I have to search on multiple fields on different language at a time
--
View this message in context:
http://lucene.472066.n3.nabble.com/Spellcheck-in-multilanguage-search-tp1393357p1393431.html
Sent from the Solr - User mailing list archive at Nabble.com.
: Tue 31-08-2010 12:18
To: solr-user@lucene.apache.org;
Subject: Spellcheck in multilanguage search
How can be spellcheck configured for multilanguage search,I have to index 17
languages in my indexes and search on them also wants to use spellcheck for
that
--
View this message in context:
http
How can be spellcheck configured for multilanguage search,I have to index 17
languages in my indexes and search on them also wants to use spellcheck for
that
--
View this message in context:
http://lucene.472066.n3.nabble.com/Spellcheck-in-multilanguage-search-tp1393357p1393357.html
Sent from
17 feb 2009 kl. 21.26 skrev Grant Ingersoll:
I believe Karl Wettin submitted a Lucene patch for a Language
guesser: http://issues.apache.org/jira/browse/LUCENE-826 but it is
marked as won't fix.
The test case of LUCENE-1039 is a language classifier. I've use patch
to detect languages of
On 2/17/09 12:26 PM, "Grant Ingersoll" wrote:
> If purchasing, several companies offer solutions, but I don't know
> that their quality is any better than what you can get through open
> source, as generally speaking, the problem is solved with a high
> degree of accuracy through n-gram analysis.
uesday, February 17, 2009 6:39:40 PM
Subject: Re: Multilanguage
Does Apache Tika help find the language of the given document?
On 2/17/09, Till Kinstler wrote:
Paul Libbrecht schrieb:
Clearly, then, something that matches words in a dictionary and
decides
on
the language based on the langu
ementation is at the URL below my name.
>
> Otis --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
>
>
> From: revathy arun
> To: solr-user@lucene.apache.org
> Sent: Tuesday, February 17, 2009 6:39:40 PM
> Subj
- Solr - Nutch
From: revathy arun
To: solr-user@lucene.apache.org
Sent: Tuesday, February 17, 2009 6:39:40 PM
Subject: Re: Multilanguage
Does Apache Tika help find the language of the given document?
On 2/17/09, Till Kinstler wrote:
>
> Paul Libbrecht schrieb:
>
&g
Does Apache Tika help find the language of the given document?
On 2/17/09, Till Kinstler wrote:
>
> Paul Libbrecht schrieb:
>
> Clearly, then, something that matches words in a dictionary and decides on
>> the language based on the language of the majority could do a decent job to
>> decide the
Paul Libbrecht schrieb:
Clearly, then, something that matches words in a dictionary and decides
on the language based on the language of the majority could do a decent
job to decide the analyzer.
Does such a tool exist?
I once played around with http://ngramj.sourceforge.net/ for language
I was looking for such a tool and haven't found it yet.
Using StandardAnalyzer one can obtain some form of token-stream which
can be used for "agnostic analysis".
Clearly, then, something that matches words in a dictionary and
decides on the language based on the language of the majority could
@lucene.apache.org
Sent: Monday, February 16, 2009 1:42:07 PM
Subject: Multilanguage
Hi,
I have a scenario where ,i need to convert pdf content to text and then
index the same at run time .I do not know as to what language the pdf would
be ,in this case which is the best soln i have with respect the
I recommend that you search both this and the
Lucene list. You'll find that this topic has been
discussed many times, and several approaches
have been outlined.
The searchable archives are linked to from here:
http://lucene.apache.org/java/docs/mailinglists.html.
Best
Erick
On Mon, Feb 16, 2009
Hi,
I have a scenario where ,i need to convert pdf content to text and then
index the same at run time .I do not know as to what language the pdf would
be ,in this case which is the best soln i have with respect the content
field type in the schema where the text content would be indexed to?
Th
Thank you both for points. For now I am hanlding with fuzzy search.
Let's hope this will do for sometime :)
Walter Underwood wrote:
> I've done this. There are five cases for the tokens in the search
> index:
>
> 1. Tokens that are unique after stemming (this is good).
> 2. Tokens that are common
Duh. Four cases. For extra credit, what language is "wunder" in?
wunder
On 1/28/09 5:12 PM, "Walter Underwood" wrote:
> I've done this. There are five cases for the tokens in the search
> index:
>
> 1. Tokens that are unique after stemming (this is good).
> 2. Tokens that are common after stem
I've done this. There are five cases for the tokens in the search
index:
1. Tokens that are unique after stemming (this is good).
2. Tokens that are common after stemming (usually trademarks,
like LaserJet).
3. Tokens with collisions after stemming:
German "mit", "MIT" the university
Germ
I'm not entirely sure about the fine points, but consider the
filters that are available that fold all the diacritics into their
low-ascii equivalents. Perhaps using that filter at *both* index
and search time on the English index would do the trick.
In your example, both would be 'munchen'. Strai
Hi,
I currently have two indexes with solr. One for english version and one
with german version. They use respectively english/german2 snowball
factory.
Right now depending on which language is website currently I query
corresponding index.
There is requirement though that stuff is found regardless
Hi,
Your problem seems to be lower level than the SOLR code. You are sending
an xml request that contains an illegal (to xml spec) character. You
should strip these characters out of the data that you send. Or turn the
xml validation (not recommended because of all kinds of risks).
See
http://www
Hi,
I a, getting this error in the tomcat log file on passing chinese test to
the content field
The content field uses the ckj tokenizer.
and is defined as
INFO: [] webapp=/lang_prototype path=/update params={} status=0 QTime=69
Jan 28, 2009 12:17:03 PM org.apache.solr.common.
Hi,
This is the only info in the tomcat log at indexing
Jan 27, 2009 3:46:15 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/lang_prototype path=/update params={} status=0 QTime=191
I dont see any ohter errors in the logs .
when i use curl to update i get success message.
and commit
errors: 11
What were those?
My hunch is your indexer had issues. What did Solr output into the
console or log during indexing?
Erik
On Jan 27, 2009, at 6:56 AM, revathy arun wrote:
Hi Shalin,
The admin page stats are as follows
searcherName : searc...@1d4c3d5 main
caching : true
Hi Shalin,
The admin page stats are as follows
searcherName : searc...@1d4c3d5 main
caching : true
numDocs : 0
maxDoc : 0
*name: * /update *class: * org.apache.solr.handler.XmlUpdateRequestHandler
*version: * $Revision: 690026 $ *description: * Add documents with XML *
stats: *handlerStart :
Are you looking for it in the right place? It is very unlikely that a commit
happens and index is not created.
The index is usually created inside the data directory as configured in your
solconfig.xml
Can you search for *:* from the solr admin page and see if documents are
returned?
On Tue, Jan
this is the stats of my updatehandler
but i still dont see any index created
*stats: *commits : 7
autocommits : 0
optimizes : 2
docsPending : 0
adds : 0
deletesById : 0
deletesByQuery : 0
errors : 0
cumulative_adds : 0
cumulative_deletesById : 0
cumulative_deletesByQuery : 0
cumulative_errors : 0
Hi
I have committed.The admin page does not show any docs pending or committed
or any errors.
Regards
Sujatha
On 1/27/09, Shalin Shekhar Mangar wrote:
>
> Did you commit after the updates?
>
> 2009/1/27 revathy arun
>
> > Hi,
> >
> > I have downloade solr1.3.0 .
> >
> > I need to index chines
Did you commit after the updates?
2009/1/27 revathy arun
> Hi,
>
> I have downloade solr1.3.0 .
>
> I need to index chinese content ,for this i have defined a new field in the
> schema
>
> as
>
>
> positionIncrementGap="100">
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> I beleive solr1.3 already
Hi,
I have downloade solr1.3.0 .
I need to index chinese content ,for this i have defined a new field in the
schema
as
I beleive solr1.3 already has the cjkanalyzer by default.
my schema in the testing stage has only 2 fields
However when i index the chinese text into
>>>> Regards
>>>> Sujatha
>>>>
>>>>
>>>>
>>>>
>>>> On 12/18/08, Feak, Todd wrote:
>>>>
>>>>
>>>>> Don't forget to consider scaling concerns (if there are any)
gt;
> >>
> >>
> >>
> >> On 12/18/08, Feak, Todd wrote:
> >>
> >>> Don't forget to consider scaling concerns (if there are any). There are
> >>> strong differences in the number of searches we receive for each
> >>> lang
>>> if we needed to. We see 2 orders of magnitude difference between our
>>> most popular language and our least popular.
>>>
>>> -Todd Feak
>>>
>>> -Original Message-
>>> From: Julian Davchev [mailto:j...@drun.net]
>>> Sen
: Subject: looking for multilanguage indexing best practice/hint
: References: <49483388.8030...@drun.net>
: <502b8706-828b-4eaa-886d-af0dccf37...@stylesight.com>
: <8c0c601f0812170825j766cf005i9546b2604a19f...@mail.gmail.com
you can pre-define some base query parts and also do score boosting
behind the scenes.
I hope it helps.
Regards,
Daniel
-Original Message-
From: Sujatha Arun [mailto:suja.a...@gmail.com]
Sent: 18 December 2008 04:15
To: solr-user@lucene.apache.org
Subject: Re: looking for multilanguage
rs of magnitude difference between our
> > most popular language and our least popular.
> >
> > -Todd Feak
> >
> > -Original Message-
> > From: Julian Davchev [mailto:j...@drun.net]
> > Sent: Wednesday, December 17, 2008 11:31 AM
> > To: solr-u
of magnitude difference between our
> most popular language and our least popular.
>
> -Todd Feak
>
> -Original Message-
> From: Julian Davchev [mailto:j...@drun.net]
> Sent: Wednesday, December 17, 2008 11:31 AM
> To: solr-user@lucene.apache.org
> Subject: loo
ed to. We see 2 orders of magnitude difference between our
most popular language and our least popular.
-Todd Feak
-Original Message-
From: Julian Davchev [mailto:j...@drun.net]
Sent: Wednesday, December 17, 2008 11:31 AM
To: solr-user@lucene.apache.org
Subject: looking for multilan
so far it seems that I will use single
> scheme.at least don't see scenario where I'd need more than that.
> So question is how do I approach multilanguage indexing and multilang
> searching. Will it really make sense for just searching word..or rather
> I should supply l
Hi,
>From my study on solr and lucene so far it seems that I will use single
scheme.at least don't see scenario where I'd need more than that.
So question is how do I approach multilanguage indexing and multilang
searching. Will it really make sense for just searching word..or ra
x27;t use it.
My Schema:
http://www.nabble.com/file/p19875539/schema.xml schema.xml
Thanks a lot,
--
View this message in context:
http://www.nabble.com/solr-1.2-to-solr-1.3%2C-manage-multilanguage-error--tp19875539p19875539.html
Sent from the Solr - User mailing list archive at Nabble.com.
Thanks Ryan,
I don't think to Analyzers.
I choose solution with 2 distinct indexes.
--
D
ryan mckinley wrote:
>
> One thing to consider is you will probably want to use french analyzers
> for the french app and english ones for the english app... depending on
> how you configure schema.xml
If your french/english apps really don't need to share data, I don't
think there is any general rule -- the choice will come down to your
personal taste...
One thing to consider is you will probably want to use french analyzers
for the french app and english ones for the english app... depend
Hi,
I manipulate some "documents", Each document have the same structure.
But i have about 150.000 documents in french and about 200.000 documents in
english.
I also have 2 different applications (one in french and one in english).
French app makes queries only on french docs and English app mak
43 matches
Mail list logo