RE: Spellcheck in multilanguage search

2010-08-31 Thread Grijesh.singh
I have to search on multiple fields on different language at a time -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-in-multilanguage-search-tp1393357p1393431.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: Spellcheck in multilanguage search

2010-08-31 Thread Markus Jelsma
: Tue 31-08-2010 12:18 To: solr-user@lucene.apache.org; Subject: Spellcheck in multilanguage search How can be spellcheck configured for multilanguage search,I have to index 17 languages in my indexes and search on them also wants to use spellcheck for that -- View this message in context: http

Spellcheck in multilanguage search

2010-08-31 Thread Grijesh.singh
How can be spellcheck configured for multilanguage search,I have to index 17 languages in my indexes and search on them also wants to use spellcheck for that -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-in-multilanguage-search-tp1393357p1393357.html Sent from

Re: Multilanguage

2009-03-04 Thread Karl Wettin
17 feb 2009 kl. 21.26 skrev Grant Ingersoll: I believe Karl Wettin submitted a Lucene patch for a Language guesser: http://issues.apache.org/jira/browse/LUCENE-826 but it is marked as won't fix. The test case of LUCENE-1039 is a language classifier. I've use patch to detect languages of

Re: Multilanguage

2009-02-17 Thread Walter Underwood
On 2/17/09 12:26 PM, "Grant Ingersoll" wrote: > If purchasing, several companies offer solutions, but I don't know > that their quality is any better than what you can get through open > source, as generally speaking, the problem is solved with a high > degree of accuracy through n-gram analysis.

Re: Multilanguage

2009-02-17 Thread Grant Ingersoll
uesday, February 17, 2009 6:39:40 PM Subject: Re: Multilanguage Does Apache Tika help find the language of the given document? On 2/17/09, Till Kinstler wrote: Paul Libbrecht schrieb: Clearly, then, something that matches words in a dictionary and decides on the language based on the langu

Re: Multilanguage

2009-02-17 Thread revathy arun
ementation is at the URL below my name. > > Otis -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > From: revathy arun > To: solr-user@lucene.apache.org > Sent: Tuesday, February 17, 2009 6:39:40 PM > Subj

Re: Multilanguage

2009-02-17 Thread Otis Gospodnetic
- Solr - Nutch From: revathy arun To: solr-user@lucene.apache.org Sent: Tuesday, February 17, 2009 6:39:40 PM Subject: Re: Multilanguage Does Apache Tika help find the language of the given document? On 2/17/09, Till Kinstler wrote: > > Paul Libbrecht schrieb: > &g

Re: Multilanguage

2009-02-17 Thread revathy arun
Does Apache Tika help find the language of the given document? On 2/17/09, Till Kinstler wrote: > > Paul Libbrecht schrieb: > > Clearly, then, something that matches words in a dictionary and decides on >> the language based on the language of the majority could do a decent job to >> decide the

Re: Multilanguage

2009-02-17 Thread Till Kinstler
Paul Libbrecht schrieb: Clearly, then, something that matches words in a dictionary and decides on the language based on the language of the majority could do a decent job to decide the analyzer. Does such a tool exist? I once played around with http://ngramj.sourceforge.net/ for language

Re: Multilanguage

2009-02-17 Thread Paul Libbrecht
I was looking for such a tool and haven't found it yet. Using StandardAnalyzer one can obtain some form of token-stream which can be used for "agnostic analysis". Clearly, then, something that matches words in a dictionary and decides on the language based on the language of the majority could

Re: Multilanguage

2009-02-16 Thread Otis Gospodnetic
@lucene.apache.org Sent: Monday, February 16, 2009 1:42:07 PM Subject: Multilanguage Hi, I have a scenario where ,i need to  convert pdf content to text  and then index the same at run time .I do not know as to what language the pdf would be ,in this case which is the best  soln i have with respect the

Re: Multilanguage

2009-02-16 Thread Erick Erickson
I recommend that you search both this and the Lucene list. You'll find that this topic has been discussed many times, and several approaches have been outlined. The searchable archives are linked to from here: http://lucene.apache.org/java/docs/mailinglists.html. Best Erick On Mon, Feb 16, 2009

Multilanguage

2009-02-15 Thread revathy arun
Hi, I have a scenario where ,i need to convert pdf content to text and then index the same at run time .I do not know as to what language the pdf would be ,in this case which is the best soln i have with respect the content field type in the schema where the text content would be indexed to? Th

Re: multilanguage + howto search in all languages?

2009-01-29 Thread Julian Davchev
Thank you both for points. For now I am hanlding with fuzzy search. Let's hope this will do for sometime :) Walter Underwood wrote: > I've done this. There are five cases for the tokens in the search > index: > > 1. Tokens that are unique after stemming (this is good). > 2. Tokens that are common

Re: multilanguage + howto search in all languages?

2009-01-28 Thread Walter Underwood
Duh. Four cases. For extra credit, what language is "wunder" in? wunder On 1/28/09 5:12 PM, "Walter Underwood" wrote: > I've done this. There are five cases for the tokens in the search > index: > > 1. Tokens that are unique after stemming (this is good). > 2. Tokens that are common after stem

Re: multilanguage + howto search in all languages?

2009-01-28 Thread Walter Underwood
I've done this. There are five cases for the tokens in the search index: 1. Tokens that are unique after stemming (this is good). 2. Tokens that are common after stemming (usually trademarks, like LaserJet). 3. Tokens with collisions after stemming: German "mit", "MIT" the university Germ

Re: multilanguage + howto search in all languages?

2009-01-28 Thread Erick Erickson
I'm not entirely sure about the fine points, but consider the filters that are available that fold all the diacritics into their low-ascii equivalents. Perhaps using that filter at *both* index and search time on the English index would do the trick. In your example, both would be 'munchen'. Strai

multilanguage + howto search in all languages?

2009-01-28 Thread Julian Davchev
Hi, I currently have two indexes with solr. One for english version and one with german version. They use respectively english/german2 snowball factory. Right now depending on which language is website currently I query corresponding index. There is requirement though that stuff is found regardless

Re: multilanguage prototype

2009-01-28 Thread Jerven Bolleman
Hi, Your problem seems to be lower level than the SOLR code. You are sending an xml request that contains an illegal (to xml spec) character. You should strip these characters out of the data that you send. Or turn the xml validation (not recommended because of all kinds of risks). See http://www

Re: multilanguage prototype

2009-01-27 Thread revathy arun
Hi, I a, getting this error in the tomcat log file on passing chinese test to the content field The content field uses the ckj tokenizer. and is defined as INFO: [] webapp=/lang_prototype path=/update params={} status=0 QTime=69 Jan 28, 2009 12:17:03 PM org.apache.solr.common.

Re: multilanguage prototype

2009-01-27 Thread revathy arun
Hi, This is the only info in the tomcat log at indexing Jan 27, 2009 3:46:15 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/lang_prototype path=/update params={} status=0 QTime=191 I dont see any ohter errors in the logs . when i use curl to update i get success message. and commit

Re: multilanguage prototype

2009-01-27 Thread Erik Hatcher
errors: 11 What were those? My hunch is your indexer had issues. What did Solr output into the console or log during indexing? Erik On Jan 27, 2009, at 6:56 AM, revathy arun wrote: Hi Shalin, The admin page stats are as follows searcherName : searc...@1d4c3d5 main caching : true

Re: multilanguage prototype

2009-01-27 Thread revathy arun
Hi Shalin, The admin page stats are as follows searcherName : searc...@1d4c3d5 main caching : true numDocs : 0 maxDoc : 0 *name: * /update *class: * org.apache.solr.handler.XmlUpdateRequestHandler *version: * $Revision: 690026 $ *description: * Add documents with XML * stats: *handlerStart :

Re: multilanguage prototype

2009-01-27 Thread Shalin Shekhar Mangar
Are you looking for it in the right place? It is very unlikely that a commit happens and index is not created. The index is usually created inside the data directory as configured in your solconfig.xml Can you search for *:* from the solr admin page and see if documents are returned? On Tue, Jan

Re: multilanguage prototype

2009-01-27 Thread revathy arun
this is the stats of my updatehandler but i still dont see any index created *stats: *commits : 7 autocommits : 0 optimizes : 2 docsPending : 0 adds : 0 deletesById : 0 deletesByQuery : 0 errors : 0 cumulative_adds : 0 cumulative_deletesById : 0 cumulative_deletesByQuery : 0 cumulative_errors : 0

Re: multilanguage prototype

2009-01-27 Thread revathy arun
Hi I have committed.The admin page does not show any docs pending or committed or any errors. Regards Sujatha On 1/27/09, Shalin Shekhar Mangar wrote: > > Did you commit after the updates? > > 2009/1/27 revathy arun > > > Hi, > > > > I have downloade solr1.3.0 . > > > > I need to index chines

Re: multilanguage prototype

2009-01-27 Thread Shalin Shekhar Mangar
Did you commit after the updates? 2009/1/27 revathy arun > Hi, > > I have downloade solr1.3.0 . > > I need to index chinese content ,for this i have defined a new field in the > schema > > as > > > positionIncrementGap="100"> > > > > > > > > > > > > > > > > > > I beleive solr1.3 already

multilanguage prototype

2009-01-27 Thread revathy arun
Hi, I have downloade solr1.3.0 . I need to index chinese content ,for this i have defined a new field in the schema as I beleive solr1.3 already has the cjkanalyzer by default. my schema in the testing stage has only 2 fields However when i index the chinese text into

Re: looking for multilanguage indexing best practice/hint

2008-12-21 Thread Julian Davchev
>>>> Regards >>>> Sujatha >>>> >>>> >>>> >>>> >>>> On 12/18/08, Feak, Todd wrote: >>>> >>>> >>>>> Don't forget to consider scaling concerns (if there are any)

Re: looking for multilanguage indexing best practice/hint

2008-12-19 Thread Sujatha Arun
gt; > >> > >> > >> > >> On 12/18/08, Feak, Todd wrote: > >> > >>> Don't forget to consider scaling concerns (if there are any). There are > >>> strong differences in the number of searches we receive for each > >>> lang

Re: looking for multilanguage indexing best practice/hint

2008-12-18 Thread Julian Davchev
>>> if we needed to. We see 2 orders of magnitude difference between our >>> most popular language and our least popular. >>> >>> -Todd Feak >>> >>> -Original Message- >>> From: Julian Davchev [mailto:j...@drun.net] >>> Sen

Re: looking for multilanguage indexing best practice/hint

2008-12-18 Thread Chris Hostetter
: Subject: looking for multilanguage indexing best practice/hint : References: <49483388.8030...@drun.net> : <502b8706-828b-4eaa-886d-af0dccf37...@stylesight.com> : <8c0c601f0812170825j766cf005i9546b2604a19f...@mail.gmail.com

RE: looking for multilanguage indexing best practice/hint

2008-12-18 Thread Daniel Alheiros
you can pre-define some base query parts and also do score boosting behind the scenes. I hope it helps. Regards, Daniel -Original Message- From: Sujatha Arun [mailto:suja.a...@gmail.com] Sent: 18 December 2008 04:15 To: solr-user@lucene.apache.org Subject: Re: looking for multilanguage

Re: looking for multilanguage indexing best practice/hint

2008-12-18 Thread Erick Erickson
rs of magnitude difference between our > > most popular language and our least popular. > > > > -Todd Feak > > > > -Original Message- > > From: Julian Davchev [mailto:j...@drun.net] > > Sent: Wednesday, December 17, 2008 11:31 AM > > To: solr-u

Re: looking for multilanguage indexing best practice/hint

2008-12-17 Thread Sujatha Arun
of magnitude difference between our > most popular language and our least popular. > > -Todd Feak > > -Original Message- > From: Julian Davchev [mailto:j...@drun.net] > Sent: Wednesday, December 17, 2008 11:31 AM > To: solr-user@lucene.apache.org > Subject: loo

RE: looking for multilanguage indexing best practice/hint

2008-12-17 Thread Feak, Todd
ed to. We see 2 orders of magnitude difference between our most popular language and our least popular. -Todd Feak -Original Message- From: Julian Davchev [mailto:j...@drun.net] Sent: Wednesday, December 17, 2008 11:31 AM To: solr-user@lucene.apache.org Subject: looking for multilan

Re: looking for multilanguage indexing best practice/hint

2008-12-17 Thread Alexander Ramos Jardim
so far it seems that I will use single > scheme.at least don't see scenario where I'd need more than that. > So question is how do I approach multilanguage indexing and multilang > searching. Will it really make sense for just searching word..or rather > I should supply l

looking for multilanguage indexing best practice/hint

2008-12-17 Thread Julian Davchev
Hi, >From my study on solr and lucene so far it seems that I will use single scheme.at least don't see scenario where I'd need more than that. So question is how do I approach multilanguage indexing and multilang searching. Will it really make sense for just searching word..or ra

solr 1.2 to solr 1.3, manage multilanguage error?

2008-10-08 Thread sunnyfr
x27;t use it. My Schema: http://www.nabble.com/file/p19875539/schema.xml schema.xml Thanks a lot, -- View this message in context: http://www.nabble.com/solr-1.2-to-solr-1.3%2C-manage-multilanguage-error--tp19875539p19875539.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Same index OR multiple indexes/webapps (multilanguage document)

2007-05-28 Thread John.Smith
Thanks Ryan, I don't think to Analyzers. I choose solution with 2 distinct indexes. -- D ryan mckinley wrote: > > One thing to consider is you will probably want to use french analyzers > for the french app and english ones for the english app... depending on > how you configure schema.xml

Re: Same index OR multiple indexes/webapps (multilanguage document)

2007-05-28 Thread Ryan McKinley
If your french/english apps really don't need to share data, I don't think there is any general rule -- the choice will come down to your personal taste... One thing to consider is you will probably want to use french analyzers for the french app and english ones for the english app... depend

Same index OR multiple indexes/webapps (multilanguage document)

2007-05-28 Thread John.Smith
Hi, I manipulate some "documents", Each document have the same structure. But i have about 150.000 documents in french and about 200.000 documents in english. I also have 2 different applications (one in french and one in english). French app makes queries only on french docs and English app mak