Re: Analyzers thread safe across indexes?

2019-01-25 Thread Bill Gray
Thank you Adrien. On Thu, Jan 24, 2019 at 10:10 PM Adrien Grand wrote: > Hi Bill, > > Yes, reusing analyzers across different indexes is safe. > > Tokenizers (and some token filters and char filters) are stateful, but > state caching is performed in a ThreadLocal so no state is shared > between

Re: Analyzers thread safe across indexes?

2019-01-24 Thread Adrien Grand
Hi Bill, Yes, reusing analyzers across different indexes is safe. Tokenizers (and some token filters and char filters) are stateful, but state caching is performed in a ThreadLocal so no state is shared between threads. On Fri, Jan 25, 2019 at 1:04 AM Bill Gray wrote: > > Hi, > > I'm working on

Re: analyzers-common VS analyzers-icu

2016-06-08 Thread Daniel Bigham
Any other replies to this? Timothy's response was somewhat helpful but hasn't answered in an authoritative way what the current status of these two different "forks" of language analyzers is. Surely there is some history here and some high level status about them? (perhaps I should look at git a

RE: analyzers-common VS analyzers-icu

2016-06-01 Thread Allison, Timothy B.
That package has an ICU tokenizer and the ICUFoldingFilter. The ICUFoldingFilter does advanced (well, Unicode compliant) case folding/lowercasing/normalization and is critical for non-ascii languages. You can use that in place of the AsciiFoldingFilter and the LowerCaseFilter, and it should

Re: analyzers for Thai, Telugu, Vietnamese, Korean, Urdu,...

2014-11-09 Thread Olivier Binda
On 11/09/2014 04:52 PM, Ahmet Arslan wrote: Hi, Thai has this for example : org.apache.lucene.analysis.th.ThaiAnalyzer Ahmet Thanks ahmet (I'm now using it), What about the Korean Analyzer developped in LUCENE-4956 ? It looks like there

Re: analyzers for Thai, Telugu, Vietnamese, Korean, Urdu,...

2014-11-09 Thread Ahmet Arslan
Hi, Thai has this for example : org.apache.lucene.analysis.th.ThaiAnalyzer Ahmet On Saturday, November 8, 2014 12:48 PM, Olivier Binda wrote: Hello What should I use for analysing languages like Thai, Telugu, Vietnamese, Korean, Urdu ? The StandardAnalyzer ? The ICUAnalyzer ? It doesn't l

Re: analyzers for Thai, Telugu, Vietnamese, Korean, Urdu,...

2014-11-08 Thread Erick Erickson
There are a bunch of different examples in the schema file that should point you in the right direction, whether these specific languages are supported is an open question though. Best, Erick On Sat, Nov 8, 2014 at 2:47 AM, Olivier Binda wrote: > Hello > > What should I use for analysing languag

RE: Analyzers aren't reusable?? (lucene 4.2.1)

2013-12-05 Thread Scott Smith
That was an easy fix. Everything works as expected now. Thanks again. -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Thursday, December 05, 2013 1:46 PM To: java-user@lucene.apache.org Subject: RE: Analyzers aren't reusable?? (lucene 4.2.1) The problem i

RE: Analyzers aren't reusable?? (lucene 4.2.1)

2013-12-05 Thread Scott Smith
Thanks for the quick response. I'll read through the references. Thanks again Scott -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Thursday, December 05, 2013 1:46 PM To: java-user@lucene.apache.org Subject: RE: Analyzers aren't reusable?? (lucene 4

RE: Analyzers aren't reusable?? (lucene 4.2.1)

2013-12-05 Thread Uwe Schindler
The problem is the CharFilter, which cannot be reused. To correctly implement the Analyzer do the wrapping of the incoming Reader in the protected initReader():http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/analysis/Analyzer.html#initReader(java.lang.String, java.io.Reader). In creat

Re: Analyzers

2007-02-08 Thread Chris Lu
t be done? -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 08 February 2007 17:34 To: java-user@lucene.apache.org Subject: Re: Analyzers Use PerFieldAnalyzerWrapper. On 2/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: > > Hi all, > > I wanted t

Re: Analyzers

2007-02-08 Thread karl wettin
8 feb 2007 kl. 18.36 skrev Kainth, Sachin: Can you give me an example of how this might be done? The javadocs is generally a good place to start: http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/ PerFieldAnalyzerWrapper.html -- karl --

RE: Analyzers

2007-02-08 Thread Kainth, Sachin
Can you provide an example? -Original Message- From: Chris Lu [mailto:[EMAIL PROTECTED] Sent: 08 February 2007 17:35 To: java-user@lucene.apache.org Subject: Re: Analyzers This is totally possible. -- Chris Lu - Instant Full-Text Search On Any Database

RE: Analyzers

2007-02-08 Thread Kainth, Sachin
Can you give me an example of how this might be done? -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 08 February 2007 17:34 To: java-user@lucene.apache.org Subject: Re: Analyzers Use PerFieldAnalyzerWrapper. On 2/8/07, Kainth, Sachin <[EMAIL PROTECTED]>

Re: Analyzers

2007-02-08 Thread Chris Lu
This is totally possible. -- Chris Lu - Instant Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com On 2/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: Hi all, I wanted to know if it is possible to store some fields

Re: Analyzers

2007-02-08 Thread Erick Erickson
Use PerFieldAnalyzerWrapper. On 2/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: Hi all, I wanted to know if it is possible to store some fields in an index with one analyzers and other fields with another analyzer? Cheers Sachin This email and any attached files are confidential and copy

Re: Analyzers and multiple languages (language detection)

2006-11-21 Thread Bob Carpenter
Antony Bowesman wrote: Hello, I'm new to Lucene and wanted some advice on analyzers, stemmers and language analysis. I've got LIA, so have read it's chapters. I am writing a framework that needs to be able to index documents from a range of languages where just the character set of the docu

Re: Analyzers and multiple languages

2006-10-13 Thread Erik Hatcher
On Oct 13, 2006, at 3:42 AM, Antony Bowesman wrote: I am writing a framework that needs to be able to index documents from a range of languages where just the character set of the document is known. Has anyone looked at or is using language analysis to determine the language of a document

Re: Analyzers and multiple languages

2006-10-13 Thread Soeren Pekrul
Hello Antony, I have a similar problem. My collection contains mainly German documents, but some in English and few in French, Spain and Latin. I know that each language has its own stemming rules. Language detection is not my domain. But I can imagine it could be possible to detect the lang

Re: Analyzers and multiple languages

2006-10-13 Thread Mark Miller
Generally, stemming is not a method for index size reduction even though that might be a side effect. It is very useful in search however...you would generally want a search for skiing to also hit ski and skier (I can't spell so don't get caught up on that). There are lots of those examples...if y

Re: Analyzers and multiple languages

2006-10-13 Thread Erick Erickson
This won't be *really* helpful, but I remember this being discussed at some length a while ago. You'd be able to see some good info if you searched the list archive, probably for language I didn't pay much attention since this isn't something I'm concerned with lately, so I can't be much real hel