RE: Custom indexing

2016-04-19 Thread Uwe Schindler
Hi, > The main use case is searching in file names. For example, lucene.txt, > lucene_new.txt, lucene_1_new.txt. If I use 'lucene', I need to get all 3 > files. with 'new' I need to get last two files. Please note that Standard > analyzer/tokenizer of lucene 3.6 is not giving us the results with >

Re: Custom indexing

2016-04-18 Thread Ahmet Arslan
Hi, Please try letter tokenizer, it should cover your example. Ahmet On Monday, April 18, 2016 3:02 PM, PK C wrote: Hi, Thank you very much for your quick responses. Jack Krupansky, The main use case is searching in file names. For example, lucene.txt,

Re: Custom indexing

2016-04-18 Thread Jack Krupansky
You failed to disclose up front that you are using such an old release of Lucene. Lucene is now on 6.0. I'll defer to others if they wish to provide support for such an old release. -- Jack Krupansky On Mon, Apr 18, 2016 at 8:01 AM, PK C wrote: > Hi, > >Thank you

Re: Custom indexing

2016-04-18 Thread PK C
Hi, Thank you very much for your quick responses. Jack Krupansky, The main use case is searching in file names. For example, lucene.txt, lucene_new.txt, lucene_1_new.txt. If I use 'lucene', I need to get all 3 files. with 'new' I need to get last two files. Please note that Standard

Re: Custom indexing

2016-04-12 Thread Jack Krupansky
The standard analyzer/tokenizer should do a decent job of splitting on dot, hyphen, and underscore, in addition to whitespace and other punctuation. Can you post some specific test cases you are concerned with? (You should always run some test cases.) -- Jack Krupansky On Tue, Apr 12, 2016 at

Re: Custom indexing

2016-04-12 Thread Ahmet Arslan
Hi Chamarty, Well, there are a lot of options here. 1) Use LetterTokenizer 2) Use WordDelimeterFilter combined with WhiteSpaceTokenizer 3) Use MappingCharFilter to replace those characters with spaces . . . Ahmet On Tuesday, April 12, 2016 3:58 PM, PrasannaKumar Chamarty

Jackrabbit - Custom indexing

2016-04-12 Thread PrasannaKumar Chamarty
Hi, What is the best way (in terms of maintenance required with new lucene releases) to allow splitting of words (into tokens) on "." and "_" for indexing ? Please note that I am using lucene through Jackrabbit. Jackrabbit's Search configuration can be found at