RE: IndexUpdater (4.4.0) fails when -verbose is not set

2013-09-16 Thread Uwe Schindler
Hi Bruce, Thanks for investigating! Can you open a bug report on https://issues.apache.org/jira/browse/LUCENE ? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Bruce Karsh [mailto:bruceka...@gmail.

IndexUpdater (4.4.0) fails when -verbose is not set

2013-09-16 Thread Bruce Karsh
Here it fails because -verbose is not set: $ java -cp ./lucene-core-4.4-SNAPSHOT.jar org.apache.lucene.index.IndexUpgrader ./INDEX Exception in thread "main" java.lang.IllegalArgumentException: printStream must not be null at org.apache.lucene.index.IndexWriterConfig.setInfoStream(IndexWriterConf

exception while writing to index

2013-09-16 Thread nischal reddy
Hi, I am getting an exception while indexing files, i tried debugging but couldnt figure out the problem. I have a custom analyzer which creates the token stream , i am indexing around 15k files, when i start the indexing after some time i get this exception: java.lang.IllegalArgumentException:

Re: org.apache.lucene.analysis.icu.ICUNormalizer2Filter -- why Token?

2013-09-16 Thread Robert Muir
That would be great! On Mon, Sep 16, 2013 at 1:41 PM, Benson Margulies wrote: > Thanks, I might pitch in. > > > On Mon, Sep 16, 2013 at 12:58 PM, Robert Muir wrote: > >> Mostly because our tokenizers like StandardTokenizer will tokenize the >> same way regardless of normalization form or whether

org.apache.lucene.analysis.icu.ICUNormalizer2Filter -- why Token?

2013-09-16 Thread Benson Margulies
Can anyone shed light as to why this is a token filter and not a char filter? I'm wishing for one of these _upstream_ of a tokenizer, so that the tokenizer's lookups in its dictionaries are seeing normalized contents.

Re: org.apache.lucene.analysis.icu.ICUNormalizer2Filter -- why Token?

2013-09-16 Thread Benson Margulies
Thanks, I might pitch in. On Mon, Sep 16, 2013 at 12:58 PM, Robert Muir wrote: > Mostly because our tokenizers like StandardTokenizer will tokenize the > same way regardless of normalization form or whether its normalized at > all? > > But for other tokenizers, such a charfilter should be usefu

Re: Multiple field instances and Field.Store.NO

2013-09-16 Thread Alan Burlison
> Is Luke showing you stored fields? If so, this makes no sense ... > Field.Store.NO (single or multiple calls) should have resulted in no > stored fields. It shows the field but shows the content as -- Alan Burlison -- - To

Re: Regarding Compression Tool

2013-09-16 Thread Mark Miller
Have you considered storing your indexes server-side? I haven't used compression but usually the trade-off of compression is CPU usage which will also be a drain on battery life. Or maybe consider how important the highlighter is to your users - is it worth the trade-off of either disk space or bat

Re: Lucene Query Syntax with analyzed and unanalyzed text

2013-09-16 Thread Ian Lea
org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper in analyzers-common is what you need. There's an example in the javadocs. Build and use the wrapper instance in place of StandardAnalyzer or whatever you are using now. -- Ian. On Mon, Sep 16, 2013 at 5:36 PM, Scott Smith wrote

Lucene Query Syntax with analyzed and unanalyzed text

2013-09-16 Thread Scott Smith
I want to be sure I understand this correctly. Suppose I have a search that I'm going to run through the query parser that looks like: body:"some phrase" AND keyword:"my-keyword" clearly "body" and "keyword" are field names. However, the additional information is that the "body" field is anal

Re: org.apache.lucene.analysis.icu.ICUNormalizer2Filter -- why Token?

2013-09-16 Thread Robert Muir
Mostly because our tokenizers like StandardTokenizer will tokenize the same way regardless of normalization form or whether its normalized at all? But for other tokenizers, such a charfilter should be useful: there is a JIRA for it, but it has some unresolved issues https://issues.apache.org/jira

Re: Multiple field instances and Field.Store.NO

2013-09-16 Thread Michael McCandless
On Mon, Sep 16, 2013 at 9:52 AM, Alan Burlison wrote: > On 16 September 2013 12:40, Michael McCandless > wrote: > >> If you use Field.Store.NO for all fields for a given document then no >> field should have been stored. Can you boil this down to a small test >> case? > > repeated calls to > > d

Re: Multiple field instances and Field.Store.NO

2013-09-16 Thread Alan Burlison
On 16 September 2013 12:40, Michael McCandless wrote: > If you use Field.Store.NO for all fields for a given document then no > field should have been stored. Can you boil this down to a small test > case? repeated calls to doc.add(new TextField("content", c, Field.Store.NO))) result in a sin

Re: Multiple field instances and Field.Store.NO

2013-09-16 Thread Alan Burlison
On 16 September 2013 11:47, Ian Lea wrote: > Not exactly dumb, and I can't tell you exactly what is happening here, > but lucene stores some info at the index level rather than the field > level, and things can get confusing if you don't use the same Field > definition consistently for a field. >

Re: Multiple field instances and Field.Store.NO

2013-09-16 Thread Michael McCandless
That is strange. If you use Field.Store.NO for all fields for a given document then no field should have been stored. Can you boil this down to a small test case? Mike McCandless http://blog.mikemccandless.com On Mon, Sep 16, 2013 at 6:33 AM, Alan Burlison wrote: > I'm creating multiple inst

Re: Multiple field instances and Field.Store.NO

2013-09-16 Thread Ian Lea
Not exactly dumb, and I can't tell you exactly what is happening here, but lucene stores some info at the index level rather than the field level, and things can get confusing if you don't use the same Field definition consistently for a field. >From the javadocs for org.apache.lucene.document.Fie

Multiple field instances and Field.Store.NO

2013-09-16 Thread Alan Burlison
I'm creating multiple instances of a field, some with Field.Store.YES and some with Field.Store.NO, with Lucene 4.4. If Field.Store.YES is set then I see multiple instances of the field in the documents in the resulting index, if I use Field.Store.NO then I only see a single field. Is that expected

Re: possible latency increase from Lucene versions 4.1 to 4.4?

2013-09-16 Thread Adrien Grand
Hi John, I just had a look at Mike's benchs[1][2] which don't show any performance difference from approximately 1 year. But this only tests a conjunction of two terms so it might still be that latency worsened for more complex queries. [1] http://people.apache.org/~mikemccand/lucenebench/AndHigh

Re: Regarding Compression Tool

2013-09-16 Thread Jebarlin Robertson
I am using Apache Lucene in Android. I have around 1 GB of Text documents (Logs). When I Index these text documents using this *new Field(ContentIndex.KEY_TEXTCONTENT, contents, Field.Store.YES, Field.Index.ANALYZED,TermVector.WITH_POSITIONS_OFFSETS)*, the index directory is consuming 1.59GB memory