I am using Apache Lucene in Android. I have around 1 GB of Text documents
(Logs). When I Index these text documents using this
*new Field(ContentIndex.KEY_TEXTCONTENT, contents, Field.Store.YES,
Field.Index.ANALYZED,TermVector.WITH_POSITIONS_OFFSETS)*, the index
directory is consuming 1.59GB
I'm creating multiple instances of a field, some with Field.Store.YES
and some with Field.Store.NO, with Lucene 4.4. If Field.Store.YES is
set then I see multiple instances of the field in the documents in the
resulting index, if I use Field.Store.NO then I only see a single
field. Is that
Not exactly dumb, and I can't tell you exactly what is happening here,
but lucene stores some info at the index level rather than the field
level, and things can get confusing if you don't use the same Field
definition consistently for a field.
From the javadocs for
That is strange.
If you use Field.Store.NO for all fields for a given document then no
field should have been stored. Can you boil this down to a small test
case?
Mike McCandless
http://blog.mikemccandless.com
On Mon, Sep 16, 2013 at 6:33 AM, Alan Burlison alan.burli...@gmail.com wrote:
I'm
On 16 September 2013 11:47, Ian Lea ian@gmail.com wrote:
Not exactly dumb, and I can't tell you exactly what is happening here,
but lucene stores some info at the index level rather than the field
level, and things can get confusing if you don't use the same Field
definition consistently
On 16 September 2013 12:40, Michael McCandless
luc...@mikemccandless.com wrote:
If you use Field.Store.NO for all fields for a given document then no
field should have been stored. Can you boil this down to a small test
case?
repeated calls to
doc.add(new TextField(content, c,
On Mon, Sep 16, 2013 at 9:52 AM, Alan Burlison alan.burli...@gmail.com wrote:
On 16 September 2013 12:40, Michael McCandless
luc...@mikemccandless.com wrote:
If you use Field.Store.NO for all fields for a given document then no
field should have been stored. Can you boil this down to a small
Mostly because our tokenizers like StandardTokenizer will tokenize the
same way regardless of normalization form or whether its normalized at
all?
But for other tokenizers, such a charfilter should be useful: there is
a JIRA for it, but it has some unresolved issues
I want to be sure I understand this correctly. Suppose I have a search that
I'm going to run through the query parser that looks like:
body:some phrase AND keyword:my-keyword
clearly body and keyword are field names. However, the additional
information is that the body field is analyzed and
Is Luke showing you stored fields? If so, this makes no sense ...
Field.Store.NO (single or multiple calls) should have resulted in no
stored fields.
It shows the field but shows the content as not present or not stored
--
Alan Burlison
--
Thanks, I might pitch in.
On Mon, Sep 16, 2013 at 12:58 PM, Robert Muir rcm...@gmail.com wrote:
Mostly because our tokenizers like StandardTokenizer will tokenize the
same way regardless of normalization form or whether its normalized at
all?
But for other tokenizers, such a charfilter
Can anyone shed light as to why this is a token filter and not a char
filter? I'm wishing for one of these _upstream_ of a tokenizer, so that the
tokenizer's lookups in its dictionaries are seeing normalized contents.
That would be great!
On Mon, Sep 16, 2013 at 1:41 PM, Benson Margulies ben...@basistech.com wrote:
Thanks, I might pitch in.
On Mon, Sep 16, 2013 at 12:58 PM, Robert Muir rcm...@gmail.com wrote:
Mostly because our tokenizers like StandardTokenizer will tokenize the
same way regardless of
Hi,
I am getting an exception while indexing files, i tried debugging but
couldnt figure out the problem.
I have a custom analyzer which creates the token stream , i am indexing
around 15k files, when i start the indexing after some time i get this
exception:
Here it fails because -verbose is not set:
$ java -cp ./lucene-core-4.4-SNAPSHOT.jar
org.apache.lucene.index.IndexUpgrader ./INDEX
Exception in thread main java.lang.IllegalArgumentException: printStream
must not be null
at
Hi Bruce,
Thanks for investigating! Can you open a bug report on
https://issues.apache.org/jira/browse/LUCENE ?
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Bruce Karsh
16 matches
Mail list logo