TextField is dangerous: it is analyzed, possible into more then one
token, and then your deletes won't work. It's safer to use
StringField for tokens you later want to delete by.
Try making a standalone test that just deletes documents first...
You don't need to iw.commit to make commits
I'm trying to compare two song titles (usually latinscript) for
similarity. So Im looking for when the two titles seem to be the same
song accounting for spelling mistakes, additional words ectera.
For a number of years I've been doing this for some time by creating a
RAMDirectory, creating a
Hello,
When I build lucene from source using these instructions:
https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/BUILD.txt
I end up with only
./build/core/lucene-core-4.10.2-SNAPSHOT.jar
I would like to build lucene-analyzers-common-4.10.2.jar and
lucene-queryparser-4.10.2.jar
If you run ant -p it will print targets and descriptions.
you want 'ant compile'.
In my opinion the default target should not be 'jar', but print this
list of targets instead, just like the top-level build file.
On Tue, Dec 2, 2014 at 12:09 PM, Badano Andrea andrea.bad...@sweco.se wrote:
It is possible to get a total corpus frequency for bigram queries or
higher? i.e. How many times does the query occur in the corpus.
I'm looking to implement a count of occurrences per million terms. I know
for a single term I can use `TermsEnum.totalTermFreq()`, is there any
comparable way to
Is all the millions and random worms uncovered v runn command
1000.888
--Original Message--
From: Peter Organisciak
To: java-user@lucene.apache.org
ReplyTo: java-user@lucene.apache.org
Subject: Total Freq for Bigrams, Trigrams, etc.
Sent: Dec 2, 2014 8:38 PM
It is
If you index the n-grams in their own field using ShingleFilter, you can
get statistics using the same term api on that field, in which the terms
*are* n-grams, and similarly for queries.
-Mike
On 12/02/2014 03:38 PM, Peter Organisciak wrote:
It is possible to get a total corpus frequency
1 madz is whorific funny asl xx
Sent from my BlackBerry® wireless device
-Original Message-
From: Michael Sokolov msoko...@safaribooksonline.com
Date: Tue, 02 Dec 2014 17:31:18
To: java-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: Total Freq for Bigrams,
I am using mmap fs directory in lucene. My index is small (about 3GB
in disk) and I have plenty of memory available. The problem is that
when the term is first queried, it's slow. How can I load all
directory into memory? One solution is using many query to warm it
up. But I can't query all terms
Hi Prasad,
Firstly, the Lucene ‘general’ list is not the appropriate list; it’s the
java-user lucene list so I’m replying there instead.
This is mostly about query parsing. If you look at Lucene’s modules,
you’ll see a “queryparser” module. In there, there’s a “flexible” package
which is named
10 matches
Mail list logo