On Wednesday 24 November 2004 00:37, John Wang wrote:
Hi:
I am trying to index 1M documents, with batches of 500 documents.
Each document has an unique text key, which is added as a
Field.KeyWord(name,value).
For each batch of 500, I need to make sure I am not adding a
On Wednesday 24 November 2004 01:31, Ken McCracken wrote:
Hi,
Thanks the pointers in your replies. Would it be possible to include
some sort of accrual scorer interface somewhere in the Lucene Query
APIs? This could be passed into a query similar to
MaxDisjunctionQuery; and combine the
:can I get the similar wordlist as output. so that I can show the end
:user in the column --- do you mean foam?
:How can I get similar word list in the given content?
This is a non trivial problem, because the definition of similar is
subject to interpretation. I
On Wednesday 24 November 2004 08:16, Morus Walter wrote:
Lucene itself doesn't handle wildcards within phrases.
This can be added using PhrasePrefixQuery (which is slightly misnamed):
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/PhrasePrefixQuery.html
Regards
Daniel
Thanks everybody for responds.
What else can essentially improve queries performance?
(I do not speak now about such things as keeping index optimized etc. -
it's clear)
As I experiensed on my 2 cpu box, during the query execution both
processors were realy busy. The question is would it
Hi Daniel,
I couldn't figure out how to use the PharsePrefixQuery with a phase like java*
developer. It only provides method to add terms. Can a term contain wildcard
character in lucene?
Thanks,
Terence
On Wednesday 24 November 2004 08:16, Morus Walter wrote:
Lucene itself doesn't
Hi Morus,
I want to search for the string like below:
- java developer
- javascript developer
By searching java*, it will return more than I want. That's why I am thinking
java* developer.
Terence
Terence Lai writes:
Look likes that the wildcard query disappeared. In fact, I am
I am able to delete now the Index using the following
if(indexDir.exists())
{
IndexReader reader = IndexReader.open( indexDir );
uidIter = reader.terms(new Term(id, ));
while (uidIter.term() != null uidIter.term().field() == id) {
reader.delete(uidIter.term());
uidIter.next();
}
I haven't tried it but believe this should work:
IndexReader reader;
void delete(long id) {
reader.delete(new Term(id, Long.toString(id)));
}
This also has the benefit that it does binary search rather than
sequential search.
You will want to pad you id's with leading zeroes
Thanks Paul!
Using your suggestion, I have changed the update check code to use
only the indexReader:
try {
localReader = IndexReader.open(path);
while (keyIter.hasNext()) {
key = (String) keyIter.next();
term = new Term(key, key);
I have also seen this problem.
In the Lucene code, I don't see where the reader speicified when
creating a field is closed. That holds on to the file.
I am looking at DocumentWriter.invertDocument()
Thanks
-John
On Mon, 22 Nov 2004 16:21:35 -0600, Chris Lamprecht
[EMAIL PROTECTED] wrote:
A
Does keyIter return the keys in sorted order? This should reduce seeks,
especially if the keys are dense.
Also, you should be able to localReader.delete(term) instead of
iterating over the docs (of which I presume there is only one doc since
keys are unique). This won't improve performance as
When comparing RAMDirectory and FSDirectory it is important to mention
what OS you are using. When using linux it will cache the most recent
disk access in memory. Here is a good article that describes its
strategy: http://forums.gentoo.org/viewtopic.php?t=175419
The 2% difference you are
Hi again,
Thanks for everyone who replied. The PerFieldAnalyzerWrapper was a good
suggestion, and one I had overlooked, but for our particular
requirements it wouldn't quite work so I went with overriding
getFieldQuery().
You were right, Paul. In 1.4.2 a whole heap of QueryParser changes were
Actually, just realised a PhraseQuery is incorrect...
I only want a single TermQuery but it just needs to be quoted, d'oh.
-Original Message-
Then I found that because that analyser always returns a single token
(TermQuery) it would send through spaces into the final query string,
15 matches
Mail list logo