Re: Search Hit Score

2004-07-07 Thread Ype Kingma
On Wednesday 07 July 2004 08:25, Ype Kingma wrote:

 For a single term query, one can iterate through
 IndexReader.termDocs(Term) and store the document numbers by
 TermDocs.docFreq().

That should be TermDocs.freq()

Oops,
Ype


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Optimizing for long queries?

2004-07-07 Thread Drew Farris
On Mon, 28 Jun 2004 10:04:40 +0200, Julien Nioche
[EMAIL PROTECTED] wrote:
 Hello Drew,
 
 I don't think it's in the FAQ.
 

Julien,

Thanks for the advice, and the in-depth exploration of INDEX_INTERVAL
here and on the developer's list. If I have the opportunity to run
similar benchmarks comparing both modified INDEX_INTERVALS and sorted
queries,  I'll share the results as well.

You mentioned that you were using queries from query logs for your
application. Do you have a general idea of the average query length of
these queries?

Drew.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



MultiSearcher is very slow

2004-07-07 Thread Don Vaillancourt
Hi all,
I've managed to add multi-index searching capability to my code.  But one 
thing that I have noticed is that Lucene is extremely slow in searching.

For example I have been testing with 2 indexes for the past month or so and 
searching them returns results in under 250ms and sometimes even surprising 
me at under 50ms.  All of this on my desktop computer.

If I search both indexes with the MultiSearcher class, it take over 2000ms 
to search.  I have compared the search results and they do match.  But why 
is it taking so much longer.

I'm assuming that each index is searched separately and then the Hits 
results from each search is merged and sorted.  But still, I don't think 
that this should take any longer than 750ms.

Is it just the way Lucene MultiSearcher class was engineered that makes it 
slow or is there something that I am doing wrong?

Thanks
Don Vaillancourt
Director of Software Development
WEB IMPACT INC.
416-815-2000 ext. 245
email: [EMAIL PROTECTED]
web: http://www.web-impact.com

This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright.  If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.







Re: upgrade from Lucene 1.3 final to 1.4rc3 problem

2004-07-07 Thread Zilverline info
This is a bug (see posting 'Lockfile Problem Solved'), upgrade to 
1.4-final, and you'll be fine

Alex Aw Seat Kiong wrote:
Hi!
I'm using Lucene 1.3 final currently, all things were working fine.
But, after i'm upgraded from Lucene 1.3 final to 1.4rc3 (simply overwrite the 
lucene-1.4-final.jar to lucene-1.4-rc3.jar and re-compile it)
We can re-compile it successfuly. but when will try to index the document. It give the 
error as below:
java.lang.NullPointerException
   at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:146)
   at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:126)
   at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102)
   at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:83)
   at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:173)
Which wrong? Pls help.
Thanks.
Regards,
Alex

 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Most efficient way to index 14M documents (out of memory/file handles)

2004-07-07 Thread Andrzej Bialecki
[EMAIL PROTECTED] wrote:
A colleague of mine found the fastest way to index was to use a RAMDirectory, letting it grow
to a pre-defined maximum size, then merging it to a new temporary file-based index to
flush it. Repeat this, creating new directories for all the file based indexes then perform 
a merge into one index once all docs are indexed.

I haven't managed to test this for myself but my colleague  says he noticed a 
considerable speed up by merging once at the end with this approach so you may want
to give it a try. (This was with Lucene 1.3)
I can confirm that this approach works quite well - I use it myself in 
some applications, both with Lucene 1.3 and 1.4. The disadvantage is of 
course that the memory consumption goes up, so you have to be careful to 
  cap the max size of RAMDirectory according to your max heap size limits.

--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-
FreeBSD developer (http://www.freebsd.org)
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Search Hit Score

2004-07-07 Thread Karthik N S
Hey Ype

 Apologies .


 I would be more intrested in Boost/Weight factor in terms of  Query rather
then Fields.

 Please explain with example src.

With regards
Karthik


-Original Message-
From: Ype Kingma [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 07, 2004 12:08 PM
To: [EMAIL PROTECTED]
Subject: Re: Search  Hit Score


On Wednesday 07 July 2004 08:25, Ype Kingma wrote:

 For a single term query, one can iterate through
 IndexReader.termDocs(Term) and store the document numbers by
 TermDocs.docFreq().

That should be TermDocs.freq()

Oops,
Ype


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Most efficient way to index 14M documents (out of memory/file handles)

2004-07-07 Thread Harald Kirsch
On Tue, Jul 06, 2004 at 10:44:40PM -0700, Kevin A. Burton wrote:
 I'm trying to burn an index of 14M documents.
 
 I have two problems.
 
 1.  I have to run optimize() every 50k documents or I run out of file 
 handles.  this takes TIME and of course is linear to the size of the 
 index so it just gets slower by the time I complete.  It starts to crawl 
 at about 3M documents.

Recently I indexed roughly this many documents. I separated the whole
thing first into 100 jobs (we happen to have that many machines in the
cluster.-) each indexing its share into its own index. I used
mergeFactor=100 and only optimized just before closing the index.

Then I merged them all into one index simply by

  writer.mergeFactor = 150; 
  writer.addIndexes(dirs);

I was surprised myself that it went through easily within under two
hours for each of the 101 indexes. The documents have, however, only
three fields.

  Maybe this helps,
  Harald.

-- 

Harald Kirsch | [EMAIL PROTECTED] | +44 (0) 1223/49-2593

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Most efficient way to index 14M documents (out of memory/file handles)

2004-07-07 Thread Doug Cutting
A mergeFactor of 5000 is a bad idea.  If you want to index faster, try 
increasing minMergeDocs instead.  If you have lots of memory this can 
probably be 5000 or higher.

Also, why do you optimize before you're done?  That only slows things. 
Perhaps you have to do it because you've set mergeFactor to such an 
extreme value?  I do not recommend a merge factor higher than 100.

Doug
Kevin A. Burton wrote:
I'm trying to burn an index of 14M documents.
I have two problems.
1.  I have to run optimize() every 50k documents or I run out of file 
handles.  this takes TIME and of course is linear to the size of the 
index so it just gets slower by the time I complete.  It starts to crawl 
at about 3M documents.

2.  I eventually will run out of memory in this configuration.
I KNOW this has been covered before but for the life of me I can't find 
it in the archives, the FAQ or the wiki.
I'm using an IndexWriter with a mergeFactor of 5k and then optimizing 
every 50k documents.

Does it make sense to just create a new IndexWriter for every 50k docs 
and then do one big optimize() at the end?

Kevin
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Most efficient way to index 14M documents (out of memory/file handles)

2004-07-07 Thread Julien Nioche
It is not surprising that you run out of file handles with such a large
mergeFactor.

Before trying more complex strategies involving RAMDirectories and/or
splitting your indexation on several machines, I reckon you should try
simple things like using a low mergeFactor (eg: 10) combined with a higher
minMergeDocs (ex: 1000) and optimize only at the end of the process.

By setting a higher value to minMergeDocs, you'll index and merge with a
RAMDirectory. When the limit is reached (ex 1000) a segment is written in
the FS. MergeFactor controls the number of segments to be merged, so when
you have 10 segments on the FS (which is already 10x1000 docs), the
IndexWriter will merge them all into a single segment. This is equivalent to
an optimize I think. The process continues like that until it's finished.

Combining theses parameters should be enough to achieve good performance.
The good point of using minMergeDocs is that you make a heavy use of the
RAMDirectory used by your IndexWriter (== fast) without having to be too
careful with the RAM (which would be the case with RamDirectory). At the
same time keeping your mergeFactor low limits the risks of too many handles
problem.


- Original Message - 
From: Kevin A. Burton [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, July 07, 2004 7:44 AM
Subject: Most efficient way to index 14M documents (out of memory/file
handles)


 I'm trying to burn an index of 14M documents.

 I have two problems.

 1.  I have to run optimize() every 50k documents or I run out of file
 handles.  this takes TIME and of course is linear to the size of the
 index so it just gets slower by the time I complete.  It starts to crawl
 at about 3M documents.

 2.  I eventually will run out of memory in this configuration.

 I KNOW this has been covered before but for the life of me I can't find
 it in the archives, the FAQ or the wiki.

 I'm using an IndexWriter with a mergeFactor of 5k and then optimizing
 every 50k documents.

 Does it make sense to just create a new IndexWriter for every 50k docs
 and then do one big optimize() at the end?

 Kevin

 -- 

 Please reply using PGP.

 http://peerfear.org/pubkey.asc

 NewsMonster - http://www.newsmonster.org/

 Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM/YIM - sfburtonator,  Web - http://peerfear.org/
 GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
   IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: addIndexes vs addDocument

2004-07-07 Thread roy-lucene-user
Otis,

Okay, got it... however we weren't creating new document objects... just
grabbing a document through an IndexReader and calling addDocument on another
index.  Would that still work with unstored fields(well, its working for us
since we don't have any unstored fields)?

Thanks a lot!

Roy.

On Tue, 6 Jul 2004 19:46:30 -0700 (PDT), Otis Gospodnetic wrote
 Quick example.
 Index A has fields 'title' and 'contents'.
 Field 'contents' is stored in A as Field.UnStored.
 This means that you cannot retrieve the original content of the
 'contents' field, since that value was not stored verbatim in the
 index.
 Therefore, you cannot create a new Document instance, pull out String
 value of the 'contents' field from A, use it to create another field,
 add it to the new Document instance, and add that Document to a new
 index B using addDocument method.
 
 addIndexes method does not need to pull out the original String field
 values from Documents, so it will work.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



addIndexes and optimize

2004-07-07 Thread roy-lucene-user
Hey y'all again,

Just wondering why the IndexWriter.addIndexes method calls optimize before and after 
it starts merging segments together.

We would like to create an addIndexes method that doesn't optimize and call optimize 
on the IndexWriter later.

Roy.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Most efficient way to index 14M documents (out of memory/file handles)

2004-07-07 Thread Doug Cutting
Julien,
Thanks for the excellent explanation.
I think this thread points to a documentation problem.  We should 
improve the javadoc for these parameters to make it easier for folks to

In particular, the javadoc for mergeFactor should mention that very 
large values (100) are not recommended, since they can run into file 
handle limitations with FSDirectory.  The maximum number of open files 
while merging is around mergeFactor * (5 + number of indexed fields). 
Perhaps mergeFactor should be tagged an Expert parameter to discourage 
folks playing with it, as it is such a common source of problems.

The javadoc should instead encourage using minMergeDocs to increase 
indexing speed by using more memory.  This parameter is unfortunately 
poorly named.  It should really be called something like maxBufferedDocs.

Doug
Julien Nioche wrote:
It is not surprising that you run out of file handles with such a large
mergeFactor.
Before trying more complex strategies involving RAMDirectories and/or
splitting your indexation on several machines, I reckon you should try
simple things like using a low mergeFactor (eg: 10) combined with a higher
minMergeDocs (ex: 1000) and optimize only at the end of the process.
By setting a higher value to minMergeDocs, you'll index and merge with a
RAMDirectory. When the limit is reached (ex 1000) a segment is written in
the FS. MergeFactor controls the number of segments to be merged, so when
you have 10 segments on the FS (which is already 10x1000 docs), the
IndexWriter will merge them all into a single segment. This is equivalent to
an optimize I think. The process continues like that until it's finished.
Combining theses parameters should be enough to achieve good performance.
The good point of using minMergeDocs is that you make a heavy use of the
RAMDirectory used by your IndexWriter (== fast) without having to be too
careful with the RAM (which would be the case with RamDirectory). At the
same time keeping your mergeFactor low limits the risks of too many handles
problem.
- Original Message - 
From: Kevin A. Burton [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, July 07, 2004 7:44 AM
Subject: Most efficient way to index 14M documents (out of memory/file
handles)


I'm trying to burn an index of 14M documents.
I have two problems.
1.  I have to run optimize() every 50k documents or I run out of file
handles.  this takes TIME and of course is linear to the size of the
index so it just gets slower by the time I complete.  It starts to crawl
at about 3M documents.
2.  I eventually will run out of memory in this configuration.
I KNOW this has been covered before but for the life of me I can't find
it in the archives, the FAQ or the wiki.
I'm using an IndexWriter with a mergeFactor of 5k and then optimizing
every 50k documents.
Does it make sense to just create a new IndexWriter for every 50k docs
and then do one big optimize() at the end?
Kevin
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   NewsMonster - http://www.newsmonster.org/
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Searching for asterisk in a term

2004-07-07 Thread yahootintin . 1247688
Can you recommend an analyzer that doesn't discard '*' or '/'?



--- Lucene
Users List [EMAIL PROTECTED] wrote:

The first thing you'll
want to check is that you are using an Analyzer

 that does not discard that
'*' before indexing.  StandardAnalyzer, for

 instance, will discard it.
 Check one of Erik Hatcher's articles that

 includes a tool that helps you
see what your Analyzer does with the any

 given text input.  You can also
use Luke to see what your index

 contains.

 

 Otis

 

 --- [EMAIL PROTECTED]
wrote:

  Hi,

  

  I'm trying to search for a term that contains an
asterisk.  

  

  This

  is the field that I indexed:

  - new Field(testField,
Hello *foo bar, true,

  true, true);

  

  I'm trying to find this
document by matching '*foo':

  - new

  TermQuery(new Term(testField,
*me));

  

  I've also tried to escape the

  * like this:

  -
new TermQuery(new Term(testField, \\*me));

  

  Neither

  of
these queries return this document.  Is this type of search

  possible
with

  Lucene?

  

  Thanks.

  

  -

  To unsubscribe, e-mail: [EMAIL PROTECTED]


 For additional commands, e-mail: [EMAIL PROTECTED]


 

  

 

 

 -

 To unsubscribe, e-mail: [EMAIL PROTECTED]

 For
additional commands, e-mail: [EMAIL PROTECTED]

 

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching for asterisk in a term

2004-07-07 Thread Erik Hatcher
On Jul 7, 2004, at 3:41 PM, [EMAIL PROTECTED] wrote:
Can you recommend an analyzer that doesn't discard '*' or '/'?
WhitespaceAnalyzer :)
Check the wiki AnalysisParalysis page also.
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Problem with match on a non tokenized field.

2004-07-07 Thread wallen
Use org.apache.lucene.analysis.PerFieldAnalyzerWrapper

Here is how I use it:

PerFieldAnalyzerWrapper analyzer  = new
org.apache.lucene.analysis.PerFieldAnalyzerWrapper(new MyAnalyzer());
analyzer.addAnalyzer(url, new NullAnalyzer());
try 
{
query = QueryParser.parse(searchQuery, contents,
analyzer);

-Original Message-
From: Polina Litvak [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 07, 2004 4:20 PM
To: [EMAIL PROTECTED]
Subject: Problem with match on a non tokenized field.


I have a Lucene Document with a field named Code which is stored 
and indexed but not tokenized. The value of the field is ABC5-LB.
The only way I can match the field when searching is by entering 
Code:ABC5-LB because when I drop the quotes, every Analyzer I've tried
using breaks my
query into Code:ABC5 -Code:LB.
 
I need to be able to match this field by doing something like
Code:ABC5-L*, therefore always using quotes is not an option.
 
How would I go about writing my own analyzer that will not tokenize the
query ?
 
Thanks,
Polina
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: indexing help

2004-07-07 Thread Doug Cutting
John Wang wrote:
 While lucene tokenizes the words in the document, it counts the
frequency and figures out the position, we are trying to bypass this
stage: For each document, I have a set of words with a know frequency,
e.g. java (5), lucene (6) etc. (I don't care about the position, so it
can always be 0.)
 What I can do now is to create a dummy document, e.g. java java
java java java lucene lucene lucene lucene lucene and pass it to
lucene.
 This seems hacky and cumbersome. Is there a better alternative? I
browsed around in the source code, but couldn't find anything.
Write an analyzer that returns terms with the appropriate distribution.
For example:
public class VectorTokenStream extends TokenStream {
  private int term;
  private int freq;
  public VectorTokenStream(String[] terms, int[] freqs) {
this.terms = terms;
this.freqs = freqs;
  }
  public Token next() {
if (freq == 0) {
  term++;
  if (term = terms.length)
return null;
  freq = freqs[term];
}
freq--;
return new Token(terms[term], 0, 0);
  }
}
Document doc = new Document();
doc.add(Field.Text(content, ));
indexWriter.addDocument(doc, new Analyzer() {
  public TokenStream tokenStream(String field, Reader reader) {
return new VectorTokenStream(new String[] {java,lucene},
 new int[] {5,6});
  }
});
  Too bad the Field class is final, otherwise I can derive from it
and do something on that line...
Extending Field would not help.  That's why it's final.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


PhraseQuery with Wildcards?

2004-07-07 Thread yahootintin . 1247688
Hi,



Is there any way to do a PhraseQuery with Wildcards?



I'd like to
search for:

  MyField:foo bar*



I thought I could cobble something together
using PhraseQuery and Wildcards but I couldn't get this functionality to work
due to my lack of experience with Lucene.



Is there a way to do this with
Lucene?



Thanks.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: indexing help

2004-07-07 Thread John Wang
Hi Doug:
 Thanks for the response!

 The solution you proposed is still a derivative of creating a
dummy document stream. Taking the same example, java (5), lucene (6),
VectorTokenStream would create a total of 11 Tokens whereas only 2 is
neccessary.

Given many documents with many terms and frequencies, it would
create many extra Token instances.

   The reason I was looking to derving the Field class is because I
can directly manipulate the FieldInfo by setting the frequency. But
the class is final...

   Any other suggestions?

Thanks

-John

On Wed, 07 Jul 2004 14:20:24 -0700, Doug Cutting [EMAIL PROTECTED] wrote:
 John Wang wrote:
   While lucene tokenizes the words in the document, it counts the
  frequency and figures out the position, we are trying to bypass this
  stage: For each document, I have a set of words with a know frequency,
  e.g. java (5), lucene (6) etc. (I don't care about the position, so it
  can always be 0.)
 
   What I can do now is to create a dummy document, e.g. java java
  java java java lucene lucene lucene lucene lucene and pass it to
  lucene.
 
   This seems hacky and cumbersome. Is there a better alternative? I
  browsed around in the source code, but couldn't find anything.
 
 Write an analyzer that returns terms with the appropriate distribution.
 
 For example:
 
 public class VectorTokenStream extends TokenStream {
   private int term;
   private int freq;
   public VectorTokenStream(String[] terms, int[] freqs) {
 this.terms = terms;
 this.freqs = freqs;
   }
   public Token next() {
 if (freq == 0) {
   term++;
   if (term = terms.length)
 return null;
   freq = freqs[term];
 }
 freq--;
 return new Token(terms[term], 0, 0);
   }
 }
 
 Document doc = new Document();
 doc.add(Field.Text(content, ));
 indexWriter.addDocument(doc, new Analyzer() {
   public TokenStream tokenStream(String field, Reader reader) {
 return new VectorTokenStream(new String[] {java,lucene},
  new int[] {5,6});
   }
 });
 
Too bad the Field class is final, otherwise I can derive from it
  and do something on that line...
 
 Extending Field would not help.  That's why it's final.
 
 Doug
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: PhraseQuery with Wildcards?

2004-07-07 Thread Erik Hatcher
On Jul 7, 2004, at 6:24 PM, [EMAIL PROTECTED] wrote:
Hi,
Is there any way to do a PhraseQuery with Wildcards?
No.
This very question came up a few days ago.  Look at PhrasePrefixQuery - 
although this will be a bit of effort to expand the terms matching the 
wildcarded term.

I'd like to
search for:
  MyField:foo bar*
I thought I could cobble something together
using PhraseQuery and Wildcards but I couldn't get this functionality 
to work
due to my lack of experience with Lucene.

Is there a way to do this with
Lucene?
Thanks.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Deleting a Doc found via a Query

2004-07-07 Thread Bill Tschumy
I must be missing something here, but I can't see an easy way to delete 
a Document that has been found via searching.  The delete() method of 
IndexReader takes a docNum.  How do I get the docNum corresponding to 
the Document in the Hits?

I tried scanning through all the Documents using IndexReader's 
document(i) method, testing for equality (==) with my queried Document, 
but it wasn't found.  I assume this is because the Documents returned 
in Hits are copies of the Documents the document() method returns.
--
Bill Tschumy
Otherwise -- Austin, TX
http://www.otherwise.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Deleting a Doc found via a Query

2004-07-07 Thread Peter M Cipollone
Bill,

Check
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Hits.html#id(int)

Pete

- Original Message - 
From: Bill Tschumy [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, July 07, 2004 9:46 PM
Subject: Deleting a Doc found via a Query


 I must be missing something here, but I can't see an easy way to delete
 a Document that has been found via searching.  The delete() method of
 IndexReader takes a docNum.  How do I get the docNum corresponding to
 the Document in the Hits?

 I tried scanning through all the Documents using IndexReader's
 document(i) method, testing for equality (==) with my queried Document,
 but it wasn't found.  I assume this is because the Documents returned
 in Hits are copies of the Documents the document() method returns.
 -- 
 Bill Tschumy
 Otherwise -- Austin, TX
 http://www.otherwise.com


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Deleting a Doc found via a Query

2004-07-07 Thread Bill Tschumy
Thanks.  This works fine.  I guess I was missing something g.  I  
would have expected this to be a property of Document.

On Jul 7, 2004, at 8:49 PM, Peter M Cipollone wrote:
Bill,
Check
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/ 
Hits.html#id(int)

Pete
- Original Message -
From: Bill Tschumy [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, July 07, 2004 9:46 PM
Subject: Deleting a Doc found via a Query

I must be missing something here, but I can't see an easy way to  
delete
a Document that has been found via searching.  The delete() method of
IndexReader takes a docNum.  How do I get the docNum corresponding to
the Document in the Hits?

I tried scanning through all the Documents using IndexReader's
document(i) method, testing for equality (==) with my queried  
Document,
but it wasn't found.  I assume this is because the Documents returned
in Hits are copies of the Documents the document() method returns.
--
Bill Tschumy
Otherwise -- Austin, TX
http://www.otherwise.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--
Bill Tschumy
Otherwise -- Austin, TX
http://www.otherwise.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: upgrade from Lucene 1.3 final to 1.4rc3 problem

2004-07-07 Thread Alex Aw Seat Kiong
Hi!

Thanks, the problem was sovled by using lucene1.4 final.

Regards,
AlexAw


- Original Message - 
From: Zilverline info [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, July 07, 2004 10:32 PM
Subject: Re: upgrade from Lucene 1.3 final to 1.4rc3 problem


 This is a bug (see posting 'Lockfile Problem Solved'), upgrade to
 1.4-final, and you'll be fine

 Alex Aw Seat Kiong wrote:

 Hi!
 
 I'm using Lucene 1.3 final currently, all things were working fine.
 But, after i'm upgraded from Lucene 1.3 final to 1.4rc3 (simply overwrite
the lucene-1.4-final.jar to lucene-1.4-rc3.jar and re-compile it)
 We can re-compile it successfuly. but when will try to index the
document. It give the error as below:
 java.lang.NullPointerException
 at
org.apache.lucene.store.FSDirectory.create(FSDirectory.java:146)
 at
org.apache.lucene.store.FSDirectory.init(FSDirectory.java:126)
 at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102)
 at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:83)
 at
org.apache.lucene.index.IndexWriter.init(IndexWriter.java:173)
 Which wrong? Pls help.
 
 Thanks.
 
 Regards,
 Alex
 
 
 
 
 
 



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene shouldn't use java.io.tmpdir

2004-07-07 Thread Kevin A. Burton
As per 1.3 (or was it 1.4) Lucene migrated to using java.iot.tmpdir to 
store the locks for the index.

While under most situations this is save a lot of application servers 
change java.io.tmpdir at runtime.

Tomcat is a good example.  Within Tomcat this property is set to 
TOMCAT_HOME/temp..

Under this situation if I were to create two IndexWriters within two VMs 
and try to write to the same index  the index would get corrupted if one 
Lucene instance was within Tomcat and the other was within a standard VM.

I think we should consider either:
1. Using out own tmpdir property based on the given OS.
2. Go back to the old mechanism of storing the locks within the index 
basedir (if it's not readonly).

Thoughts?
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene shouldn't use java.io.tmpdir

2004-07-07 Thread Otis Gospodnetic
Hey Kevin,

Not sure if you're aware of it, but you can specify the lock dir, so in
your example, both JVMs could use the exact same lock dir, as long as
you invoke the VMs with the same params.  You shouldn't be writing the
same index with more than 1 IndexWriter though (not sure if this was
just a bad example or a real scenario).

Otis


--- Kevin A. Burton [EMAIL PROTECTED] wrote:
 As per 1.3 (or was it 1.4) Lucene migrated to using java.iot.tmpdir
 to 
 store the locks for the index.
 
 While under most situations this is save a lot of application servers
 
 change java.io.tmpdir at runtime.
 
 Tomcat is a good example.  Within Tomcat this property is set to 
 TOMCAT_HOME/temp..
 
 Under this situation if I were to create two IndexWriters within two
 VMs 
 and try to write to the same index  the index would get corrupted if
 one 
 Lucene instance was within Tomcat and the other was within a standard
 VM.
 
 I think we should consider either:
 
 1. Using out own tmpdir property based on the given OS.
 2. Go back to the old mechanism of storing the locks within the index
 
 basedir (if it's not readonly).
 
 Thoughts?
 
 -- 
 
 Please reply using PGP.
 
 http://peerfear.org/pubkey.asc
 
 NewsMonster - http://www.newsmonster.org/
 
 Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM/YIM - sfburtonator,  Web - http://peerfear.org/
 GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
   IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: unicode-compatible

2004-07-07 Thread Otis Gospodnetic
Moving to lucene-user list.

Persian = Farsi?  What you would need is a Farsi Analyzer, and Lucene
does not come with one, unfortunately.  You'll likely have to write it
yourself, or find an existing one.

Otis

--- shafipour elnaz [EMAIL PROTECTED] wrote:
 I want to make it to be compatible with persian unicode but I don't
 know where should I do these changes.
 
 Erik Hatcher [EMAIL PROTECTED] wrote:On Jul 7, 2004, at
 8:53 AM, shafipour elnaz wrote:
  Could any one tell me how can I make lucene unicode-compatible?
 
 It already is.
 
 Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Hit Score

2004-07-07 Thread Karthik N S

Hey

 Dev Guys

 Apologies



  Can some body  Explain me How to Retrieve  All hits  avaliable per indexed
document.

   To explain in Detail


   A Physical Search on Single document would list 3 places  for a certain
word occurance,

   So if  i am suppose to retrieve all the 3 Occurances from the same Field
using Lucene ...

   How to handle the query .. ... Explain with a simple SRC Example


 with regards
  Karthik





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]