Réf. : Re: IndexSearcher and number of occurence

2005-01-13 Thread Bertrand VENZAL



Hi,

Thanks for your quick answer, I understood wot u meant by using the
indexSearcher to get the termFreqVector. But, you use an int as an id to
find the termFrequency so I suppose that it is the position number in the
IndexReader vector.
My problem is : during the indexing phase, I can store the id, but if a
document is deleted and recreated later on (like in an update), this will
change my vector and all the id's previously set will be no more correct.
Am i right on this point ? or am i missing something ...

thanks ...
|++|
||   Erik Hatcher ||
||   [EMAIL PROTECTED]||
||   ns.com  |  Pour :|
||   Envoyé par : |   Luce|
||   lucene-user-return-12|   ne   |
||   431-bertrand.venzal=c|   Users|
||   [EMAIL PROTECTED]|   List|
||   e.org|   luce|
|||   ne-us|
||   13/01/2005 11:28 |   [EMAIL 
PROTECTED]|
||   Veuillez répondre à  |   karta|
||   Lucene Users List  |   .apac|
|||   he.or|
|||   g   |
||||
||||
|||  cc :  |
||||
||||
||||
||||
||||
|||  Objet :   |
|||   Re:  |
|||   Index|
|||   Searc|
|||   her  |
|||   and  |
|||   numbe|
|||   r of |
|||   occur|
|||   ence |
||||
||||
|++|










On Jan 13, 2005, at 5:03 AM, Bertrand VENZAL wrote:



 Hi all,

 Im quite new in this mailing list. I ve many difficulties to find the
 number of a word (occurence) in a document, I need to use indexSearcher
 because of the query but the score returning is not wot i m looking
 for.
 I found in the mailing List the class TermDoc but it seems to work only
 with indexReader.

 If anyone can give a hand of this one, I will appreciate ...

Perhaps this technique is what you're looking for set the field(s)
you're interested in capturing frequency on to be vectored.  You'll see
that flag as additional overloaded methods on the Field.  You'll still
need to use an IndexReader, but that is no problem.  Construct an
IndexReader and use it to construct the IndexSearcher that you'll also
use.  Here's some snippets of code:

                // During indexing, subject field was added like this:
    doc.add(Field.UnStored(subject, subject, true));

... // now during searching...

    IndexReader reader = IndexReader.open(directory);

    ...
    // from your Hits, get the document id
    int id = hits.doc(i);

    TermFreqVector vector =
        reader.getTermFreqVector(id, subject);

Now read up on the TermFreqVector API to get at the frequency of a
specific term.

                Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Réf. : Re: IndexSearcher and number of occurence

2005-01-13 Thread Erik Hatcher
On Jan 13, 2005, at 10:17 AM, Bertrand VENZAL wrote:

Hi,
Thanks for your quick answer, I understood wot u meant by using the
indexSearcher to get the termFreqVector. But, you use an int as an id 
to
find the termFrequency so I suppose that it is the position number in 
the
IndexReader vector.
My problem is : during the indexing phase, I can store the id, but if a
document is deleted and recreated later on (like in an update), this 
will
change my vector and all the id's previously set will be no more 
correct.
Am i right on this point ? or am i missing something ...
Yes, the Document id (the one Lucene uses) is not to be relied on 
long-term.  But, in the example you'd get it from Hits immediately 
after a search, and thus it would be accurate and usable.  You do not 
need to store any the id during indexing - Lucene maintains it and 
gives it to you from Hits.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Réf. : Re: Réf. : Re: IndexSearcher and number of occurence

2005-01-13 Thread Bertrand VENZAL



Great, thanks for your help, I understand things quickly but I need lots of
explanation .. ;-)

For who is interested .. I was using :

int id = hits.doc(i);
instead of :
int id = hits.id(i);

Tchõ
Bertrand





On Jan 13, 2005, at 10:17 AM, Bertrand VENZAL wrote:




 Hi,

 Thanks for your quick answer, I understood wot u meant by using the
 indexSearcher to get the termFreqVector. But, you use an int as an id
 to
 find the termFrequency so I suppose that it is the position number in
 the
 IndexReader vector.
 My problem is : during the indexing phase, I can store the id, but if a
 document is deleted and recreated later on (like in an update), this
 will
 change my vector and all the id's previously set will be no more
 correct.
 Am i right on this point ? or am i missing something ...

Yes, the Document id (the one Lucene uses) is not to be relied on
long-term.  But, in the example you'd get it from Hits immediately
after a search, and thus it would be accurate and usable.  You do not
need to store any the id during indexing - Lucene maintains it and
gives it to you from Hits.

                Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]