getting Lucene Docid from inside score()

2018-03-09 Thread Dwaipayan Roy
While searching, I want to get the lucene assigned docid (that starts from
0 to the number of documents -1) of a document having a particular query
term.

>From inside the score(), printing 'doc' or calling docId() is returning a
docid which, I think, is the internal docid of a segment in which the
document is indexed. However, I want to have the lucene assigned docid. How
to do that?

Dwaipayan..


Re: getting Lucene Docid from inside score()

2018-03-09 Thread Michael Sokolov
Are you sure you want this? Lucene docids aren't generally useful outside a
narrow internal context. They can change over time for example.

But if you do, it sounds like maybe what you are seeing is the per segment
docid. To get a global one you have to add the segment offset, held by a
leaf reader.

On Mar 9, 2018 5:06 AM, "Dwaipayan Roy"  wrote:

> While searching, I want to get the lucene assigned docid (that starts from
> 0 to the number of documents -1) of a document having a particular query
> term.
>
> From inside the score(), printing 'doc' or calling docId() is returning a
> docid which, I think, is the internal docid of a segment in which the
> document is indexed. However, I want to have the lucene assigned docid. How
> to do that?
>
> Dwaipayan..
>


Re: getting Lucene Docid from inside score()

2018-03-09 Thread Dwaipayan Roy
Thank you very much for your reply. Yes, I really want this (for
implementing a retrieval function that extends the LMDir function).
Precisely, I want the document numbering same as that we see in
Lucene-Index-Viewers like Luke.

I am not sure what you meant by "segment offset, held by a leaf reader"..
Can you please explain a little, exactly when and what I need to do?

Many thanks.


On 2018/03/09 11:25:44, Michael Sokolov  wrote:
> Are you sure you want this? Lucene docids aren't generally useful outside
a>
> narrow internal context. They can change over time for example.>
>
> But if you do, it sounds like maybe what you are seeing is the per
segment>
> docid. To get a global one you have to add the segment offset, held by a>
> leaf reader.>
>
> On Mar 9, 2018 5:06 AM, "Dwaipayan Roy"  wrote:>
>
> > While searching, I want to get the lucene assigned docid (that starts
from>
> > 0 to the number of documents -1) of a document having a particular
query>
> > term.>
> >>
> > From inside the score(), printing 'doc' or calling docId() is returning
a>
> > docid which, I think, is the internal docid of a segment in which the>
> > document is indexed. However, I want to have the lucene assigned docid.
How>
> > to do that?>
> >>
> > Dwaipayan..>
> >>
>


Re: getting Lucene Docid from inside score()

2018-03-09 Thread dwaipayan . roy
Thank you very much for your reply. Yes, I really want this (for
implementing a retrieval function that extends the LMDir function).
Precisely, I want the document numbering same as that we see in
Lucene-Index-Viewers like Luke.

I am not sure what you meant by "segment offset, held by a leaf reader"..
Can you please explain a little, exactly when and what I need to do?

Many thanks.

On 2018/03/09 11:25:44, Michael Sokolov  wrote: 
> Are you sure you want this? Lucene docids aren't generally useful outside a
> narrow internal context. They can change over time for example.
> 
> But if you do, it sounds like maybe what you are seeing is the per segment
> docid. To get a global one you have to add the segment offset, held by a
> leaf reader.
> 
> On Mar 9, 2018 5:06 AM, "Dwaipayan Roy"  wrote:
> 
> > While searching, I want to get the lucene assigned docid (that starts from
> > 0 to the number of documents -1) of a document having a particular query
> > term.
> >
> > From inside the score(), printing 'doc' or calling docId() is returning a
> > docid which, I think, is the internal docid of a segment in which the
> > document is indexed. However, I want to have the lucene assigned docid. How
> > to do that?
> >
> > Dwaipayan..
> >
> 

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: getting Lucene Docid from inside score()

2018-03-09 Thread Erick Erickson
You almost certainly do _not_ want this unless you are absolutely and
totally sure that your index does not change between the time you ask
for for the internal Lucene doc ID and the time you use it. No docs
may be added. No forceMerges are done. In fact, I'd go so far as to
say you shouldn't open any new searchers.

Here's the reason. Say I have a single segment index with internal doc
IDs 1, 2, 3, 4, 5. Say I delete docs 2 and 3. Now say I optimize, the
new segment has IDs 1, 2, 3. This a simplification to illustrate that
_whenever_ a segment gets rewritten for any reason, internal Lucene
doc IDs may change. All this goes on in the background and you have no
control over when.

Docs may even get renumbered relative to each other. Let's claim that
your SOlr ID is doc1 and its associated internal ID is 1. doc100 has
internal id 100. Segment merging could assign doc1 an id of 200 and
doc100 an id of 150. You just don't know.

Luke and the like are using a point-in-time snapshot of the index.

If you still want to get the internal ID, just specify the
pseudo-field [docid], as: "fl=id,[docid]"

Best,
Erick

On Fri, Mar 9, 2018 at 3:50 AM, dwaipayan@gmail.com
 wrote:
> Thank you very much for your reply. Yes, I really want this (for
> implementing a retrieval function that extends the LMDir function).
> Precisely, I want the document numbering same as that we see in
> Lucene-Index-Viewers like Luke.
>
> I am not sure what you meant by "segment offset, held by a leaf reader"..
> Can you please explain a little, exactly when and what I need to do?
>
> Many thanks.
>
> On 2018/03/09 11:25:44, Michael Sokolov  wrote:
>> Are you sure you want this? Lucene docids aren't generally useful outside a
>> narrow internal context. They can change over time for example.
>>
>> But if you do, it sounds like maybe what you are seeing is the per segment
>> docid. To get a global one you have to add the segment offset, held by a
>> leaf reader.
>>
>> On Mar 9, 2018 5:06 AM, "Dwaipayan Roy"  wrote:
>>
>> > While searching, I want to get the lucene assigned docid (that starts from
>> > 0 to the number of documents -1) of a document having a particular query
>> > term.
>> >
>> > From inside the score(), printing 'doc' or calling docId() is returning a
>> > docid which, I think, is the internal docid of a segment in which the
>> > document is indexed. However, I want to have the lucene assigned docid. How
>> > to do that?
>> >
>> > Dwaipayan..
>> >
>>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org