[jira] [Comment Edited] (LUCENE-7453) Change naming of variables/apis from docid to docnum

2016-09-21 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511145#comment-15511145
 ] 

Paul Elschot edited comment on LUCENE-7453 at 9/21/16 8:56 PM:
---

bq. But the seg examples you have still have docid, just with seg prepended. It 
still has the problem that it uses "id", when id means identifier,

This is meant as an identifier for a document within a segment; in a segment 
this identifier is permanent. There may be another identifier in a document 
field, but that is irrelevant here.

For compound readers there are multiple segments, and also in that case adding 
seg to the name is correct.



was (Author: paul.elsc...@xs4all.nl):
bq. But the seg examples you have still have docid, just with seg prepended. It 
still has the problem that it uses "id", when id means identifier,

This is meant as an identifier for a document within a segment; in a segment 
this identifier is permanent, and the only one.

For compound readers there are multiple segments, and also in that case adding 
seg to the name is correct.


> Change naming of variables/apis from docid to docnum
> 
>
> Key: LUCENE-7453
> URL: https://issues.apache.org/jira/browse/LUCENE-7453
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ryan Ernst
>
> In SOLR-9528 a suggestion was made to change {{docid}} to {{docnum}}. The 
> reasoning for this is most notably that {{docid}} has a connotation about a 
> persistent unique identifier (eg like {{_id}} in elasticsearch or {{id}} in 
> solr), while {{docid}} in lucene is currently some local to a segment, and 
> not comparable directly across segments.
> When I first started working on Lucene, I had this same confusion. {{docnum}} 
> is a much better name for this transient, segment local identifier for a doc. 
> Regardless of what solr wants to do in their api (eg keeping _docid_), I 
> think we should switch the lucene apis and variable names to use docnum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7453) Change naming of variables/apis from docid to docnum

2016-09-21 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511065#comment-15511065
 ] 

Mikhail Khludnev edited comment on LUCENE-7453 at 9/21/16 8:24 PM:
---

Whatever you name it, let's have _dixi_ in code! Please!


was (Author: mkhludnev):
Whatever you call name, let's have _dixi_ in code! Please!

> Change naming of variables/apis from docid to docnum
> 
>
> Key: LUCENE-7453
> URL: https://issues.apache.org/jira/browse/LUCENE-7453
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ryan Ernst
>
> In SOLR-9528 a suggestion was made to change {{docid}} to {{docnum}}. The 
> reasoning for this is most notably that {{docid}} has a connotation about a 
> persistent unique identifier (eg like {{_id}} in elasticsearch or {{id}} in 
> solr), while {{docid}} in lucene is currently some local to a segment, and 
> not comparable directly across segments.
> When I first started working on Lucene, I had this same confusion. {{docnum}} 
> is a much better name for this transient, segment local identifier for a doc. 
> Regardless of what solr wants to do in their api (eg keeping _docid_), I 
> think we should switch the lucene apis and variable names to use docnum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7453) Change naming of variables/apis from docid to docnum

2016-09-21 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509831#comment-15509831
 ] 

Dawid Weiss edited comment on LUCENE-7453 at 9/21/16 12:55 PM:
---

bq. Are you fine to get docIndex from IndexReader or IndexSearcher after 
submitting docs to IndexWriter?

Yes, I think so. When you add a document to an IndexWriter you don't get any 
document "id" (or its number) anyway. Documents are indexed and made available 
to you once you acquire a new IndexReader -- and then each document will be 
uniquely described with an "index", valid only within this particular 
IndexReader. I think this makes sense, even when you think of methods like 
{{maxDoc}} which could read {{maxDocIndex}}...


was (Author: dweiss):
bq. Are you fine to get docIndex from IndexReader or IndexSearcher after 
submitting docs to IndexWriter?

Yes, I think so. When you add a document to an IndexWriter you don't get any 
document "id" (or number) anyway. Documents are indexed and made available to 
you once you acquire a new IndexReader -- and then each document will be 
uniquely described with an "index", valid only within this particular 
IndexReader. I think this makes sense, even when you think of methods like 
{{maxDoc}} which could read {{maxDocIndex}}...

> Change naming of variables/apis from docid to docnum
> 
>
> Key: LUCENE-7453
> URL: https://issues.apache.org/jira/browse/LUCENE-7453
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ryan Ernst
>
> In SOLR-9528 a suggestion was made to change {{docid}} to {{docnum}}. The 
> reasoning for this is most notably that {{docid}} has a connotation about a 
> persistent unique identifier (eg like {{_id}} in elasticsearch or {{id}} in 
> solr), while {{docid}} in lucene is currently some local to a segment, and 
> not comparable directly across segments.
> When I first started working on Lucene, I had this same confusion. {{docnum}} 
> is a much better name for this transient, segment local identifier for a doc. 
> Regardless of what solr wants to do in their api (eg keeping _docid_), I 
> think we should switch the lucene apis and variable names to use docnum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7453) Change naming of variables/apis from docid to docnum

2016-09-21 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509280#comment-15509280
 ] 

Dawid Weiss edited comment on LUCENE-7453 at 9/21/16 9:02 AM:
--

bq. I think docNum is a good improvement because it makes it sounds like we are 
numbering the documents, not assigning a unique identifier to them.

Sorry, but this explanation is even more controversial and vague to me (what is 
"numbering" of documents?). I'd prefer simply explaining that identifiers are 
persistent within an index segment (because they are), but index segments can 
be merged and thus a document may be moved across index segments over time, 
changing its per-segment identifier. 

If we really wish to make loops like this not use the "id" naming:
{code}
for (int docId = 0, max = indexReader.maxDoc(); docId < max; docId++) {
  // do something
}
{code}

then really {{docNum}} doesn't make it any better. Even {{docIndex}} seems 
better to me; in fact, this "index" makes sense both at segment level (where 
the index doesn't change) and at composite reader level (where the 'index' of a 
document has a more complex semantics). If we make it clear document index is 
volatile and is valid (and constant) only for the a opened reader, then this is 
more clear to me.

{code}
for (int docIndex = 0, max = indexReader.maxDoc(); docIndex < max; docIndex++) {
  // do something
}
{code}




was (Author: dweiss):
bq. I think docNum is a good improvement because it makes it sounds like we are 
numbering the documents, not assigning a unique identifier to them.

Sorry, but this explanation is even more controversial and vague to me than 
(what is "numbering" of documents?). I'd prefer simply explaining that 
identifiers are persistent within an index segment (because they are), but 
index segments can be merged and thus a document may be moved across index 
segments over time, changing its per-segment identifier. 

If we really wish to make loops like this not use the "id" naming:
{code}
for (int docId = 0, max = indexReader.maxDoc(); docId < max; docId++) {
  // do something
}
{code}

then really {{docNum}} doesn't make it any better. Even {{docIndex}} seems 
better to me; in fact, this "index" makes sense both at segment level (where 
the index doesn't change) and at composite reader level (where the 'index' of a 
document has a more complex semantics). If we make it clear document index is 
volatile and is valid (and constant) only for the a opened reader, then this is 
more clear to me.

{code}
for (int docIndex = 0, max = indexReader.maxDoc(); docIndex < max; docIndex++) {
  // do something
}
{code}



> Change naming of variables/apis from docid to docnum
> 
>
> Key: LUCENE-7453
> URL: https://issues.apache.org/jira/browse/LUCENE-7453
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ryan Ernst
>
> In SOLR-9528 a suggestion was made to change {{docid}} to {{docnum}}. The 
> reasoning for this is most notably that {{docid}} has a connotation about a 
> persistent unique identifier (eg like {{_id}} in elasticsearch or {{id}} in 
> solr), while {{docid}} in lucene is currently some local to a segment, and 
> not comparable directly across segments.
> When I first started working on Lucene, I had this same confusion. {{docnum}} 
> is a much better name for this transient, segment local identifier for a doc. 
> Regardless of what solr wants to do in their api (eg keeping _docid_), I 
> think we should switch the lucene apis and variable names to use docnum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7453) Change naming of variables/apis from docid to docnum

2016-09-20 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507544#comment-15507544
 ] 

Dawid Weiss edited comment on LUCENE-7453 at 9/20/16 7:30 PM:
--

To me the difference between {{docnum}} and {{docid}} is really that {{docnum}} 
is one letter longer :) Seriously, it doesn't seem to be explaining anything 
more than {{docid}} does. It would be more self-explanatory to call it 
{{docSegmentIndex}}, but this seems verbose.

Don't you think adding better documentation (in one place and linking to it) 
would be a better idea than just renaming? Also, the nomenclature here has been 
with us for years. I don't see an obvious benefit of switching to {{docnum}} 
for new users and I see how it may be a confusing change to existing 
Lucene-experienced developers (especially if they have their own code that 
would stick to "docid" in local variables, etc.


was (Author: dweiss):
To me the difference between {{docnum}} and {{docid}} is really that {{docnum}} 
is one letter longer :) Seriously, it doesn't seem to be explaining anything 
more than {{docid}} does. It would be more self-explanatory to call it 
{{segmentIndex}}, but this seems verbose.

Don't you think adding better documentation (in one place and linking to it) 
would be a better idea than just renaming? Also, the nomenclature here has been 
with us for years. I don't see an obvious benefit of switching to {{docnum}} 
for new users and I see how it may be a confusing change to existing 
Lucene-experienced developers (especially if they have their own code that 
would stick to "docid" in local variables, etc.

> Change naming of variables/apis from docid to docnum
> 
>
> Key: LUCENE-7453
> URL: https://issues.apache.org/jira/browse/LUCENE-7453
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ryan Ernst
>
> In SOLR-9528 a suggestion was made to change {{docid}} to {{docnum}}. The 
> reasoning for this is most notably that {{docid}} has a connotation about a 
> persistent unique identifier (eg like {{_id}} in elasticsearch or {{id}} in 
> solr), while {{docid}} in lucene is currently some local to a segment, and 
> not comparable directly across segments.
> When I first started working on Lucene, I had this same confusion. {{docnum}} 
> is a much better name for this transient, segment local identifier for a doc. 
> Regardless of what solr wants to do in their api (eg keeping _docid_), I 
> think we should switch the lucene apis and variable names to use docnum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org