[jira] [Comment Edited] (LUCENE-7453) Change naming of variables/apis from docid to docnum
[ https://issues.apache.org/jira/browse/LUCENE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511145#comment-15511145 ] Paul Elschot edited comment on LUCENE-7453 at 9/21/16 8:56 PM: --- bq. But the seg examples you have still have docid, just with seg prepended. It still has the problem that it uses "id", when id means identifier, This is meant as an identifier for a document within a segment; in a segment this identifier is permanent. There may be another identifier in a document field, but that is irrelevant here. For compound readers there are multiple segments, and also in that case adding seg to the name is correct. was (Author: paul.elsc...@xs4all.nl): bq. But the seg examples you have still have docid, just with seg prepended. It still has the problem that it uses "id", when id means identifier, This is meant as an identifier for a document within a segment; in a segment this identifier is permanent, and the only one. For compound readers there are multiple segments, and also in that case adding seg to the name is correct. > Change naming of variables/apis from docid to docnum > > > Key: LUCENE-7453 > URL: https://issues.apache.org/jira/browse/LUCENE-7453 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ryan Ernst > > In SOLR-9528 a suggestion was made to change {{docid}} to {{docnum}}. The > reasoning for this is most notably that {{docid}} has a connotation about a > persistent unique identifier (eg like {{_id}} in elasticsearch or {{id}} in > solr), while {{docid}} in lucene is currently some local to a segment, and > not comparable directly across segments. > When I first started working on Lucene, I had this same confusion. {{docnum}} > is a much better name for this transient, segment local identifier for a doc. > Regardless of what solr wants to do in their api (eg keeping _docid_), I > think we should switch the lucene apis and variable names to use docnum. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7453) Change naming of variables/apis from docid to docnum
[ https://issues.apache.org/jira/browse/LUCENE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511065#comment-15511065 ] Mikhail Khludnev edited comment on LUCENE-7453 at 9/21/16 8:24 PM: --- Whatever you name it, let's have _dixi_ in code! Please! was (Author: mkhludnev): Whatever you call name, let's have _dixi_ in code! Please! > Change naming of variables/apis from docid to docnum > > > Key: LUCENE-7453 > URL: https://issues.apache.org/jira/browse/LUCENE-7453 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ryan Ernst > > In SOLR-9528 a suggestion was made to change {{docid}} to {{docnum}}. The > reasoning for this is most notably that {{docid}} has a connotation about a > persistent unique identifier (eg like {{_id}} in elasticsearch or {{id}} in > solr), while {{docid}} in lucene is currently some local to a segment, and > not comparable directly across segments. > When I first started working on Lucene, I had this same confusion. {{docnum}} > is a much better name for this transient, segment local identifier for a doc. > Regardless of what solr wants to do in their api (eg keeping _docid_), I > think we should switch the lucene apis and variable names to use docnum. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7453) Change naming of variables/apis from docid to docnum
[ https://issues.apache.org/jira/browse/LUCENE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509831#comment-15509831 ] Dawid Weiss edited comment on LUCENE-7453 at 9/21/16 12:55 PM: --- bq. Are you fine to get docIndex from IndexReader or IndexSearcher after submitting docs to IndexWriter? Yes, I think so. When you add a document to an IndexWriter you don't get any document "id" (or its number) anyway. Documents are indexed and made available to you once you acquire a new IndexReader -- and then each document will be uniquely described with an "index", valid only within this particular IndexReader. I think this makes sense, even when you think of methods like {{maxDoc}} which could read {{maxDocIndex}}... was (Author: dweiss): bq. Are you fine to get docIndex from IndexReader or IndexSearcher after submitting docs to IndexWriter? Yes, I think so. When you add a document to an IndexWriter you don't get any document "id" (or number) anyway. Documents are indexed and made available to you once you acquire a new IndexReader -- and then each document will be uniquely described with an "index", valid only within this particular IndexReader. I think this makes sense, even when you think of methods like {{maxDoc}} which could read {{maxDocIndex}}... > Change naming of variables/apis from docid to docnum > > > Key: LUCENE-7453 > URL: https://issues.apache.org/jira/browse/LUCENE-7453 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ryan Ernst > > In SOLR-9528 a suggestion was made to change {{docid}} to {{docnum}}. The > reasoning for this is most notably that {{docid}} has a connotation about a > persistent unique identifier (eg like {{_id}} in elasticsearch or {{id}} in > solr), while {{docid}} in lucene is currently some local to a segment, and > not comparable directly across segments. > When I first started working on Lucene, I had this same confusion. {{docnum}} > is a much better name for this transient, segment local identifier for a doc. > Regardless of what solr wants to do in their api (eg keeping _docid_), I > think we should switch the lucene apis and variable names to use docnum. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7453) Change naming of variables/apis from docid to docnum
[ https://issues.apache.org/jira/browse/LUCENE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509280#comment-15509280 ] Dawid Weiss edited comment on LUCENE-7453 at 9/21/16 9:02 AM: -- bq. I think docNum is a good improvement because it makes it sounds like we are numbering the documents, not assigning a unique identifier to them. Sorry, but this explanation is even more controversial and vague to me (what is "numbering" of documents?). I'd prefer simply explaining that identifiers are persistent within an index segment (because they are), but index segments can be merged and thus a document may be moved across index segments over time, changing its per-segment identifier. If we really wish to make loops like this not use the "id" naming: {code} for (int docId = 0, max = indexReader.maxDoc(); docId < max; docId++) { // do something } {code} then really {{docNum}} doesn't make it any better. Even {{docIndex}} seems better to me; in fact, this "index" makes sense both at segment level (where the index doesn't change) and at composite reader level (where the 'index' of a document has a more complex semantics). If we make it clear document index is volatile and is valid (and constant) only for the a opened reader, then this is more clear to me. {code} for (int docIndex = 0, max = indexReader.maxDoc(); docIndex < max; docIndex++) { // do something } {code} was (Author: dweiss): bq. I think docNum is a good improvement because it makes it sounds like we are numbering the documents, not assigning a unique identifier to them. Sorry, but this explanation is even more controversial and vague to me than (what is "numbering" of documents?). I'd prefer simply explaining that identifiers are persistent within an index segment (because they are), but index segments can be merged and thus a document may be moved across index segments over time, changing its per-segment identifier. If we really wish to make loops like this not use the "id" naming: {code} for (int docId = 0, max = indexReader.maxDoc(); docId < max; docId++) { // do something } {code} then really {{docNum}} doesn't make it any better. Even {{docIndex}} seems better to me; in fact, this "index" makes sense both at segment level (where the index doesn't change) and at composite reader level (where the 'index' of a document has a more complex semantics). If we make it clear document index is volatile and is valid (and constant) only for the a opened reader, then this is more clear to me. {code} for (int docIndex = 0, max = indexReader.maxDoc(); docIndex < max; docIndex++) { // do something } {code} > Change naming of variables/apis from docid to docnum > > > Key: LUCENE-7453 > URL: https://issues.apache.org/jira/browse/LUCENE-7453 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ryan Ernst > > In SOLR-9528 a suggestion was made to change {{docid}} to {{docnum}}. The > reasoning for this is most notably that {{docid}} has a connotation about a > persistent unique identifier (eg like {{_id}} in elasticsearch or {{id}} in > solr), while {{docid}} in lucene is currently some local to a segment, and > not comparable directly across segments. > When I first started working on Lucene, I had this same confusion. {{docnum}} > is a much better name for this transient, segment local identifier for a doc. > Regardless of what solr wants to do in their api (eg keeping _docid_), I > think we should switch the lucene apis and variable names to use docnum. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7453) Change naming of variables/apis from docid to docnum
[ https://issues.apache.org/jira/browse/LUCENE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507544#comment-15507544 ] Dawid Weiss edited comment on LUCENE-7453 at 9/20/16 7:30 PM: -- To me the difference between {{docnum}} and {{docid}} is really that {{docnum}} is one letter longer :) Seriously, it doesn't seem to be explaining anything more than {{docid}} does. It would be more self-explanatory to call it {{docSegmentIndex}}, but this seems verbose. Don't you think adding better documentation (in one place and linking to it) would be a better idea than just renaming? Also, the nomenclature here has been with us for years. I don't see an obvious benefit of switching to {{docnum}} for new users and I see how it may be a confusing change to existing Lucene-experienced developers (especially if they have their own code that would stick to "docid" in local variables, etc. was (Author: dweiss): To me the difference between {{docnum}} and {{docid}} is really that {{docnum}} is one letter longer :) Seriously, it doesn't seem to be explaining anything more than {{docid}} does. It would be more self-explanatory to call it {{segmentIndex}}, but this seems verbose. Don't you think adding better documentation (in one place and linking to it) would be a better idea than just renaming? Also, the nomenclature here has been with us for years. I don't see an obvious benefit of switching to {{docnum}} for new users and I see how it may be a confusing change to existing Lucene-experienced developers (especially if they have their own code that would stick to "docid" in local variables, etc. > Change naming of variables/apis from docid to docnum > > > Key: LUCENE-7453 > URL: https://issues.apache.org/jira/browse/LUCENE-7453 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ryan Ernst > > In SOLR-9528 a suggestion was made to change {{docid}} to {{docnum}}. The > reasoning for this is most notably that {{docid}} has a connotation about a > persistent unique identifier (eg like {{_id}} in elasticsearch or {{id}} in > solr), while {{docid}} in lucene is currently some local to a segment, and > not comparable directly across segments. > When I first started working on Lucene, I had this same confusion. {{docnum}} > is a much better name for this transient, segment local identifier for a doc. > Regardless of what solr wants to do in their api (eg keeping _docid_), I > think we should switch the lucene apis and variable names to use docnum. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org