[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field
[ https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507135#comment-15507135 ] Hoss Man commented on SOLR-9528: bq. That works for many contexts, except perhaps Alex's original usecase: fetch me the document that corresponds to a specific \_docid\_ . Ah, ok see -- i'm glad i asked: that usecase ("search for a document using it's internal id") was not actaully mentioned anywhere in this issue. I would argue that we should few that as a broader problem (in a distinct jira): "make it less to find documents that have a specific function value" (ie: add a new qparser/syntaxtic sugar for the behavior frange where the low/high values are the same ... maybe... {code} q={!func eq=123456789}docid() {code} > Make _docid_ (lucene id) a pseudo field > --- > > Key: SOLR-9528 > URL: https://issues.apache.org/jira/browse/SOLR-9528 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) >Reporter: Alexandre Rafalovitch >Priority: Minor > > Lucene document id is a transitory id that cannot be relied on as it can > change on document updates, etc. > However, there are circumstances where it could be useful to use it in a > search. The primarily use is a debugging where some error messages provide > only lucene document id as the reference. For example: > {noformat} > child query must only match non-parent docs, but parent docID=38200 matched > childScorer=class org.apache.lucene.search.DisjunctionSumScorer > {noformat} > We already expose the lucene id with \[docid] transformer with \_docid_ > sorting. > On the email list, [~yo...@apache.org] proposed that _docid_ should be a > legitimate pseudo-field, which would make it returnable, usable in function > queries, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field
[ https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505254#comment-15505254 ] Yonik Seeley commented on SOLR-9528: docid() as a value source is a good idea... as you say, makes it clearer that it's a computed value. That works for many contexts, except perhaps Alex's original usecase: fetch me the document that corresponds to a specific \_docid\_ . One could use something like frange(l=123456789,u=123456789)docid() but that's pretty clunky > Make _docid_ (lucene id) a pseudo field > --- > > Key: SOLR-9528 > URL: https://issues.apache.org/jira/browse/SOLR-9528 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) >Reporter: Alexandre Rafalovitch >Priority: Minor > > Lucene document id is a transitory id that cannot be relied on as it can > change on document updates, etc. > However, there are circumstances where it could be useful to use it in a > search. The primarily use is a debugging where some error messages provide > only lucene document id as the reference. For example: > {noformat} > child query must only match non-parent docs, but parent docID=38200 matched > childScorer=class org.apache.lucene.search.DisjunctionSumScorer > {noformat} > We already expose the lucene id with \[docid] transformer with \_docid_ > sorting. > On the email list, [~yo...@apache.org] proposed that _docid_ should be a > legitimate pseudo-field, which would make it returnable, usable in function > queries, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field
[ https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505214#comment-15505214 ] Hoss Man commented on SOLR-9528: bq. So I suppose this means a Won't-Fix for this issue, ... Assuming my vague guess at what's being suggested here is accurate, then yeah -- that would be my vote. But i'm still not certain i actaully understand the objective If my guess _was_ correct, then we could also just change the title of this jira and use it to track creating a patch that adds {{docid()}} as a ValueSource, and only once it exists update the ref guide to suggest it in any place where {{\_docid\_}} is currently suggested > Make _docid_ (lucene id) a pseudo field > --- > > Key: SOLR-9528 > URL: https://issues.apache.org/jira/browse/SOLR-9528 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) >Reporter: Alexandre Rafalovitch >Priority: Minor > > Lucene document id is a transitory id that cannot be relied on as it can > change on document updates, etc. > However, there are circumstances where it could be useful to use it in a > search. The primarily use is a debugging where some error messages provide > only lucene document id as the reference. For example: > {noformat} > child query must only match non-parent docs, but parent docID=38200 matched > childScorer=class org.apache.lucene.search.DisjunctionSumScorer > {noformat} > We already expose the lucene id with \[docid] transformer with \_docid_ > sorting. > On the email list, [~yo...@apache.org] proposed that _docid_ should be a > legitimate pseudo-field, which would make it returnable, usable in function > queries, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field
[ https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505203#comment-15505203 ] David Smiley commented on SOLR-9528: bq. Special syntax like _docid_ in the sort param made sense in the early days of Solr, but feel hackish now that we have first order functions (which are clearly a "computed" value, with no ambiguity that it might be stored) +1 to what you say Hoss. We've got ValueSourceParsers & DocumentTransformers now for this sorta thing. So I suppose this means a Won't-Fix for this issue, and might mean other new issues, and possibly a removal of "\_docid\_" in the Ref guide (being deprecated, yet still works in some situations (I know it doesn't always work)). > Make _docid_ (lucene id) a pseudo field > --- > > Key: SOLR-9528 > URL: https://issues.apache.org/jira/browse/SOLR-9528 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) >Reporter: Alexandre Rafalovitch >Priority: Minor > > Lucene document id is a transitory id that cannot be relied on as it can > change on document updates, etc. > However, there are circumstances where it could be useful to use it in a > search. The primarily use is a debugging where some error messages provide > only lucene document id as the reference. For example: > {noformat} > child query must only match non-parent docs, but parent docID=38200 matched > childScorer=class org.apache.lucene.search.DisjunctionSumScorer > {noformat} > We already expose the lucene id with \[docid] transformer with \_docid_ > sorting. > On the email list, [~yo...@apache.org] proposed that _docid_ should be a > legitimate pseudo-field, which would make it returnable, usable in function > queries, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field
[ https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505121#comment-15505121 ] Hoss Man commented on SOLR-9528: I don't understand, practially/actionable, what this sentence means... {noformat} ...proposed that _docid_ should be a legitimate pseudo-field, which would make it returnable, usable in function queries, etc. {noformat} * How is (anyone) defining "legitimate pseudo-field" in this context? * There's not enough context to understand what is implied by the "etc." in this sentence -- what are some concrete examples of what users would be able to do in the future that they can't do now ** alternatively: what are some examples of existing vs new _syntax_ that is being proposed (either in configs or requests) for functionality that is already supported? If the crux of this idea here is simply that the string {{\_docid\_}} should be usable anywhere that a fieldname can be used even when it's not defined in the schema, then that seems like a particularly bad/inconsistent idea to me since all of the other magic {{\_underscore\_}} fields that exist in solr *are* definied in the schema, and it's actually important how/if they are stored, docValues, etc... I've never been a huge fan of *any* magic field names in solr, and I *personally* would be confused as hell if we started doing encouraging users to use magic field names that look like real field names but don't actually exist -- especailly because i would never be sure when a user is asking a question if they actually added {{\_docid\_}} to their schema -- a situation i have actaully encountered in real live and was then *VERY* confused by the described behavior of {{sort=\_docid\_ asc}}. My straw man proposal would be to (informally/formally) deprecate using {{\_docid\_}} in the sort param, and insitead offer a {{docid()}} (or {{docnum()}}, whatever folks prefer) ValueSourceParser out of the box, that people could pass to other functions (for the purpose of filtering, sorting, whatever...), or request in the response via {{fl}} etc... Special syntax like {{\_docid\_}} in the sort param made sense in the early days of Solr, but feel hackish now that we have first order functions (which are clearly a "computed" value, with no ambiguity that it might be stored) (for that matter, i would argue we should do the same thing with "{{score}}" => {{score()}}, and add a {{random(seed)}} to replace the way users currently have to configure solr.RandomField ... but i'll save those fights for different jiras) > Make _docid_ (lucene id) a pseudo field > --- > > Key: SOLR-9528 > URL: https://issues.apache.org/jira/browse/SOLR-9528 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) >Reporter: Alexandre Rafalovitch >Priority: Minor > > Lucene document id is a transitory id that cannot be relied on as it can > change on document updates, etc. > However, there are circumstances where it could be useful to use it in a > search. The primarily use is a debugging where some error messages provide > only lucene document id as the reference. For example: > {noformat} > child query must only match non-parent docs, but parent docID=38200 matched > childScorer=class org.apache.lucene.search.DisjunctionSumScorer > {noformat} > We already expose the lucene id with \[docid] transformer with \_docid_ > sorting. > On the email list, [~yo...@apache.org] proposed that _docid_ should be a > legitimate pseudo-field, which would make it returnable, usable in function > queries, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field
[ https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503230#comment-15503230 ] Yonik Seeley commented on SOLR-9528: -1 to changing the name. \_docid\_ has been around forever (at least since 2010), and there's a high bar for breaking back compat. It's a major source of frustration for users. Additionally, I've never actually seen anyone that has run into \_docid\_ confuse it with anything else. If people didn't read the docs carefully, they would be just as likely to fall into a trap of considering "docnum" to be persistent (why wouldn't it be? it's the document number) new meme: "hypothetical confusion considered harmful" ;-) > Make _docid_ (lucene id) a pseudo field > --- > > Key: SOLR-9528 > URL: https://issues.apache.org/jira/browse/SOLR-9528 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) >Reporter: Alexandre Rafalovitch >Priority: Minor > > Lucene document id is a transitory id that cannot be relied on as it can > change on document updates, etc. > However, there are circumstances where it could be useful to use it in a > search. The primarily use is a debugging where some error messages provide > only lucene document id as the reference. For example: > {noformat} > child query must only match non-parent docs, but parent docID=38200 matched > childScorer=class org.apache.lucene.search.DisjunctionSumScorer > {noformat} > We already expose the lucene id with \[docid] transformer with \_docid_ > sorting. > On the email list, [~yo...@apache.org] proposed that _docid_ should be a > legitimate pseudo-field, which would make it returnable, usable in function > queries, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field
[ https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502960#comment-15502960 ] Alexandre Rafalovitch commented on SOLR-9528: - +1 on changing the names if that helps. -1 on removing it from the message, unless something better is put there (like Solr document id). > Make _docid_ (lucene id) a pseudo field > --- > > Key: SOLR-9528 > URL: https://issues.apache.org/jira/browse/SOLR-9528 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) >Reporter: Alexandre Rafalovitch >Priority: Minor > > Lucene document id is a transitory id that cannot be relied on as it can > change on document updates, etc. > However, there are circumstances where it could be useful to use it in a > search. The primarily use is a debugging where some error messages provide > only lucene document id as the reference. For example: > {noformat} > child query must only match non-parent docs, but parent docID=38200 matched > childScorer=class org.apache.lucene.search.DisjunctionSumScorer > {noformat} > We already expose the lucene id with \[docid] transformer with \_docid_ > sorting. > On the email list, [~yo...@apache.org] proposed that _docid_ should be a > legitimate pseudo-field, which would make it returnable, usable in function > queries, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field
[ https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502538#comment-15502538 ] Uwe Schindler commented on SOLR-9528: - Hi, bq. This might be off topic, but I'd like to remove internal docnum from that exception message. "docnum" is a much better name anyways! Because itr does not contain the magical term "id", which has some "persistent" association in people's brain. I'd also be a fan to change the method parameter names in Lucene to use this term. > Make _docid_ (lucene id) a pseudo field > --- > > Key: SOLR-9528 > URL: https://issues.apache.org/jira/browse/SOLR-9528 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) >Reporter: Alexandre Rafalovitch >Priority: Minor > > Lucene document id is a transitory id that cannot be relied on as it can > change on document updates, etc. > However, there are circumstances where it could be useful to use it in a > search. The primarily use is a debugging where some error messages provide > only lucene document id as the reference. For example: > {noformat} > child query must only match non-parent docs, but parent docID=38200 matched > childScorer=class org.apache.lucene.search.DisjunctionSumScorer > {noformat} > We already expose the lucene id with \[docid] transformer with \_docid_ > sorting. > On the email list, [~yo...@apache.org] proposed that _docid_ should be a > legitimate pseudo-field, which would make it returnable, usable in function > queries, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field
[ https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502530#comment-15502530 ] Mikhail Khludnev commented on SOLR-9528: This might be off topic, but I'd like to remove internal docnum from that exception message. > Make _docid_ (lucene id) a pseudo field > --- > > Key: SOLR-9528 > URL: https://issues.apache.org/jira/browse/SOLR-9528 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) >Reporter: Alexandre Rafalovitch >Priority: Minor > > Lucene document id is a transitory id that cannot be relied on as it can > change on document updates, etc. > However, there are circumstances where it could be useful to use it in a > search. The primarily use is a debugging where some error messages provide > only lucene document id as the reference. For example: > {noformat} > child query must only match non-parent docs, but parent docID=38200 matched > childScorer=class org.apache.lucene.search.DisjunctionSumScorer > {noformat} > We already expose the lucene id with \[docid] transformer with \_docid_ > sorting. > On the email list, [~yo...@apache.org] proposed that _docid_ should be a > legitimate pseudo-field, which would make it returnable, usable in function > queries, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field
[ https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501376#comment-15501376 ] Yonik Seeley commented on SOLR-9528: bq. I'm not a fan of allowing docid to be used as input to a query without a compelling real-world use-case. We already do though... one can do "sort=_docid_ asc" To not totally be confused by that, one must understand what _docid_ is (and that it can change across commits). So in my view, it's not a matter of adding a new magic field, but just making the exisitng one more consistent. If you can sort by it, and retrieve it, let one query by it as well. > Make _docid_ (lucene id) a pseudo field > --- > > Key: SOLR-9528 > URL: https://issues.apache.org/jira/browse/SOLR-9528 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) >Reporter: Alexandre Rafalovitch >Priority: Minor > > Lucene document id is a transitory id that cannot be relied on as it can > change on document updates, etc. > However, there are circumstances where it could be useful to use it in a > search. The primarily use is a debugging where some error messages provide > only lucene document id as the reference. For example: > {noformat} > child query must only match non-parent docs, but parent docID=38200 matched > childScorer=class org.apache.lucene.search.DisjunctionSumScorer > {noformat} > We already expose the lucene id with \[docid] transformer with \_docid_ > sorting. > On the email list, [~yo...@apache.org] proposed that _docid_ should be a > legitimate pseudo-field, which would make it returnable, usable in function > queries, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field
[ https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501317#comment-15501317 ] Erick Erickson commented on SOLR-9528: -- The point about debugging is very well taken. Would that functionality be served by having the debug information return the internal Lucene doc ID? That would make it less tempting to use as _input_... At first blush though, I'm not a fan of allowing _docid_ to be used as _input_ to a query without a compelling real-world use-case. Partly I don't want to deal with "query responses differ when using the same ID" questions and having to unravel "Oh, you mean the internal ID, not the " ;). Allowing _docid_ to be an _output_ is fine IMO. This isn't a super-strong objection but I would like to see the practical application (not theoretical, one someone is actually using in the field) before going down this path though. > Make _docid_ (lucene id) a pseudo field > --- > > Key: SOLR-9528 > URL: https://issues.apache.org/jira/browse/SOLR-9528 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) >Reporter: Alexandre Rafalovitch >Priority: Minor > > Lucene document id is a transitory id that cannot be relied on as it can > change on document updates, etc. > However, there are circumstances where it could be useful to use it in a > search. The primarily use is a debugging where some error messages provide > only lucene document id as the reference. For example: > {noformat} > child query must only match non-parent docs, but parent docID=38200 matched > childScorer=class org.apache.lucene.search.DisjunctionSumScorer > {noformat} > We already expose the lucene id with \[docid] transformer with \_docid_ > sorting. > On the email list, [~yo...@apache.org] proposed that _docid_ should be a > legitimate pseudo-field, which would make it returnable, usable in function > queries, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org