[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field

2016-09-20 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507135#comment-15507135
 ] 

Hoss Man commented on SOLR-9528:


bq. That works for many contexts, except perhaps Alex's original usecase: fetch 
me the document that corresponds to a specific \_docid\_ .

Ah, ok see -- i'm glad i asked: that usecase ("search for a document using it's 
internal id") was not actaully mentioned anywhere in this issue.

I would argue that we should few that as a broader problem (in a distinct 
jira): "make it less to find documents that have a specific function value" 
(ie: add a new qparser/syntaxtic sugar for the behavior frange where the 
low/high values are the same ... maybe...
{code}
q={!func eq=123456789}docid()
{code}


> Make _docid_ (lucene id) a pseudo field
> ---
>
> Key: SOLR-9528
> URL: https://issues.apache.org/jira/browse/SOLR-9528
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Alexandre Rafalovitch
>Priority: Minor
>
> Lucene document id is a transitory id that cannot be relied on as it can 
> change on document updates, etc.
> However, there are circumstances where it could be useful to use it in a 
> search. The primarily use is a debugging where some error messages provide 
> only lucene document id as the reference. For example:
> {noformat}
> child query must only match non-parent docs, but parent docID=38200 matched 
> childScorer=class org.apache.lucene.search.DisjunctionSumScorer
> {noformat}
> We already expose the lucene id with \[docid] transformer with \_docid_ 
> sorting.
> On the email list, [~yo...@apache.org] proposed that _docid_ should be a 
> legitimate pseudo-field, which would make it returnable, usable in function 
> queries, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field

2016-09-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505254#comment-15505254
 ] 

Yonik Seeley commented on SOLR-9528:


docid() as a value source is a good idea... as you say, makes it clearer that 
it's a computed value.
That works for many contexts, except perhaps Alex's original usecase: fetch me 
the document that corresponds to a specific \_docid\_ .
One could use something like frange(l=123456789,u=123456789)docid() but that's 
pretty clunky

> Make _docid_ (lucene id) a pseudo field
> ---
>
> Key: SOLR-9528
> URL: https://issues.apache.org/jira/browse/SOLR-9528
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Alexandre Rafalovitch
>Priority: Minor
>
> Lucene document id is a transitory id that cannot be relied on as it can 
> change on document updates, etc.
> However, there are circumstances where it could be useful to use it in a 
> search. The primarily use is a debugging where some error messages provide 
> only lucene document id as the reference. For example:
> {noformat}
> child query must only match non-parent docs, but parent docID=38200 matched 
> childScorer=class org.apache.lucene.search.DisjunctionSumScorer
> {noformat}
> We already expose the lucene id with \[docid] transformer with \_docid_ 
> sorting.
> On the email list, [~yo...@apache.org] proposed that _docid_ should be a 
> legitimate pseudo-field, which would make it returnable, usable in function 
> queries, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field

2016-09-19 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505214#comment-15505214
 ] 

Hoss Man commented on SOLR-9528:


bq. So I suppose this means a Won't-Fix for this issue, ...

Assuming my vague guess at what's being suggested here is accurate, then yeah 
-- that would be my vote.  But i'm still not certain i actaully understand the 
objective

If my guess _was_ correct, then we could also just change the title of this 
jira and use it to track creating a patch that adds {{docid()}} as a 
ValueSource, and only once it exists update the ref guide to suggest it in any 
place where {{\_docid\_}} is currently suggested

> Make _docid_ (lucene id) a pseudo field
> ---
>
> Key: SOLR-9528
> URL: https://issues.apache.org/jira/browse/SOLR-9528
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Alexandre Rafalovitch
>Priority: Minor
>
> Lucene document id is a transitory id that cannot be relied on as it can 
> change on document updates, etc.
> However, there are circumstances where it could be useful to use it in a 
> search. The primarily use is a debugging where some error messages provide 
> only lucene document id as the reference. For example:
> {noformat}
> child query must only match non-parent docs, but parent docID=38200 matched 
> childScorer=class org.apache.lucene.search.DisjunctionSumScorer
> {noformat}
> We already expose the lucene id with \[docid] transformer with \_docid_ 
> sorting.
> On the email list, [~yo...@apache.org] proposed that _docid_ should be a 
> legitimate pseudo-field, which would make it returnable, usable in function 
> queries, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field

2016-09-19 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505203#comment-15505203
 ] 

David Smiley commented on SOLR-9528:


bq. Special syntax like _docid_ in the sort param made sense in the early days 
of Solr, but feel hackish now that we have first order functions (which are 
clearly a "computed" value, with no ambiguity that it might be stored)

+1 to what you say Hoss. We've got ValueSourceParsers & DocumentTransformers 
now for this sorta thing.

So I suppose this means a Won't-Fix for this issue, and might mean other new 
issues, and possibly a removal of "\_docid\_" in the Ref guide (being 
deprecated, yet still works in some situations (I know it doesn't always work)).

> Make _docid_ (lucene id) a pseudo field
> ---
>
> Key: SOLR-9528
> URL: https://issues.apache.org/jira/browse/SOLR-9528
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Alexandre Rafalovitch
>Priority: Minor
>
> Lucene document id is a transitory id that cannot be relied on as it can 
> change on document updates, etc.
> However, there are circumstances where it could be useful to use it in a 
> search. The primarily use is a debugging where some error messages provide 
> only lucene document id as the reference. For example:
> {noformat}
> child query must only match non-parent docs, but parent docID=38200 matched 
> childScorer=class org.apache.lucene.search.DisjunctionSumScorer
> {noformat}
> We already expose the lucene id with \[docid] transformer with \_docid_ 
> sorting.
> On the email list, [~yo...@apache.org] proposed that _docid_ should be a 
> legitimate pseudo-field, which would make it returnable, usable in function 
> queries, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field

2016-09-19 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505121#comment-15505121
 ] 

Hoss Man commented on SOLR-9528:


I don't understand, practially/actionable, what this sentence means...

{noformat}
...proposed that _docid_ should be a legitimate pseudo-field, which would 
make it returnable, usable in function queries, etc.
{noformat}

* How is (anyone) defining "legitimate pseudo-field" in this context?
* There's not enough context to understand what is implied by the "etc." in 
this sentence -- what are some concrete examples of what users would be able to 
do in the future that they can't do now
** alternatively: what are some examples of existing vs new _syntax_ that is 
being proposed (either in configs or requests) for functionality that is 
already supported?



If the crux of this idea here is simply that the string {{\_docid\_}} should be 
usable anywhere that a fieldname can be used even when it's not defined in the 
schema, then that seems like a particularly bad/inconsistent idea to me since 
all of the other magic {{\_underscore\_}} fields that exist in solr *are* 
definied in the schema, and it's actually important how/if they are stored, 
docValues, etc...

I've never been a huge fan of *any* magic field names in solr, and I 
*personally* would be confused as hell if we started doing encouraging users to 
use magic field names that look like real field names but don't actually exist 
-- especailly because i would never be sure when a user is asking a question if 
they actually added {{\_docid\_}} to their schema -- a situation i have 
actaully encountered in real live and was then *VERY* confused by the described 
behavior of {{sort=\_docid\_ asc}}.  

My straw man proposal would be to (informally/formally) deprecate using 
{{\_docid\_}} in the sort param, and insitead offer a {{docid()}} (or 
{{docnum()}}, whatever folks prefer) ValueSourceParser out of the box, that 
people could pass to other functions (for the purpose of filtering, sorting, 
whatever...), or request in the response via {{fl}} etc...   

Special syntax like {{\_docid\_}} in the sort param made sense in the early 
days of Solr, but feel hackish now that we have first order functions (which 
are clearly a "computed" value, with no ambiguity that it might be stored)

(for that matter, i would argue we should do the same thing with "{{score}}" => 
{{score()}}, and add a {{random(seed)}} to replace the way users currently have 
to configure solr.RandomField ... but i'll save those fights for different 
jiras)

> Make _docid_ (lucene id) a pseudo field
> ---
>
> Key: SOLR-9528
> URL: https://issues.apache.org/jira/browse/SOLR-9528
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Alexandre Rafalovitch
>Priority: Minor
>
> Lucene document id is a transitory id that cannot be relied on as it can 
> change on document updates, etc.
> However, there are circumstances where it could be useful to use it in a 
> search. The primarily use is a debugging where some error messages provide 
> only lucene document id as the reference. For example:
> {noformat}
> child query must only match non-parent docs, but parent docID=38200 matched 
> childScorer=class org.apache.lucene.search.DisjunctionSumScorer
> {noformat}
> We already expose the lucene id with \[docid] transformer with \_docid_ 
> sorting.
> On the email list, [~yo...@apache.org] proposed that _docid_ should be a 
> legitimate pseudo-field, which would make it returnable, usable in function 
> queries, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field

2016-09-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503230#comment-15503230
 ] 

Yonik Seeley commented on SOLR-9528:


-1 to changing the name.

\_docid\_ has been around forever (at least since 2010), and there's a high bar 
for breaking back compat.  It's a major source of frustration for users.  
Additionally, I've never actually seen anyone that has run into \_docid\_ 
confuse it with anything else.  If people didn't read the docs carefully, they 
would be just as likely to fall into a trap of considering "docnum" to be 
persistent (why wouldn't it be? it's the document number)

new meme: "hypothetical confusion considered harmful" ;-)

> Make _docid_ (lucene id) a pseudo field
> ---
>
> Key: SOLR-9528
> URL: https://issues.apache.org/jira/browse/SOLR-9528
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Alexandre Rafalovitch
>Priority: Minor
>
> Lucene document id is a transitory id that cannot be relied on as it can 
> change on document updates, etc.
> However, there are circumstances where it could be useful to use it in a 
> search. The primarily use is a debugging where some error messages provide 
> only lucene document id as the reference. For example:
> {noformat}
> child query must only match non-parent docs, but parent docID=38200 matched 
> childScorer=class org.apache.lucene.search.DisjunctionSumScorer
> {noformat}
> We already expose the lucene id with \[docid] transformer with \_docid_ 
> sorting.
> On the email list, [~yo...@apache.org] proposed that _docid_ should be a 
> legitimate pseudo-field, which would make it returnable, usable in function 
> queries, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field

2016-09-19 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502960#comment-15502960
 ] 

Alexandre Rafalovitch commented on SOLR-9528:
-

+1 on changing the names if that helps.

-1 on removing it from the message, unless something better is put there (like 
Solr document id). 

> Make _docid_ (lucene id) a pseudo field
> ---
>
> Key: SOLR-9528
> URL: https://issues.apache.org/jira/browse/SOLR-9528
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Alexandre Rafalovitch
>Priority: Minor
>
> Lucene document id is a transitory id that cannot be relied on as it can 
> change on document updates, etc.
> However, there are circumstances where it could be useful to use it in a 
> search. The primarily use is a debugging where some error messages provide 
> only lucene document id as the reference. For example:
> {noformat}
> child query must only match non-parent docs, but parent docID=38200 matched 
> childScorer=class org.apache.lucene.search.DisjunctionSumScorer
> {noformat}
> We already expose the lucene id with \[docid] transformer with \_docid_ 
> sorting.
> On the email list, [~yo...@apache.org] proposed that _docid_ should be a 
> legitimate pseudo-field, which would make it returnable, usable in function 
> queries, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field

2016-09-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502538#comment-15502538
 ] 

Uwe Schindler commented on SOLR-9528:
-

Hi,

bq. This might be off topic, but I'd like to remove internal docnum from that 
exception message.

"docnum" is a much better name anyways! Because itr does not contain the 
magical term "id", which has some "persistent" association in people's brain. 
I'd also be a fan to change the method parameter names in Lucene to use this 
term.

> Make _docid_ (lucene id) a pseudo field
> ---
>
> Key: SOLR-9528
> URL: https://issues.apache.org/jira/browse/SOLR-9528
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Alexandre Rafalovitch
>Priority: Minor
>
> Lucene document id is a transitory id that cannot be relied on as it can 
> change on document updates, etc.
> However, there are circumstances where it could be useful to use it in a 
> search. The primarily use is a debugging where some error messages provide 
> only lucene document id as the reference. For example:
> {noformat}
> child query must only match non-parent docs, but parent docID=38200 matched 
> childScorer=class org.apache.lucene.search.DisjunctionSumScorer
> {noformat}
> We already expose the lucene id with \[docid] transformer with \_docid_ 
> sorting.
> On the email list, [~yo...@apache.org] proposed that _docid_ should be a 
> legitimate pseudo-field, which would make it returnable, usable in function 
> queries, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field

2016-09-19 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502530#comment-15502530
 ] 

Mikhail Khludnev commented on SOLR-9528:


This might be off topic, but I'd like to remove internal docnum from that 
exception message. 

> Make _docid_ (lucene id) a pseudo field
> ---
>
> Key: SOLR-9528
> URL: https://issues.apache.org/jira/browse/SOLR-9528
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Alexandre Rafalovitch
>Priority: Minor
>
> Lucene document id is a transitory id that cannot be relied on as it can 
> change on document updates, etc.
> However, there are circumstances where it could be useful to use it in a 
> search. The primarily use is a debugging where some error messages provide 
> only lucene document id as the reference. For example:
> {noformat}
> child query must only match non-parent docs, but parent docID=38200 matched 
> childScorer=class org.apache.lucene.search.DisjunctionSumScorer
> {noformat}
> We already expose the lucene id with \[docid] transformer with \_docid_ 
> sorting.
> On the email list, [~yo...@apache.org] proposed that _docid_ should be a 
> legitimate pseudo-field, which would make it returnable, usable in function 
> queries, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field

2016-09-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501376#comment-15501376
 ] 

Yonik Seeley commented on SOLR-9528:


bq.  I'm not a fan of allowing docid to be used as input to a query without a 
compelling real-world use-case. 

We already do though... one can do "sort=_docid_ asc"
To not totally be confused by that, one must understand what _docid_ is (and 
that it can change across commits).
So in my view, it's not a matter of adding a new magic field, but just making 
the exisitng one more consistent.  If you can sort by it, and retrieve it, let 
one query by it as well.

> Make _docid_ (lucene id) a pseudo field
> ---
>
> Key: SOLR-9528
> URL: https://issues.apache.org/jira/browse/SOLR-9528
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Alexandre Rafalovitch
>Priority: Minor
>
> Lucene document id is a transitory id that cannot be relied on as it can 
> change on document updates, etc.
> However, there are circumstances where it could be useful to use it in a 
> search. The primarily use is a debugging where some error messages provide 
> only lucene document id as the reference. For example:
> {noformat}
> child query must only match non-parent docs, but parent docID=38200 matched 
> childScorer=class org.apache.lucene.search.DisjunctionSumScorer
> {noformat}
> We already expose the lucene id with \[docid] transformer with \_docid_ 
> sorting.
> On the email list, [~yo...@apache.org] proposed that _docid_ should be a 
> legitimate pseudo-field, which would make it returnable, usable in function 
> queries, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9528) Make _docid_ (lucene id) a pseudo field

2016-09-18 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501317#comment-15501317
 ] 

Erick Erickson commented on SOLR-9528:
--

The point about debugging is very well taken. Would that functionality be 
served by having the debug information return the internal Lucene doc ID? That 
would make it less tempting to use as _input_...

At first  blush though, I'm not a fan of allowing _docid_ to be used as _input_ 
to a query without a compelling real-world use-case. Partly I don't want to 
deal with "query responses differ when using the same ID" questions and having 
to unravel "Oh, you mean the internal ID, not the " ;). Allowing 
_docid_ to be an _output_ is fine IMO. This isn't a super-strong objection but 
I would like to see the practical application (not theoretical, one someone is 
actually using in the field) before going down this path though.



> Make _docid_ (lucene id) a pseudo field
> ---
>
> Key: SOLR-9528
> URL: https://issues.apache.org/jira/browse/SOLR-9528
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Alexandre Rafalovitch
>Priority: Minor
>
> Lucene document id is a transitory id that cannot be relied on as it can 
> change on document updates, etc.
> However, there are circumstances where it could be useful to use it in a 
> search. The primarily use is a debugging where some error messages provide 
> only lucene document id as the reference. For example:
> {noformat}
> child query must only match non-parent docs, but parent docID=38200 matched 
> childScorer=class org.apache.lucene.search.DisjunctionSumScorer
> {noformat}
> We already expose the lucene id with \[docid] transformer with \_docid_ 
> sorting.
> On the email list, [~yo...@apache.org] proposed that _docid_ should be a 
> legitimate pseudo-field, which would make it returnable, usable in function 
> queries, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org