subject:"\[jira\] \[Commented\] \(SOLR\-8220\) Read field from docValues for non stored fields"

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2016-02-19 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154456#comment-15154456
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


Notes for the reference guide:

Page: https://cwiki.apache.org/confluence/display/solr/Defining+Fields
{code}
Property=useDocValuesAsStored
Description=If the field has docValues enabled, setting this to true would 
allow the field to be treated as regular stored fields (even if it has 
stored=false). This means that this field would be returned alongside regular 
stored fields that are returned using the fl parameter.
Values=true or false
Implicit default=false for schema versions <1.6, true for schema versions >=1.6
{code}


Page: https://cwiki.apache.org/confluence/display/solr/DocValues
{code}
 Retrieving docValues during search:

Field values retrieved during search queries are typically returned from stored 
values if the field has stored=true. However, starting with schema version 1.6, 
all non-stored docValues fields will be also returned along with other stored 
fields when all fields (or pattern matching globs) are specified to be returned 
(e.g. fl=*) for search queries. This behavior can be turned on and off by 
setting useDocValuesAsStored parameter for a field or a field type to true 
(implicit default since schema version 1.6) or false (implicit default till 
schema version 1.5). See 
https://cwiki.apache.org/confluence/display/solr/Defining+Fields

Note that enabling this property has performance implications because DocValues 
are column-oriented and may therefore incur additional cost to retrieve for 
each returned document. Also note that while returning non-stored fields from 
docValues (default in schema versions 1.6+, unless useDocValuesAsStored is 
false), the values of a multi-valued field are returned in sorted order (and 
not insertion order). If you require the multi-valued fields to be returned in 
the original insertion order, then make your multi-valued field as stored (such 
a change requires re-indexing).
{code}

Page: 
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-Thefl%28FieldList%29Parameter
{code}
Note: Starting with schema version 1.6, if there are non-stored fields with 
docValues enabled in the index, then a pattern glob like * in the fl parameter 
will retrieve those fields. This is not the case if those fields have 
explicitly useDocValuesAsStored as false in their field definition (see 
https://cwiki.apache.org/confluence/display/solr/Defining+Fields) or the schema 
version is <1.6. However, something like fl=dvfield or fl=*,dvfield (say 
dvfield is a non-stored field with docValues enabled) would retrieve the 
dvfield irrespective of the useDocValuesAsStored value. (See SOLR-8220 for more 
details)
{code}

Could someone please review and update the ref guide with the above 
information? And please feel free to reorganize, modify, drop, or rephrase any 
of this.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2016-01-11 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092491#comment-15092491
 ] 

Erick Erickson commented on SOLR-8220:
--

Hmmm, one implication that I just realized (yeah, I'm slow sometimes). In 
addition to the output from a DV field being reordered, identical values are 
collapsed since it's a SortedSet under the covers, right? So if I have a 
MultiValued DocValues field and put in "memory", "memory", "memory", then all I 
get back from the DV field is one copy of "memory".

Since the guarantee that multiValued fields return data in the same order 
inserted when returning the Stored value is not true of returning DV values, 
this is perhaps no biggie. IMO it does deserve a comment when we do the docs 
for this though.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2016-01-11 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092500#comment-15092500
 ] 

Yonik Seeley commented on SOLR-8220:


Yep... I commented earlier that we should prob add an explicit "set" type/flag 
in the future.  It can make sense to have a field that preserves 
order/instances and one that doesn't.


> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-28 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072712#comment-15072712
 ] 

Shalin Shekhar Mangar commented on SOLR-8220:
-

I had a chat with David on IRC. In summary, his objections are:
# The primary is the default — are stored=false & docValues=true returned from 
globs by default (whatever we name the field/fieldtype attributes). It would be 
set to false often, assuming  many schemas index text both ways?
# Globs only match a DV field if useDocValuesAsStored=true and so the property 
name should be indicative of that
# Perhaps if we have a matchesFlGlob parameter we would then have no need for 
useDocValuesAsStored?

I was of the opinion that matchesFlGlob=true is not a great name because what 
if we use JSON request API in future and there's no "fl" so to speak. David 
then also suggested an alternate name e.g. matchesReturnGlob: whether a glob 
(asterisk) pattern in an ‘fl’ parameter (or equivalent) will match this field.  
Applies to stored or docValues fields.  Defaults to what stored is set to.

I'm still not sure about this, David. What does stored=true, 
matchesReturnGlob=false mean? Should that even be allowed?

Replying to some of your earlier points:
bq. The main reason is that for returning the top-X docs with more than a few 
fields, row storage will probably be faster than columnar storage, 
performance-wise.

Agreed, and that's why we need to figure out the specifics in SOLR-8344 so that 
we choose the right way of retrieval.

bq. Another reason is to retain a particular ordering for multi-valued fields

We can make always try to use stored values for multi-valued fields in 
SOLR-8344.

bq. Another reason is that Solr's highlighters don't read from docValues 
(solvable).

Why is this a reason for not using DocValues on the original field?

bq. Ideally most users wouldn't want to monkey with such parameters (IMO). But 
most schemas I've seen have at least one occurrence of an original input string 
indexed two ways for search & sorting/faceting. And if our example schemas do, 
thus motivating us to set it to false for these fields, it just re-confirms my 
point.

The thing is the  copy fields you refer to are deliberately added and I think 
it should be okay to expect that users will choose the value for 
useDocValuesAsStored according to their use-cases for such fields. It is also 
common for people to have docValues=true and stored=true together just because 
they believe that it isn't possible to retrieve doc values. 

What do you think?

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-28 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072789#comment-15072789
 ] 

Yonik Seeley commented on SOLR-8220:


{quote}
> (Yonik) Going forward, why wouldn't one just use docValues on the original 
> field [ rather than a copyField ] ?
The main reason is that for returning the top-X docs with more than a few 
fields, row storage will probably be faster than columnar storage, 
performance-wise.
{quote}

If one still wants to use row-stored for performance tweaks (the exact 
cross-over point will hopefully be determined by SOLR-8344), then they can 
still do that, and it makes sense to do both row-stored and column-stored on 
the original field.  Sorry if it wasn't clear, but that was my original point.

bq. It [ useDocValuesAsStored ] basically means will a 'fl' glob match the DV 
field or not.
That's only one aspect.  Every place that "uses" stored fields should treat DV 
field as a stored field.  It also affects streaming expressions / SQL, 
highlighting, partial updates, etc.

Although I can see why you might think that at first - the fact that the 
current implementation does return fl=exact_field_name even when 
useDocValuesAsStored=false was a surprise to me as well.  I don't see it as a 
big deal though (more like a minor syntactic shortcut for a pseudo field).

To me, useDocValuesAsStored=false means "this isn't really a stored field... 
it's just an implementation artifact for some query-time feature we need".

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-28 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072846#comment-15072846
 ] 

David Smiley commented on SOLR-8220:


To be clear I'm not -1 on this; just -0.

I get the gist of the intent with what is committed.  Thanks for clarifying 
guys.

I think what may _reduce_ the need for users to know about / touch 
useDocValuesAsStored (what I hope for) is if conventionally, stored & 
doc-values fields are the same field, and then we put tokenized text into some 
other field, indexed=true stored=false docValues=false if we also need keyword 
search on the field in question.  I think this is what Yonik is suggesting.  
The only annoyance with this is highlighting -- the highlighters expect the 
stored text to be at the same field name as both the query & index.  
{{hl.requireFieldMatch}} could be set to false but that's a blunt instrument 
and isn't even supported by the postings highlighter.  Nonetheless I think this 
could be solved in another issue.

Can and should the default/example schemas be adjusted to fit the 
aforementioned conventions?  Then we wouldn't have any want to set 
useDocValuesAsStored in them.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072115#comment-15072115
 ] 

ASF subversion and git services commented on SOLR-8220:
---

Commit 1721795 from sha...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1721795 ]

SOLR-8220: Read field from DocValues for non stored fields

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-27 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072116#comment-15072116
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


I suggest we change:

{code}
+  Also note that returning stored fields from docValues (default in schema 
versions 1.6+) returns multiValued
+  fields in sorted order. If you require the older behavior of multiValued 
fields being returned in the
+  original insertion order, set useDocValuesAsStored="false" for the 
individual fields or make
+  sure your schema version is < 1.6. This does not require re-indexing.
+  See SOLR-8220 for more details.
{code}

to 

{code}
+  Also note that while returning non-stored fields from docValues (default in 
schema versions 1.6+, unless useDocValuesAsStored is false) returns multiValued
+  fields in sorted order. If you require the multiValued fields being returned 
in the
+  original insertion order, then make your multiValued field as stored. This 
requires re-indexing.
+  See SOLR-8220 for more details.
 {code}

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-27 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072112#comment-15072112
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


+1, LGTM.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-27 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072246#comment-15072246
 ] 

Shalin Shekhar Mangar commented on SOLR-8220:
-

#3 was fixed in later patches. In the committed code, if you request a field 
explicitly then it will be returned either from stored or from DV.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-27 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072233#comment-15072233
 ] 

David Smiley commented on SOLR-8220:


Rule #5 is very surprising to me, as it seems to conflict with #3 in that it 
ignores useDocValuesAsStored. What is the rationale?  If the field is both 
Stored And DV then from where is it returned?  I REALLY hope not DV since it 
then wouldn't respect value ordering. 

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-27 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072182#comment-15072182
 ] 

David Smiley commented on SOLR-8220:


I've been following this since inception with passing interest and wish I had 
the time this holiday to comment before Shalin committing.  Based on Shalin's 
summary a couple days ago, {{fl=\*}} will return these doc-values only fields 
that have useDocValuesAsStored=true, and it will be true by default going 
forward.  My only concern with this is that it's common-place to index an 
original input value of text 2 ways -- one for keyword search (marked stored as 
well), and another copyField target that isn't stored, isn't indexed, but has 
docValues for faceting or sorting.  Now, this value will be returned twice from 
{{fl=\*}}.  Granted the user could set useDocValuesAsStored=false on these 
fields, and that's not a big deal to do so nor a big deal to forget to do so.  
Is this not lost on Ishan & Shalin who are putting the work into this issue or 
is this just a recognized trade-off?  It didn't have to be this way.  It could 
have designed such that {{fl=\*}} is only for stored fields and those 
_explicitly_ setting useDocValuesAsStored=true.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072243#comment-15072243
 ] 

ASF subversion and git services commented on SOLR-8220:
---

Commit 1721844 from sha...@apache.org in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1721844 ]

SOLR-8220: Read field from DocValues for non stored fields

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-27 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072185#comment-15072185
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


David, do you think having the copyField targets to have useDocValuesAsStored 
as false in our example schemas partly alleviates the problem? 

In fact, I was planning to open another issue to add a few more dynamic field 
types that have stored=false, docValues=true in the example schemas. I can add 
this change to copyFields too, if it makes sense.

Or, rather, do you prefer useDocValuesAsStored to be false by default and 
turned on on-demand? I think that will make adoption harder, and make it harder 
for us (or iow more of an abrupt change for user) to get rid of stored fields 
(even if it makes sense performance wise some day). Having this as the default 
now (i.e. useDocValuesAsStored=true) would make the transition less abrupt. 
What do you think?

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-27 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072234#comment-15072234
 ] 

Shalin Shekhar Mangar commented on SOLR-8220:
-

The idea is that if you have explicitly asked for that field then we find a way 
to return it, if possible. If it is both stored and DV then it returns from 
stored.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-27 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072241#comment-15072241
 ] 

Shalin Shekhar Mangar commented on SOLR-8220:
-

bq. It didn't have to be this way. It could have designed such that fl=* is 
only for stored fields and those explicitly setting useDocValuesAsStored=true.

Let me take a step back. The way I think about useDocValuesAsStored is that it 
gives a hint to Solr that this field *can* be retrieved from DocValues. But 
that doesn't necessarily mean that we will always retrieve it from DocValues. 
Later, in SOLR-8344 we will implement some heuristics to choose whether to 
retrieve from stored or from DV if both are enabled.

So, by making useDocValuesAsStored=true as default, we don't tie our hands in 
SOLR-8344 and we'd be free to choose as we desire. But if the user wants a 
field to never be *automatically* retrieved from DocValues then he can set 
useDocValuesAsStored=false. Keep in mind that this is only for automatic 
selection and if the user wants to explicitly retrieve a field by specifying 
its full name (no globbing) in the 'fl' parameter then we respect his/her 
wishes and retrieve from DV if necessary.

Maybe useDocValuesAsStored is a bad name and 'autoDocValuesAsStored' conveys 
the meaning better?

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-27 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072249#comment-15072249
 ] 

David Smiley commented on SOLR-8220:


How about two Boolean settings:
* {{matchesFlGlob}} defaults to whatever {{stored}} is. Very straight forward 
to describe; no special exceptions. Useful to set to either true or false in 
different circumstances depending if it's stored or DV. This would be a 
separate issue. 
* {{autoDocValuesAsStored}} defaults to whatever {{docValues}} is.  I'm not 
sure how to define it honestly other than to say it doesn't have to do with fl 
globbing. Maybe it could prevent fl choosing the field even if explicitly ?

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-27 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072244#comment-15072244
 ] 

David Smiley commented on SOLR-8220:


Ok then Rule #5 makes perfect sense to me. #3 is questionable to me; hard to 
clearly explain the rules to someone.  So useDocValuesAsStored is only 
considered when FL contains an asterisk?  If so, I wonder if it should be named 
to reflect that somehow. Like "flWildcard". With a name like that, it would be 
useful to set to false on a stored field. 

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-27 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072281#comment-15072281
 ] 

Yonik Seeley commented on SOLR-8220:


bq. copyField target that isn't stored, isn't indexed, but has docValues for 
faceting or sorting.

Going forward, why wouldn't one just use docValues on the original field?
Anyway the copyField thing presents ambiguities for atomic updates as well... 
it's not specific to "fl".  Whatever we support for that can be used for "fl" 
as well (for instance, determinging that the copyField target isn't a "real" 
field because the source is already stored/docValued or something).

It seems like going forward, people will benefit by adjusting their mental 
model to "it's just a different way of storing... row stored or column stored." 
 To a new user, why would one not get all stored fields back?  If column stored 
option had been present in Lucene from the start, that's probably how it would 
have been implemented in Solr from the start.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-27 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072397#comment-15072397
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


Right, in the last patch from me (and maybe the second last too) [0], I 
introduced returning any additional fields with \* even if those additional 
fields are non-stored DVs with useDocValuesAsStored=false. Hence, Shalin's 
point 3 now should read something like:

{quote}
3. {{fl=\*,a1}} will return all stored=true fields and 
useDocValuesAsStored=true DV fields. If the 'a1' field is a stored=false DV 
field with useDocValuesAsStored=false then it will also be returned because it 
was explicitly asked for.
{quote}

[0] - 
https://issues.apache.org/jira/browse/SOLR-8220?focusedCommentId=15071474=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15071474

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-27 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072357#comment-15072357
 ] 

Shalin Shekhar Mangar commented on SOLR-8220:
-

bq. matchesFlGlob defaults to whatever stored is. Very straight forward to 
describe; no special exceptions. Useful to set to either true or false in 
different circumstances depending if it's stored or DV. This would be a 
separate issue.

I don't think we need that level of configurability? Keep in mind that every 
parameter that we add also needs to be explained to a new user. Having trained 
people on Solr, explaining the difference between stored/indexed/docvalues is 
confusing enough. We should avoid adding more complexity if we can. This is 
another reason why I wanted useDocValuesAsStored to be true by default and 
un-specified/hidden in all default schemas.

Also I don't really understand your reasons for adding such a parameter. 
Globbing is allowed on doc values field as long as 
{{useDocValuesAsStored=true}}. In the committed patch, you can do 
{{fl=a,b,c,d*}} and have {{d*}} match all docvalues fields which are 
{{useDocValuesAsStored=true}}. But if you set {{useDocValuesAsStored=false}} 
then globbing will not work. What am I missing?

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-27 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072425#comment-15072425
 ] 

David Smiley commented on SOLR-8220:


bq. (Yonik) Going forward, why wouldn't one just use docValues on the original 
field?

The main reason is that for returning the top-X docs with more than a few 
fields, row storage will probably be faster than columnar storage, 
performance-wise.  So at index time you can pay for both if you need columnar 
for other reasons (sorting/faceting).  Another reason is to retain a particular 
ordering for multi-valued fields.  Another reason is that Solr's highlighters 
don't read from docValues (solvable).

bq. (Ishan) David, do you think having the copyField targets to have 
useDocValuesAsStored as false in our example schemas partly alleviates the 
problem?

Yes, we should do that.  Ideally most users wouldn't want to monkey with such 
parameters (IMO).  But most schemas I've seen have at least one occurrence of 
an original input string indexed two ways for search & sorting/faceting.  And 
if our example schemas do, thus motivating us to set it to false for these 
fields, it just re-confirms my point.

Shalin: I very much care about ease of documenting/explaining this; I thought 
my comments showed I care.  I guess we just see this issue differently.  I'm 
coming around to a new interpretation of what useDocValuesAsStored is, as it 
was committed and today clarified by you and Ishan.  It basically means will a 
'fl' glob match the DV field or not.  If my understanding is true, then I think 
this is evidence I'm on to something with my "matchesFlGlob" suggestion.  You 
are free to disagree but I think it's extremely easy to document/describe/teach 
etc. what matchesFlGlob means, particularly if it's scope is expanded to apply 
to stored fields too.

FWIW I'll be on IRC.  My attempts to ping you haven't received a response.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072140#comment-15072140
 ] 

ASF subversion and git services commented on SOLR-8220:
---

Commit 1721808 from sha...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1721808 ]

SOLR-8220: Improve upgrade notes in CHANGES.txt

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-24 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15071396#comment-15071396
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


Erick, this patch (8220) returns DVs only for non-stored fields. So, before 
this change, the users had no way of returning their multivalued non-stored 
fields (even field() didn't work). However, your suggested text (as is) is spot 
on once we have this and 8344 in. 

For this issue, I think we should document this caveat to suggest that if 
insertion order for multivalued fields is important for you, then have your 
field stored=true (and that this work is not for you).

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-24 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15071395#comment-15071395
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


Erick, this patch (8220) returns DVs only for non-stored fields. So, before 
this change, the users had no way of returning their multivalued non-stored 
fields (even field() didn't work). However, your suggested text (as is) is spot 
on once we have this and 8344 in. 

For this issue, I think we should document this caveat to suggest that if 
insertion order for multivalued fields is important for you, then have your 
field stored=true (and that this work is not for you).

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-24 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15071193#comment-15071193
 ] 

Erick Erickson commented on SOLR-8220:
--

When we do the docs, we do need to include a warning about ordering. 
multiValued fields are returned in sorted order when returned from DV fields, 
not the original order, right?

Up until now, we've promised that multiValued fields are returned in the same 
order they were inserted into the document so two MV fields can be treated like 
parallel arrays. This will _not_ be true of DV fields that are multiValued, 
correct?

Suggested text would be something like:

"Returning stored fields from docValues (default in schema versions 1.6+) 
returns multiValued fields in sorted order. If you require the older behavior 
of multiValued fields being returned in the original insertion order, set 
useDocValuesAsStored="false" for the individual fields or make sure your schema 
version is < 1.6. This does _not_ require re-indexing."



> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-23 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070052#comment-15070052
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


bq. does the DocValues.getDocsWithField allocate a BitSet or just return a 
pre-existing instance
While I tried to track it down, I thought it is backed by a pre-existing bitset 
instance, as created at the leaf reader level. I didn't think it was a 
performance concern. Could someone confirm this understanding, please? 
Moreover, since Yonik suggested its use, I was more confident about using it.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-23 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070581#comment-15070581
 ] 

Erick Erickson commented on SOLR-8220:
--

LGTM as far as not returning all the DV values with each doc.



> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-22 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068027#comment-15068027
 ] 

Shalin Shekhar Mangar commented on SOLR-8220:
-

Thanks Ishan.

bq. Since, for the fl=* case, we need all non-stored DVs that have 
useDocValuesAsStored=true, but for the general filtering case of fl=dv1,dv2 we 
need to filter using all non-stored DVs (irrespective of the 
useDocValuesAsStored flag)

Okay I see what you are saying. The useDocValuesAsStored=true default applies 
when you request all fields but if you are explicitly asking for a field then 
we can return it from DVs even if it was marked as useDocValuesAsStored=false. 
I have mixed feelings about this but I can see where it can be useful e.g. 1st 
phase of distributed search.

bq. . However, I had a look, and found that responseWriters (e.g. 
JSONResponseWriter) get the whole SolrDocument at the writeSolrDocument() 
method, from where it does the following call to drop fields it doesn't need

Hmm, yeah, we can't do that with doc values, it'd be too expensive.

Is there a test which creates a new field with useDocValuesAsStored as true and 
separately as false using the schema API? I'm assuming you will address Erick's 
concern above about multi-valued fields.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-22 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068075#comment-15068075
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


bq. Is there a test which creates a new field with useDocValuesAsStored as true 
and separately as false using the schema API?
I had added SchemaVersionSpecificBehaviorTest to test for these various 
true/false cases. However, there is no useDocValuesAsStored=false case with 
checking of output. I'll add such a test.

bq. I'm assuming you will address Erick's concern above about multi-valued 
fields.
I'm working through them. So far as I can see, both the current loop with 
values.getValueCount() and what Erick suggested as a loop are running 
identically, i.e values.getValueCount() is indeed returning the count of values 
per document. But I am adding a test to prove it.

For the {{DocValues.getDocsWithField(atomicReader, fieldName).get(docid)}}, not 
having it was resulting in empty fields being returned for documents that 
weren't supposed to have an docValue (the user never added a docValue for that 
document during indexing). Again, I think I should add a specific test for 
that, testing for the number of fields returned (maybe there already is one 
from Keith, but I'll check again).

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-22 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068319#comment-15068319
 ] 

Erick Erickson commented on SOLR-8220:
--

bq: For the DocValues.getDocsWithField(atomicReader, fieldName).get(docid), not 
having it was resulting in empty fields being returned for documents that 
weren't supposed to have an docValue (the user never added a docValue for that 
document during indexing).

Right, I had to add a test at the end to avoid that. I didn't track the code 
thoroughly, but does the DocValues.getDocsWithField allocate a BitSet or just 
return a pre-existing instance? Or even cache the BitSet somewhere? If it 
allocates a new BitSet (or even fills up a cache entry), the test at the end 
might be much less expensive. I didn't track it down though, and if it returns 
a reference to a cached bitset that will be created _anyway_, then it's just a 
style thing

{code}
 if (outValues.size() > 0) {
   sdoc.addField()
}
{code}

As for whether the loop returns all values in the field, I saw this "by 
inspection" on the techproducts example (with a few mods for adding 
docValues="true" to the schema). Again, though, this is 4.x after I hacked a 
backport and put it in an entirely different place in the code, specifically 
NOT a visitor pattern. So it's entirely possible that the semantics have 
changed or hacking it into a different part of the code base has a different 
context.  A test would settle it for all time though.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-21 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067566#comment-15067566
 ] 

Erick Erickson commented on SOLR-8220:
--

WARNING: I'm stealing this code and back-porting to 4.x for my own purposes so 
this may not pertain to 5x. And I'm not very up on the low-level details.

But this loop for reading multiValued fields puts _all_ the multivalued fields 
for _all_ the docs on the shard into each doc:

{code}
if (values != null && DocValues.getDocsWithField(atomicReader, 
fieldName).get(docid)) {
values.setDocument(docid);
if (values.getValueCount() > 0) {
  List outValues = new LinkedList();
  for (int i = 0; i < values.getValueCount(); i++) { // Iterates 
more than just this doc, I think all of them!

{code}

I had more luck with 
{code}
  if (values != null) {
values.setDocument(docid);
List outValues = new LinkedList();
for (int ord = (int) values.nextOrd(); ord != 
SortedSetDocValues.NO_MORE_ORDS; ord = (int) values.nextOrd()) {
{code}

Note that I also think this is unnecessary in the if test, the loop above 
doesn't do anything bad if there are docs with empty fields: 
{code}
DocValues.getDocsWithField(atomicReader, fieldName).get(docid)
{code}

I changed it to just if (values != null). But I did have to test 
outValues.size() > 0  before doing the addField after the loop or I got empty 
braces in the output doc.

Again let me emphasize that 
1> I don't know this code well, so take this with a grain of salt
2> I needed this for a one-off on the 4.x code line and this may work with 5x 
just fine as-is. Needless to say what I'm doing will never make into the 
official project

But this saved me a TON of work, glad you're tackling this!

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-18 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063741#comment-15063741
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


Btw, just found out that not all query paths actually use a DocsStreamer. I am 
checking as to what this could be down to.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-16 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061581#comment-15061581
 ] 

Shalin Shekhar Mangar commented on SOLR-8220:
-

Thanks Ishan.

# The SolrIndexSearcher.decorateDocValueFields method has a 
honourUseDVsAsStoredFlag which is always true. We can remove it?
# Same for SolrIndexSearcher.getNonStoredDocValuesFieldNames?
# The wantsAllFields flag added to SolrIndexSearcher.doc doesn't seem 
necessary. I guess it was added because the patch adds non stored doc values 
fields to the 'fnames' but if we can separate out stored fnames from the 
non-stored doc values to be returned then we can remove this param from both 
SolrIndexSearcher.doc and SolrIndexSearcher.getNonStoredDocValuesFieldNames
# The pattern matching in the DocStreamer constructor makes a bit nervous. 
Where is the pattern matching done for current stored fields?

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-15 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058168#comment-15058168
 ] 

Shalin Shekhar Mangar commented on SOLR-8220:
-

Hi Ishan, you have misunderstood the "useDocValuesAsStored" parameter. It is 
supposed to be per-field and not on the entire schema so that you can 
selectively disable it on fields that you don't want to be treated as if they 
were stored.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-13 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055523#comment-15055523
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


{quote}
The patch is failing a few tests with very poor error reporting. Here's a 
reproducible failure after applying this patch:
Test suite: TestManagedSchemaDynamicFieldResource, Seed: 
-Dtests.seed=C0DE559FF2A0799
Looking into the failure.
{quote}
It seems this test fails even without the patch. Filed SOLR-8411 for this. 
However, with this patch, still 5-6 tests fail. But I've so far been unable to 
reproduce any of them.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-02 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036219#comment-15036219
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


bq. I've updated the patch to now check for the schema version to be >=1.6
As per an offline discussion with [~hossman], he mentioned that in the past 
we've never tied runtime behaviour with specific schema versions. He suggested, 
as did Yonik above I think, that we use some attribute like 
docValueAsIfStored=true/false (with a default of false for previous schema 
versions).
I'll try to tackle this and update the patch, and drop the naive check for 
schema version.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-02 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036165#comment-15036165
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


bq. I've updated the patch to now check for the schema version to be >=1.6
In the patch, I did some clumsy >= check for a float. Would've been better to 
just check for > 1.5. I'll fix it with the next patch.
I'm working on bumping up the version of the schema for the out of the box 
configsets.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-02 Thread Keith Laban (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036972#comment-15036972
 ] 

Keith Laban commented on SOLR-8220:
---

My only concern is that we may have to add a flag for this and for whatever we 
decide in SOLR-8344 which is just going to add to he confusion. Thoughts?

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-02 Thread Erick Erickson

Please follow the instructions here to unsubscribe. Note, you _MUST_
use the exact same e-mail you subscribed with.

See the "unsubscribe" link here:
 http://lucene.apache.org/solr/resources.html

If you have problems, have you tried to follow the instructions here?
https://wiki.apache.org/solr/Unsubscribing%20from%20mailing%20lists

Best,
Erick

On Wed, Dec 2, 2015 at 5:55 PM, Johny John  wrote:
> dev-unsubscr...@lucene.apache.org
>
> Sent from Windows Mail
>
> From: Keith Laban (JIRA)
> Sent: ‎Thursday‎, ‎December‎ ‎3‎, ‎2015 ‎6‎:‎19‎ ‎AM
> To: dev@lucene.apache.org
>
>
> [
> https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036972#comment-15036972
> ]
>
> Keith Laban commented on SOLR-8220:
> ---
>
> My only concern is that we may have to add a flag for this and for whatever
> we decide in SOLR-8344 which is just going to add to he confusion. Thoughts?
>
>> Read field from docValues for non stored fields
>> ---
>>
>> Key: SOLR-8220
>> URL: https://issues.apache.org/jira/browse/SOLR-8220
>> Project: Solr
>>  Issue Type: Improvement
>>Reporter: Keith Laban
>> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch,
>> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch,
>> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
>> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
>> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
>> SOLR-8220.patch
>>
>>
>> Many times a value will be both stored="true" and docValues="true" which
>> requires redundant data to be stored on disk. Since reading from docValues
>> is both efficient and a common practice (facets, analytics, streaming, etc),
>> reading values from docValues when a stored version of the field does not
>> exist would be a valuable disk usage optimization.
>> The only caveat with this that I can see would be for multiValued fields
>> as they would always be returned sorted in the docValues approach. I believe
>> this is a fair compromise.
>> I've done a rough implementation for this as a field transform, but I
>> think it should live closer to where stored fields are loaded in the
>> SolrIndexSearcher.
>> Two open questions/observations:
>> 1) There doesn't seem to be a standard way to read values for docValues,
>> facets, analytics, streaming, etc, all seem to be doing their own ways,
>> perhaps some of this logic should be centralized.
>> 2) What will the API behavior be? (Below is my proposed implementation)
>> Parameters for fl:
>> - fl="docValueField"
>>   -- return field from docValue if the field is not stored and in
>> docValues, if the field is stored return it from stored fields
>> - fl="*"
>>   -- return only stored fields
>> - fl="+"
>>-- return stored fields and docValue fields
>> 2a - would be easiest implementation and might be sufficient for a first
>> pass. 2b - is current behavior
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-02 Thread Johny John

dev-unsubscr...@lucene.apache.org






Sent from Windows Mail





From: Keith Laban (JIRA)
Sent: ‎Thursday‎, ‎December‎ ‎3‎, ‎2015 ‎6‎:‎19‎ ‎AM
To: dev@lucene.apache.org






[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036972#comment-15036972
 ] 

Keith Laban commented on SOLR-8220:
---

My only concern is that we may have to add a flag for this and for whatever we 
decide in SOLR-8344 which is just going to add to he confusion. Thoughts?

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-01 Thread Keith Laban (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035238#comment-15035238
 ] 

Keith Laban commented on SOLR-8220:
---

This still needs work around bumping the schema version, this impl changes the 
default behavior for globs

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-01 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033868#comment-15033868
 ] 

Yonik Seeley commented on SOLR-8220:


bq. It sounds like you are arguing for a common way to access docvalues and 
stored fields using the 'fl' parameter. I'm +1 to that.

Ah, ok... I mis-read your previous comment of "i.e. keep this issue focused on 
adding syntactic sugar to read field from doc values for non-stored fields" as 
advocating for new syntax for "fl" to load from non-stored docValue fields.

bq. Let's discuss this optimization in SOLR-8344 and keep the two issues 
separate.

I didn't really see it as separate (it depends on how you look at it),  I see 
it more as, we have a new feature that treats docValues as "column-stored".  
What should the default behavior be when all requested fields are both 
column-stored and row-stored? I think we can make progress + commit this issue 
separately, but should still come at SOLR-8344 "fresh" (i.e. not put the burden 
of proof on one default more than the other).


> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-01 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033899#comment-15033899
 ] 

Shalin Shekhar Mangar commented on SOLR-8220:
-

Agreed. Let's wrap this up. [~ichattopadhyaya] or [~k317h], can one of you put 
up an updated patch? I can review and commit.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-01 Thread Keith Laban (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033926#comment-15033926
 ] 

Keith Laban commented on SOLR-8220:
---

I found some issues with the way this patch works and have some updated tests 
in separate patch which I haven't uploaded, I need to clean it up a bit and can 
submit it tonight.

One issues I found is that if a document doesn't have a field value for a dv 
field it will get an empty value in the response, we should probably check with 
in the doc value api for docs with field, to make sure that that document had a 
value. 

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-01 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033973#comment-15033973
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


Afaik, there's no way to discern if a singly valued non-stored dv field was 
added to the document or not, since a singly valued dv field either gets the 
value provided during the indexing or it picks up the default. Any suggestions 
how to deal with that?

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-01 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034013#comment-15034013
 ] 

Erick Erickson commented on SOLR-8220:
--

Random thought that occurred to me before coffee, so be warned.

The initial statement here is 'Many times a value will be both stored="true" 
and docValues="true" ', then there was a lot of discussion about efficiencies 
etc... 

Why are we trying to anticipate "the right thing to do"? It would be simpler to 
code something like:
> If the field is stored=true, return the stored value (don't even need to look 
> whether DV is true or not).
> If the field is stored=false and docValues=true, return the DV value.

Now it's totally under the control of the user which path is chosen through the 
schema definition; we don't have to try to guess anything. No new syntax. Maybe 
with a new "best practice" or something. There would be a learning curve for 
users around using only docValues=true for efficiency and _not_ setting 
stored=true. Not quite sure what to do if the user defines both however, 
perhaps use the stored value?

The thing one does lose is the ability to get 2 and 1. from the 
_same_ field, so there would be the added burden on the user of having to have 
two fields, one stored-only one dv-only if that distinction was important.

And in the "wild and crazy" department (and for a different ticket _entirely_) 
we could consider disallowing fields with both docValues=true and stored=true. 
Not advocating this last, just throwing out for discussion.

Let me emphasize that I don't have any investment in doing things this way, and 
apologies for thinking of this so late in the game.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-12-01 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034017#comment-15034017
 ] 

Yonik Seeley commented on SOLR-8220:


For strings, an ord of -1 is "missing"
For numerics, you can use DocValues.getDocsWithField();


> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-30 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031910#comment-15031910
 ] 

Yonik Seeley commented on SOLR-8220:


bq. From a performance perspective, reading values from DocValues always (if 
they exist) can be horrible because each field access in docvalues may need a 
random disk seek, whereas, all stored fields for a document are kept together 
and need only 1 random seek and a sequential block read.

A few points:
- stored fields also require decompression (more overhead)
- use of stored fields and docvalues at the same time is less memory efficient 
- the stored fields will also take up needed disk cache (although hopefully the 
OS will figure out which it should cache more aggressively
- presumably one has docvalues because they need to be used, and they need to 
be fast... i.e. they already need to be cached.
- if one as a small set of fields that are normally retrieved, it seems like a 
win again.
- a *very* common case these days is that the entire index fits in memory.
- we're in the SSD era, and multiple "seeks" will still be more expensive if 
not cached, but much less so (and less so over time as non-volatile storage 
keeps improving)

It seems like this should be a big win for the common case, and the ability to 
reindex your data or change config and not have to change the clients is 
important IMO.  It's like being able to reindex a date to a trie-date and have 
the clients not care.  We can already reindex a field as docValues, and sort, 
facet, do analytics, without changing client requests.  Optimizations to field 
value retrieval (or optionally removing redundantly stored data) should be the 
same.


> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-30 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031935#comment-15031935
 ] 

Shalin Shekhar Mangar commented on SOLR-8220:
-

bq. It seems like this should be a big win for the common case, and the ability 
to reindex your data or change config and not have to change the clients is 
important IMO.

It sounds like you are arguing for a common way to access docvalues and stored 
fields using the 'fl' parameter. I'm +1 to that.

But are you also arguing for always loading fields from docvalues even if they 
are stored?

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-30 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032230#comment-15032230
 ] 

Shalin Shekhar Mangar commented on SOLR-8220:
-

bq. On a different note, if we are going to tackle only the non-stored 
docValues fields for now in this issue, does it now make sense to do this, 
performance wise, at the DocTransformer instead of the SolrIndexSearcher?

I don't think there's any difference performance-wise. Changes to DocStreamer 
should be enough as it is called only for writing the response and not the 
entire result-set.

bq. At this point the question that remains; should we move forward with these 
patches and move logic for retrieving dv fields to SolrIndexSearcher, leaving 
out *, *_foo and other optimizations for now? i.e. retrieve fields by name, if 
they exist in dv, but are not stored.

+1 let's create a patch to retrieve fields by name, if they exist in dv, but 
are not stored. I also like Yonik's idea of bumping the schema version to have 
fl=* return all fields (stored + non-stored docvalues) in 5.x and to include 
both by default in trunk (6.x). So +1 to that as well.

bq. a very common case these days is that the entire index fits in memory.

I propose a middle ground. Let's use Lucene's spinning disk utility method and 
prefer docvalues if we detect a SSD and fallback to reading from stored fields 
otherwise. Let's discuss this optimization in SOLR-8344 and keep the two issues 
separate.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-29 Thread Keith Laban (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031289#comment-15031289
 ] 

Keith Laban commented on SOLR-8220:
---

bq. From a performance perspective, reading values from DocValues always (if 
they exist) can be horrible because each field access in docvalues may need a 
random disk seek, whereas, all stored fields for a document are kept together 
and need only 1 random seek and a sequential block read.

I agree that reading values from DocValues always can be horrible, my line of 
thinking was that if you are asking for one or two fields from a large document 
and they are both dv and stored reading from dv would likely be much more 
efficient. It might make more sense to be able to get those values explicitly 
from docValues using the the transformer, or have some logic that can determine 
when it is more efficient. That should be discussed further in [SOLR-8344].

At this point the question that remains; should we move forward with these 
patches and move logic for retrieving dv fields to SolrIndexSearcher, leaving 
out {{\*}}, {{\*\_foo}} and other _optimizations_ for now? i.e. retrieve fields 
by name, if they exist in dv, but are not stored.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-26 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15028962#comment-15028962
 ] 

Shalin Shekhar Mangar commented on SOLR-8220:
-

Guys, sorry for not paying attention to this earlier but on a quick reading 
through the comments and an offline conversation with Ishan, I want to point 
out a few things.

bq. Theoretical optimization, will skip reading from stored fields if all the 
requested fields are available in docValues

bq. I don't think some of the Lucene folks want docValues modeled as stored 
fields at the Lucene level.

>From a performance perspective, reading values from DocValues always (if they 
>exist) can be horrible because each field access in docvalues may need a 
>random disk seek, whereas, all stored fields for a document are kept together 
>and need only 1 random seek and a sequential block read. That, and the fact 
>that docvalues aren't in the document cache makes me think that we should not 
>model docvalues as a stored field and treat them equivalently. At least not 
>without supporting benchmarks.

So my suggestion is that we not mix the two issues i.e. keep this issue focused 
on adding syntactic sugar to read field from doc values for non-stored fields 
in whatever ways proposed. By the way, this is already possible using the 
'field' DocTransformer e.g. fl=field(my_dv_field)

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-26 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15029079#comment-15029079
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


bq. So my suggestion is that we not mix the two issues
I've created SOLR-8344 to deal with that optimization of reading stored valued 
fields from docvalues, so that we can focus only on the non-stored dv case 
(which is a functionality improvement/new feature, as opposed to an 
optimization).

bq.  By the way, this is already possible using the 'field' DocTransformer e.g. 
fl=field(my_dv_field)
Indeed. This currently doesn't work for multivalued fields.

On a different note, if we are going to tackle only the non-stored docValues 
fields for now in this issue, does it now make sense to do this, *performance 
wise*, at the DocTransformer instead of the SolrIndexSearcher? If done that 
way, the RTG and atomic updates for such non-stored dv fields will work if we 
also use a doctransformer after the searcher has returned the docs. 
[~ysee...@gmail.com], [~erickerickson], [~shalinmangar] do you have any 
thoughts/recommendation, please?

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-24 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025770#comment-15025770
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


I was thinking of doing exactly that! I'll raise another jira for it.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-24 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025652#comment-15025652
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


Just noticed, EnumFieldTest.testEnumSort() fails with the last patch (and the 
one before it) when the severity_dv is chosen as the enum field. I think we're 
handling the enum dv fields incorrectly.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-24 Thread Keith Laban (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025686#comment-15025686
 ] 

Keith Laban commented on SOLR-8220:
---

I'm working on a separate patch which fixes EnumField, it also adds support for 
"*_dv" type queries. I'll take a look at merging your change in too. Do you 
think it would be worth adding an interface for SolrDocument and 
SolrInputDocument to implement which includes {{containsKey} and {{addField}}

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-20 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018489#comment-15018489
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


{quote}
 We are adding fields retrieved from docValues by doing the following:
{code}
doc.add(schemaField.getType().createField(schemaField, 
sdv.get(docid).utf8ToString(), 1.0f));
{code}

this {{createField}} call is returning {{null}} based on the code I wrote 
above. Perhaps we need to create fields differently, or change how 
{{createField}} works.
{quote}
[Referencing this comment from SOLR-8316]. Can we "decorate" the SolrDocument 
in DocStreamer instead of trying to do that with the StoredDocument from 
lucene? That will give us the benefits: (a) we won't need to fix SOLR-8316, (b) 
we can leave the StoredDocument as is, and not change it from under the 
document cache (which is probably an awkward thing with the current patch), (c) 
it has efficient containsKey(), if needed, so the linear O(n) cost can be 
avoided. Though, point b will mean we won't need containsKey() anyway.
This also means that SOLR-8276 will have to change, and there we have decorate 
a SolrInputDocument instead of a SolrDocument. 
Keith, Yonik, what do you think?

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-20 Thread Keith Laban (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018760#comment-15018760
 ] 

Keith Laban commented on SOLR-8220:
---

It looks like calling {{createFields}} instead actually creates both the fields 
we need. 

I was able to change decoration to something like this
{code}
SortedDocValues sdv = leafReader.getSortedDocValues(fieldName);
for(StorableField s: schemaField.getType().createFields(schemaField, 
sdv.get(docid).utf8ToString(), 1.0f)) {
  if(s != null) doc.add(s);
}
{code}

which makes the the SortedDocValueField get added to the document properly, but 
when trying to write the string value later on it doesn't write anything 
because the implementation in {{Field}} doesn't know how to write a 
{{BytesRef}}. We can override this is in {{SortedDocValueField}} but all of 
that stuff is in lucene code. It looks like {{StrField#createFields}} converts 
the string value to a {{BytesRef}} for the constructors of the doc values 
fields.


I'm still a bit confused how [SOLR-8276] works for you, i get a NPE when trying 
pull back the non-indexed/non-stored field in the current impl.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-20 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018805#comment-15018805
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


bq. I'm still a bit confused how SOLR-8276 works for you, i get a NPE when 
trying pull back the non-indexed/non-stored field in the current impl.
I added another document (id=4) to the test at SOLR-8276. I see no problems 
whatsoever with string dv fields (single valued), which internally uses 
SortedDocValues. The test passes fine. Also, the test at BasicFunctionalityTest 
works fine with the {{test_s_dvo}} field. Both SOLR-8276 and the latter test 
use the latest patch here. So, as per the tests, the createField seems to do 
its job. Am I missing something?

However, beyond this point, should we avoid using the 
schemaField.getType().createField() for fields in the StoredDocument (lucene) 
and instead do this decoration on the SolrDocument which is created from this 
StoredDocument? (See my comment before this one).

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-19 Thread Keith Laban (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013814#comment-15013814
 ] 

Keith Laban commented on SOLR-8220:
---

Created [SOLR-8316] for my last point

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-19 Thread Keith Laban (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013699#comment-15013699
 ] 

Keith Laban commented on SOLR-8220:
---

[~ysee...@gmail.com]: re. * vs **

I'm all for using just one glob pattern {{*}} for both doc values and stored, 
however I think its worth it to consider the philosophical implications on 
backwards compat. While it won't break anything it does introduce some 
unexpected behavior without much warning or a way to disable it.

I propose we add a {{fl.wildcardDV=true}} option to turn on this behavior in 
Solr 5x but enable it by default in 6. We can optionally later add a field type 
option where you can use docValues but not have the field returned in your 
result set. 


Ishan I can tackle those.

Regarding my earlier update about modifying {{Field}}, its doesn't seem as 
trivial as I originally thought as {{createField}} doesn't set 
{{DocValuesType}} there is a separate field that gets created for doc values 
after {{createField}} is called for the first time. I'm not sure what the 
implications of modifying this behavior would be. I think it's ok to leave this 
limitation in for now.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-19 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013633#comment-15013633
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


{quote}
{noformat}
fl=myfield  // returns from either stored or docValues
fl=*_i // returns all stored or docValues fields ending in _i
fl=*// returns all stored fields and all docValues fields that are 
not stored
{noformat}
{quote}
This is a TODO. Also, need to add more tests around multivalued fields (here 
and in SOLR-8276).
As for LazyFields for non stored docvalues, I think it is an optimization that 
we can deal with later in a separate issue. Doing what we have in this patch 
itself is progress. If some committer can please take this forward, can we get 
this in for 5.4?
Keith, do you wish to tackle the above TODO (and other todo items)? If so, I 
will focus on SOLR-5944 for now.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-19 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013915#comment-15013915
 ] 

Yonik Seeley commented on SOLR-8220:


bq. I think you meant a type.docValuesType() == DocValuesType.NONE. I agree, we 
should make the change.

I don't think some of the Lucene folks want docValues modeled as stored fields 
at the Lucene level (i.e. you don't index them that way, and you don't retrieve 
them that way).

One possible option is just move to something higher lievel like SolrDocument.

bq. I propose we add a {{fl.wildcardDV=true}} option to turn on this behavior 
in Solr 5x but enable it by default in 6.
We could bump the schema number to change the default, and that would enable 
the folks on 5x to get transparent migration from stored fields to docValues if 
they want w/o having to change clients / query params.

And if finer grained control is desirable, a schema field flag actAsStored=true 
(or whatever better name people come up with) could have it's default set 
differently based on the schema version.



> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-19 Thread Keith Laban (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013542#comment-15013542
 ] 

Keith Laban commented on SOLR-8220:
---

I'm not sure how this should be handled. 

https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/document/Field.java#L241

needs to be modified to
{code}
if (!type.stored() && type.indexOptions() == IndexOptions.NONE && 
type.docValuesType() != DocValuesType.NONE) {
  throw new IllegalArgumentException("it doesn't make sense to have a field 
that "
+ "is neither indexed nor stored nor docValues");
}
{code}

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-19 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013555#comment-15013555
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


I think you meant a {{type.docValuesType() == DocValuesType.NONE}}. I agree, we 
should make the change.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-18 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011570#comment-15011570
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


Thanks for the review, Keith.

bq. 1. [...] This could potentially be very expensive to compute for every 
singe field for ever single document and also add unnecessary GC pressure by 
creating new HashSet for all the fields for every single document.

I was aware of this, and wanted to fix this as part of the "cleanup / 
refactoring" I promised.

bq. {{doc.getField(fieldName)==null}} the doc fields are a list so this will be 
O( n ) for each lookup.

I used that to ensure we're not re-adding unstored docvalues a second time to 
the same document. This is necessary here so that we don't re-add such fields 
to a document was obtained from the documentCache and already has all unstored 
docvalues in it. I can create a set of fields inside the {{StoredDocument}} 
class so that a hasField lookup can be speeded up. However, given that it is a 
Lucene class, I have left this be. Any suggestions?

bq. 3) Re multivalued fields: doing introspection for every single value for 
field for every document is not fast.

I think it shouldn't be a problem. In modern JVMs, the {{instanceof}} has 
negligible cost. However, I will do it once per multivalued field in my next 
patch.

bq. 4) {{SchemaField schemaField = schema.getField(fieldName);}} this throws an 
exception if the field name is not in the schema (think typos in FL)
If it is a dynamic field, it will still work; a wrong field name won't work 
here. Shouldn't a wrong field name throw an exception, rather than silently 
dropping it? I am split either ways.

bq. This creates a whole bunch of new objects which could be slow and cause a 
lot of GC pressure, although it may not be an issue.
I think this creates at most only the value source object, which isn't too bad. 
Internally, it uses the docvalues API.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-18 Thread Keith Laban (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011760#comment-15011760
 ] 

Keith Laban commented on SOLR-8220:
---

bq. I used that to ensure we're not re-adding unstored docvalues a second time 
to the same document. This is necessary here so that we don't re-add such 
fields to a document was obtained from the documentCache and already has all 
unstored docvalues in it. I can create a set of fields inside the 
StoredDocument class so that a hasField lookup can be speeded up. However, 
given that it is a Lucene class, I have left this be. Any suggestions?

This shouldn't be an issue since the hook is called after caching is done. This 
could get really expensive if you are getting a few thousand documents that 
have hundreds of fields. I think the real issue is how do we cache this 
efficiently. I think that will require modifying LazyDocument, (see my comments 
above)

bq. If it is a dynamic field, it will still work; a wrong field name won't work 
here. Shouldn't a wrong field name throw an exception, rather than silently 
dropping it? I am split either ways.

This is more a backwards compat thing. What is current behavior for stored 
fields?

bq. I think this creates at most only the value source object, which isn't too 
bad. Internally, it uses the docvalues API.

for a string field, getValueSource creates a new StrFieldSource and getValues 
creates a new DocTermsIndexDocValues. Both of these closures add overhead 
especially if you're doing this hundreds of times for thousands of documents



> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-18 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011854#comment-15011854
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


bq. This shouldn't be an issue since the hook is called after caching is done.
Even if this is called after the document has been added to the cache, this 
decorate() method changes the same doc object that has been added to the cache. 
And hence, next time the document is fetched from the cache, it will contain 
the previously decorated docvalues as part of the stored doc from the cache. 

I'll look at what it will take to modify the LazyDocument to make this work 
differently. Are you already looking into it, or have some thoughts around it?

bq. Both of these closures add overhead especially if you're doing this 
hundreds of times for thousands of documents
Yes, that makes sense; I hadn't noticed the second object getting created. We 
should avoid this overhead if possible.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-18 Thread Keith Laban (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011925#comment-15011925
 ] 

Keith Laban commented on SOLR-8220:
---

bq. If there is a need to distinguish between docValues as an alternative to a 
stored field

I think this would be the case only for multi valued fields at least until we 
had an alternative version of docValue multi valued preserving the original 
field (i.e. not sorted, not set) using something like BinaryDocValues 
underneath as you mentioned earlier. 

bq. I'll look at what it will take to modify the LazyDocument to make this work 
differently. Are you already looking into it, or have some thoughts around it?

Doing this properly requires us to be able to know all the possibly docValue 
fields on a document upfront and a way for LazyDocument to be able to load the 
lazy field from doc values. 

A large goal of this should be to have the ability to skip reading stored 
fields altogether if the field requirement is fully satisfied by docValues. 
However I'm not sure if using docValues would be more efficient than stored 
fields when all the fields are being returned. 

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-18 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011950#comment-15011950
 ] 

Yonik Seeley commented on SOLR-8220:


bq. I think this would be the case only for multi valued fields at least until 
we had an alternative version of docValue multi valued preserving the original 
field (i.e. not sorted, not set) using something like BinaryDocValues 
underneath as you mentioned earlier.

Yup, I agree.  I think this is just a case of us having incomplete type 
support.  We need to distinguish between multiValued and setValued in general.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-18 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011884#comment-15011884
 ] 

Yonik Seeley commented on SOLR-8220:


bq. Added a ~ glob, similar to *. fl= here means: return all conventional 
stored fields and all non stored docvalues.

Purely from an interface perspective (I haven't looked at the code), it feels 
like this should be transparent.
It would be nice to be able to transition from an indexed+stored field to an 
indexed+docValues field and not have any of the clients know/care.

{code}
fl=myfield  // returns from either stored or docValues
fl=*_i // returns all stored or docValues fields ending in _i
fl=*// returns all stored fields and all docValues fields that are 
not stored
{code}

If there is a need to distinguish between docValues as an alternative to a 
stored field, and docValues as an implementation detail that you don't want to 
return to the user (say you transitioned from an indexed-only field to an 
indexed+docValues field  or docValues-only field), then we could introduce a 
field flag for the schema.
Something like includeInStored=true/false or asStored=true/false


> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-18 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012076#comment-15012076
 ] 

Yonik Seeley commented on SOLR-8220:


The set of fields that have docValues and are not stored can be computed once 
per index snapshot (from the FieldInfos+schema).
There should be no performance impact if there are no un-stored docValues 
fields in use.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-18 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012580#comment-15012580
 ] 

Yonik Seeley commented on SOLR-8220:


bq. Does that sound fine?

Yep.  Seems like we will only have a perf issue when we have many sparse 
un-stored docValue fields.  At that point it might make sense to have a 
separate docValues field that contains the list of fields for the document.  
That can be saved for a future optimization though.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-18 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012633#comment-15012633
 ] 

Erick Erickson commented on SOLR-8220:
--

Given that stored values are compressed in 16k blocks and to return a single 
field from the stored data requires decompressing 16K, how much effect do 
LazyDocuments really have any more? I don't know, just askin'

Since docValues avoids decompressing 16k per doc, disk seeks and the like I 
strongly suspect that it is vastly more efficient than getting the stored 
values. That's how Streaming Aggregation can return on 200k-400k docs/second.

All that said, I suspect that there are negligible savings (or perhaps even 
costs) in mixing the two, i.e. if _any_ field to be returned is not DV, you 
might as well return all the fields from the stored data.

Testing would tell though.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-18 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012035#comment-15012035
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


bq. Purely from an interface perspective (I haven't looked at the code), it 
feels like this should be transparent.
That makes sense, having {{*}} return all stored and non-stored docvalues. 

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-18 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012826#comment-15012826
 ] 

Yonik Seeley commented on SOLR-8220:


Back in the day, LazyField actually had a pointer directly into the index where 
the field value could be read.
That got remove from Lucene at some point, and was replaced with something just 
for compat sake IIRC that had an N^2 bug... doc was loaded on each lazy-field 
access, which Hoss found/fixed.  But that leaves less performance benefit to 
using LazyDocument.  On a quick look, it seems to load all lazy fields at once 
when the first lazy field is touched.  I guess these days it's more of a memory 
optimization than a performance one.

Might be worth considering new approaches (we can break back compat in trunk 
for 6.0).
Or maybe subclass LazyDocument and do something different for docValues fields.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-18 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012795#comment-15012795
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


bq.  Caveats being:
I think we can live with those caveats for now, and optimize later. A docValues 
fields containing list of other docValues fields sounds nice for a later 
optimization. Given that this is a functional improvement, as is, over what we 
have today (i.e. no ability to return nonstored docValues), we should carry on 
with it and optimize later to address those caveats.

bq. Theoretical optimization, will skip reading from stored fields if all the 
requested fields are available in docValues. (changes mostly to DocStreamer)
Sounds good. It would be interesting to perf test this to measure the 
performance gains with doing this.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-18 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012103#comment-15012103
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


bq. The set of fields that have docValues and are not stored can be computed 
once per index snapshot (from the FieldInfos+schema).
That is what I have done at the time of searcher creation (in 
SolrIndexSearcher's constructor) in my last patch. Does that sound fine?

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-17 Thread Keith Laban (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009192#comment-15009192
 ] 

Keith Laban commented on SOLR-8220:
---

I see some potential issues here, mostly performance concerns:

1)
{code}
  if (wantsAllNonStoredDocValues) {
Set dvFields = new HashSet<>();
for (int i=0; i Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-17 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009100#comment-15009100
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


The patch causes a regression with score fields, I'll look into it in a while.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-17 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008302#comment-15008302
 ] 

Jan Høydahl commented on SOLR-8220:
---

How about using {{fl=\*\*}} as glob instead of {{\~}}? In Lucene, {{~}} 
normally indicates fuzziness of some kind, while {{**}} in ant lingo means all 
items recursively... Not important though :)

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-17 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008382#comment-15008382
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


** seems like a better idea than ~. I couldn't get + to work, as Keith 
suggested, I guess it was getting confused with a positive sign.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-16 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006669#comment-15006669
 ] 

Yonik Seeley commented on SOLR-8220:


{quote}
2) There is no metadata (that I can find) stored for each document that says
whether it has an unstored docValue field, so efficiently loading docValues
fields based on FL=* would be difficult.
{quote}

In SolrIndexSearcher there is
{code}
  private final FieldInfos fieldInfos;
  private final Collection fieldNames;
{code}

That gives you the full set of fields actually in-use by the index.  Useful for 
dealing with dynamicFields, where the names aren't explicitly listed in the 
schema.

In the future, perhaps we should have some meta-data about what fields each 
document contains.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-16 Thread Keith Laban (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007107#comment-15007107
 ] 

Keith Laban commented on SOLR-8220:
---

And for the specified FL scenario, LazyDocument assumes all the possible fields 
have been accounted for in the document and the not yet loaded fields have a 
bit offset for quick access in the future. Right now there's no way to cache a 
partially loaded document unless the whole stored document is visited to pre 
populate the lazy field with the required information. 

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-16 Thread Keith Laban (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007093#comment-15007093
 ] 

Keith Laban commented on SOLR-8220:
---

I'm talking more specifically about the "FL=*" scenario 

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-16 Thread Keith Laban (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007068#comment-15007068
 ] 

Keith Laban commented on SOLR-8220:
---

Thanks [~ichattopadhyaya], this is my first time submitting a patch. This was 
originally done against 5.3.1 in the future i'll submit it based on trunk. 

I also noticed that there is a bug in the patch I submited. 

The line which reads 
{code}
if(null == sf){
  return doc;
}
{code}

should be
{code}
if(null == sf){
  continue;
}
{code}

[~ysee...@gmail.com] I think that we definitely need something like that. 
Currently CompressingStoredFieldsReader.visitDocument needs to visit the whole 
document in stored field to know which fields there are. Ideally we would 1) be 
able to avoid doing any work in stored fields if they can instead by read out 
of docValues. 2) have a mechanism for LazyDocument to know to read the lazy 
field from docValues instead of from stored fields. 3) know about values which 
aren't stored but should be read from docValues.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-16 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007091#comment-15007091
 ] 

Yonik Seeley commented on SOLR-8220:


bq. 1) be able to avoid doing any work in stored fields if they can instead by 
read out of docValues.

We should be able to do this optimization regardless?  For the fields 
requested, we can check the schema to see if they have docValues.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-11-16 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006538#comment-15006538
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


Thanks for the patch, Keith. 
I couldn't apply the patch. Is this based on trunk? I saw that 
SolrIndexSearcher.doc() method returns {{StoredDocument}} on trunk, but it 
seems your patch assumes a {{Document}}.
If this isn't based on trunk, can you please re-work the patch and update it to 
trunk? Also, an SVN patch would be easier for most developers to work with.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-10-29 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980479#comment-14980479
 ] 

Yonik Seeley commented on SOLR-8220:


bq. +1 to doing this. I think this will be useful for SOLR-5944, and was anyway 
planning to split this functionality out into its own issue.

Ah, right, atomic updates needs this functionality as well if we are to allow 
docValues fields that aren't stored.
In that case I'll amend my previous comments around ResultContext... that's 
appropriate for decorating documents as they are being returned, but perhaps 
not low enough level for other use cases.


> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-10-29 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980449#comment-14980449
 ] 

Ishan Chattopadhyaya commented on SOLR-8220:


+1 to doing this. I think this will be useful for SOLR-5944, and was anyway 
planning to split this functionality out into its own issue. Also, if we go 
forward with \_version\_ field as docvalues field, then this becomes important. 
In current Solr, the way to read non-stored docValues fields is to use a 
function query, field(mydvfield). 

[~k317h] Are you planning to work on this / have a patch for this? If not, then 
I can give it a try and have SOLR-5944 depend on it.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-10-29 Thread Keith Laban (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980731#comment-14980731
 ] 

Keith Laban commented on SOLR-8220:
---

There are two approaches I can see:

1) implement a new type of StoredFieldReader which is aware of field type (i.e. 
does it have docValues, is it stored). This reader would delegate between 
reading from docValues or the stored fields

2) in the 
[doc:https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L736]
 function do two passes. first pass to get stored fields, a second pass to get 
docValues. This can go a step further to make the SetNonLazyFieldSelector aware 
of docValues fields and instruct the the reader to not load fields which are 
known to be in docValues.


thoughts?

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-10-28 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979837#comment-14979837
 ] 

David Smiley commented on SOLR-8220:


FYI I have a patch in SOLR-5478, and I've done it without patching Solr as well 
using a different technique.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-10-27 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977537#comment-14977537
 ] 

Erick Erickson commented on SOLR-8220:
--

The only surprising behavior I see with this approach is that indexing 2.0 
would return different values when returned from docValues and when returned 
from stored, the DV return might be something like 1. or even 
2.0 would be a "surprise".

I'm not against the idea, just want this out there.

And one side benefit that's not entirely obvious. In sharded situations, the 
first pass returns the candidate list ID and "sort criteria". The way it's 
written last I knew was it returned stored values, which required decompression 
because it gets the stored field. If all the sort fields were DV, then we 
wouldn't have to do this.

This can't be the complete story since you can index but not store a sort field 
and distributed works, but it's one path I believe I've seen. It's an open 
question how to wire that in to standard search for a field that's stored, and 
a DV field.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-10-27 Thread Keith Laban (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977560#comment-14977560
 ] 

Keith Laban commented on SOLR-8220:
---

I believe sorting already reaps the benefits of doc values at least according 
to [this 
documentation|https://cwiki.apache.org/confluence/display/solr/DocValues]

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-10-27 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977575#comment-14977575
 ] 

Yonik Seeley commented on SOLR-8220:


bq. reading values from docValues when a stored version of the field does not 
exist would be a valuable disk usage optimization.

+1, and I've heard a number of users request this.

bq. 1) There doesn't seem to be a standard way to read values for docValues, 
facets, analytics, streaming, etc, all seem to be doing their own ways, perhaps 
some of this logic should be centralized.

See ReturnFields / ResultContext, that's currently where stored field handling 
is centralized, and handles anywhere a field list (or pseudo-fields / 
transformers) is specified.

+1 for 2a as a first pass.
For bonus points, prevent stored fields from being loaded at all when not 
needed.  This gets us a big step closer to having the normal request handler 
have the same performance as "/export".

Looking beyond the first pass, it might be nice to use docValues as more of a 
first-class alternate "stored" mechanism, and consider them part of "*".  If 
for some reason it's desirable to treat some docValues fields as stored, and 
others not, we could introduce a flag on  in the schema.

bq. The only caveat with this that I can see would be for multiValued fields as 
they would always be returned sorted in the docValues approach. I believe this 
is a fair compromise.

This shouldn't be much of a concern for approach 2a, but another future option 
would be to add explicit set types, and also implement list-type multi-valued 
docValues fields... prob using binary docValues under the covers).


> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-10-27 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977602#comment-14977602
 ] 

Yonik Seeley commented on SOLR-8220:


bq. 3> shard2 returns its candidate top 10 to shard1. This return packet has 
the top 10 doc IDs and sort criteria.

We currently pull the sort criteria from the field comparators implementing the 
sort:
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java#L638

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-10-27 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977592#comment-14977592
 ] 

Erick Erickson commented on SOLR-8220:
--

I'm not talking about sorting here. The sorting during scoring certainly uses 
DV values.

Here's the sequence (two shards, no followers, rows=10, illustrating with just 
the action on shard2)
1> shard1 gets the incoming request
2> shard1 sends a sub-request for candidate top 10 to shard2
3> shard2 returns its candidate top 10 to shard1. This return packet has the 
top 10 doc IDs and sort criteria.
4>  shard1 combines the lists of candidate top 10 docs and picks the true top 10
5> shard1 asks shard2 for the docs that came from shard2 and made it into the 
sorted top 10 list.

What I'm talking about is step <3>. Last I knew, this did not use the DV 
fields, but pulled the stored value thus decompressing. 

I'm not sure how this is resolved for fields that aren't stored, there must be 
a fallback. Or I'm missing something here, wouldn't be the first time. Maybe 
Yonik's idea of making DV fields more "first class citizens" would just take 
care of the issue entirely.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-10-27 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977591#comment-14977591
 ] 

Yonik Seeley commented on SOLR-8220:


bq. The way it's written last I knew was it returned stored values, which 
required decompression because it gets the stored field.
Right, we should change this (and this issue may handle that out-of-the-box, if 
we chose not to store the "id" field).

bq. If all the sort fields were DV, then we wouldn't have to do this.

The sort field values returned in the first phase of distributed search aren't 
obtained from stored field values.

> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

2015-10-27 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977601#comment-14977601
 ] 

Erick Erickson commented on SOLR-8220:
--

bq: The sort field values returned in the first phase of distributed search 
aren't obtained from stored field values.

Thanks, I suspect I was inappropriately generalizing from the . 
Which answers how sort values get returned from non-stored fields


> Read field from docValues for non stored fields
> ---
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>-- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 >

100 matches

Mail list logo