[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154456#comment-15154456 ] Ishan Chattopadhyaya commented on SOLR-8220: Notes for the reference guide: Page: https://cwiki.apache.org/confluence/display/solr/Defining+Fields {code} Property=useDocValuesAsStored Description=If the field has docValues enabled, setting this to true would allow the field to be treated as regular stored fields (even if it has stored=false). This means that this field would be returned alongside regular stored fields that are returned using the fl parameter. Values=true or false Implicit default=false for schema versions <1.6, true for schema versions >=1.6 {code} Page: https://cwiki.apache.org/confluence/display/solr/DocValues {code} Retrieving docValues during search: Field values retrieved during search queries are typically returned from stored values if the field has stored=true. However, starting with schema version 1.6, all non-stored docValues fields will be also returned along with other stored fields when all fields (or pattern matching globs) are specified to be returned (e.g. fl=*) for search queries. This behavior can be turned on and off by setting useDocValuesAsStored parameter for a field or a field type to true (implicit default since schema version 1.6) or false (implicit default till schema version 1.5). See https://cwiki.apache.org/confluence/display/solr/Defining+Fields Note that enabling this property has performance implications because DocValues are column-oriented and may therefore incur additional cost to retrieve for each returned document. Also note that while returning non-stored fields from docValues (default in schema versions 1.6+, unless useDocValuesAsStored is false), the values of a multi-valued field are returned in sorted order (and not insertion order). If you require the multi-valued fields to be returned in the original insertion order, then make your multi-valued field as stored (such a change requires re-indexing). {code} Page: https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-Thefl%28FieldList%29Parameter {code} Note: Starting with schema version 1.6, if there are non-stored fields with docValues enabled in the index, then a pattern glob like * in the fl parameter will retrieve those fields. This is not the case if those fields have explicitly useDocValuesAsStored as false in their field definition (see https://cwiki.apache.org/confluence/display/solr/Defining+Fields) or the schema version is <1.6. However, something like fl=dvfield or fl=*,dvfield (say dvfield is a non-stored field with docValues enabled) would retrieve the dvfield irrespective of the useDocValuesAsStored value. (See SOLR-8220 for more details) {code} Could someone please review and update the ref guide with the above information? And please feel free to reorganize, modify, drop, or rephrase any of this. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092491#comment-15092491 ] Erick Erickson commented on SOLR-8220: -- Hmmm, one implication that I just realized (yeah, I'm slow sometimes). In addition to the output from a DV field being reordered, identical values are collapsed since it's a SortedSet under the covers, right? So if I have a MultiValued DocValues field and put in "memory", "memory", "memory", then all I get back from the DV field is one copy of "memory". Since the guarantee that multiValued fields return data in the same order inserted when returning the Stored value is not true of returning DV values, this is perhaps no biggie. IMO it does deserve a comment when we do the docs for this though. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092500#comment-15092500 ] Yonik Seeley commented on SOLR-8220: Yep... I commented earlier that we should prob add an explicit "set" type/flag in the future. It can make sense to have a field that preserves order/instances and one that doesn't. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072712#comment-15072712 ] Shalin Shekhar Mangar commented on SOLR-8220: - I had a chat with David on IRC. In summary, his objections are: # The primary is the default — are stored=false & docValues=true returned from globs by default (whatever we name the field/fieldtype attributes). It would be set to false often, assuming many schemas index text both ways? # Globs only match a DV field if useDocValuesAsStored=true and so the property name should be indicative of that # Perhaps if we have a matchesFlGlob parameter we would then have no need for useDocValuesAsStored? I was of the opinion that matchesFlGlob=true is not a great name because what if we use JSON request API in future and there's no "fl" so to speak. David then also suggested an alternate name e.g. matchesReturnGlob: whether a glob (asterisk) pattern in an ‘fl’ parameter (or equivalent) will match this field. Applies to stored or docValues fields. Defaults to what stored is set to. I'm still not sure about this, David. What does stored=true, matchesReturnGlob=false mean? Should that even be allowed? Replying to some of your earlier points: bq. The main reason is that for returning the top-X docs with more than a few fields, row storage will probably be faster than columnar storage, performance-wise. Agreed, and that's why we need to figure out the specifics in SOLR-8344 so that we choose the right way of retrieval. bq. Another reason is to retain a particular ordering for multi-valued fields We can make always try to use stored values for multi-valued fields in SOLR-8344. bq. Another reason is that Solr's highlighters don't read from docValues (solvable). Why is this a reason for not using DocValues on the original field? bq. Ideally most users wouldn't want to monkey with such parameters (IMO). But most schemas I've seen have at least one occurrence of an original input string indexed two ways for search & sorting/faceting. And if our example schemas do, thus motivating us to set it to false for these fields, it just re-confirms my point. The thing is the copy fields you refer to are deliberately added and I think it should be okay to expect that users will choose the value for useDocValuesAsStored according to their use-cases for such fields. It is also common for people to have docValues=true and stored=true together just because they believe that it isn't possible to retrieve doc values. What do you think? > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072789#comment-15072789 ] Yonik Seeley commented on SOLR-8220: {quote} > (Yonik) Going forward, why wouldn't one just use docValues on the original > field [ rather than a copyField ] ? The main reason is that for returning the top-X docs with more than a few fields, row storage will probably be faster than columnar storage, performance-wise. {quote} If one still wants to use row-stored for performance tweaks (the exact cross-over point will hopefully be determined by SOLR-8344), then they can still do that, and it makes sense to do both row-stored and column-stored on the original field. Sorry if it wasn't clear, but that was my original point. bq. It [ useDocValuesAsStored ] basically means will a 'fl' glob match the DV field or not. That's only one aspect. Every place that "uses" stored fields should treat DV field as a stored field. It also affects streaming expressions / SQL, highlighting, partial updates, etc. Although I can see why you might think that at first - the fact that the current implementation does return fl=exact_field_name even when useDocValuesAsStored=false was a surprise to me as well. I don't see it as a big deal though (more like a minor syntactic shortcut for a pseudo field). To me, useDocValuesAsStored=false means "this isn't really a stored field... it's just an implementation artifact for some query-time feature we need". > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072846#comment-15072846 ] David Smiley commented on SOLR-8220: To be clear I'm not -1 on this; just -0. I get the gist of the intent with what is committed. Thanks for clarifying guys. I think what may _reduce_ the need for users to know about / touch useDocValuesAsStored (what I hope for) is if conventionally, stored & doc-values fields are the same field, and then we put tokenized text into some other field, indexed=true stored=false docValues=false if we also need keyword search on the field in question. I think this is what Yonik is suggesting. The only annoyance with this is highlighting -- the highlighters expect the stored text to be at the same field name as both the query & index. {{hl.requireFieldMatch}} could be set to false but that's a blunt instrument and isn't even supported by the postings highlighter. Nonetheless I think this could be solved in another issue. Can and should the default/example schemas be adjusted to fit the aforementioned conventions? Then we wouldn't have any want to set useDocValuesAsStored in them. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072115#comment-15072115 ] ASF subversion and git services commented on SOLR-8220: --- Commit 1721795 from sha...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1721795 ] SOLR-8220: Read field from DocValues for non stored fields > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072116#comment-15072116 ] Ishan Chattopadhyaya commented on SOLR-8220: I suggest we change: {code} + Also note that returning stored fields from docValues (default in schema versions 1.6+) returns multiValued + fields in sorted order. If you require the older behavior of multiValued fields being returned in the + original insertion order, set useDocValuesAsStored="false" for the individual fields or make + sure your schema version is < 1.6. This does not require re-indexing. + See SOLR-8220 for more details. {code} to {code} + Also note that while returning non-stored fields from docValues (default in schema versions 1.6+, unless useDocValuesAsStored is false) returns multiValued + fields in sorted order. If you require the multiValued fields being returned in the + original insertion order, then make your multiValued field as stored. This requires re-indexing. + See SOLR-8220 for more details. {code} > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072112#comment-15072112 ] Ishan Chattopadhyaya commented on SOLR-8220: +1, LGTM. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072246#comment-15072246 ] Shalin Shekhar Mangar commented on SOLR-8220: - #3 was fixed in later patches. In the committed code, if you request a field explicitly then it will be returned either from stored or from DV. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072233#comment-15072233 ] David Smiley commented on SOLR-8220: Rule #5 is very surprising to me, as it seems to conflict with #3 in that it ignores useDocValuesAsStored. What is the rationale? If the field is both Stored And DV then from where is it returned? I REALLY hope not DV since it then wouldn't respect value ordering. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072182#comment-15072182 ] David Smiley commented on SOLR-8220: I've been following this since inception with passing interest and wish I had the time this holiday to comment before Shalin committing. Based on Shalin's summary a couple days ago, {{fl=\*}} will return these doc-values only fields that have useDocValuesAsStored=true, and it will be true by default going forward. My only concern with this is that it's common-place to index an original input value of text 2 ways -- one for keyword search (marked stored as well), and another copyField target that isn't stored, isn't indexed, but has docValues for faceting or sorting. Now, this value will be returned twice from {{fl=\*}}. Granted the user could set useDocValuesAsStored=false on these fields, and that's not a big deal to do so nor a big deal to forget to do so. Is this not lost on Ishan & Shalin who are putting the work into this issue or is this just a recognized trade-off? It didn't have to be this way. It could have designed such that {{fl=\*}} is only for stored fields and those _explicitly_ setting useDocValuesAsStored=true. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072243#comment-15072243 ] ASF subversion and git services commented on SOLR-8220: --- Commit 1721844 from sha...@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1721844 ] SOLR-8220: Read field from DocValues for non stored fields > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072185#comment-15072185 ] Ishan Chattopadhyaya commented on SOLR-8220: David, do you think having the copyField targets to have useDocValuesAsStored as false in our example schemas partly alleviates the problem? In fact, I was planning to open another issue to add a few more dynamic field types that have stored=false, docValues=true in the example schemas. I can add this change to copyFields too, if it makes sense. Or, rather, do you prefer useDocValuesAsStored to be false by default and turned on on-demand? I think that will make adoption harder, and make it harder for us (or iow more of an abrupt change for user) to get rid of stored fields (even if it makes sense performance wise some day). Having this as the default now (i.e. useDocValuesAsStored=true) would make the transition less abrupt. What do you think? > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072234#comment-15072234 ] Shalin Shekhar Mangar commented on SOLR-8220: - The idea is that if you have explicitly asked for that field then we find a way to return it, if possible. If it is both stored and DV then it returns from stored. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072241#comment-15072241 ] Shalin Shekhar Mangar commented on SOLR-8220: - bq. It didn't have to be this way. It could have designed such that fl=* is only for stored fields and those explicitly setting useDocValuesAsStored=true. Let me take a step back. The way I think about useDocValuesAsStored is that it gives a hint to Solr that this field *can* be retrieved from DocValues. But that doesn't necessarily mean that we will always retrieve it from DocValues. Later, in SOLR-8344 we will implement some heuristics to choose whether to retrieve from stored or from DV if both are enabled. So, by making useDocValuesAsStored=true as default, we don't tie our hands in SOLR-8344 and we'd be free to choose as we desire. But if the user wants a field to never be *automatically* retrieved from DocValues then he can set useDocValuesAsStored=false. Keep in mind that this is only for automatic selection and if the user wants to explicitly retrieve a field by specifying its full name (no globbing) in the 'fl' parameter then we respect his/her wishes and retrieve from DV if necessary. Maybe useDocValuesAsStored is a bad name and 'autoDocValuesAsStored' conveys the meaning better? > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072249#comment-15072249 ] David Smiley commented on SOLR-8220: How about two Boolean settings: * {{matchesFlGlob}} defaults to whatever {{stored}} is. Very straight forward to describe; no special exceptions. Useful to set to either true or false in different circumstances depending if it's stored or DV. This would be a separate issue. * {{autoDocValuesAsStored}} defaults to whatever {{docValues}} is. I'm not sure how to define it honestly other than to say it doesn't have to do with fl globbing. Maybe it could prevent fl choosing the field even if explicitly ? > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072244#comment-15072244 ] David Smiley commented on SOLR-8220: Ok then Rule #5 makes perfect sense to me. #3 is questionable to me; hard to clearly explain the rules to someone. So useDocValuesAsStored is only considered when FL contains an asterisk? If so, I wonder if it should be named to reflect that somehow. Like "flWildcard". With a name like that, it would be useful to set to false on a stored field. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072281#comment-15072281 ] Yonik Seeley commented on SOLR-8220: bq. copyField target that isn't stored, isn't indexed, but has docValues for faceting or sorting. Going forward, why wouldn't one just use docValues on the original field? Anyway the copyField thing presents ambiguities for atomic updates as well... it's not specific to "fl". Whatever we support for that can be used for "fl" as well (for instance, determinging that the copyField target isn't a "real" field because the source is already stored/docValued or something). It seems like going forward, people will benefit by adjusting their mental model to "it's just a different way of storing... row stored or column stored." To a new user, why would one not get all stored fields back? If column stored option had been present in Lucene from the start, that's probably how it would have been implemented in Solr from the start. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072397#comment-15072397 ] Ishan Chattopadhyaya commented on SOLR-8220: Right, in the last patch from me (and maybe the second last too) [0], I introduced returning any additional fields with \* even if those additional fields are non-stored DVs with useDocValuesAsStored=false. Hence, Shalin's point 3 now should read something like: {quote} 3. {{fl=\*,a1}} will return all stored=true fields and useDocValuesAsStored=true DV fields. If the 'a1' field is a stored=false DV field with useDocValuesAsStored=false then it will also be returned because it was explicitly asked for. {quote} [0] - https://issues.apache.org/jira/browse/SOLR-8220?focusedCommentId=15071474=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15071474 > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072357#comment-15072357 ] Shalin Shekhar Mangar commented on SOLR-8220: - bq. matchesFlGlob defaults to whatever stored is. Very straight forward to describe; no special exceptions. Useful to set to either true or false in different circumstances depending if it's stored or DV. This would be a separate issue. I don't think we need that level of configurability? Keep in mind that every parameter that we add also needs to be explained to a new user. Having trained people on Solr, explaining the difference between stored/indexed/docvalues is confusing enough. We should avoid adding more complexity if we can. This is another reason why I wanted useDocValuesAsStored to be true by default and un-specified/hidden in all default schemas. Also I don't really understand your reasons for adding such a parameter. Globbing is allowed on doc values field as long as {{useDocValuesAsStored=true}}. In the committed patch, you can do {{fl=a,b,c,d*}} and have {{d*}} match all docvalues fields which are {{useDocValuesAsStored=true}}. But if you set {{useDocValuesAsStored=false}} then globbing will not work. What am I missing? > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072425#comment-15072425 ] David Smiley commented on SOLR-8220: bq. (Yonik) Going forward, why wouldn't one just use docValues on the original field? The main reason is that for returning the top-X docs with more than a few fields, row storage will probably be faster than columnar storage, performance-wise. So at index time you can pay for both if you need columnar for other reasons (sorting/faceting). Another reason is to retain a particular ordering for multi-valued fields. Another reason is that Solr's highlighters don't read from docValues (solvable). bq. (Ishan) David, do you think having the copyField targets to have useDocValuesAsStored as false in our example schemas partly alleviates the problem? Yes, we should do that. Ideally most users wouldn't want to monkey with such parameters (IMO). But most schemas I've seen have at least one occurrence of an original input string indexed two ways for search & sorting/faceting. And if our example schemas do, thus motivating us to set it to false for these fields, it just re-confirms my point. Shalin: I very much care about ease of documenting/explaining this; I thought my comments showed I care. I guess we just see this issue differently. I'm coming around to a new interpretation of what useDocValuesAsStored is, as it was committed and today clarified by you and Ishan. It basically means will a 'fl' glob match the DV field or not. If my understanding is true, then I think this is evidence I'm on to something with my "matchesFlGlob" suggestion. You are free to disagree but I think it's extremely easy to document/describe/teach etc. what matchesFlGlob means, particularly if it's scope is expanded to apply to stored fields too. FWIW I'll be on IRC. My attempts to ping you haven't received a response. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072140#comment-15072140 ] ASF subversion and git services commented on SOLR-8220: --- Commit 1721808 from sha...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1721808 ] SOLR-8220: Improve upgrade notes in CHANGES.txt > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15071396#comment-15071396 ] Ishan Chattopadhyaya commented on SOLR-8220: Erick, this patch (8220) returns DVs only for non-stored fields. So, before this change, the users had no way of returning their multivalued non-stored fields (even field() didn't work). However, your suggested text (as is) is spot on once we have this and 8344 in. For this issue, I think we should document this caveat to suggest that if insertion order for multivalued fields is important for you, then have your field stored=true (and that this work is not for you). > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15071395#comment-15071395 ] Ishan Chattopadhyaya commented on SOLR-8220: Erick, this patch (8220) returns DVs only for non-stored fields. So, before this change, the users had no way of returning their multivalued non-stored fields (even field() didn't work). However, your suggested text (as is) is spot on once we have this and 8344 in. For this issue, I think we should document this caveat to suggest that if insertion order for multivalued fields is important for you, then have your field stored=true (and that this work is not for you). > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15071193#comment-15071193 ] Erick Erickson commented on SOLR-8220: -- When we do the docs, we do need to include a warning about ordering. multiValued fields are returned in sorted order when returned from DV fields, not the original order, right? Up until now, we've promised that multiValued fields are returned in the same order they were inserted into the document so two MV fields can be treated like parallel arrays. This will _not_ be true of DV fields that are multiValued, correct? Suggested text would be something like: "Returning stored fields from docValues (default in schema versions 1.6+) returns multiValued fields in sorted order. If you require the older behavior of multiValued fields being returned in the original insertion order, set useDocValuesAsStored="false" for the individual fields or make sure your schema version is < 1.6. This does _not_ require re-indexing." > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070052#comment-15070052 ] Ishan Chattopadhyaya commented on SOLR-8220: bq. does the DocValues.getDocsWithField allocate a BitSet or just return a pre-existing instance While I tried to track it down, I thought it is backed by a pre-existing bitset instance, as created at the leaf reader level. I didn't think it was a performance concern. Could someone confirm this understanding, please? Moreover, since Yonik suggested its use, I was more confident about using it. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070581#comment-15070581 ] Erick Erickson commented on SOLR-8220: -- LGTM as far as not returning all the DV values with each doc. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068027#comment-15068027 ] Shalin Shekhar Mangar commented on SOLR-8220: - Thanks Ishan. bq. Since, for the fl=* case, we need all non-stored DVs that have useDocValuesAsStored=true, but for the general filtering case of fl=dv1,dv2 we need to filter using all non-stored DVs (irrespective of the useDocValuesAsStored flag) Okay I see what you are saying. The useDocValuesAsStored=true default applies when you request all fields but if you are explicitly asking for a field then we can return it from DVs even if it was marked as useDocValuesAsStored=false. I have mixed feelings about this but I can see where it can be useful e.g. 1st phase of distributed search. bq. . However, I had a look, and found that responseWriters (e.g. JSONResponseWriter) get the whole SolrDocument at the writeSolrDocument() method, from where it does the following call to drop fields it doesn't need Hmm, yeah, we can't do that with doc values, it'd be too expensive. Is there a test which creates a new field with useDocValuesAsStored as true and separately as false using the schema API? I'm assuming you will address Erick's concern above about multi-valued fields. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068075#comment-15068075 ] Ishan Chattopadhyaya commented on SOLR-8220: bq. Is there a test which creates a new field with useDocValuesAsStored as true and separately as false using the schema API? I had added SchemaVersionSpecificBehaviorTest to test for these various true/false cases. However, there is no useDocValuesAsStored=false case with checking of output. I'll add such a test. bq. I'm assuming you will address Erick's concern above about multi-valued fields. I'm working through them. So far as I can see, both the current loop with values.getValueCount() and what Erick suggested as a loop are running identically, i.e values.getValueCount() is indeed returning the count of values per document. But I am adding a test to prove it. For the {{DocValues.getDocsWithField(atomicReader, fieldName).get(docid)}}, not having it was resulting in empty fields being returned for documents that weren't supposed to have an docValue (the user never added a docValue for that document during indexing). Again, I think I should add a specific test for that, testing for the number of fields returned (maybe there already is one from Keith, but I'll check again). > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068319#comment-15068319 ] Erick Erickson commented on SOLR-8220: -- bq: For the DocValues.getDocsWithField(atomicReader, fieldName).get(docid), not having it was resulting in empty fields being returned for documents that weren't supposed to have an docValue (the user never added a docValue for that document during indexing). Right, I had to add a test at the end to avoid that. I didn't track the code thoroughly, but does the DocValues.getDocsWithField allocate a BitSet or just return a pre-existing instance? Or even cache the BitSet somewhere? If it allocates a new BitSet (or even fills up a cache entry), the test at the end might be much less expensive. I didn't track it down though, and if it returns a reference to a cached bitset that will be created _anyway_, then it's just a style thing {code} if (outValues.size() > 0) { sdoc.addField() } {code} As for whether the loop returns all values in the field, I saw this "by inspection" on the techproducts example (with a few mods for adding docValues="true" to the schema). Again, though, this is 4.x after I hacked a backport and put it in an entirely different place in the code, specifically NOT a visitor pattern. So it's entirely possible that the semantics have changed or hacking it into a different part of the code base has a different context. A test would settle it for all time though. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067566#comment-15067566 ] Erick Erickson commented on SOLR-8220: -- WARNING: I'm stealing this code and back-porting to 4.x for my own purposes so this may not pertain to 5x. And I'm not very up on the low-level details. But this loop for reading multiValued fields puts _all_ the multivalued fields for _all_ the docs on the shard into each doc: {code} if (values != null && DocValues.getDocsWithField(atomicReader, fieldName).get(docid)) { values.setDocument(docid); if (values.getValueCount() > 0) { List outValues = new LinkedList(); for (int i = 0; i < values.getValueCount(); i++) { // Iterates more than just this doc, I think all of them! {code} I had more luck with {code} if (values != null) { values.setDocument(docid); List outValues = new LinkedList(); for (int ord = (int) values.nextOrd(); ord != SortedSetDocValues.NO_MORE_ORDS; ord = (int) values.nextOrd()) { {code} Note that I also think this is unnecessary in the if test, the loop above doesn't do anything bad if there are docs with empty fields: {code} DocValues.getDocsWithField(atomicReader, fieldName).get(docid) {code} I changed it to just if (values != null). But I did have to test outValues.size() > 0 before doing the addField after the loop or I got empty braces in the output doc. Again let me emphasize that 1> I don't know this code well, so take this with a grain of salt 2> I needed this for a one-off on the 4.x code line and this may work with 5x just fine as-is. Needless to say what I'm doing will never make into the official project But this saved me a TON of work, glad you're tackling this! > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063741#comment-15063741 ] Ishan Chattopadhyaya commented on SOLR-8220: Btw, just found out that not all query paths actually use a DocsStreamer. I am checking as to what this could be down to. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061581#comment-15061581 ] Shalin Shekhar Mangar commented on SOLR-8220: - Thanks Ishan. # The SolrIndexSearcher.decorateDocValueFields method has a honourUseDVsAsStoredFlag which is always true. We can remove it? # Same for SolrIndexSearcher.getNonStoredDocValuesFieldNames? # The wantsAllFields flag added to SolrIndexSearcher.doc doesn't seem necessary. I guess it was added because the patch adds non stored doc values fields to the 'fnames' but if we can separate out stored fnames from the non-stored doc values to be returned then we can remove this param from both SolrIndexSearcher.doc and SolrIndexSearcher.getNonStoredDocValuesFieldNames # The pattern matching in the DocStreamer constructor makes a bit nervous. Where is the pattern matching done for current stored fields? > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058168#comment-15058168 ] Shalin Shekhar Mangar commented on SOLR-8220: - Hi Ishan, you have misunderstood the "useDocValuesAsStored" parameter. It is supposed to be per-field and not on the entire schema so that you can selectively disable it on fields that you don't want to be treated as if they were stored. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055523#comment-15055523 ] Ishan Chattopadhyaya commented on SOLR-8220: {quote} The patch is failing a few tests with very poor error reporting. Here's a reproducible failure after applying this patch: Test suite: TestManagedSchemaDynamicFieldResource, Seed: -Dtests.seed=C0DE559FF2A0799 Looking into the failure. {quote} It seems this test fails even without the patch. Filed SOLR-8411 for this. However, with this patch, still 5-6 tests fail. But I've so far been unable to reproduce any of them. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036219#comment-15036219 ] Ishan Chattopadhyaya commented on SOLR-8220: bq. I've updated the patch to now check for the schema version to be >=1.6 As per an offline discussion with [~hossman], he mentioned that in the past we've never tied runtime behaviour with specific schema versions. He suggested, as did Yonik above I think, that we use some attribute like docValueAsIfStored=true/false (with a default of false for previous schema versions). I'll try to tackle this and update the patch, and drop the naive check for schema version. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036165#comment-15036165 ] Ishan Chattopadhyaya commented on SOLR-8220: bq. I've updated the patch to now check for the schema version to be >=1.6 In the patch, I did some clumsy >= check for a float. Would've been better to just check for > 1.5. I'll fix it with the next patch. I'm working on bumping up the version of the schema for the out of the box configsets. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036972#comment-15036972 ] Keith Laban commented on SOLR-8220: --- My only concern is that we may have to add a flag for this and for whatever we decide in SOLR-8344 which is just going to add to he confusion. Thoughts? > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
Please follow the instructions here to unsubscribe. Note, you _MUST_ use the exact same e-mail you subscribed with. See the "unsubscribe" link here: http://lucene.apache.org/solr/resources.html If you have problems, have you tried to follow the instructions here? https://wiki.apache.org/solr/Unsubscribing%20from%20mailing%20lists Best, Erick On Wed, Dec 2, 2015 at 5:55 PM, Johny Johnwrote: > dev-unsubscr...@lucene.apache.org > > Sent from Windows Mail > > From: Keith Laban (JIRA) > Sent: ‎Thursday‎, ‎December‎ ‎3‎, ‎2015 ‎6‎:‎19‎ ‎AM > To: dev@lucene.apache.org > > > [ > https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036972#comment-15036972 > ] > > Keith Laban commented on SOLR-8220: > --- > > My only concern is that we may have to add a flag for this and for whatever > we decide in SOLR-8344 which is just going to add to he confusion. Thoughts? > >> Read field from docValues for non stored fields >> --- >> >> Key: SOLR-8220 >> URL: https://issues.apache.org/jira/browse/SOLR-8220 >> Project: Solr >> Issue Type: Improvement >>Reporter: Keith Laban >> Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, >> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, >> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, >> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, >> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, >> SOLR-8220.patch >> >> >> Many times a value will be both stored="true" and docValues="true" which >> requires redundant data to be stored on disk. Since reading from docValues >> is both efficient and a common practice (facets, analytics, streaming, etc), >> reading values from docValues when a stored version of the field does not >> exist would be a valuable disk usage optimization. >> The only caveat with this that I can see would be for multiValued fields >> as they would always be returned sorted in the docValues approach. I believe >> this is a fair compromise. >> I've done a rough implementation for this as a field transform, but I >> think it should live closer to where stored fields are loaded in the >> SolrIndexSearcher. >> Two open questions/observations: >> 1) There doesn't seem to be a standard way to read values for docValues, >> facets, analytics, streaming, etc, all seem to be doing their own ways, >> perhaps some of this logic should be centralized. >> 2) What will the API behavior be? (Below is my proposed implementation) >> Parameters for fl: >> - fl="docValueField" >> -- return field from docValue if the field is not stored and in >> docValues, if the field is stored return it from stored fields >> - fl="*" >> -- return only stored fields >> - fl="+" >>-- return stored fields and docValue fields >> 2a - would be easiest implementation and might be sufficient for a first >> pass. 2b - is current behavior > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
dev-unsubscr...@lucene.apache.org Sent from Windows Mail From: Keith Laban (JIRA) Sent: ‎Thursday‎, ‎December‎ ‎3‎, ‎2015 ‎6‎:‎19‎ ‎AM To: dev@lucene.apache.org [ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036972#comment-15036972 ] Keith Laban commented on SOLR-8220: --- My only concern is that we may have to add a flag for this and for whatever we decide in SOLR-8344 which is just going to add to he confusion. Thoughts? > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035238#comment-15035238 ] Keith Laban commented on SOLR-8220: --- This still needs work around bumping the schema version, this impl changes the default behavior for globs > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033868#comment-15033868 ] Yonik Seeley commented on SOLR-8220: bq. It sounds like you are arguing for a common way to access docvalues and stored fields using the 'fl' parameter. I'm +1 to that. Ah, ok... I mis-read your previous comment of "i.e. keep this issue focused on adding syntactic sugar to read field from doc values for non-stored fields" as advocating for new syntax for "fl" to load from non-stored docValue fields. bq. Let's discuss this optimization in SOLR-8344 and keep the two issues separate. I didn't really see it as separate (it depends on how you look at it), I see it more as, we have a new feature that treats docValues as "column-stored". What should the default behavior be when all requested fields are both column-stored and row-stored? I think we can make progress + commit this issue separately, but should still come at SOLR-8344 "fresh" (i.e. not put the burden of proof on one default more than the other). > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033899#comment-15033899 ] Shalin Shekhar Mangar commented on SOLR-8220: - Agreed. Let's wrap this up. [~ichattopadhyaya] or [~k317h], can one of you put up an updated patch? I can review and commit. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033926#comment-15033926 ] Keith Laban commented on SOLR-8220: --- I found some issues with the way this patch works and have some updated tests in separate patch which I haven't uploaded, I need to clean it up a bit and can submit it tonight. One issues I found is that if a document doesn't have a field value for a dv field it will get an empty value in the response, we should probably check with in the doc value api for docs with field, to make sure that that document had a value. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033973#comment-15033973 ] Ishan Chattopadhyaya commented on SOLR-8220: Afaik, there's no way to discern if a singly valued non-stored dv field was added to the document or not, since a singly valued dv field either gets the value provided during the indexing or it picks up the default. Any suggestions how to deal with that? > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034013#comment-15034013 ] Erick Erickson commented on SOLR-8220: -- Random thought that occurred to me before coffee, so be warned. The initial statement here is 'Many times a value will be both stored="true" and docValues="true" ', then there was a lot of discussion about efficiencies etc... Why are we trying to anticipate "the right thing to do"? It would be simpler to code something like: > If the field is stored=true, return the stored value (don't even need to look > whether DV is true or not). > If the field is stored=false and docValues=true, return the DV value. Now it's totally under the control of the user which path is chosen through the schema definition; we don't have to try to guess anything. No new syntax. Maybe with a new "best practice" or something. There would be a learning curve for users around using only docValues=true for efficiency and _not_ setting stored=true. Not quite sure what to do if the user defines both however, perhaps use the stored value? The thing one does lose is the ability to get 2 and 1. from the _same_ field, so there would be the added burden on the user of having to have two fields, one stored-only one dv-only if that distinction was important. And in the "wild and crazy" department (and for a different ticket _entirely_) we could consider disallowing fields with both docValues=true and stored=true. Not advocating this last, just throwing out for discussion. Let me emphasize that I don't have any investment in doing things this way, and apologies for thinking of this so late in the game. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034017#comment-15034017 ] Yonik Seeley commented on SOLR-8220: For strings, an ord of -1 is "missing" For numerics, you can use DocValues.getDocsWithField(); > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031910#comment-15031910 ] Yonik Seeley commented on SOLR-8220: bq. From a performance perspective, reading values from DocValues always (if they exist) can be horrible because each field access in docvalues may need a random disk seek, whereas, all stored fields for a document are kept together and need only 1 random seek and a sequential block read. A few points: - stored fields also require decompression (more overhead) - use of stored fields and docvalues at the same time is less memory efficient - the stored fields will also take up needed disk cache (although hopefully the OS will figure out which it should cache more aggressively - presumably one has docvalues because they need to be used, and they need to be fast... i.e. they already need to be cached. - if one as a small set of fields that are normally retrieved, it seems like a win again. - a *very* common case these days is that the entire index fits in memory. - we're in the SSD era, and multiple "seeks" will still be more expensive if not cached, but much less so (and less so over time as non-volatile storage keeps improving) It seems like this should be a big win for the common case, and the ability to reindex your data or change config and not have to change the clients is important IMO. It's like being able to reindex a date to a trie-date and have the clients not care. We can already reindex a field as docValues, and sort, facet, do analytics, without changing client requests. Optimizations to field value retrieval (or optionally removing redundantly stored data) should be the same. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031935#comment-15031935 ] Shalin Shekhar Mangar commented on SOLR-8220: - bq. It seems like this should be a big win for the common case, and the ability to reindex your data or change config and not have to change the clients is important IMO. It sounds like you are arguing for a common way to access docvalues and stored fields using the 'fl' parameter. I'm +1 to that. But are you also arguing for always loading fields from docvalues even if they are stored? > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032230#comment-15032230 ] Shalin Shekhar Mangar commented on SOLR-8220: - bq. On a different note, if we are going to tackle only the non-stored docValues fields for now in this issue, does it now make sense to do this, performance wise, at the DocTransformer instead of the SolrIndexSearcher? I don't think there's any difference performance-wise. Changes to DocStreamer should be enough as it is called only for writing the response and not the entire result-set. bq. At this point the question that remains; should we move forward with these patches and move logic for retrieving dv fields to SolrIndexSearcher, leaving out *, *_foo and other optimizations for now? i.e. retrieve fields by name, if they exist in dv, but are not stored. +1 let's create a patch to retrieve fields by name, if they exist in dv, but are not stored. I also like Yonik's idea of bumping the schema version to have fl=* return all fields (stored + non-stored docvalues) in 5.x and to include both by default in trunk (6.x). So +1 to that as well. bq. a very common case these days is that the entire index fits in memory. I propose a middle ground. Let's use Lucene's spinning disk utility method and prefer docvalues if we detect a SSD and fallback to reading from stored fields otherwise. Let's discuss this optimization in SOLR-8344 and keep the two issues separate. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031289#comment-15031289 ] Keith Laban commented on SOLR-8220: --- bq. From a performance perspective, reading values from DocValues always (if they exist) can be horrible because each field access in docvalues may need a random disk seek, whereas, all stored fields for a document are kept together and need only 1 random seek and a sequential block read. I agree that reading values from DocValues always can be horrible, my line of thinking was that if you are asking for one or two fields from a large document and they are both dv and stored reading from dv would likely be much more efficient. It might make more sense to be able to get those values explicitly from docValues using the the transformer, or have some logic that can determine when it is more efficient. That should be discussed further in [SOLR-8344]. At this point the question that remains; should we move forward with these patches and move logic for retrieving dv fields to SolrIndexSearcher, leaving out {{\*}}, {{\*\_foo}} and other _optimizations_ for now? i.e. retrieve fields by name, if they exist in dv, but are not stored. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15028962#comment-15028962 ] Shalin Shekhar Mangar commented on SOLR-8220: - Guys, sorry for not paying attention to this earlier but on a quick reading through the comments and an offline conversation with Ishan, I want to point out a few things. bq. Theoretical optimization, will skip reading from stored fields if all the requested fields are available in docValues bq. I don't think some of the Lucene folks want docValues modeled as stored fields at the Lucene level. >From a performance perspective, reading values from DocValues always (if they >exist) can be horrible because each field access in docvalues may need a >random disk seek, whereas, all stored fields for a document are kept together >and need only 1 random seek and a sequential block read. That, and the fact >that docvalues aren't in the document cache makes me think that we should not >model docvalues as a stored field and treat them equivalently. At least not >without supporting benchmarks. So my suggestion is that we not mix the two issues i.e. keep this issue focused on adding syntactic sugar to read field from doc values for non-stored fields in whatever ways proposed. By the way, this is already possible using the 'field' DocTransformer e.g. fl=field(my_dv_field) > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15029079#comment-15029079 ] Ishan Chattopadhyaya commented on SOLR-8220: bq. So my suggestion is that we not mix the two issues I've created SOLR-8344 to deal with that optimization of reading stored valued fields from docvalues, so that we can focus only on the non-stored dv case (which is a functionality improvement/new feature, as opposed to an optimization). bq. By the way, this is already possible using the 'field' DocTransformer e.g. fl=field(my_dv_field) Indeed. This currently doesn't work for multivalued fields. On a different note, if we are going to tackle only the non-stored docValues fields for now in this issue, does it now make sense to do this, *performance wise*, at the DocTransformer instead of the SolrIndexSearcher? If done that way, the RTG and atomic updates for such non-stored dv fields will work if we also use a doctransformer after the searcher has returned the docs. [~ysee...@gmail.com], [~erickerickson], [~shalinmangar] do you have any thoughts/recommendation, please? > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025770#comment-15025770 ] Ishan Chattopadhyaya commented on SOLR-8220: I was thinking of doing exactly that! I'll raise another jira for it. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025652#comment-15025652 ] Ishan Chattopadhyaya commented on SOLR-8220: Just noticed, EnumFieldTest.testEnumSort() fails with the last patch (and the one before it) when the severity_dv is chosen as the enum field. I think we're handling the enum dv fields incorrectly. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025686#comment-15025686 ] Keith Laban commented on SOLR-8220: --- I'm working on a separate patch which fixes EnumField, it also adds support for "*_dv" type queries. I'll take a look at merging your change in too. Do you think it would be worth adding an interface for SolrDocument and SolrInputDocument to implement which includes {{containsKey} and {{addField}} > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018489#comment-15018489 ] Ishan Chattopadhyaya commented on SOLR-8220: {quote} We are adding fields retrieved from docValues by doing the following: {code} doc.add(schemaField.getType().createField(schemaField, sdv.get(docid).utf8ToString(), 1.0f)); {code} this {{createField}} call is returning {{null}} based on the code I wrote above. Perhaps we need to create fields differently, or change how {{createField}} works. {quote} [Referencing this comment from SOLR-8316]. Can we "decorate" the SolrDocument in DocStreamer instead of trying to do that with the StoredDocument from lucene? That will give us the benefits: (a) we won't need to fix SOLR-8316, (b) we can leave the StoredDocument as is, and not change it from under the document cache (which is probably an awkward thing with the current patch), (c) it has efficient containsKey(), if needed, so the linear O(n) cost can be avoided. Though, point b will mean we won't need containsKey() anyway. This also means that SOLR-8276 will have to change, and there we have decorate a SolrInputDocument instead of a SolrDocument. Keith, Yonik, what do you think? > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018760#comment-15018760 ] Keith Laban commented on SOLR-8220: --- It looks like calling {{createFields}} instead actually creates both the fields we need. I was able to change decoration to something like this {code} SortedDocValues sdv = leafReader.getSortedDocValues(fieldName); for(StorableField s: schemaField.getType().createFields(schemaField, sdv.get(docid).utf8ToString(), 1.0f)) { if(s != null) doc.add(s); } {code} which makes the the SortedDocValueField get added to the document properly, but when trying to write the string value later on it doesn't write anything because the implementation in {{Field}} doesn't know how to write a {{BytesRef}}. We can override this is in {{SortedDocValueField}} but all of that stuff is in lucene code. It looks like {{StrField#createFields}} converts the string value to a {{BytesRef}} for the constructors of the doc values fields. I'm still a bit confused how [SOLR-8276] works for you, i get a NPE when trying pull back the non-indexed/non-stored field in the current impl. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018805#comment-15018805 ] Ishan Chattopadhyaya commented on SOLR-8220: bq. I'm still a bit confused how SOLR-8276 works for you, i get a NPE when trying pull back the non-indexed/non-stored field in the current impl. I added another document (id=4) to the test at SOLR-8276. I see no problems whatsoever with string dv fields (single valued), which internally uses SortedDocValues. The test passes fine. Also, the test at BasicFunctionalityTest works fine with the {{test_s_dvo}} field. Both SOLR-8276 and the latter test use the latest patch here. So, as per the tests, the createField seems to do its job. Am I missing something? However, beyond this point, should we avoid using the schemaField.getType().createField() for fields in the StoredDocument (lucene) and instead do this decoration on the SolrDocument which is created from this StoredDocument? (See my comment before this one). > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013814#comment-15013814 ] Keith Laban commented on SOLR-8220: --- Created [SOLR-8316] for my last point > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013699#comment-15013699 ] Keith Laban commented on SOLR-8220: --- [~ysee...@gmail.com]: re. * vs ** I'm all for using just one glob pattern {{*}} for both doc values and stored, however I think its worth it to consider the philosophical implications on backwards compat. While it won't break anything it does introduce some unexpected behavior without much warning or a way to disable it. I propose we add a {{fl.wildcardDV=true}} option to turn on this behavior in Solr 5x but enable it by default in 6. We can optionally later add a field type option where you can use docValues but not have the field returned in your result set. Ishan I can tackle those. Regarding my earlier update about modifying {{Field}}, its doesn't seem as trivial as I originally thought as {{createField}} doesn't set {{DocValuesType}} there is a separate field that gets created for doc values after {{createField}} is called for the first time. I'm not sure what the implications of modifying this behavior would be. I think it's ok to leave this limitation in for now. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013633#comment-15013633 ] Ishan Chattopadhyaya commented on SOLR-8220: {quote} {noformat} fl=myfield // returns from either stored or docValues fl=*_i // returns all stored or docValues fields ending in _i fl=*// returns all stored fields and all docValues fields that are not stored {noformat} {quote} This is a TODO. Also, need to add more tests around multivalued fields (here and in SOLR-8276). As for LazyFields for non stored docvalues, I think it is an optimization that we can deal with later in a separate issue. Doing what we have in this patch itself is progress. If some committer can please take this forward, can we get this in for 5.4? Keith, do you wish to tackle the above TODO (and other todo items)? If so, I will focus on SOLR-5944 for now. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013915#comment-15013915 ] Yonik Seeley commented on SOLR-8220: bq. I think you meant a type.docValuesType() == DocValuesType.NONE. I agree, we should make the change. I don't think some of the Lucene folks want docValues modeled as stored fields at the Lucene level (i.e. you don't index them that way, and you don't retrieve them that way). One possible option is just move to something higher lievel like SolrDocument. bq. I propose we add a {{fl.wildcardDV=true}} option to turn on this behavior in Solr 5x but enable it by default in 6. We could bump the schema number to change the default, and that would enable the folks on 5x to get transparent migration from stored fields to docValues if they want w/o having to change clients / query params. And if finer grained control is desirable, a schema field flag actAsStored=true (or whatever better name people come up with) could have it's default set differently based on the schema version. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013542#comment-15013542 ] Keith Laban commented on SOLR-8220: --- I'm not sure how this should be handled. https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/document/Field.java#L241 needs to be modified to {code} if (!type.stored() && type.indexOptions() == IndexOptions.NONE && type.docValuesType() != DocValuesType.NONE) { throw new IllegalArgumentException("it doesn't make sense to have a field that " + "is neither indexed nor stored nor docValues"); } {code} > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013555#comment-15013555 ] Ishan Chattopadhyaya commented on SOLR-8220: I think you meant a {{type.docValuesType() == DocValuesType.NONE}}. I agree, we should make the change. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011570#comment-15011570 ] Ishan Chattopadhyaya commented on SOLR-8220: Thanks for the review, Keith. bq. 1. [...] This could potentially be very expensive to compute for every singe field for ever single document and also add unnecessary GC pressure by creating new HashSet for all the fields for every single document. I was aware of this, and wanted to fix this as part of the "cleanup / refactoring" I promised. bq. {{doc.getField(fieldName)==null}} the doc fields are a list so this will be O( n ) for each lookup. I used that to ensure we're not re-adding unstored docvalues a second time to the same document. This is necessary here so that we don't re-add such fields to a document was obtained from the documentCache and already has all unstored docvalues in it. I can create a set of fields inside the {{StoredDocument}} class so that a hasField lookup can be speeded up. However, given that it is a Lucene class, I have left this be. Any suggestions? bq. 3) Re multivalued fields: doing introspection for every single value for field for every document is not fast. I think it shouldn't be a problem. In modern JVMs, the {{instanceof}} has negligible cost. However, I will do it once per multivalued field in my next patch. bq. 4) {{SchemaField schemaField = schema.getField(fieldName);}} this throws an exception if the field name is not in the schema (think typos in FL) If it is a dynamic field, it will still work; a wrong field name won't work here. Shouldn't a wrong field name throw an exception, rather than silently dropping it? I am split either ways. bq. This creates a whole bunch of new objects which could be slow and cause a lot of GC pressure, although it may not be an issue. I think this creates at most only the value source object, which isn't too bad. Internally, it uses the docvalues API. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011760#comment-15011760 ] Keith Laban commented on SOLR-8220: --- bq. I used that to ensure we're not re-adding unstored docvalues a second time to the same document. This is necessary here so that we don't re-add such fields to a document was obtained from the documentCache and already has all unstored docvalues in it. I can create a set of fields inside the StoredDocument class so that a hasField lookup can be speeded up. However, given that it is a Lucene class, I have left this be. Any suggestions? This shouldn't be an issue since the hook is called after caching is done. This could get really expensive if you are getting a few thousand documents that have hundreds of fields. I think the real issue is how do we cache this efficiently. I think that will require modifying LazyDocument, (see my comments above) bq. If it is a dynamic field, it will still work; a wrong field name won't work here. Shouldn't a wrong field name throw an exception, rather than silently dropping it? I am split either ways. This is more a backwards compat thing. What is current behavior for stored fields? bq. I think this creates at most only the value source object, which isn't too bad. Internally, it uses the docvalues API. for a string field, getValueSource creates a new StrFieldSource and getValues creates a new DocTermsIndexDocValues. Both of these closures add overhead especially if you're doing this hundreds of times for thousands of documents > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011854#comment-15011854 ] Ishan Chattopadhyaya commented on SOLR-8220: bq. This shouldn't be an issue since the hook is called after caching is done. Even if this is called after the document has been added to the cache, this decorate() method changes the same doc object that has been added to the cache. And hence, next time the document is fetched from the cache, it will contain the previously decorated docvalues as part of the stored doc from the cache. I'll look at what it will take to modify the LazyDocument to make this work differently. Are you already looking into it, or have some thoughts around it? bq. Both of these closures add overhead especially if you're doing this hundreds of times for thousands of documents Yes, that makes sense; I hadn't noticed the second object getting created. We should avoid this overhead if possible. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011925#comment-15011925 ] Keith Laban commented on SOLR-8220: --- bq. If there is a need to distinguish between docValues as an alternative to a stored field I think this would be the case only for multi valued fields at least until we had an alternative version of docValue multi valued preserving the original field (i.e. not sorted, not set) using something like BinaryDocValues underneath as you mentioned earlier. bq. I'll look at what it will take to modify the LazyDocument to make this work differently. Are you already looking into it, or have some thoughts around it? Doing this properly requires us to be able to know all the possibly docValue fields on a document upfront and a way for LazyDocument to be able to load the lazy field from doc values. A large goal of this should be to have the ability to skip reading stored fields altogether if the field requirement is fully satisfied by docValues. However I'm not sure if using docValues would be more efficient than stored fields when all the fields are being returned. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011950#comment-15011950 ] Yonik Seeley commented on SOLR-8220: bq. I think this would be the case only for multi valued fields at least until we had an alternative version of docValue multi valued preserving the original field (i.e. not sorted, not set) using something like BinaryDocValues underneath as you mentioned earlier. Yup, I agree. I think this is just a case of us having incomplete type support. We need to distinguish between multiValued and setValued in general. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011884#comment-15011884 ] Yonik Seeley commented on SOLR-8220: bq. Added a ~ glob, similar to *. fl= here means: return all conventional stored fields and all non stored docvalues. Purely from an interface perspective (I haven't looked at the code), it feels like this should be transparent. It would be nice to be able to transition from an indexed+stored field to an indexed+docValues field and not have any of the clients know/care. {code} fl=myfield // returns from either stored or docValues fl=*_i // returns all stored or docValues fields ending in _i fl=*// returns all stored fields and all docValues fields that are not stored {code} If there is a need to distinguish between docValues as an alternative to a stored field, and docValues as an implementation detail that you don't want to return to the user (say you transitioned from an indexed-only field to an indexed+docValues field or docValues-only field), then we could introduce a field flag for the schema. Something like includeInStored=true/false or asStored=true/false > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012076#comment-15012076 ] Yonik Seeley commented on SOLR-8220: The set of fields that have docValues and are not stored can be computed once per index snapshot (from the FieldInfos+schema). There should be no performance impact if there are no un-stored docValues fields in use. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012580#comment-15012580 ] Yonik Seeley commented on SOLR-8220: bq. Does that sound fine? Yep. Seems like we will only have a perf issue when we have many sparse un-stored docValue fields. At that point it might make sense to have a separate docValues field that contains the list of fields for the document. That can be saved for a future optimization though. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012633#comment-15012633 ] Erick Erickson commented on SOLR-8220: -- Given that stored values are compressed in 16k blocks and to return a single field from the stored data requires decompressing 16K, how much effect do LazyDocuments really have any more? I don't know, just askin' Since docValues avoids decompressing 16k per doc, disk seeks and the like I strongly suspect that it is vastly more efficient than getting the stored values. That's how Streaming Aggregation can return on 200k-400k docs/second. All that said, I suspect that there are negligible savings (or perhaps even costs) in mixing the two, i.e. if _any_ field to be returned is not DV, you might as well return all the fields from the stored data. Testing would tell though. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012035#comment-15012035 ] Ishan Chattopadhyaya commented on SOLR-8220: bq. Purely from an interface perspective (I haven't looked at the code), it feels like this should be transparent. That makes sense, having {{*}} return all stored and non-stored docvalues. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012826#comment-15012826 ] Yonik Seeley commented on SOLR-8220: Back in the day, LazyField actually had a pointer directly into the index where the field value could be read. That got remove from Lucene at some point, and was replaced with something just for compat sake IIRC that had an N^2 bug... doc was loaded on each lazy-field access, which Hoss found/fixed. But that leaves less performance benefit to using LazyDocument. On a quick look, it seems to load all lazy fields at once when the first lazy field is touched. I guess these days it's more of a memory optimization than a performance one. Might be worth considering new approaches (we can break back compat in trunk for 6.0). Or maybe subclass LazyDocument and do something different for docValues fields. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012795#comment-15012795 ] Ishan Chattopadhyaya commented on SOLR-8220: bq. Caveats being: I think we can live with those caveats for now, and optimize later. A docValues fields containing list of other docValues fields sounds nice for a later optimization. Given that this is a functional improvement, as is, over what we have today (i.e. no ability to return nonstored docValues), we should carry on with it and optimize later to address those caveats. bq. Theoretical optimization, will skip reading from stored fields if all the requested fields are available in docValues. (changes mostly to DocStreamer) Sounds good. It would be interesting to perf test this to measure the performance gains with doing this. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012103#comment-15012103 ] Ishan Chattopadhyaya commented on SOLR-8220: bq. The set of fields that have docValues and are not stored can be computed once per index snapshot (from the FieldInfos+schema). That is what I have done at the time of searcher creation (in SolrIndexSearcher's constructor) in my last patch. Does that sound fine? > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009192#comment-15009192 ] Keith Laban commented on SOLR-8220: --- I see some potential issues here, mostly performance concerns: 1) {code} if (wantsAllNonStoredDocValues) { Set dvFields = new HashSet<>(); for (int i=0; iRead field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009100#comment-15009100 ] Ishan Chattopadhyaya commented on SOLR-8220: The patch causes a regression with score fields, I'll look into it in a while. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008302#comment-15008302 ] Jan Høydahl commented on SOLR-8220: --- How about using {{fl=\*\*}} as glob instead of {{\~}}? In Lucene, {{~}} normally indicates fuzziness of some kind, while {{**}} in ant lingo means all items recursively... Not important though :) > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15008382#comment-15008382 ] Ishan Chattopadhyaya commented on SOLR-8220: ** seems like a better idea than ~. I couldn't get + to work, as Keith suggested, I guess it was getting confused with a positive sign. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006669#comment-15006669 ] Yonik Seeley commented on SOLR-8220: {quote} 2) There is no metadata (that I can find) stored for each document that says whether it has an unstored docValue field, so efficiently loading docValues fields based on FL=* would be difficult. {quote} In SolrIndexSearcher there is {code} private final FieldInfos fieldInfos; private final Collection fieldNames; {code} That gives you the full set of fields actually in-use by the index. Useful for dealing with dynamicFields, where the names aren't explicitly listed in the schema. In the future, perhaps we should have some meta-data about what fields each document contains. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007107#comment-15007107 ] Keith Laban commented on SOLR-8220: --- And for the specified FL scenario, LazyDocument assumes all the possible fields have been accounted for in the document and the not yet loaded fields have a bit offset for quick access in the future. Right now there's no way to cache a partially loaded document unless the whole stored document is visited to pre populate the lazy field with the required information. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007093#comment-15007093 ] Keith Laban commented on SOLR-8220: --- I'm talking more specifically about the "FL=*" scenario > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007068#comment-15007068 ] Keith Laban commented on SOLR-8220: --- Thanks [~ichattopadhyaya], this is my first time submitting a patch. This was originally done against 5.3.1 in the future i'll submit it based on trunk. I also noticed that there is a bug in the patch I submited. The line which reads {code} if(null == sf){ return doc; } {code} should be {code} if(null == sf){ continue; } {code} [~ysee...@gmail.com] I think that we definitely need something like that. Currently CompressingStoredFieldsReader.visitDocument needs to visit the whole document in stored field to know which fields there are. Ideally we would 1) be able to avoid doing any work in stored fields if they can instead by read out of docValues. 2) have a mechanism for LazyDocument to know to read the lazy field from docValues instead of from stored fields. 3) know about values which aren't stored but should be read from docValues. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007091#comment-15007091 ] Yonik Seeley commented on SOLR-8220: bq. 1) be able to avoid doing any work in stored fields if they can instead by read out of docValues. We should be able to do this optimization regardless? For the fields requested, we can check the schema to see if they have docValues. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006538#comment-15006538 ] Ishan Chattopadhyaya commented on SOLR-8220: Thanks for the patch, Keith. I couldn't apply the patch. Is this based on trunk? I saw that SolrIndexSearcher.doc() method returns {{StoredDocument}} on trunk, but it seems your patch assumes a {{Document}}. If this isn't based on trunk, can you please re-work the patch and update it to trunk? Also, an SVN patch would be easier for most developers to work with. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980479#comment-14980479 ] Yonik Seeley commented on SOLR-8220: bq. +1 to doing this. I think this will be useful for SOLR-5944, and was anyway planning to split this functionality out into its own issue. Ah, right, atomic updates needs this functionality as well if we are to allow docValues fields that aren't stored. In that case I'll amend my previous comments around ResultContext... that's appropriate for decorating documents as they are being returned, but perhaps not low enough level for other use cases. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980449#comment-14980449 ] Ishan Chattopadhyaya commented on SOLR-8220: +1 to doing this. I think this will be useful for SOLR-5944, and was anyway planning to split this functionality out into its own issue. Also, if we go forward with \_version\_ field as docvalues field, then this becomes important. In current Solr, the way to read non-stored docValues fields is to use a function query, field(mydvfield). [~k317h] Are you planning to work on this / have a patch for this? If not, then I can give it a try and have SOLR-5944 depend on it. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980731#comment-14980731 ] Keith Laban commented on SOLR-8220: --- There are two approaches I can see: 1) implement a new type of StoredFieldReader which is aware of field type (i.e. does it have docValues, is it stored). This reader would delegate between reading from docValues or the stored fields 2) in the [doc:https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L736] function do two passes. first pass to get stored fields, a second pass to get docValues. This can go a step further to make the SetNonLazyFieldSelector aware of docValues fields and instruct the the reader to not load fields which are known to be in docValues. thoughts? > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979837#comment-14979837 ] David Smiley commented on SOLR-8220: FYI I have a patch in SOLR-5478, and I've done it without patching Solr as well using a different technique. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977537#comment-14977537 ] Erick Erickson commented on SOLR-8220: -- The only surprising behavior I see with this approach is that indexing 2.0 would return different values when returned from docValues and when returned from stored, the DV return might be something like 1. or even 2.0 would be a "surprise". I'm not against the idea, just want this out there. And one side benefit that's not entirely obvious. In sharded situations, the first pass returns the candidate list ID and "sort criteria". The way it's written last I knew was it returned stored values, which required decompression because it gets the stored field. If all the sort fields were DV, then we wouldn't have to do this. This can't be the complete story since you can index but not store a sort field and distributed works, but it's one path I believe I've seen. It's an open question how to wire that in to standard search for a field that's stored, and a DV field. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977560#comment-14977560 ] Keith Laban commented on SOLR-8220: --- I believe sorting already reaps the benefits of doc values at least according to [this documentation|https://cwiki.apache.org/confluence/display/solr/DocValues] > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977575#comment-14977575 ] Yonik Seeley commented on SOLR-8220: bq. reading values from docValues when a stored version of the field does not exist would be a valuable disk usage optimization. +1, and I've heard a number of users request this. bq. 1) There doesn't seem to be a standard way to read values for docValues, facets, analytics, streaming, etc, all seem to be doing their own ways, perhaps some of this logic should be centralized. See ReturnFields / ResultContext, that's currently where stored field handling is centralized, and handles anywhere a field list (or pseudo-fields / transformers) is specified. +1 for 2a as a first pass. For bonus points, prevent stored fields from being loaded at all when not needed. This gets us a big step closer to having the normal request handler have the same performance as "/export". Looking beyond the first pass, it might be nice to use docValues as more of a first-class alternate "stored" mechanism, and consider them part of "*". If for some reason it's desirable to treat some docValues fields as stored, and others not, we could introduce a flag on in the schema. bq. The only caveat with this that I can see would be for multiValued fields as they would always be returned sorted in the docValues approach. I believe this is a fair compromise. This shouldn't be much of a concern for approach 2a, but another future option would be to add explicit set types, and also implement list-type multi-valued docValues fields... prob using binary docValues under the covers). > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977602#comment-14977602 ] Yonik Seeley commented on SOLR-8220: bq. 3> shard2 returns its candidate top 10 to shard1. This return packet has the top 10 doc IDs and sort criteria. We currently pull the sort criteria from the field comparators implementing the sort: https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java#L638 > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977592#comment-14977592 ] Erick Erickson commented on SOLR-8220: -- I'm not talking about sorting here. The sorting during scoring certainly uses DV values. Here's the sequence (two shards, no followers, rows=10, illustrating with just the action on shard2) 1> shard1 gets the incoming request 2> shard1 sends a sub-request for candidate top 10 to shard2 3> shard2 returns its candidate top 10 to shard1. This return packet has the top 10 doc IDs and sort criteria. 4> shard1 combines the lists of candidate top 10 docs and picks the true top 10 5> shard1 asks shard2 for the docs that came from shard2 and made it into the sorted top 10 list. What I'm talking about is step <3>. Last I knew, this did not use the DV fields, but pulled the stored value thus decompressing. I'm not sure how this is resolved for fields that aren't stored, there must be a fallback. Or I'm missing something here, wouldn't be the first time. Maybe Yonik's idea of making DV fields more "first class citizens" would just take care of the issue entirely. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977591#comment-14977591 ] Yonik Seeley commented on SOLR-8220: bq. The way it's written last I knew was it returned stored values, which required decompression because it gets the stored field. Right, we should change this (and this issue may handle that out-of-the-box, if we chose not to store the "id" field). bq. If all the sort fields were DV, then we wouldn't have to do this. The sort field values returned in the first phase of distributed search aren't obtained from stored field values. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977601#comment-14977601 ] Erick Erickson commented on SOLR-8220: -- bq: The sort field values returned in the first phase of distributed search aren't obtained from stored field values. Thanks, I suspect I was inappropriately generalizing from the . Which answers how sort values get returned from non-stored fields > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org