[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-8220: -- Description: useDocValuesAsStored Many times a value will be both stored="true" and docValues="true" which requires redundant data to be stored on disk. Since reading from docValues is both efficient and a common practice (facets, analytics, streaming, etc), reading values from docValues when a stored version of the field does not exist would be a valuable disk usage optimization. The only caveat with this that I can see would be for multiValued fields as they would always be returned sorted in the docValues approach. I believe this is a fair compromise. I've done a rough implementation for this as a field transform, but I think it should live closer to where stored fields are loaded in the SolrIndexSearcher. Two open questions/observations: 1) There doesn't seem to be a standard way to read values for docValues, facets, analytics, streaming, etc, all seem to be doing their own ways, perhaps some of this logic should be centralized. 2) What will the API behavior be? (Below is my proposed implementation) Parameters for fl: - fl="docValueField" -- return field from docValue if the field is not stored and in docValues, if the field is stored return it from stored fields - fl="*" -- return only stored fields - fl="+" -- return stored fields and docValue fields 2a - would be easiest implementation and might be sufficient for a first pass. 2b - is current behavior was: Many times a value will be both stored="true" and docValues="true" which requires redundant data to be stored on disk. Since reading from docValues is both efficient and a common practice (facets, analytics, streaming, etc), reading values from docValues when a stored version of the field does not exist would be a valuable disk usage optimization. The only caveat with this that I can see would be for multiValued fields as they would always be returned sorted in the docValues approach. I believe this is a fair compromise. I've done a rough implementation for this as a field transform, but I think it should live closer to where stored fields are loaded in the SolrIndexSearcher. Two open questions/observations: 1) There doesn't seem to be a standard way to read values for docValues, facets, analytics, streaming, etc, all seem to be doing their own ways, perhaps some of this logic should be centralized. 2) What will the API behavior be? (Below is my proposed implementation) Parameters for fl: - fl="docValueField" -- return field from docValue if the field is not stored and in docValues, if the field is stored return it from stored fields - fl="*" -- return only stored fields - fl="+" -- return stored fields and docValue fields 2a - would be easiest implementation and might be sufficient for a first pass. 2b - is current behavior > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar >Priority: Major > Fix For: 5.5, 6.0 > > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > useDocValuesAsStored > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-8220: Attachment: SOLR-8220.patch Thanks Ishan. Changes: # Create unmodifiable map in SolrIndexSearcher's constructor so that we don't create unmodifiable instances on every call to SolrIndexSearcher.getNonStoredDVs # Fixed javadocs for ReturnFields.getLuceneFieldNames to note that it can return null even if ignoreWantsAll=true # Optimized field selection in DocsStreamer to create fewer objects. We iterate over requested field names and do intersection in the loop. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-8220: Attachment: SOLR-8220.patch # Removed extra license header in schema-non-stored-docvalues.xml which failed the parsing # Fixed the code comment in TestUseDocValuesAsStored2.java that is no longer valid now that we return useDocValuesAsStored=false field if they are explicitly requested in addition to all fields. I'll commit this shortly. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-8220: Attachment: SOLR-8220-branch_5x.patch Patch for branch_5x. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220.patch Added a code comment to the previous patch. Plus very minor cleanup at DocsStreamer's constructor. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-8220: Attachment: SOLR-8220.patch Improved Ishan's new managed schema test to actually add documents and assert query responses in different scenarios. Just to recap, following is the behavior of "fl" with doc value fields with the latest patch: # All doc value fields in schema version 1.6 are useDocValuesAsStored=true by default # fl=`*` will return all stored=true and useDocValuesAsStored=true fields # fl=`*,a1` will return all stored=true and useDocValuesAsStored=true fields. If the 'a1' field is useDocValuesAsStored=false then it will *NOT* be returned even though it was explicitly asked for. # fl=`a*` will match the specified pattern against only stored=true and useDocValuesAsStored=true fields. # fl=`a1` will return a1 as long as it is stored=true or docValues=true (i.e. useDocValuesAsStored does not matter if a field is explicitly requested) > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220.patch Updating the patch: # SolrIndexSearcher's getNonStoredDVs() method now returns unmodifiable sets of field names # Added another method in ReturnFields and SolrReturnFields which returns the additionally requested field names, irrespective of a wantsAll ({{\*}}) being present in the fl=. # Now fl=*,a3 case also returns a3, irrespective of whether a3 has useDocValuesAsStored. (This change was made in DocsStreamer's constructor, under: {code} // add non-stored DV fields that may have been requested if (rctx.getReturnFields().wantsAllFields()) { {code} > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220.patch bq. Ishan, can you please add a test which uses the schema API to add/modify a field with the useDocValuesAsStored parameter set to true (default), explicitly false and then true again? Added two extra tests: # TestUseDocValuesAsStored.testManagedSchema() for testing with managed schema. # TestUseDocValuesAsStored2 suite for testing with Schema API. Also, renamed the test methods in TestUseDocValuesAsStored to shorter/simpler names. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-8220: Attachment: SOLR-8220.patch With the right patch this time. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-8220: Attachment: SOLR-8220.patch Changes: # DocStreamer calculates dv fields to return in the constructor for every scenario instead of doing conditional logic and set intersection in next(). Also uses HashSet instead of Treeset (we don't care about order) # Renamed nonStoredDVs in SolrIndexSearcher to allNonStoredDVs # Added simple javadocs describing allNonStoredDVs and nonStoredDVsUsedAsStored # Reformatted new code according to the project code style # Fixed formatting in schema-nonstored-docvalues.xml which failed precommit (tabs instead of spaces) # Added a basic sanity test in TestUseDocValuesAsStored.testNonStoredDocValueFieldsOnEmptyIndex to assert that no exceptions are thrown on unknown fields with various fl combinations. Ishan, can you please add a test which uses the schema API to add/modify a field with the useDocValuesAsStored parameter set to true (default), explicitly false and then true again? > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-8220: Attachment: SOLR-8220.patch # Added a null check in DocStreamer so that we do not try to decorate dv fields if none are required # Deletes all docs in testNonStoredDocValueFieldsOnEmptyIndex before running any test > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220.patch Added tests for the previous two scenarios. @Erick, thanks for the catch, the values.getValueCount() wasn't working indeed, and only after adding the test for it could I catch it. Shalin, Erick, please review this latest patch. Thanks. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220.patch Thanks for your review, Shalin. I've updated the patch to address your suggestions. bq. The SolrIndexSearcher.decorateDocValueFields method has a honourUseDVsAsStoredFlag which is always true. We can remove it? bq.Same for SolrIndexSearcher.getNonStoredDocValuesFieldNames? Refactored the decorateDocValues() a bit to not send in wantsAllFields flag to the method and to handle it at the DocsStreamer itself. Hence, now, the decorateDocValues() method takes in only the field names it needs to do anything about; the filtering for non-stored dvs is taken care of at DocsStreamer.next() itself. Since, for the {{fl=\*}} case, we need all non-stored DVs that have {{useDocValuesAsStored}}=true, but for the general filtering case of {{fl=dv1,dv2}} we need to filter using all non-stored DVs (irrespective of the useDocValuesAsStored flag), I've retained this true/false logic in the getNonStoredDocValuesFieldNames() method. Renamed that method, however, to call it {{getNonStoredDVs(boolean onlyUseDocValuesAsStored)}} and added a clear javadoc to this effect. bq.The wantsAllFields flag added to SolrIndexSearcher.doc doesn't seem necessary. I guess it was added because the patch adds non stored doc values fields to the 'fnames' but if we can separate out stored fnames from the non-stored doc values to be returned then we can remove this param from both SolrIndexSearcher.doc and SolrIndexSearcher.getNonStoredDocValuesFieldNames I think the original motivation was to deal with cases {{fl=\*,nonstoredDv1}}. Here, the idea initially was that {{\*}} returns all stored fields, and nonstoredDv1 is added to it. But now, since {{\*}} takes care of all stored and non-stored dvs, this logic isn't needed. So, this wantsAllFields flag was a left over from a previous patch which I've now removed. bq.The pattern matching in the DocStreamer constructor makes a bit nervous. Where is the pattern matching done for current stored fields? Keith can weigh in on this better. However, I had a look, and found that responseWriters (e.g. JSONResponseWriter) get the whole SolrDocument at the {{writeSolrDocument()}} method, from where it does the following call to drop fields it doesn't need: {code} for (String fname : doc.getFieldNames()) { if (returnFields!= null && !returnFields.wantsField(fname)) { continue; } {code} This wantsField() call uses wildcard handling. So, reviewing this information, it seems like our handling of this at the DocsStreamer is fine here. It doesn't look costly to me, since it is performed only when fl has a pattern, and that pattern is checked against only non-stored DVs. Do you think there's something better that can be done which I'm missing? bq.The conditional logic in SolrIndexSearcher.decorateDocValueFields for multi-valued fields is too complicated! Can we please simplify this? Made it simpler. :-) > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField"
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220.patch Updated the patch to have the useDocValuesAsStored attribute on a per field basis (instead of being at schema's top level). > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220-5x.patch It seems the failures on trunk were unrelated to this patch, and possibly related to the Jetty upgrade to 9.3. To verify this, I ported the patch to branch5x (attached the patch) and it passes all tests. TODO: Make sure managed schema can handle the newly added useDocValuesAsStored field in the schema (test for persistence across edits to managed schema). Apart from the TODO item, can [~k317h] and [~shalinmangar] please review the patch (the trunk one)? > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220.patch Updated the patch. bq. TODO: Make sure managed schema can handle the newly added useDocValuesAsStored parameter in the schema (test for persistence across edits to managed schema). Added support for writing back (persisting) of this flag when used with managed schema. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220.patch Updating the patch to bump the version up to 1.6. # Renamed Keith's TestStoredDocValues to TestNonStoredDocValues. Renamed the corresponding schema file too. # Adds useDocValuesAsStored parameter (true/false) to schema file. It defaults to true >= 1.6 version of schema. # Changing all 1.5 version schema files (test and examples) to 1.6. This will help catch bugs/unexpected behaviour. The patch is failing a few tests with very poor error reporting. Here's a reproducible failure after applying this patch: Test suite: TestManagedSchemaDynamicFieldResource, Seed: -Dtests.seed=C0DE559FF2A0799 Looking into the failure. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220.patch Looks good, Keith. I was wondering why you chose to name the test file and the schema file as "TestStoredDocValues" and "schema-stored-docvalues.xml", whereas we are dealing with only non-stored docvalues in this issue? I've updated the patch to now check for the schema version to be >=1.6. Please review, and feel free to refactor further. We'll have to add a test (using a schema version < 1.6) to ensure backcompat behaviour is not broken. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220.patch bq. Removed leaked in SolrBaseDocument references The intention of using SolrDocumentBase, instead of SolrDocument, was so that SOLR-8276 can use the same decorate method on a SolrInputDocument. I've re-added the SolrDocumentBase reference, which is implemented in SOLR-8339. Please apply SOLR-8339 first, and then apply/work with this patch. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220.patch bq. have some updated tests in separate patch which I haven't uploaded, I need to clean it up a bit and can submit it tonight. I've fixed the issue in this updated patch, thanks to Yonik's suggestion. Keith, looking forward to the tests. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Laban updated SOLR-8220: -- Attachment: SOLR-8220.patch - Removed leaked in {{SolrBaseDocument}} references - Fixed support for EnumField (single and multi valued) - Fixed support for {{*_i}} type globs - Removed tests from {{BasicFuntionalityTests}} and moved them to a new test class along with its own test schema. - Added more robust testing including multivalued field testing > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Laban updated SOLR-8220: -- Attachment: SOLR-8220.patch Crushed one more bug with multivalued fields being added on docs without that value and a test for it > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Laban updated SOLR-8220: -- Attachment: SOLR-8220.patch One more patch.. It looks like at some point the logic for {{onlyPseudoFields}} was modified to something that may not work, so reverting that line back to the original. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220.patch Updating the patch. # Only tackles non-stored docvalues. # Depends on the refactoring performed at SOLR-8339. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220.patch Updating the patch with the following changes: # Since this decorate method would be used from the RealTimeGetComponent as well, and there it will have to decorate a SolrInputDocument, I've changed the method to handle both a SolrDocument and SolrInputDocument, depending on what is sent in. # The BasicFunctionalityTest was failing for me since it depended on a field "test_s_dv", which didn't exist in the schema. I've added that to the schema.xml. # Added a javadoc for the decorate method. Keith, please review. If you think these changes make sense, then I'll base SOLR-8276 on this one. As you mentioned, the schema version check is still a TODO. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Laban updated SOLR-8220: -- Attachment: SOLR-8220.patch I changed {{decorateDocValueFields}} to operate based on a {{SolrDocument}} instead of a lucene {{StoredDocument}} and no longer relies on createField (and it now works on non-stored/non-indexed fields). Also includes a test for the non-stored/non-indexed field type. In this patch I also removed code related to {{\*\*}} in favor of {{*}} for docValue related fields. Still TODO is to make this behavior only apply to a new schema version. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220.patch Updated the patch with a minor one line fix in the multi-valued string field case: {{doc.add(schemaField.getType().createField(schemaField, values.lookupOrd(i).utf8ToString(), 1f));}} Fyi, since I had already copied over the multivalued fields support from SOLR-8276 patch to this patch earlier, I've made SOLR-8276 depend on this issue. So, if we fix this issue now, we'll be able to fix (a) search results with non stored docvalues, (b) RTG of documents containing non stored docvalues, (c) atomic updates of documents containing non stored docvalues (for updates to both regular fields as well as non stored docvalues). I will make SOLR-5944 depend on this now. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Laban updated SOLR-8220: -- Attachment: SOLR-8220.patch Added a patch based on [~ichattopadhyaya] latest patch. needs to be perf tested. Theoretical optimization, will skip reading from stored fields if all the requested fields are available in docValues. (changes mostly to DocStreamer) Caveats being: - Cannot optimize if any fields are multi valued. - Cannot optimize for * queries. - Does not cache the document (slower in the long run?) -- How can we cache? using doc.getField, perhaps? or LazyDocument? > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Laban updated SOLR-8220: -- Attachment: SOLR-8220.patch reformatted patch to be svn style and cleaned up code from last update > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220-ishan.patch Updated patch. This has: * Now uses {{**}} for returning all non-stored docvalues fields. * Using docvalues API directly for single valued fields (instead of valuesource api) * Few cleanups here and there. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220-ishan.patch > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220.patch Updated your patch to trunk (svn) with no changes to your logic. I'll review it soon. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220-ishan.patch Here's an alternate patch (SOLR-8220-ishan) for this functionality. Uses the same hooks that you've created, i.e. decorateDocValueFields(), and uses your unit tests. * Changed the handling of single valued dv fields to have less code, * Adds support for multivalued dv fields support, * In your patch, {{fl=*,mydvfield}} wasn't working. Fixed this. * Added a '~' glob, similar to '*'. {{fl=~}} here means: return all conventional stored fields and all non stored docvalues. Some cleanup / refactoring and a few tests might be needed. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-8220: --- Attachment: SOLR-8220-ishan.patch Updated last patch to use FieldInfos, since searcher.getSchema().getFields() doesn't have the dynamic fields. Modified the test to randomly test the {{fl}} parameter with ~ or the dv field name itself. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Laban updated SOLR-8220: -- Attachment: SOLR-8220.patch adding an initial attempt at this patch. What does everyone think about taking an approach like this? This patch will decorate a document with docValue values after the stored values have been read. For now it skips multivalued fields and only reads from docValues if FL is specified and the field is not stored. Some problems I noticed with this approach are: 1) LazyDocument doesn't support a notion of loading from docValues, without mucking around in there I can't see a way apply the docValues before caching because of various FLs. 2) There is no metadata (that I can find) stored for each document that says whether it has an unstored docValue field, so efficiently loading docValues fields based on FL=* would be difficult. The only possibly way right now is to iterate over all schema fields looking for the viable docValue fields for each matched document. 3) More of a question: What kind of FieldType should be created when adding these docValues to the document? I have an alternate patch that attempts to preload stored values from docValues before handing over to the IndexReader. Those fields are then skipped later on. It passes tests, but it's not a very elegent approach. And also has its limitations; it only works when FL is specified and it wouldn't work on the cached hit of the document if a field is loaded with LazyDocument. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org