[jira] [Commented] (SOLR-8109) Option to Copy just the first value from a multivalued field
[ https://issues.apache.org/jira/browse/SOLR-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949669#comment-14949669 ] Gus Heck commented on SOLR-8109: [~hossman] Did you see SOLR-8113? Any thoughts on this, that or the difference between them? > Option to Copy just the first value from a multivalued field > > > Key: SOLR-8109 > URL: https://issues.apache.org/jira/browse/SOLR-8109 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Affects Versions: 5.3 >Reporter: Gus Heck > Attachments: SOLR-8109.patch > > > Provide a firstValueOnly boolean option for copyField -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8109) Option to Copy just the first value from a multivalued field
[ https://issues.apache.org/jira/browse/SOLR-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937734#comment-14937734 ] Hoss Man commented on SOLR-8109: Isn't this situation already pretty sell resolved more robustly by CloneFieldUpdateProcessorFactory + FirstFieldValueUpdateProcessorFactory instead of using copyField? Particularly because it can be configured on a per processor chain basis, so you can pick "first" in some cases, or "max" in others (depending on wh/what/where the docs are coming from) https://lucene.apache.org/solr/5_3_0/solr-core/org/apache/solr/update/processor/FirstFieldValueUpdateProcessorFactory.html https://lucene.apache.org/solr/5_3_0/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html In general i think update processors are a much better way to address problems like this moving forward, rather then adding more features at the "schema" level (like new copyField options) ... particularly when you consider how low level copyField operations are, and how they (by definition) *must* happen after both atomic update operations about any distributed processing for cloud setups > Option to Copy just the first value from a multivalued field > > > Key: SOLR-8109 > URL: https://issues.apache.org/jira/browse/SOLR-8109 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Affects Versions: 5.3 >Reporter: Gus Heck > Attachments: SOLR-8109.patch > > > Provide a firstValueOnly boolean option for copyField -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8109) Option to Copy just the first value from a multivalued field
[ https://issues.apache.org/jira/browse/SOLR-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938168#comment-14938168 ] Gus Heck commented on SOLR-8109: Does FirstFieldValueUpdateProcessorFactory accept wildcards? The case that my code is designed to handle looks like this: {code} {code} Even if this can be done with processors that accept wild cards, the configuration of the update processor in solrConfig.xml has to coordinate with schema.xml (the wildcard patterns have to match). Granted the copyfield directive has to coordinate with the dynamic field too, but I tend to prefer having things that have to coordinate next to each other in the same file. As for the weight of the code that "must happen after atomic update operations"... If you look at my code, you'll notice that the only part that gets run on a per-document basis is: {code:title=DocumentBuilder.java} if (cf.isFirstValueOnly() && destHasValues) { continue; } {code} Everything else is parsing the attribute and just passing the boolean value around (or unit tests). > Option to Copy just the first value from a multivalued field > > > Key: SOLR-8109 > URL: https://issues.apache.org/jira/browse/SOLR-8109 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Affects Versions: 5.3 >Reporter: Gus Heck > Attachments: SOLR-8109.patch > > > Provide a firstValueOnly boolean option for copyField -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8109) Option to Copy just the first value from a multivalued field
[ https://issues.apache.org/jira/browse/SOLR-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938312#comment-14938312 ] Gus Heck commented on SOLR-8109: So what I think you are saying is that copyField will soon be deprecated? > Option to Copy just the first value from a multivalued field > > > Key: SOLR-8109 > URL: https://issues.apache.org/jira/browse/SOLR-8109 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Affects Versions: 5.3 >Reporter: Gus Heck > Attachments: SOLR-8109.patch > > > Provide a firstValueOnly boolean option for copyField -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8109) Option to Copy just the first value from a multivalued field
[ https://issues.apache.org/jira/browse/SOLR-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938196#comment-14938196 ] Hoss Man commented on SOLR-8109: bq. Does FirstFieldValueUpdateProcessorFactory accept wildcards? it does, but not in the way you are asking about - all FirstFieldValueUpdateProcessorFactory does is "prune" a list of values down to the first value for each the configured fields (which can be specified as a regex) the "copy" part is handled by CloneFieldUpdateProcessorFactory, and in looking at it's docs again, i see there is still one disconnect between it's functionality and the older style copyField: wildcards in the dest. we should definitely add equivalent functionality to cover that case. bq. As for the weight of the code that "must happen after atomic update operations" ... I didn't say anything about the "weight" of your code ... my point was that, by design, copyFields (and any new features we might add to copyFields) happen after the full processor chain -- the user doesn't have any choice about it. I nparticularly this means any new features we might add to copyFields must also happen after atomic updates and distributed/cloud updates, which makes the utility of any new features we add to copyField extremely limited, since copyField already doesn't play nicely with those other features (see the blue note box on https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents, SOLR-3743, etc...). ergo: i think it's a bad idea to keep trying to add features to copyField. All the reasons mentioned above (and finer control over the _order_ that various features may be applied via the pipeline configuration) is the whole reason why CloneFieldUpdateProcessorFactory and the various FieldMutatingUpdateProcessorFactory were added in the first place. I think it's important to move *away* from encouraging copyField usage, not towards it. > Option to Copy just the first value from a multivalued field > > > Key: SOLR-8109 > URL: https://issues.apache.org/jira/browse/SOLR-8109 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Affects Versions: 5.3 >Reporter: Gus Heck > Attachments: SOLR-8109.patch > > > Provide a firstValueOnly boolean option for copyField -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8109) Option to Copy just the first value from a multivalued field
[ https://issues.apache.org/jira/browse/SOLR-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937711#comment-14937711 ] Gus Heck commented on SOLR-8109: Sometimes when using dynamic fields, the majority of the dynamic fields generated by the input data are single valued but a few annoying fields happen to be multivalued. This forces the dynamic field to be multivalued, which precludes sorting on all fields originating from the dynamic field. At present this can only be handled by having an ingestion pipeline (or other preprocessing code) anticipate the dynamic configuration and add a second field for sorting. This creates a need for the ingestion to model (and duplicate) the dynamic field configuration. Although not always appropriate, the most basic thing such processing can do is pick the first value for the "sort" field, and ignore the rest. The patch I am attaching adds a firstValueOnly attribute to copyField in the solr schema, which provides this first basic workaround without an ingestion pipeline. > Option to Copy just the first value from a multivalued field > > > Key: SOLR-8109 > URL: https://issues.apache.org/jira/browse/SOLR-8109 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Affects Versions: 5.3 >Reporter: Gus Heck > > Provide a firstValueOnly boolean option for copyField -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org