[jira] [Commented] (SOLR-8109) Option to Copy just the first value from a multivalued field

2015-10-08 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949669#comment-14949669
 ] 

Gus Heck commented on SOLR-8109:


[~hossman] Did you see SOLR-8113? Any thoughts on this, that or the difference 
between them?

> Option to Copy just the first value from a multivalued field
> 
>
> Key: SOLR-8109
> URL: https://issues.apache.org/jira/browse/SOLR-8109
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8109.patch
>
>
> Provide a firstValueOnly boolean option for copyField



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8109) Option to Copy just the first value from a multivalued field

2015-09-30 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937734#comment-14937734
 ] 

Hoss Man commented on SOLR-8109:


Isn't this situation already pretty sell resolved more robustly by 
CloneFieldUpdateProcessorFactory + FirstFieldValueUpdateProcessorFactory 
instead of using copyField? 

Particularly because it can be configured on a per processor chain basis, so 
you can pick "first" in some cases, or "max" in others (depending on 
wh/what/where the docs are coming from)

https://lucene.apache.org/solr/5_3_0/solr-core/org/apache/solr/update/processor/FirstFieldValueUpdateProcessorFactory.html
https://lucene.apache.org/solr/5_3_0/solr-core/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html

In general i think update processors are a much better way to address problems 
like this moving forward, rather then adding more features at the "schema" 
level (like new copyField options) ... particularly when you consider how low 
level copyField operations are, and how they (by definition) *must* happen 
after both atomic update operations about any distributed processing for cloud 
setups


> Option to Copy just the first value from a multivalued field
> 
>
> Key: SOLR-8109
> URL: https://issues.apache.org/jira/browse/SOLR-8109
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8109.patch
>
>
> Provide a firstValueOnly boolean option for copyField



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8109) Option to Copy just the first value from a multivalued field

2015-09-30 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938168#comment-14938168
 ] 

Gus Heck commented on SOLR-8109:


Does FirstFieldValueUpdateProcessorFactory accept wildcards? The case that my 
code is designed to handle looks like this:

{code}
  
  
  
{code}

Even if this can be done with processors that accept wild cards, the 
configuration of the update processor in solrConfig.xml has to coordinate with 
schema.xml (the wildcard patterns have to match). Granted the copyfield 
directive has to coordinate with the dynamic field too, but I tend to prefer 
having things that have to coordinate next to each other in the same file. 

As for the weight of the code that "must happen after atomic update 
operations"... If you look at my code, you'll notice that the only part that 
gets run on a per-document basis is:

{code:title=DocumentBuilder.java}
  if (cf.isFirstValueOnly() && destHasValues) {
continue;
  }
{code}

 Everything else is parsing the attribute and just passing the boolean value 
around (or unit tests). 

> Option to Copy just the first value from a multivalued field
> 
>
> Key: SOLR-8109
> URL: https://issues.apache.org/jira/browse/SOLR-8109
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8109.patch
>
>
> Provide a firstValueOnly boolean option for copyField



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8109) Option to Copy just the first value from a multivalued field

2015-09-30 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938312#comment-14938312
 ] 

Gus Heck commented on SOLR-8109:


So what I think you are saying is that copyField will soon be deprecated?

> Option to Copy just the first value from a multivalued field
> 
>
> Key: SOLR-8109
> URL: https://issues.apache.org/jira/browse/SOLR-8109
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8109.patch
>
>
> Provide a firstValueOnly boolean option for copyField



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8109) Option to Copy just the first value from a multivalued field

2015-09-30 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938196#comment-14938196
 ] 

Hoss Man commented on SOLR-8109:


bq. Does FirstFieldValueUpdateProcessorFactory accept wildcards?

it does, but not in the way you are asking about - all 
FirstFieldValueUpdateProcessorFactory does is "prune" a list of values down to 
the first value for each the configured fields (which can be specified as a 
regex)

the "copy" part is handled by CloneFieldUpdateProcessorFactory, and in looking 
at it's docs again, i see there is still one disconnect between it's 
functionality and the older style copyField: wildcards in the dest.  we should 
definitely add equivalent functionality to cover that case.

bq. As for the weight of the code that "must happen after atomic update 
operations" ...

I didn't say anything about the "weight" of your code ... my point was that, by 
design, copyFields (and any new features we might add to copyFields) happen 
after the full processor chain  -- the user doesn't have any choice about it.  
I nparticularly this means any new features we might add to copyFields must 
also happen after atomic updates and distributed/cloud updates, which makes the 
utility of any new features we add to copyField extremely limited, since 
copyField already doesn't play nicely with those other features (see the blue 
note box on 
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents, 
SOLR-3743, etc...).

ergo: i think it's a bad idea to keep trying to add features to copyField.  All 
the reasons mentioned above (and finer control over the _order_ that various 
features may be applied via the pipeline configuration) is the whole reason why 
CloneFieldUpdateProcessorFactory and the various 
FieldMutatingUpdateProcessorFactory were added in the first place.  I think 
it's important to move *away* from encouraging copyField usage, not towards it.

> Option to Copy just the first value from a multivalued field
> 
>
> Key: SOLR-8109
> URL: https://issues.apache.org/jira/browse/SOLR-8109
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8109.patch
>
>
> Provide a firstValueOnly boolean option for copyField



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8109) Option to Copy just the first value from a multivalued field

2015-09-30 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937711#comment-14937711
 ] 

Gus Heck commented on SOLR-8109:


Sometimes when using dynamic fields, the majority of the dynamic fields 
generated by the input data are single valued but a few annoying fields happen 
to be multivalued. This forces the dynamic field to be multivalued, which 
precludes sorting on all fields originating from the dynamic field. At present 
this can only be handled by having an ingestion pipeline (or other 
preprocessing code) anticipate the dynamic configuration and add a second field 
for sorting. This creates a need for the ingestion to model (and duplicate) the 
dynamic field configuration. Although not always appropriate, the most basic 
thing such processing can do is pick the first value for the "sort" field, and 
ignore the rest. The patch I am attaching adds a firstValueOnly attribute to 
copyField in the solr schema, which provides this first basic workaround 
without an ingestion pipeline.

> Option to Copy just the first value from a multivalued field
> 
>
> Key: SOLR-8109
> URL: https://issues.apache.org/jira/browse/SOLR-8109
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 5.3
>Reporter: Gus Heck
>
> Provide a firstValueOnly boolean option for copyField



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org