[ 
https://issues.apache.org/jira/browse/SOLR-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-314:
-------------------------------

    Attachment: SOLR-314-StoreAnalysis.patch

This adds the StoreAnalysisProcessor to the default chain.  It is skipped 
unless the request includes a parameter "store.analysis=true"

It chooses the field type based on a field param: 
f.fieldname.analyze=FieldTypeName

I'm not totally happy with the field names.  suggestions?

- - - - -

The one big issue I'm not sure how to deal with is stitching a multi-valued 
reqeust into a single TokenStream.

Consider the input 
<add> <doc>
 <field name="feature">aaa bbb ccc</field>
 <field name="feature">bbb ccc ddd</field>
</doc></add> 

As is, If the FieldType has a 'RemoveDuplicates' filter, that won't remove the 
duplicates between the fields because each input field gets its own Reader

Any ideas for a way around this?

Can I extract the Tokenizer explicitly?



> Store Analyzed token text from an incoming SolrInputDocument
> ------------------------------------------------------------
>
>                 Key: SOLR-314
>                 URL: https://issues.apache.org/jira/browse/SOLR-314
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Ryan McKinley
>         Attachments: SOLR-314-StoreAnalysis.patch
>
>
> This is an UpdateRequestProcessor that runs incoming fields through a Field 
> Analyzer and stores the output of each token as a field value.
> For Example.  If you have a field type defined:
>   <fieldType name="text_ws" class="solr.TextField" >
>       <analyzer>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>       </analyzer>
>   </fieldType>
> And send a request:
> /update?store.analysis=true&f.feature.analysis=text_ws
> <add> <doc>
>  <field name="feature">aaa bbb ccc</field>
> </doc></add>
> The returned document will look like:
> <doc>
>  <arr name="feature">
>   <str>aaa</str>
>   <str>bbb</str>
>   <str>ccc</str>
>  </arr>
> </doc>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to