And if you want to have the “kept” words stored, consider the trick used in 
example/files for url/e-mail extraction mentioned here (note the related fix in 
the patch in the JIRA issue mentioned): 

   https://lucidworks.com/blog/2016/01/27/example_files/ 
<https://lucidworks.com/blog/2016/01/27/example_files/>




> On Feb 1, 2016, at 3:23 PM, John Blythe <j...@curvolabs.com> wrote:
> 
> i immediately realized after sending that i'd had stored="true" in the
> field's config and that it was storing the original data, not the processed
> data. silly me, thanks anyway!
> 
> -- 
> *John Blythe*
> Product Manager & Lead Developer
> 
> 251.605.3071 | j...@curvolabs.com
> www.curvolabs.com
> 
> 58 Adams Ave
> Evansville, IN 47713
> 
> On Mon, Feb 1, 2016 at 3:18 PM, John Blythe <j...@curvolabs.com> wrote:
> 
>> hi all,
>> 
>> i'm having trouble with what would seem to be a pretty straightforward
>> filter.
>> 
>> i'm trying to 'tag' documents based off of a list of relevant words that a
>> description field may contain. if the data contains any of the words then
>> this field is populated with it and acts as a quick reference for
>> relevant/bucketed documents.
>> 
>> i receive no errors when reloading the core or indexing the data. each
>> document, however, has its description listed in this tag field *even if
>> none of the targeted words are in it.*
>> 
>> here's the analyzer, tokenizer, and filter:
>> 
>> <analyzer>
>>        <tokenizer class="solr.StandardTokenizerFactory" />
>>        <filter class="solr.KeepWordFilterFactory" words="tags.txt"
>> ignoreCase="true"/>
>> </analyzer>
>> 
>> to add to the confusion, when i run test data through both of the
>> appropriate FieldName/FieldType in the Analysis UI I get the expected
>> results: the non-targeted words are left out of processing.
>> 
>> thanks for any info/help-
>> 

Reply via email to