And if you want to have the “kept” words stored, consider the trick used in example/files for url/e-mail extraction mentioned here (note the related fix in the patch in the JIRA issue mentioned):
https://lucidworks.com/blog/2016/01/27/example_files/ <https://lucidworks.com/blog/2016/01/27/example_files/> > On Feb 1, 2016, at 3:23 PM, John Blythe <j...@curvolabs.com> wrote: > > i immediately realized after sending that i'd had stored="true" in the > field's config and that it was storing the original data, not the processed > data. silly me, thanks anyway! > > -- > *John Blythe* > Product Manager & Lead Developer > > 251.605.3071 | j...@curvolabs.com > www.curvolabs.com > > 58 Adams Ave > Evansville, IN 47713 > > On Mon, Feb 1, 2016 at 3:18 PM, John Blythe <j...@curvolabs.com> wrote: > >> hi all, >> >> i'm having trouble with what would seem to be a pretty straightforward >> filter. >> >> i'm trying to 'tag' documents based off of a list of relevant words that a >> description field may contain. if the data contains any of the words then >> this field is populated with it and acts as a quick reference for >> relevant/bucketed documents. >> >> i receive no errors when reloading the core or indexing the data. each >> document, however, has its description listed in this tag field *even if >> none of the targeted words are in it.* >> >> here's the analyzer, tokenizer, and filter: >> >> <analyzer> >> <tokenizer class="solr.StandardTokenizerFactory" /> >> <filter class="solr.KeepWordFilterFactory" words="tags.txt" >> ignoreCase="true"/> >> </analyzer> >> >> to add to the confusion, when i run test data through both of the >> appropriate FieldName/FieldType in the Analysis UI I get the expected >> results: the non-targeted words are left out of processing. >> >> thanks for any info/help- >>