How to copy and extract information from a multi-line text before the tokenizer

Michael Kliewe Tue, 23 Aug 2011 03:29:44 -0700

Hello all,

I have a custom schema which has a few fields, and I would like to create a new 
field in the schema that only has one special line of another field indexed. 
Lets use this example:


field AllData (TextField) has for example this data:
Title: exampleTitle of the book
Author: Example Author
Date: 01.01.1980

Each line is separated by a line break.
I now need a new field named OnlyAuthor which only has the Author information 
in it, so I can search and facet for specific Author information. I added this 
to my schema:

<fieldType name="authorField" class="solr.TextField">
  <analyzer type="index">
    <charFilter class="solr.PatternReplaceCharFilterFactory" 
pattern="^.*\nAuthor: (.*?)\n.*$" replacement="$1" replace="all" />
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.TrimFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <charFilter class="solr.PatternReplaceCharFilterFactory" 
pattern="^.*\nAuthor: (.*?)\n.*$" replacement="$1" replace="all" />
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.TrimFilterFactory"/>
  </analyzer>
</fieldType>

<field name="OnlyAuthor" type="authorField" indexed="true" stored="true" />

<copyField source="AllData" dest="OnlyAuthor"/>


But this is not working, the new AuthorOnly field contains all data, because 
the regex didn't match. But I need "Example Author" in that field (I think) to 
be able to search and facet only author information.

I don't know where the problem is, perhaps someone of you can give me a hint, 
or a totally different method to achieve my goal to extract a single line from 
this multi-line-text.

Kind regards and thanks for any help
Michael

How to copy and extract information from a multi-line text before the tokenizer

Reply via email to