If you use the stripping filter, the stored text is the original HTML.
You can then highlight text inside the HTML. If you use the stripping
DIH transformer, you will store the stripped text. It will be somewhat
smaller. You can highlight the stripped text blobs, but you can't
highlight the original HTML.

On Wed, May 26, 2010 at 7:26 PM, Blargy <zman...@hotmail.com> wrote:
>
> We have user entered item listings that have a title and contain html in
> their descriptions. I would like to index the full descriptions (minus the
> html which im stripping out via the DIH HTMLStripTransformer) so I can
> search across that it as well as perform highlighting/excerpting.
>
> Can someone recommend a good fieldType and field for this need. The
> following is what I've been using up to this point for both fields (title
> and description).
>
>   <fieldType name="text" class="solr.TextField" omitNorms="false">
>      <analyzer>
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.WordDelimiterFilterFactory"
>                generateWordParts="1"
>                generateNumberParts="1"
>                catenateWords="1"
>                catenateNumber="1"
>                catenateAll="1"
>                splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter
> class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory"/>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
> Should I be using the DIH HTMLStripTransformer or HTMLStripCharFilterFactory
> to remove the html? Which one is faster?
>
> Any suggestions on my fieldType?
>
> Thanks a lot!
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Need-guidance-on-schema-type-tp846923p846923.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to