Re: Indexing content, storing html

Reece Fri, 22 Feb 2008 11:32:46 -0800

Well I don't remember the specific name of it, I just wrote that
because it sounded close :)


There is a list of them here though:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

-Reece



On Fri, Feb 22, 2008 at 2:10 PM, Paul deGrandis
<[EMAIL PROTECTED]> wrote:
> Thanks!
>
>  Does Solr include an HTMLTokenFilterFactory?
>
>  Paul
>
>
>
>  On 2/22/08, Reece <[EMAIL PROTECTED]> wrote:
>  > I did this as well, but found problems when searching (tags in between
>  >  words caused searching nightmares).  I recommend stripping out all the
>  >  tags using the HTMLTokenFilterFactory or your own regex when indexing,
>  >  and storing the actual HTML in an actual database.
>  >
>  >  If you really want to store the HTML though, you can use cdata in the
>  >  xml like this:
>  >
>  >  <?xml version="1.0" encoding="UTF-8" ?>
>  >         <add>
>  >             <doc>
>  >                 <field name="id">123</field>
>  >                 <field name="title"><![CDATA[yourbightmlstring]]></field>
>  >             </doc>
>  >       </add>
>  >
>  >  The CDATA thing will basically say anything between it's tag's will be
>  >  rendered as the field value.  It only breaks if your html string has a
>  >  "]]>" in it to end the data tag.
>  >
>  >
>  >  -Reece
>  >
>  >
>  >
>  >
>  >  On Fri, Feb 22, 2008 at 12:19 PM, Paul deGrandis
>  >  <[EMAIL PROTECTED]> wrote:
>  >  > Hi all,
>  >  >
>  >  >  I'm working on a solr app that pulls HTML from an embedded JavaScript
>  >  >  WYSIWYG editor, and I need to index on the content, but store and
>  >  >  reproduce the HTML.  The problem I have is when I try to add and
>  >  >  commit, the HTML gets interpreted as XML.  Is the way to do this
>  >  >  properly to create an HTMLTokenFilterFactory?  And if so, is there a
>  >  >  collection of plugins (like filters and such) that someone can point
>  >  >  me to?
>  >  >
>  >  >  Regards,
>  >  >  Paul
>  >  >
>  >
>

Re: Indexing content, storing html

Reply via email to