Well I don't remember the specific name of it, I just wrote that because it sounded close :)
There is a list of them here though: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters -Reece On Fri, Feb 22, 2008 at 2:10 PM, Paul deGrandis <[EMAIL PROTECTED]> wrote: > Thanks! > > Does Solr include an HTMLTokenFilterFactory? > > Paul > > > > On 2/22/08, Reece <[EMAIL PROTECTED]> wrote: > > I did this as well, but found problems when searching (tags in between > > words caused searching nightmares). I recommend stripping out all the > > tags using the HTMLTokenFilterFactory or your own regex when indexing, > > and storing the actual HTML in an actual database. > > > > If you really want to store the HTML though, you can use cdata in the > > xml like this: > > > > <?xml version="1.0" encoding="UTF-8" ?> > > <add> > > <doc> > > <field name="id">123</field> > > <field name="title"><![CDATA[yourbightmlstring]]></field> > > </doc> > > </add> > > > > The CDATA thing will basically say anything between it's tag's will be > > rendered as the field value. It only breaks if your html string has a > > "]]>" in it to end the data tag. > > > > > > -Reece > > > > > > > > > > On Fri, Feb 22, 2008 at 12:19 PM, Paul deGrandis > > <[EMAIL PROTECTED]> wrote: > > > Hi all, > > > > > > I'm working on a solr app that pulls HTML from an embedded JavaScript > > > WYSIWYG editor, and I need to index on the content, but store and > > > reproduce the HTML. The problem I have is when I try to add and > > > commit, the HTML gets interpreted as XML. Is the way to do this > > > properly to create an HTMLTokenFilterFactory? And if so, is there a > > > collection of plugins (like filters and such) that someone can point > > > me to? > > > > > > Regards, > > > Paul > > > > > >