Re: How does HTMLStripWhitespaceTokenizerFactory work?

2007-06-12 Thread Thierry Collogne
Ok. Thanks for the clarification. We will do the stripping before the indexing. On 11/06/07, Chris Hostetter [EMAIL PROTECTED] wrote: : Ok. Is it possible to get back the content without the html tags? Solr never does anything to modify the stored value of a field, so you'd really need to

Re: How does HTMLStripWhitespaceTokenizerFactory work?

2007-06-11 Thread Thierry Collogne
Ok. Is it possible to get back the content without the html tags? On 08/06/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 6/8/07, Thierry Collogne [EMAIL PROTECTED] wrote: I am trying to use the solr.HTMLStripWhitespaceTokenizerFactory analyzer with no luck. [...] Is this normal? Shouldn't

Re: How does HTMLStripWhitespaceTokenizerFactory work?

2007-06-11 Thread Mike Klaas
On 11-Jun-07, at 3:54 AM, Thierry Collogne wrote: Ok. Is it possible to get back the content without the html tags? Well, it isn't stored anywhere in Solr. It's best to think of lucene/ solr as two systems: the indexer applies a tokenization transformation to the data and creates an

Re: How does HTMLStripWhitespaceTokenizerFactory work?

2007-06-11 Thread Chris Hostetter
: Ok. Is it possible to get back the content without the html tags? Solr never does anything to modify the stored value of a field, so you'd really need to send Solr the value after strpping the HTML to get this to work. Internally, the HTMLStripWhitespaceTokenizerFactory does the HTML

How does HTMLStripWhitespaceTokenizerFactory work?

2007-06-08 Thread Thierry Collogne
Hello, I am trying to use the solr.HTMLStripWhitespaceTokenizerFactory analyzer with no luck. I have a field content that contains the following field name=content![CDATA[test a href=testlink/a post]]/field When I do a search I get the following result

Re: How does HTMLStripWhitespaceTokenizerFactory work?

2007-06-08 Thread Yonik Seeley
On 6/8/07, Thierry Collogne [EMAIL PROTECTED] wrote: I am trying to use the solr.HTMLStripWhitespaceTokenizerFactory analyzer with no luck. [...] Is this normal? Shouldn't the html code and the white spaces be removed from the field? For indexing purposes, yes. The stored field you get back