Re: HTMLStripCharFilterFactory not working in Solr4?

2012-01-25 Thread Mike Hugo
gt; Steve > > > -Original Message- > > From: Mike Hugo [mailto:m...@piragua.com] > > Sent: Tuesday, January 24, 2012 3:56 PM > > To: solr-user@lucene.apache.org > > Subject: Re: HTMLStripCharFilterFactory not working in Solr4? > > > > Thanks for the

RE: HTMLStripCharFilterFactory not working in Solr4?

2012-01-25 Thread Steven A Rowe
lters have been working there all along.) Steve > -Original Message- > From: Mike Hugo [mailto:m...@piragua.com] > Sent: Tuesday, January 24, 2012 3:56 PM > To: solr-user@lucene.apache.org > Subject: Re: HTMLStripCharFilterFactory not working in Solr4? > > Thanks

Re: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Mike Hugo
Thanks for the responses everyone. Steve, the test method you provided also works for me. However, when I try a more end to end test with the HTMLStripCharFilterFactory configured for a field I am still having the same problem. I attached a failing unit test and configuration to the following is

Re: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Yonik Seeley
Oops, I didn't read carefully enough to see that you wanted those constructs entirely stripped out. Given that you're seeing numbers indexed, this strongly indicates an escaping bug in the SolrJ client that must have been introduced at some point. I'll see if I can reproduce it in a unit test. -

RE: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Michael Ryan
Try putting the HTMLStripCharFilterFactory before the StandardTokenizerFactory instead of after it. I vaguely recall being burned by something like this before. -Michael

RE: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Steven A Rowe
Hi Mike, When I add the following test to TestHTMLStripCharFilterFactory.java on Solr trunk, it passes: public void testNumericCharacterEntities() throws Exception { final String text = "Bose® ™"; // |Bose® ™| HTMLStripCharFilterFactory htmlStripFactory = new HTMLStripCharFilterFactory()

Re: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Mike Hugo
Thanks for the response Yonik, Interestingly enough, changing to to the LegacyHTMLStripCharFilterFactory does NOT solve the problem - in fact I get the same result I can see that the LegacyHTMLStripCharFilterFactory is being applied at startup: Jan 24, 2012 1:25:29 PM org.apache.solr.util.plugin.

Re: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Yonik Seeley
You can use LegacyHTMLStripCharFilterFactory to get the previous behavior. See https://issues.apache.org/jira/browse/LUCENE-3690 for more details. -Yonik http://www.lucidimagination.com On Tue, Jan 24, 2012 at 1:34 PM, Mike Hugo wrote: > We recently updated to the latest build of Solr4 and eve