Re: how to use HTMLStripCharFilter in solrJ?

2018-07-06 Thread Arturas Mazeika
Hi Ahmet, Thanks a lot for the post, details and infos. I've started trying out all the options that you suggested. And... I must say that I am not able to reproduce my error. Which means that even the code that I posted works with flying colors. I am puzzled. Cheers, Arturas On Thu, Jul 5, 2

Re: how to use HTMLStripCharFilter in solrJ?

2018-07-06 Thread Arturas Mazeika
Hi Alex, I suppose the explanation in [0] (references) did not bring enough of light into the reasons. So I'll try to give additional and more detailed arguments. I can take the HTML text, strip it of HTML tags and index the content. I can store the original text in solr as well. Storing the inte

Re: how to use HTMLStripCharFilter in solrJ?

2018-07-05 Thread Ahmet Arslan
Hi Arturas,  Here are some things to try : 1) HTMLStripCharFilter stripper = new HTMLStripCharFilter(strReader.markSupported() ? strReader : new BufferedReader(strReader)) 2) Consider using HTML Strip update processor factory.  3) Create a custom Lucene analyzer using html strip char filter a

Re: how to use HTMLStripCharFilter in solrJ?

2018-07-05 Thread Alexandre Rafalovitch
I am confused. Why you do not just add the CharFilter definition to the field type you need? You see to be trying to do it completely on the cliwnt side? No sure. Regards, Alex On Thu, Jul 5, 2018, 2:53 AM Arturas Mazeika, wrote: > Hi Solr Folk, > > What would be the easiest way to use som

how to use HTMLStripCharFilter in solrJ?

2018-07-04 Thread Arturas Mazeika
Hi Solr Folk, What would be the easiest way to use some of the Solr and Lucene components in SolrJ? I am pretty amazed how much thought and careful engineering went into some individual components to cover the wild real world effectively. And I wonder whether one could re-use some of them in othe