removal of deprecated HtmlStrip*Tokenizer factories

2010-03-15 Thread Robert Muir
Hello, Is there any concern with removing the deprecated HtmlStrip*Tokenizer factories? These can be done with CharFilter instead and they have some problems with lucene's trunk. If no one objects, I'd like to remove these in the branch. Otherwise, Uwe tells me there is some way to make them

Re: removal of deprecated HtmlStrip*Tokenizer factories

2010-03-15 Thread Paul Borgermans
On Mon, Mar 15, 2010 at 9:39 PM, Robert Muir rcm...@gmail.com wrote: Hello, Is there any concern with removing the deprecated HtmlStrip*Tokenizer factories? Maybe a communication issue, you need to read the source code or javadocs to know it is deprecated These can be done with CharFilter

Re: removal of deprecated HtmlStrip*Tokenizer factories

2010-03-15 Thread Mark Miller
On 03/15/2010 05:24 PM, Paul Borgermans wrote: On Mon, Mar 15, 2010 at 9:39 PM, Robert Muirrcm...@gmail.com wrote: Hello, Is there any concern with removing the deprecated HtmlStrip*Tokenizer factories? Maybe a communication issue, you need to read the source code or javadocs to

Re: removal of deprecated HtmlStrip*Tokenizer factories

2010-03-15 Thread Shalin Shekhar Mangar
On Tue, Mar 16, 2010 at 2:09 AM, Robert Muir rcm...@gmail.com wrote: Hello, Is there any concern with removing the deprecated HtmlStrip*Tokenizer factories? These can be done with CharFilter instead and they have some problems with lucene's trunk. If no one objects, I'd like to remove

Re: removal of deprecated HtmlStrip*Tokenizer factories

2010-03-15 Thread Robert Muir
On Mon, Mar 15, 2010 at 5:30 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Is there a way we can fix LUCENE-2098 too? I think this is good to fix, yet removing the deprecations is unrelated to this slowdown. The deprecated functionality (HtmlStrip*Tokenizer) is implemented in terms

Re: removal of deprecated HtmlStrip*Tokenizer factories

2010-03-15 Thread Chris Hostetter
: Is there any concern with removing the deprecated HtmlStrip*Tokenizer factories? I'm not adverse to gutting *internal* deprecated classes on just about any release (requiring plugin writers to deal with the deprecation) but if it's possible to keep things working for users with no java

Re: removal of deprecated HtmlStrip*Tokenizer factories

2010-03-15 Thread Robert Muir
On Mon, Mar 15, 2010 at 7:18 PM, Chris Hostetter hossman_luc...@fucit.org wrote: In the case of these factories: can't we eliminate the Html*Tokenizers themselves, but make the *factories* return the neccessary *Tokenizer wrapped in an HtmlStripCharFilter ? They would not be able to re-use if

Re: removal of deprecated HtmlStrip*Tokenizer factories

2010-03-15 Thread Chris Hostetter
: They would not be able to re-use if you did this, because when you : call reset(Reader) on them, the Reader would not be wrapped. Hmmm... I'm not sure i understand how any declared CharFilter/TOkenizer combo will be able to deal with this any better, but i'll take your word for it. Kill it

Re: removal of deprecated HtmlStrip*Tokenizer factories

2010-03-15 Thread Robert Muir
On Mon, Mar 15, 2010 at 7:25 PM, Chris Hostetter hossman_luc...@fucit.org wrote: Hmmm... I'm not sure i understand how any declared CharFilter/TOkenizer combo will be able to deal with this any better, but i'll take your word for it. you can see this behavior in SolrAnalyzer's