Re: Multi tokenizer

2009-08-06 Thread Koji Sekiguchi
Chris Hostetter wrote: : I need to tokenize my field on whitespaces, html, punctuation, apostrophe : but if I use HTMLStripStandardTokenizerFactory it strips only html : but no apostrophes you might consider using one of the HTML Tokenizers, and then use a PatternReplaceFilterFilter ...

Re: Multi tokenizer

2008-12-15 Thread Antonio Zippo
>>: I need to tokenize my field on whitespaces, html, punctuation, apostrophe >> >>: but if I use HTMLStripStandardTokenizerFactory it strips only html >>: but no apostrophes > you might consider using one of the HTML Tokenizers, and then use a > PatternReplaceFilterFilter ... or if you kno

Re: Multi tokenizer

2008-12-14 Thread Chris Hostetter
: I need to tokenize my field on whitespaces, html, punctuation, apostrophe : but if I use HTMLStripStandardTokenizerFactory it strips only html : but no apostrophes you might consider using one of the HTML Tokenizers, and then use a PatternReplaceFilterFilter ... or if you know java write

Multi tokenizer

2008-12-10 Thread Antonio Zippo
Hi all, I need to tokenize my field on whitespaces, html, punctuation, apostrophe but if I use HTMLStripStandardTokenizerFactory it strips only html but no apostrophes If I use PatternTokenizerFactory i don't know if i can create a pattern to tokenizer all of theese characters...(hmtl, apo