Hi, I'm not sure, but I think you need to use
class org.apache.lucene.analysis.cjk.CJKAnalyzer See http://wiki.apache.org/jackrabbit/Search - parameter analyzer Can you please verify this is correct? I will then update the documentation. Regards, Thomas On Sun, Aug 9, 2009 at 4:38 PM, go canal<[email protected]> wrote: > Just tested: > the default configuration supports full CJK text search for Text, Word and > PPT file; but can not search PDF/Excel files. > > rgds, > canal > > > > > ________________________________ > From: go canal <[email protected]> > To: [email protected] > Sent: Sunday, August 9, 2009 10:20:28 PM > Subject: full text search for CJK languages > > Hi, > could not find detailed info wrt supporting full text search for 2-byte > languages like CJK (Chinese, Japanese and Korea). > > 1) anybody know if there is one such library available ? and > 2) how to config this in Jackrabbit ? Should I replace all the extractors in > the current configuration: > <SearchIndex ..... > <param name="textFilterClasses" > > value="org.apache.jackrabbit.extractor.PlainTextExtractor, > org.apache.jackrabbit.extractor.MsWordTextExtractor, > org.apache.jackrabbit.extractor.MsExcelTextExtractor, > org.apache.jackrabbit.extractor.MsPowerPointTextExtractor, > org.apache.jackrabbit.extractor.PdfTextExtractor, > org.apache.jackrabbit.extractor.OpenOfficeTextExtractor, > org.apache.jackrabbit.extractor.RTFTextExtractor, > org.apache.jackrabbit.extractor.HTMLTextExtractor, > org.apache.jackrabbit.extractor.XMLTextExtractor" /> > </SearchIndex> > rgds, > canal > > >
