Just tested: the default configuration supports full CJK text search for Text, Word and PPT file; but can not search PDF/Excel files.
rgds, canal ________________________________ From: go canal <[email protected]> To: [email protected] Sent: Sunday, August 9, 2009 10:20:28 PM Subject: full text search for CJK languages Hi, could not find detailed info wrt supporting full text search for 2-byte languages like CJK (Chinese, Japanese and Korea). 1) anybody know if there is one such library available ? and 2) how to config this in Jackrabbit ? Should I replace all the extractors in the current configuration: <SearchIndex ..... <param name="textFilterClasses" value="org.apache.jackrabbit.extractor.PlainTextExtractor, org.apache.jackrabbit.extractor.MsWordTextExtractor, org.apache.jackrabbit.extractor.MsExcelTextExtractor, org.apache.jackrabbit.extractor.MsPowerPointTextExtractor, org.apache.jackrabbit.extractor.PdfTextExtractor, org.apache.jackrabbit.extractor.OpenOfficeTextExtractor, org.apache.jackrabbit.extractor.RTFTextExtractor, org.apache.jackrabbit.extractor.HTMLTextExtractor, org.apache.jackrabbit.extractor.XMLTextExtractor" /> </SearchIndex> rgds, canal
