Just tested:
  the default configuration supports full CJK text search for Text, Word and 
PPT file; but can not search PDF/Excel files.

 rgds,
canal




________________________________
From: go canal <[email protected]>
To: [email protected]
Sent: Sunday, August 9, 2009 10:20:28 PM
Subject: full text search for CJK languages

Hi,
could not find detailed info wrt supporting full text search for 2-byte 
languages like CJK (Chinese, Japanese and Korea). 

1) anybody know if there is one such library available ? and
2) how to config this in Jackrabbit ? Should I replace all the extractors in 
the current configuration:
    <SearchIndex .....
      <param name="textFilterClasses" 

        value="org.apache.jackrabbit.extractor.PlainTextExtractor,
         org.apache.jackrabbit.extractor.MsWordTextExtractor,
   org.apache.jackrabbit.extractor.MsExcelTextExtractor,
   org.apache.jackrabbit.extractor.MsPowerPointTextExtractor,
   org.apache.jackrabbit.extractor.PdfTextExtractor,
   org.apache.jackrabbit.extractor.OpenOfficeTextExtractor,
   org.apache.jackrabbit.extractor.RTFTextExtractor,
   org.apache.jackrabbit.extractor.HTMLTextExtractor,
   org.apache.jackrabbit.extractor.XMLTextExtractor" />
    </SearchIndex>
rgds,
canal


      

Reply via email to