Re: Anybody using Japanese SEN with recent versions of Solr ?

Koji Sekiguchi Tue, 30 Jun 2009 17:35:27 -0700

Mark,

I think you can develop your tokenizer which calls sen to tokenizeJapanese sentences.

To develop your tokenizer, you can see the source code of lucene-ja.
I think you can find the source code in lucene-ja.jar, but I'm not sure.


Koji


Mark Bennett wrote:

I've been reading through the SEN project doc and various Japanese blogs,
but still having some issues.

In particular, it seems like perhaps you're supposed to have BOTH
sen-1.2.2.1 and lucene-ja-2.0test2 installed?

I guess the lucene-ja is an adapter layer between the org.apache.lucene
analyzers and base net.java Tokenizers, whereas sen-1.2.2.1 is the base SEN
package, and is not aware of Lucene/Solr.  So I guess you need both.

But both versions have Lucene classes, and the lucene-ja stuff seems to be
using very old Lucene.  I'm not sure how you layer this all together with a
more recent Solr implemenation?  (using nightly stable)

Or perhaps the older lucene-ja is intended to already have SEN, it does have
some SEN files, but they are quite a bit older than the SEN 1221 stuff, and
you've still got the old Lucene version issue.

Any input would be appreciated.

--
Mark Bennett / New Idea Engineering, Inc. / [email protected]
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513

Re: Anybody using Japanese SEN with recent versions of Solr ?

Reply via email to