On Jun 1, 2008, at 11:35 PM, "Cloud Zhang" <[EMAIL PROTECTED]> wrote:

Sure, there are two Chinese analyzer (including the CJKAnalyzer) bundled with Lucene. But both are character based and far from acceptable.

A practical Chinese tokenizer should know Chinese words (with one or several dictionaries) than characters. It is reasonable not to bundle a dictionary based analyzer with Lucene, since the dictionary alone would be several megabytes, yet not helpful to other part of the world:)

Ah ok. Understood.

Andi..




On Mon, Jun 2, 2008 at 2:09 PM, Andi Vajda <[EMAIL PROTECTED]> wrote:

On Jun 1, 2008, at 10:53 PM, "Cloud Zhang" <[EMAIL PROTECTED]> wrote:

Thank a lot for this very detailed guide, I'll forward this to Chinese Python community, since the first thing a Chinese developer looking for about Lucene is a tokenizer for Chinese and get stuck with importing a jar...

Isn't there a Chinese analyzer already shipped with Java Lucene in contrib/analyzers ?
That contrib is already built into PyLucene.

Andi..



On Mon, Jun 2, 2008 at 1:15 PM, Andi Vajda <[EMAIL PROTECTED]> wrote:

On Mon, 2 Jun 2008, Cloud Zhang wrote:

Adding an new analyzer (in jar form) in Java is really straightforward, but when I was trying to add one for pyLucene, I found no way to refer the jar
package.

I went though the building process of pyLucene and guess maybe I could:
* put the analyzer source under
PyLucene-2.3.2-1/lucene-java-2.3.2/contrib/analyzers/src/java/, and
recompile Lucene then pyLucene
or
* put the analyzer jar somewhere in the building folder and add it to the
Makefile, then recompile pyLucene

Could them work? Or is there other solution which is as straightforward as
setting CLASSPATH in java?

To access your class(es) by name from Python, you must have JCC generate wrappers for it (them). This is what is done line 177 and on in PyLucene's Makefile. The easiest way for you to add your own Java classes to PyLucene is to create another jar file with your own analyzer classes and code and add it to the JCC invocation there.

For example, the Makefile snippet in question currently says:

GENERATE=$(JCC) $(foreach jar,$(JARS),--jar $(jar)) \
          --package java.lang java.lang.System \
                              java.lang.Runtime \
          --package java.util \
                    java.text.SimpleDateFormat \
          --package java.io java.io.StringReader \
                            java.io.InputStreamReader \
                            java.io.FileInputStream \
          --exclude org.apache.lucene.queryParser.Token \
          --exclude org.apache.lucene.queryParser.TokenMgrError \
--exclude org.apache.lucene.queryParser.QueryParserTokenManager \
          --exclude org.apache.lucene.queryParser.ParseException \
          --python lucene \
--mapping org.apache.lucene.document.Document 'get:(Ljava/ lang/String;)Ljava/lang/String;' \ --mapping java.util.Properties 'getProperty:(Ljava/lang/ String;)Ljava/lang/String;' \ --sequence org.apache.lucene.search.Hits 'length:()I' 'doc:(I)Lorg/apache/lucene/document/Document;' \
          --version $(LUCENE_VER) \
          --files $(NUM_FILES)


change the first line to say:

GENERATE=$(JCC) $(foreach jar,$(JARS),--jar $(jar)) --jar myjar.jar \
  ...

and rebuild PyLucene. That should be all you need to do. Your jar file is going to be installed along with lucene's in the lucene egg and it is going to be put on lucene.CLASSPATH which you use with lucene.initVM().

Your classes can be declared in any Java package you want. Just make sure that their names don't clash with other Lucene class names that you also need to use as the class namespace is flattened in PyLucene.

For more information about JCC and its command line args see JCC's README file at [1].

Andi..

[1] http://svn.osafoundation.org/pylucene/trunk/jcc/jcc/README
_______________________________________________
pylucene-dev mailing list
pylucene-dev@osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev



--
Cheers,
Cloud
_______________________________________________
pylucene-dev mailing list
pylucene-dev@osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

_______________________________________________
pylucene-dev mailing list
pylucene-dev@osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev




--
Cheers,
Cloud
_______________________________________________
pylucene-dev mailing list
pylucene-dev@osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
_______________________________________________
pylucene-dev mailing list
pylucene-dev@osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to