On Jun 1, 2008, at 11:35 PM, "Cloud Zhang" <[EMAIL PROTECTED]> wrote:
Sure, there are two Chinese analyzer (including the CJKAnalyzer)
bundled with Lucene. But both are character based and far from
acceptable.
A practical Chinese tokenizer should know Chinese words (with one or
several dictionaries) than characters. It is reasonable not to
bundle a dictionary based analyzer with Lucene, since the dictionary
alone would be several megabytes, yet not helpful to other part of
the world:)
Ah ok. Understood.
Andi..
On Mon, Jun 2, 2008 at 2:09 PM, Andi Vajda <[EMAIL PROTECTED]>
wrote:
On Jun 1, 2008, at 10:53 PM, "Cloud Zhang" <[EMAIL PROTECTED]> wrote:
Thank a lot for this very detailed guide, I'll forward this to
Chinese Python community, since the first thing a Chinese developer
looking for about Lucene is a tokenizer for Chinese and get stuck
with importing a jar...
Isn't there a Chinese analyzer already shipped with Java Lucene in
contrib/analyzers ?
That contrib is already built into PyLucene.
Andi..
On Mon, Jun 2, 2008 at 1:15 PM, Andi Vajda
<[EMAIL PROTECTED]> wrote:
On Mon, 2 Jun 2008, Cloud Zhang wrote:
Adding an new analyzer (in jar form) in Java is really
straightforward, but
when I was trying to add one for pyLucene, I found no way to refer
the jar
package.
I went though the building process of pyLucene and guess maybe I
could:
* put the analyzer source under
PyLucene-2.3.2-1/lucene-java-2.3.2/contrib/analyzers/src/java/, and
recompile Lucene then pyLucene
or
* put the analyzer jar somewhere in the building folder and add it
to the
Makefile, then recompile pyLucene
Could them work? Or is there other solution which is as
straightforward as
setting CLASSPATH in java?
To access your class(es) by name from Python, you must have JCC
generate wrappers for it (them). This is what is done line 177 and
on in PyLucene's Makefile. The easiest way for you to add your own
Java classes to PyLucene is to create another jar file with your
own analyzer classes and code and add it to the JCC invocation there.
For example, the Makefile snippet in question currently says:
GENERATE=$(JCC) $(foreach jar,$(JARS),--jar $(jar)) \
--package java.lang java.lang.System \
java.lang.Runtime \
--package java.util \
java.text.SimpleDateFormat \
--package java.io java.io.StringReader \
java.io.InputStreamReader \
java.io.FileInputStream \
--exclude org.apache.lucene.queryParser.Token \
--exclude org.apache.lucene.queryParser.TokenMgrError \
--exclude
org.apache.lucene.queryParser.QueryParserTokenManager \
--exclude org.apache.lucene.queryParser.ParseException \
--python lucene \
--mapping org.apache.lucene.document.Document 'get:(Ljava/
lang/String;)Ljava/lang/String;' \
--mapping java.util.Properties 'getProperty:(Ljava/lang/
String;)Ljava/lang/String;' \
--sequence org.apache.lucene.search.Hits 'length:()I'
'doc:(I)Lorg/apache/lucene/document/Document;' \
--version $(LUCENE_VER) \
--files $(NUM_FILES)
change the first line to say:
GENERATE=$(JCC) $(foreach jar,$(JARS),--jar $(jar)) --jar myjar.jar \
...
and rebuild PyLucene. That should be all you need to do. Your jar
file is going to be installed along with lucene's in the lucene egg
and it is going to be put on lucene.CLASSPATH which you use with
lucene.initVM().
Your classes can be declared in any Java package you want. Just
make sure that their names don't clash with other Lucene class
names that you also need to use as the class namespace is flattened
in PyLucene.
For more information about JCC and its command line args see JCC's
README file at [1].
Andi..
[1] http://svn.osafoundation.org/pylucene/trunk/jcc/jcc/README
_______________________________________________
pylucene-dev mailing list
pylucene-dev@osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
--
Cheers,
Cloud
_______________________________________________
pylucene-dev mailing list
pylucene-dev@osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
_______________________________________________
pylucene-dev mailing list
pylucene-dev@osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
--
Cheers,
Cloud
_______________________________________________
pylucene-dev mailing list
pylucene-dev@osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
_______________________________________________
pylucene-dev mailing list
pylucene-dev@osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev