Re: [pylucene-dev] WhitespaceTokenizer

Andi Vajda Mon, 02 Jan 2006 11:17:04 -0800


On Sun, 1 Jan 2006, Alf Eaton wrote:

I just started working with PyLucene (1.0.1) a couple of weeks ago and haveit working well building indexes using either StandardAnalyzer,SimpleAnalyzer or WhitespaceAnalyzer. However, when I tried to use my ownanalyzer, I got an error: "NameError: global name 'WhitespaceTokenizer' isnot defined". Below is the code I was using for the analyzer - is theresomething I'm doing wrong, or is this particular tokenizer not available toPyLucene?
class MyAnalyzer(object):
  def tokenStream(self, fieldName, reader):
      result = WhitespaceTokenizer(reader)
      result = LowerCaseFilter(result)
      result = StopFilter(result, stopwords)
      return result


You found a bug, support for WhitespaceTokenizer is missing.
I now fixed the bug in svn, on the main development trunk (version 1.9).

You have several options to get the fix:

 1. Use PyLucene 1.9 and build it from svn (ant and jdk 1.4 ARE required)

 2. Use PyLucene 1.9 and build it from the source archive at
http://downloads.osafoundation.org/PyLucene/src/PyLucene-src-1.9rc1-3.tar.gz
    after appling the patch below (ant and jdk 1.4 are NOT required)

 3. Apply the patch to the PyLucene 1.0.1 sources and add
    WhitespaceTokenizer.h to the long list of header file generations in
    Makefile

I recommend you use option 2, the simplest. PyLucene 1.9rc1 appears to bequite stable and usable.


Andi..

Patch to PyLucene.i to add support for WhitespaceTokenizer:
------------------------------------------------------------

Index: PyLucene.i
===================================================================
--- PyLucene.i  (revision 217)
+++ PyLucene.i  (working copy)
@@ -109,6 +109,7 @@
 #include "org/apache/lucene/analysis/SimpleAnalyzer.h"
 #include "org/apache/lucene/analysis/StopAnalyzer.h"
 #include "org/apache/lucene/analysis/WhitespaceAnalyzer.h"
+#include "org/apache/lucene/analysis/WhitespaceTokenizer.h"
 #include "org/apache/lucene/analysis/PerFieldAnalyzerWrapper.h"
 #include "org/apache/lucene/analysis/standard/StandardAnalyzer.h"
 #include "org/apache/lucene/analysis/standard/StandardFilter.h"
@@ -2664,6 +2665,10 @@
                 public:
                     WhitespaceAnalyzer();
                 };
+                class WhitespaceTokenizer : public CharTokenizer {
+                public:
+                    WhitespaceTokenizer(jreader);
+                };
                 class PerFieldAnalyzerWrapper : public Analyzer {
                 public:
                     PerFieldAnalyzerWrapper(janalyzer);

_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Re: [pylucene-dev] WhitespaceTokenizer

Reply via email to