On Sun, 1 Jan 2006, Alf Eaton wrote:

I just started working with PyLucene (1.0.1) a couple of weeks ago and have it working well building indexes using either StandardAnalyzer, SimpleAnalyzer or WhitespaceAnalyzer. However, when I tried to use my own analyzer, I got an error: "NameError: global name 'WhitespaceTokenizer' is not defined". Below is the code I was using for the analyzer - is there something I'm doing wrong, or is this particular tokenizer not available to PyLucene?

class MyAnalyzer(object):
  def tokenStream(self, fieldName, reader):
      result = WhitespaceTokenizer(reader)
      result = LowerCaseFilter(result)
      result = StopFilter(result, stopwords)
      return result

You found a bug, support for WhitespaceTokenizer is missing.
I now fixed the bug in svn, on the main development trunk (version 1.9).

You have several options to get the fix:

 1. Use PyLucene 1.9 and build it from svn (ant and jdk 1.4 ARE required)

 2. Use PyLucene 1.9 and build it from the source archive at
http://downloads.osafoundation.org/PyLucene/src/PyLucene-src-1.9rc1-3.tar.gz
    after appling the patch below (ant and jdk 1.4 are NOT required)

 3. Apply the patch to the PyLucene 1.0.1 sources and add
    WhitespaceTokenizer.h to the long list of header file generations in
    Makefile

I recommend you use option 2, the simplest. PyLucene 1.9rc1 appears to be quite stable and usable.

Andi..

Patch to PyLucene.i to add support for WhitespaceTokenizer:
------------------------------------------------------------

Index: PyLucene.i
===================================================================
--- PyLucene.i  (revision 217)
+++ PyLucene.i  (working copy)
@@ -109,6 +109,7 @@
 #include "org/apache/lucene/analysis/SimpleAnalyzer.h"
 #include "org/apache/lucene/analysis/StopAnalyzer.h"
 #include "org/apache/lucene/analysis/WhitespaceAnalyzer.h"
+#include "org/apache/lucene/analysis/WhitespaceTokenizer.h"
 #include "org/apache/lucene/analysis/PerFieldAnalyzerWrapper.h"
 #include "org/apache/lucene/analysis/standard/StandardAnalyzer.h"
 #include "org/apache/lucene/analysis/standard/StandardFilter.h"
@@ -2664,6 +2665,10 @@
                 public:
                     WhitespaceAnalyzer();
                 };
+                class WhitespaceTokenizer : public CharTokenizer {
+                public:
+                    WhitespaceTokenizer(jreader);
+                };
                 class PerFieldAnalyzerWrapper : public Analyzer {
                 public:
                     PerFieldAnalyzerWrapper(janalyzer);

_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to