On 01 May 2006, at 02:53, Andi Vajda wrote:
Secondly, it doesn't seem to be possible (in PyLucene 1.9.1) to
search an untokenized field using a term that contains spaces. For
a document that has a creator "Doe J", the query
creator:"Doe J"
doesn't return any results, and
creator:Doe J
doesn't match what it needs to.
Again, please send in code that reproduces the problem. If you can
make sure that what you're trying to do work in Java Lucene, that's
a plus.
Ideally, your sample code would be organized as unit tests.
Good idea to do the tests: I realised that StandardAnalyzer was
converting the search terms to lowercase when used in QueryParser,
but not when adding untokenized fields to the document using
IndexWriter, so the two weren't matching. Fixed now, thanks (and it's
presumably not a PyLucene problem).
alf.
--------
#!/usr/bin/env python
from PyLucene import *
filestore = FSDirectory.getDirectory("test", True)
analyzer = StandardAnalyzer()
filewriter = IndexWriter(filestore, analyzer, True)
doc = Document()
doc.add(Field('author-space', "Doe J", Field.Store.YES,
Field.Index.UN_TOKENIZED))
doc.add(Field('author-space-tok', "Doe J", Field.Store.YES,
Field.Index.TOKENIZED))
doc.add(Field('author-underscore', "Doe_J", Field.Store.YES,
Field.Index.UN_TOKENIZED))
doc.add(Field('author-underscore-tok', "Doe_J", Field.Store.YES,
Field.Index.TOKENIZED))
filewriter.addDocument(doc)
filewriter.close()
searcher = IndexSearcher("test")
for q in ("Doe J", "Doe_J"):
for f in ("author-space", "author-space-tok", "author-
underscore", "author-underscore-tok"):
#query = QueryParser.parse(q, f, analyzer) # only works for
tokenized fields
query = TermQuery(Term(f, q)) # only works for untokenized
fields
hits = searcher.search(query)
print "\nQ: %s\nQuery: %s\n" % (q, query)
for i, doc in hits:
print "Result: %s\n" % doc[f]
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev