Re: [pylucene-dev] other tokenizers ?

David Fraser Thu, 30 Jun 2005 14:16:49 -0700

Hi Joh

I think you meant to send this to the list too...


Joh N. wrote:

Thanks To David and Andi for yours replies...

before i go to question [email protected] :)

is there any sample or start point on how to index a
full text whatever be the tokenization/etc. just to
see what looks like a PyLucene Index creation ? i had
looked on sample/directory but it seems fields are not
really words but some kind of huge "content bag".
unless 'analyzer' do this job ?
PyLucene.IndexWriter(store, analyzer, True)

also it might be useful to have an encoding for

command in this line (SearchFiles.py) :query = QueryParser.parse(*unicode(command,

'iso-8859-1')*, "contents", analyzer)
because using it from dos with any accentuated
characters will raise an error.
(even if i agree it might be really difficult to know
about terminal encoding.
but it looks like for me it isn't enough to handle
accentuated chars has i can get some result with non
accentuated words while i can't for accentuated one :(

last question, does Lucene provide any compression
(for posting list) ? i had a strange _d.cfs file
created and it is at least as big as my original
corpus :(

I'm not a Lucene expert, but I think you can set it to not store thevalues for a Field and only index it. You're probably storing the values too


David
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Re: [pylucene-dev] other tokenizers ?

Reply via email to