On Feb 14, 2007, at 12:27 PM, Jack L wrote:
The core of Nutch - Lucene has a Python port PyLucene. I wonder
if there is a Python port for Nutch? We have some distributed
Nutch searchers running. I'm thinking, if would be nice to
have the merger/frontend available to Python and take advantage of
the powerful Python web frameworks.
There is a Python frontend to Nutch built by Dennis Kubes:
http://wiki.apache.org/nutch/Automating_Fetches_with_Python
And in our setup we mix Nutch's java parsers and crawlers with our
own homebuilt Python ones. We use Solr via a Python class to inject
data into the main nutch index. You have to be very careful with
index and segment merging but otherwise it works well.
I was initially using PyLucene for this task but I found that Solr
does a great job at abstracting the index files from the application,
and we can run multiple crawl processes on many machines all feeding
to the same Solr-led index. With PyLucene/Lucene you need to worry
about locks and the indexWriter/Reader.
For more on Nutch->Solr, see http://blog.foofactory.fi/2007/02/online-
indexing-integrating-nutch-with.html
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev