Hello Brian, Thanks for the reply. (I'm not sure if this discussion is interesting to PyLucene dev list. If it's considered OT, I shall take the next email offline.)
I looked at the first link you sent. It's not actually what I'm looking for. In our set up, we have multiple crawler/indexer/searcher boxes talking to one merger/web server front-end using Nutch IPC. The front-end box sends queries to multiple back-end searchers and merge the results it has received, and presents them in a web page. I'm hoping to find a way to replace the front-end Java implementation with Python. So, the piece I'm looking for does not touch the segments. Instead, it speaks Nutch IPC and parses the query strings, issues queries to the back-end, and merges results and puts them in a web page. Thanks for mentioning your experience with solr. I haven't tried it with large amount of data. My concern is, inserting using HTTP POST is much less efficient than local file access (the Nutch approach.) I'm not sure if it's able to handle millions of daily submits. -- Best regards, Jack Wednesday, February 14, 2007, 9:34:34 AM, you wrote: > On Feb 14, 2007, at 12:27 PM, Jack L wrote: >> The core of Nutch - Lucene has a Python port PyLucene. I wonder >> if there is a Python port for Nutch? We have some distributed >> Nutch searchers running. I'm thinking, if would be nice to >> have the merger/frontend available to Python and take advantage of >> the powerful Python web frameworks. > There is a Python frontend to Nutch built by Dennis Kubes: > http://wiki.apache.org/nutch/Automating_Fetches_with_Python > And in our setup we mix Nutch's java parsers and crawlers with our > own homebuilt Python ones. We use Solr via a Python class to inject > data into the main nutch index. You have to be very careful with > index and segment merging but otherwise it works well. > I was initially using PyLucene for this task but I found that Solr > does a great job at abstracting the index files from the application, > and we can run multiple crawl processes on many machines all feeding > to the same Solr-led index. With PyLucene/Lucene you need to worry > about locks and the indexWriter/Reader. For more on Nutch->>Solr, see For more on Nutch->>http://blog.foofactory.fi/2007/02/online- > indexing-integrating-nutch-with.html > _______________________________________________ > pylucene-dev mailing list > [email protected] > http://lists.osafoundation.org/mailman/listinfo/pylucene-dev __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ pylucene-dev mailing list [email protected] http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
