Re: Clustered Indexing on common network filesystem

2007-08-02 Thread Michael McCandless
"Zach Bailey" <[EMAIL PROTECTED]> wrote: > Unfortunately, I am not sure the leader of the project would feel good > about running code from trunk, save without an explicit endorsement from > a majority of the developers or contributors for that particular code > (do those people keep up with t

Re: Clustered Indexing on common network filesystem

2007-08-02 Thread Michael McCandless
I have been meaning to write up a Wiki page on this general topic but have not quite made time yet ... Sharing an index with a shared filesystem will work, however there are some caveats: * This is somewhat unchartered territory because it's fairly recent fixes to Lucene that have enabled

Re: Clustered Indexing on common network filesystem

2007-08-02 Thread Zach Bailey
Mark, Thanks so much for your response. Unfortunately, I am not sure the leader of the project would feel good about running code from trunk, save without an explicit endorsement from a majority of the developers or contributors for that particular code (do those people keep up with this list

Re: Clustered Indexing on common network filesystem

2007-08-02 Thread Zach Bailey
Rajesh, I forgot to mention this, but we did investigate this option as well and even prototyped it for an internal project. It ended up being too slow for us. It was adding a lot of overhead even to small updates, IIRC, mainly due to the fact that the index was essentially stored as a files

Re: Clustered Indexing on common network filesystem

2007-08-02 Thread Rajesh parab
One more alternative, though I am not sure if anyone is using it. Apache Compass has added a plug-in to allow storing Lucene index files inside the database. This should work in clustered environment as all nodes will share the same database instance. I am not sure the impact it will have on perf

Re: Clustered Indexing on common network filesystem

2007-08-02 Thread Mark Miller
Some quick info: NFS should work, but I think youll want to be working off the trunk. Also, Sharing an index over NFS is supposed to be slow. The standard so far, if you are not partitioning the index, is to use a unix/linux filesystem and hardlinks + rsync to efficiently share index changes

Re: Clustered Indexing on common network filesystem

2007-08-02 Thread Zach Bailey
Thanks for your response -- Based on my understanding, hadoop and nutch are essentially the same thing, with nutch being derived from hadoop, and are primarily intended to be standalone applications. We are not looking for a standalone application, rather we must use a framework to implement

Re: Clustered Indexing on common network filesystem

2007-08-02 Thread testn
> Thanks in advance, >> -Zach Bailey >> > > ----------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.co

Re: Clustered Indexing on common network filesystem

2007-08-02 Thread Zach Bailey
Hi, It's been a couple of days now and I haven't heard anything on this topic, while there has been substantial list traffic otherwise. Am I asking in the wrong place? Was I unclear? I know there are people out there that have used/are using Lucene in a clustered environment. I am just looki

Clustered Indexing on common network filesystem

2007-07-31 Thread Zach Bailey
Hello all, First a little background - we are developing a clustered application that will in part leverage Lucene to provide index and search capabilities. We have already spent time investigating various index storage implementations (database vs. filesystem) and we've decided for performan