Re: Why is lucene so slow indexing in nfs file system ?

2008-01-10 Thread Ariel
Thanks all you for yours answers, I going to change a few things in my application and make tests. One thing I haven't find another good pdfToText converter like pdfBox Do you know any other faster ? Greetings Thanks for yours answers Ariel On Jan 9, 2008 11:08 PM, Otis Gospodnetic [EMAIL

Re: Why is lucene so slow indexing in nfs file system ?

2008-01-10 Thread Ariel
In a distributed enviroment the application should make an exhaustive use of the network and there is not another way to access to the documents in a remote repository but accessing in nfs file system. One thing I must clarify: I index the documents in memory, I use RAMDirectory to do that, then

Re: Why is lucene so slow indexing in nfs file system ?

2008-01-10 Thread Erick Erickson
This seems really clunky. Especially if your merge step also optimizes. There's not much point in indexing into RAM then merging explicitly. Just use an FSDirectory rather than a RAMDirectory. There is *already* buffering built in to FSDirectory, and your merge factor etc. control how much RAM is

Re: Why is lucene so slow indexing in nfs file system ?

2008-01-10 Thread Michael McCandless
If possible you should also test the soon-to-be-released version 2.3, which has a number of speedups to indexing. Also try the steps here: http://wiki.apache.org/lucene-java/ImproveIndexingSpeed You should also try an A/B test: A) writing your index to the NFS directory and then B) to

Re: Why is lucene so slow indexing in nfs file system ?

2008-01-10 Thread Ariel
I am indexing into RAM then merging explicitly because my application demand it due to I have design it as a distributed enviroment so many threads or workers are in different machines indexing into RAM serialize to disk an another thread in another machine access the segment index to merge it

Re: Why is lucene so slow indexing in nfs file system ?

2008-01-10 Thread Otis Gospodnetic
Ariel, Comments inline. - Original Message From: Ariel [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Thursday, January 10, 2008 10:05:28 AM Subject: Re: Why is lucene so slow indexing in nfs file system ? In a distributed enviroment the application should make an exhaustive

Re: Why is lucene so slow indexing in nfs file system ?

2008-01-10 Thread Ariel
2:59 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: Ariel, Comments inline. - Original Message From: Ariel [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Thursday, January 10, 2008 10:05:28 AM Subject: Re: Why is lucene so slow indexing in nfs file system

Re: Why is lucene so slow indexing in nfs file system ?

2008-01-10 Thread Chris Lu
. Ariel On Jan 10, 2008 2:59 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: Ariel, Comments inline. - Original Message From: Ariel [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Thursday, January 10, 2008 10:05:28 AM Subject: Re: Why is lucene so slow

Re: Why is lucene so slow indexing in nfs file system ?

2008-01-10 Thread Otis Gospodnetic
Subject: Re: Why is lucene so slow indexing in nfs file system ? Thanks for yours suggestions. I'm sorry I didn't know but I would want to know what Do you mean with SAN and FC? Another thing, I have visited the lucene home page and there is not released the 2.3 version, could you tell me where

Re: Why is lucene so slow indexing in nfs file system ?

2008-01-09 Thread Erick Erickson
would like to find out why my application has this big delay to index Well, then you have to measure G. Tthe first thing I'd do is pinpoint where the time was being spent. Until you have that answered, you simply cannot take any meaningful action. 1 don't do any of the indexing. No new

RE: Why is lucene so slow indexing in nfs file system ?

2008-01-09 Thread Steven A Rowe
Hi Ariel, On 01/09/2008 at 8:50 AM, Ariel wrote: Dou you know others distributed architecture application that uses lucene to index big amounts of documents ? Apache Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit

Re: Why is lucene so slow indexing in nfs file system ?

2008-01-09 Thread Grant Ingersoll
There's also Nutch. However, 10GB isn't that big... Perhaps you can index where the docs/index lives, then just make the index available via NFS? Or, better yet, use rsync to replicate it like Solr does. -Grant On Jan 9, 2008, at 10:49 AM, Steven A Rowe wrote: Hi Ariel, On 01/09/2008

Re: Why is lucene so slow indexing in nfs file system ?

2008-01-09 Thread Antony Bowesman
Ariel wrote: The problem I have is that my application spends a lot of time to index all the documents, the delay to index 10 gb of pdf documents is about 2 days (to convert pdf to text I am using pdfbox) that is of course a lot of time, others applications based in lucene, for instance ibm

Re: Why is lucene so slow indexing in nfs file system ?

2008-01-09 Thread Otis Gospodnetic
Ariel, I believe PDFBox is not the fastest thing and was built more to handle all possible PDFs than for speed (just my impression - Ben, PDFBox's author might still be on this list and might comment). Pulling data from NFS to index seems like a bad idea. I hope at least the indices are