RE: Multi + Parallel
I am using 6 indexers / indexes to balance the speed of indexing against query performance for 40+ million documents. I came to this number through trial and error, and performance testing on the indexing side with a fast 4 processor machine. The trick is to max out the I/O throughput. -Will -Original Message- From: Justin Swanhart [mailto:[EMAIL PROTECTED] Sent: Thursday, October 14, 2004 2:43 PM To: Lucene Users List Subject: Re: Multi + Parallel The overhead of creating that many searcher objects is going to far outweigh any performance benefit you could possibly hope to gain by splitting your index up. On Thu, 14 Oct 2004 04:42:27 -0700 (PDT), Otis Gospodnetic [EMAIL PROTECTED] wrote: Search a single merged index. Otis --- Karthik N S [EMAIL PROTECTED] wrote: Hi Apologies.. Can somebody provide me Approximate answers [ Which is Better choice ] A search of 10,000 subindexes using multisearcher or a search on One Single Merged Index [ merged 10,000 Sub indexes ] a) SubIndexes 10,000 ( future) b) Field to be searche upon = 4 c)Field type present in Indexed format = 15 d) RAM = 1GB e) O/s Linux [ Clustered Enviournament] f) Processor make AMD [Probably High End] g) WebServer Tomcat 5.0.x 1)Which would be Faster ???; 2)If not What is may be the Probable Solution. Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 13, 2004 3:53 PM To: Lucene Users List Subject: Re: Multi + Parallel On Oct 13, 2004, at 3:14 AM, Karthik N S wrote: I was Curious to Know the Difference between ParallelMultiSearcher and MultiSearcher , 1) Is the working internal functionality of these are same or different . They are different internally. Externally they should return identical results and not appear different at all. Internally, ParallelMultiSearcher searches each index in a separate thread (searches wait until all threads finish before returning). In MultiSearcher, each index is searched serially. You will not likely see a benefit to using ParallelMultiSearcher unless your environment is specialized to accommodate multi-threading (multiple CPU's, indexes on separate drives that can operate independently, etc). 2) In terms of time domain do these differ when searching same no of fields / words . 3)What are the features used on each of API. There is no external difference to using either implementation. Benchmark searches using both and see what is best, but generally MultiSeacher will be better in most environments as it avoids the overhead of starting up and managing multiple threads. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Multi + Parallel
Hi Apologies.. Can somebody provide me Approximate answers [ Which is Better choice ] A search of 10,000 subindexes using multisearcher or a search on One Single Merged Index [ merged 10,000 Sub indexes ] a) SubIndexes 10,000 ( future) b) Field to be searche upon = 4 c)Field type present in Indexed format = 15 d) RAM = 1GB e) O/s Linux [ Clustered Enviournament] f) Processor make AMD [Probably High End] g) WebServer Tomcat 5.0.x 1)Which would be Faster ???; 2)If not What is may be the Probable Solution. Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 13, 2004 3:53 PM To: Lucene Users List Subject: Re: Multi + Parallel On Oct 13, 2004, at 3:14 AM, Karthik N S wrote: I was Curious to Know the Difference between ParallelMultiSearcher and MultiSearcher , 1) Is the working internal functionality of these are same or different . They are different internally. Externally they should return identical results and not appear different at all. Internally, ParallelMultiSearcher searches each index in a separate thread (searches wait until all threads finish before returning). In MultiSearcher, each index is searched serially. You will not likely see a benefit to using ParallelMultiSearcher unless your environment is specialized to accommodate multi-threading (multiple CPU's, indexes on separate drives that can operate independently, etc). 2) In terms of time domain do these differ when searching same no of fields / words . 3)What are the features used on each of API. There is no external difference to using either implementation. Benchmark searches using both and see what is best, but generally MultiSeacher will be better in most environments as it avoids the overhead of starting up and managing multiple threads. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Multi + Parallel
Search a single merged index. Otis --- Karthik N S [EMAIL PROTECTED] wrote: Hi Apologies.. Can somebody provide me Approximate answers [ Which is Better choice ] A search of 10,000 subindexes using multisearcher or a search on One Single Merged Index [ merged 10,000 Sub indexes ] a) SubIndexes 10,000 ( future) b) Field to be searche upon = 4 c)Field type present in Indexed format = 15 d) RAM = 1GB e) O/s Linux [ Clustered Enviournament] f) Processor make AMD [Probably High End] g) WebServer Tomcat 5.0.x 1)Which would be Faster ???; 2)If not What is may be the Probable Solution. Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 13, 2004 3:53 PM To: Lucene Users List Subject: Re: Multi + Parallel On Oct 13, 2004, at 3:14 AM, Karthik N S wrote: I was Curious to Know the Difference between ParallelMultiSearcher and MultiSearcher , 1) Is the working internal functionality of these are same or different . They are different internally. Externally they should return identical results and not appear different at all. Internally, ParallelMultiSearcher searches each index in a separate thread (searches wait until all threads finish before returning). In MultiSearcher, each index is searched serially. You will not likely see a benefit to using ParallelMultiSearcher unless your environment is specialized to accommodate multi-threading (multiple CPU's, indexes on separate drives that can operate independently, etc). 2) In terms of time domain do these differ when searching same no of fields / words . 3)What are the features used on each of API. There is no external difference to using either implementation. Benchmark searches using both and see what is best, but generally MultiSeacher will be better in most environments as it avoids the overhead of starting up and managing multiple threads. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Multi + Parallel
The overhead of creating that many searcher objects is going to far outweigh any performance benefit you could possibly hope to gain by splitting your index up. On Thu, 14 Oct 2004 04:42:27 -0700 (PDT), Otis Gospodnetic [EMAIL PROTECTED] wrote: Search a single merged index. Otis --- Karthik N S [EMAIL PROTECTED] wrote: Hi Apologies.. Can somebody provide me Approximate answers [ Which is Better choice ] A search of 10,000 subindexes using multisearcher or a search on One Single Merged Index [ merged 10,000 Sub indexes ] a) SubIndexes 10,000 ( future) b) Field to be searche upon = 4 c)Field type present in Indexed format = 15 d) RAM = 1GB e) O/s Linux [ Clustered Enviournament] f) Processor make AMD [Probably High End] g) WebServer Tomcat 5.0.x 1)Which would be Faster ???; 2)If not What is may be the Probable Solution. Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 13, 2004 3:53 PM To: Lucene Users List Subject: Re: Multi + Parallel On Oct 13, 2004, at 3:14 AM, Karthik N S wrote: I was Curious to Know the Difference between ParallelMultiSearcher and MultiSearcher , 1) Is the working internal functionality of these are same or different . They are different internally. Externally they should return identical results and not appear different at all. Internally, ParallelMultiSearcher searches each index in a separate thread (searches wait until all threads finish before returning). In MultiSearcher, each index is searched serially. You will not likely see a benefit to using ParallelMultiSearcher unless your environment is specialized to accommodate multi-threading (multiple CPU's, indexes on separate drives that can operate independently, etc). 2) In terms of time domain do these differ when searching same no of fields / words . 3)What are the features used on each of API. There is no external difference to using either implementation. Benchmark searches using both and see what is best, but generally MultiSeacher will be better in most environments as it avoids the overhead of starting up and managing multiple threads. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Multi + Parallel
Hi Guys Apologies.. I was Curious to Know the Difference between ParallelMultiSearcher and MultiSearcher , 1) Is the working internal functionality of these are same or different . 2) In terms of time domain do these differ when searching same no of fields / words . 3)What are the features used on each of API. Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Multi + Parallel
On Oct 13, 2004, at 3:14 AM, Karthik N S wrote: I was Curious to Know the Difference between ParallelMultiSearcher and MultiSearcher , 1) Is the working internal functionality of these are same or different . They are different internally. Externally they should return identical results and not appear different at all. Internally, ParallelMultiSearcher searches each index in a separate thread (searches wait until all threads finish before returning). In MultiSearcher, each index is searched serially. You will not likely see a benefit to using ParallelMultiSearcher unless your environment is specialized to accommodate multi-threading (multiple CPU's, indexes on separate drives that can operate independently, etc). 2) In terms of time domain do these differ when searching same no of fields / words . 3)What are the features used on each of API. There is no external difference to using either implementation. Benchmark searches using both and see what is best, but generally MultiSeacher will be better in most environments as it avoids the overhead of starting up and managing multiple threads. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]