RE: Multi + Parallel

2004-10-15 Thread Will Allen
I am using 6 indexers / indexes to balance the speed of indexing against query 
performance for 40+ million documents.  I came to this number through trial and error, 
and performance testing on the indexing side with a fast 4 processor machine.  The 
trick is to max out the I/O throughput.

-Will

-Original Message-
From: Justin Swanhart [mailto:[EMAIL PROTECTED]
Sent: Thursday, October 14, 2004 2:43 PM
To: Lucene Users List
Subject: Re: Multi + Parallel


The overhead of creating that many searcher objects is going to far
outweigh any performance benefit you could possibly hope to gain by
splitting your index up.


On Thu, 14 Oct 2004 04:42:27 -0700 (PDT), Otis Gospodnetic
[EMAIL PROTECTED] wrote:
 Search a single merged index.
 
 Otis
 
 
 
 --- Karthik N S [EMAIL PROTECTED] wrote:
 
  Hi
 
  Apologies..
 
 
Can somebody provide me Approximate answers   [ Which is Better
  choice ]
 
A search of  10,000 subindexes using  multisearcher
 
or
 
   a search on  One Single Merged Index [ merged 10,000 Sub indexes ]
 
 
  a) SubIndexes  10,000 (   future)
 
  b) Field to be searche upon   = 4
 
  c)Field type present in Indexed format = 15
 
  d)  RAM = 1GB
 
   e) O/s Linux [ Clustered Enviournament]
 
   f) Processor make AMD [Probably High End]
 
   g) WebServer Tomcat 5.0.x
 
 
 
 
1)Which would be Faster ???;
 
2)If not What is may be the Probable Solution.
 
 
  Karthik
 
 
 
 
  -Original Message-
  From: Erik Hatcher [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, October 13, 2004 3:53 PM
  To: Lucene Users List
  Subject: Re: Multi + Parallel
 
 
  On Oct 13, 2004, at 3:14 AM, Karthik N S wrote:
   I was Curious to Know the Difference between ParallelMultiSearcher
  and
   MultiSearcher ,
  
   1) Is the working internal functionality of these  are  same or
   different .
 
  They are different internally.  Externally they should return
  identical
  results and not appear different at all.
 
  Internally, ParallelMultiSearcher searches each index in a separate
  thread (searches wait until all threads finish before returning).
  In
  MultiSearcher, each index is searched serially.
 
  You will not likely see a benefit to using ParallelMultiSearcher
  unless
  your environment is specialized to accommodate multi-threading
  (multiple CPU's, indexes on separate drives that can operate
  independently, etc).
 
   2) In terms of time domain do these differ when searching same no
  of
   fields
   / words .
  
   3)What are the features used on each of  API.
 
  There is no external difference to using either implementation.
  Benchmark searches using both and see what is best, but generally
  MultiSeacher will be better in most environments as it avoids the
  overhead of starting up and managing multiple threads.
 
Erik
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Multi + Parallel

2004-10-14 Thread Karthik N S
Hi

Apologies..


  Can somebody provide me Approximate answers   [ Which is Better choice ]
 
  A search of  10,000 subindexes using  multisearcher

  or 

 a search on  One Single Merged Index [ merged 10,000 Sub indexes ]


a) SubIndexes  10,000 (   future)

b) Field to be searche upon   = 4

c)Field type present in Indexed format = 15 

d)  RAM = 1GB

 e) O/s Linux [ Clustered Enviournament] 

 f) Processor make AMD [Probably High End]

 g) WebServer Tomcat 5.0.x  




  1)Which would be Faster ???;   

  2)If not What is may be the Probable Solution.


Karthik




-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 13, 2004 3:53 PM
To: Lucene Users List
Subject: Re: Multi + Parallel 


On Oct 13, 2004, at 3:14 AM, Karthik N S wrote:
 I was Curious to Know the Difference between ParallelMultiSearcher  and
 MultiSearcher ,

 1) Is the working internal functionality of these  are  same or 
 different .

They are different internally.  Externally they should return identical 
results and not appear different at all.

Internally, ParallelMultiSearcher searches each index in a separate 
thread (searches wait until all threads finish before returning).   In 
MultiSearcher, each index is searched serially.

You will not likely see a benefit to using ParallelMultiSearcher unless 
your environment is specialized to accommodate multi-threading 
(multiple CPU's, indexes on separate drives that can operate 
independently, etc).

 2) In terms of time domain do these differ when searching same no of  
 fields
 / words .

 3)What are the features used on each of  API.

There is no external difference to using either implementation.  
Benchmark searches using both and see what is best, but generally 
MultiSeacher will be better in most environments as it avoids the 
overhead of starting up and managing multiple threads.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Multi + Parallel

2004-10-14 Thread Otis Gospodnetic
Search a single merged index.

Otis

--- Karthik N S [EMAIL PROTECTED] wrote:

 Hi
 
 Apologies..
 
 
   Can somebody provide me Approximate answers   [ Which is Better
 choice ]
  
   A search of  10,000 subindexes using  multisearcher
 
   or 
 
  a search on  One Single Merged Index [ merged 10,000 Sub indexes ]
 
 
 a) SubIndexes  10,000 (   future)
 
 b) Field to be searche upon   = 4
 
 c)Field type present in Indexed format = 15 
 
 d)  RAM = 1GB
 
  e) O/s Linux [ Clustered Enviournament] 
 
  f) Processor make AMD [Probably High End]
 
  g) WebServer Tomcat 5.0.x  
 
 
 
 
   1)Which would be Faster ???;   
 
   2)If not What is may be the Probable Solution.
 
 
 Karthik
 
 
 
 
 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, October 13, 2004 3:53 PM
 To: Lucene Users List
 Subject: Re: Multi + Parallel 
 
 
 On Oct 13, 2004, at 3:14 AM, Karthik N S wrote:
  I was Curious to Know the Difference between ParallelMultiSearcher 
 and
  MultiSearcher ,
 
  1) Is the working internal functionality of these  are  same or 
  different .
 
 They are different internally.  Externally they should return
 identical 
 results and not appear different at all.
 
 Internally, ParallelMultiSearcher searches each index in a separate 
 thread (searches wait until all threads finish before returning).  
 In 
 MultiSearcher, each index is searched serially.
 
 You will not likely see a benefit to using ParallelMultiSearcher
 unless 
 your environment is specialized to accommodate multi-threading 
 (multiple CPU's, indexes on separate drives that can operate 
 independently, etc).
 
  2) In terms of time domain do these differ when searching same no
 of  
  fields
  / words .
 
  3)What are the features used on each of  API.
 
 There is no external difference to using either implementation.  
 Benchmark searches using both and see what is best, but generally 
 MultiSeacher will be better in most environments as it avoids the 
 overhead of starting up and managing multiple threads.
 
   Erik
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Multi + Parallel

2004-10-14 Thread Justin Swanhart
The overhead of creating that many searcher objects is going to far
outweigh any performance benefit you could possibly hope to gain by
splitting your index up.


On Thu, 14 Oct 2004 04:42:27 -0700 (PDT), Otis Gospodnetic
[EMAIL PROTECTED] wrote:
 Search a single merged index.
 
 Otis
 
 
 
 --- Karthik N S [EMAIL PROTECTED] wrote:
 
  Hi
 
  Apologies..
 
 
Can somebody provide me Approximate answers   [ Which is Better
  choice ]
 
A search of  10,000 subindexes using  multisearcher
 
or
 
   a search on  One Single Merged Index [ merged 10,000 Sub indexes ]
 
 
  a) SubIndexes  10,000 (   future)
 
  b) Field to be searche upon   = 4
 
  c)Field type present in Indexed format = 15
 
  d)  RAM = 1GB
 
   e) O/s Linux [ Clustered Enviournament]
 
   f) Processor make AMD [Probably High End]
 
   g) WebServer Tomcat 5.0.x
 
 
 
 
1)Which would be Faster ???;
 
2)If not What is may be the Probable Solution.
 
 
  Karthik
 
 
 
 
  -Original Message-
  From: Erik Hatcher [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, October 13, 2004 3:53 PM
  To: Lucene Users List
  Subject: Re: Multi + Parallel
 
 
  On Oct 13, 2004, at 3:14 AM, Karthik N S wrote:
   I was Curious to Know the Difference between ParallelMultiSearcher
  and
   MultiSearcher ,
  
   1) Is the working internal functionality of these  are  same or
   different .
 
  They are different internally.  Externally they should return
  identical
  results and not appear different at all.
 
  Internally, ParallelMultiSearcher searches each index in a separate
  thread (searches wait until all threads finish before returning).
  In
  MultiSearcher, each index is searched serially.
 
  You will not likely see a benefit to using ParallelMultiSearcher
  unless
  your environment is specialized to accommodate multi-threading
  (multiple CPU's, indexes on separate drives that can operate
  independently, etc).
 
   2) In terms of time domain do these differ when searching same no
  of
   fields
   / words .
  
   3)What are the features used on each of  API.
 
  There is no external difference to using either implementation.
  Benchmark searches using both and see what is best, but generally
  MultiSeacher will be better in most environments as it avoids the
  overhead of starting up and managing multiple threads.
 
Erik
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Multi + Parallel

2004-10-13 Thread Karthik N S


Hi
 Guys

Apologies..


I was Curious to Know the Difference between ParallelMultiSearcher  and
MultiSearcher ,

1) Is the working internal functionality of these  are  same or different .

2) In terms of time domain do these differ when searching same no of  fields
/ words .

3)What are the features used on each of  API.


Thx in advance


  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Multi + Parallel

2004-10-13 Thread Erik Hatcher
On Oct 13, 2004, at 3:14 AM, Karthik N S wrote:
I was Curious to Know the Difference between ParallelMultiSearcher  and
MultiSearcher ,
1) Is the working internal functionality of these  are  same or 
different .
They are different internally.  Externally they should return identical 
results and not appear different at all.

Internally, ParallelMultiSearcher searches each index in a separate 
thread (searches wait until all threads finish before returning).   In 
MultiSearcher, each index is searched serially.

You will not likely see a benefit to using ParallelMultiSearcher unless 
your environment is specialized to accommodate multi-threading 
(multiple CPU's, indexes on separate drives that can operate 
independently, etc).

2) In terms of time domain do these differ when searching same no of  
fields
/ words .

3)What are the features used on each of  API.
There is no external difference to using either implementation.  
Benchmark searches using both and see what is best, but generally 
MultiSeacher will be better in most environments as it avoids the 
overhead of starting up and managing multiple threads.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]