RE: Lucene scalability/clustering

2004-02-26 Thread Jochen Frey
Anson,

One way of doing it is having subsets of your indexes / data on
different machines. Each machine indexes its own data. You implement a
system that distributes queries to the various machines and merges the
results back.

The working well completely depends on your implementation of the
distributed search.

I believe there was a discussion about implementing this using a
MultiSearcher somewhere as well.

Cheers!
Jochen


-Original Message-
From: Anson Lau [mailto:[EMAIL PROTECTED] 
Sent: Sunday, February 22, 2004 2:17 PM
To: 'Lucene Users List'
Subject: RE: Lucene scalability/clustering


Further on this topic - has anyone tried implementing a distributed
search with Lucene?  How does it work and does it work well?


Anson


-Original Message-
From: Hamish Carpenter [mailto:[EMAIL PROTECTED]
Sent: Monday, February 23, 2004 5:24 AM
To: Lucene Users List
Subject: Re: Lucene scalability/clustering

Hi All,

I'm Hamish Carpenter who contributed the benchmarks with the comment
about the IndexSearcherCache.  Using this solved our issues with too
many files open under linux.

The original IndexSearcherCache email is here:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg01967.html

See here for a copy of the above message and a download link:
http://www.geocities.com/haytona/lucene/
The mailing list doesn't like attachments.  The source is 10K in size.

HTH

Hamish Carpenter.

[EMAIL PROTECTED] wrote:
  BTW, where can I get Peter Halacsy's IndexSearcherCache?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene scalability/clustering

2004-02-26 Thread markharw00d
I tend to think of scaling in two dimensions: scaling by volumes of users and scaling 
by volumes of data. The former is addressed through replicated indexes 
and the latter by segmented indexes. 
Distribute replicated segments across multiple boxes and create a broker which
a)Determines which segments to query
b)Load balances query requests across the replicated servers for each segment
c) Merges responses

Make sure your communications are batched to avoid too much fine-grained chatter.

This is the basis of a scalable architecture.

Cheers
Mark


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE : RE : Lucene scalability/clustering

2004-02-24 Thread Rasik Pandey
 I'm trying to see what are some common ways to scale lucene
 onto
 multiple boxes.  Is RMI based search and using a MultiSearcher
 the
 general approach?

More details about what you are attempting would be helpful.


RBP


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: RE : Lucene scalability/clustering

2004-02-24 Thread Doug Cutting
Anson Lau wrote:
I'm trying to see what are some common ways to scale lucene onto
multiple boxes.  Is RMI based search and using a MultiSearcher the
general approach?
Yes, although you probably want to use ParallelMultiSearcher.

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE : Lucene scalability/clustering

2004-02-24 Thread Anson Lau
RBP,

I'm implementing a search engine for a project at work.  It's going to
index approx 1.5 rows in a database.

I am trying to get a feel of what my options are when scalability
becomes an issue.  I also want to know if those options require me to
implement my app in a different way right from the start.

Anson

-Original Message-
From: Rasik Pandey [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, February 24, 2004 9:34 PM
To: 'Lucene Users List'
Subject: RE : RE : Lucene scalability/clustering

 I'm trying to see what are some common ways to scale lucene
 onto multiple boxes.  Is RMI based search and using a 
 MultiSearcher the general approach?

More details about what you are attempting would be helpful.


RBP


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE : Lucene scalability/clustering

2004-02-23 Thread Rasik Pandey
 Further on this topic - has anyone tried implementing a
 distributed
 search with Lucene?  How does it work and does it work well?

I assume you are referring to RMI based search? It works well as does MultiSearcher. 

RBP


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: RE : Lucene scalability/clustering

2004-02-23 Thread Anson Lau
I'm trying to see what are some common ways to scale lucene onto
multiple boxes.  Is RMI based search and using a MultiSearcher the
general approach?

There doesn't seem to be many articles on the web on how to implement a
lucene search cluster.  If anyone knows a good article can you please
post it here?

Thanks,

Anson

-Original Message-
From: Rasik Pandey [mailto:[EMAIL PROTECTED]
Sent: Monday, February 23, 2004 9:46 PM
To: 'Lucene Users List'
Subject: RE : Lucene scalability/clustering

 Further on this topic - has anyone tried implementing a
 distributed
 search with Lucene?  How does it work and does it work well?

I assume you are referring to RMI based search? It works well as does
MultiSearcher. 

RBP


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene scalability/clustering

2004-02-22 Thread lucene
On Saturday 21 February 2004 20:24, Otis Gospodnetic wrote:
 http://jakarta.apache.org/lucene/docs/benchmarks.html

BTW, where can I get Peter Halacsy's IndexSearcherCache?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene scalability/clustering

2004-02-22 Thread Hamish Carpenter
Hi All,

I'm Hamish Carpenter who contributed the benchmarks with the comment
about the IndexSearcherCache.  Using this solved our issues with too
many files open under linux.
The original IndexSearcherCache email is here:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg01967.html
See here for a copy of the above message and a download link:
http://www.geocities.com/haytona/lucene/
The mailing list doesn't like attachments.  The source is 10K in size.
HTH

Hamish Carpenter.

[EMAIL PROTECTED] wrote:
 BTW, where can I get Peter Halacsy's IndexSearcherCache?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Lucene scalability/clustering

2004-02-22 Thread Anson Lau

Further on this topic - has anyone tried implementing a distributed
search with Lucene?  How does it work and does it work well?


Anson


-Original Message-
From: Hamish Carpenter [mailto:[EMAIL PROTECTED]
Sent: Monday, February 23, 2004 5:24 AM
To: Lucene Users List
Subject: Re: Lucene scalability/clustering

Hi All,

I'm Hamish Carpenter who contributed the benchmarks with the comment
about the IndexSearcherCache.  Using this solved our issues with too
many files open under linux.

The original IndexSearcherCache email is here:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg01967.html

See here for a copy of the above message and a download link:
http://www.geocities.com/haytona/lucene/
The mailing list doesn't like attachments.  The source is 10K in size.

HTH

Hamish Carpenter.

[EMAIL PROTECTED] wrote:
  BTW, where can I get Peter Halacsy's IndexSearcherCache?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene scalability/clustering

2004-02-21 Thread Otis Gospodnetic
http://jakarta.apache.org/lucene/docs/benchmarks.html

--- [EMAIL PROTECTED] wrote:
 Hi!
 
 How well does Lucene scale? Is it able to handle 100.000 (more or
 less 
 complex) queries a day (i.e. 9 to 5) on an index with half a million
 docs?
 
 What hardware is recommended for that demand? What to do if it cannot
 handle 
 it quickly enough?
 
 Regards,
 Timo
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]