since there is so little overlap, I would look at a core for each user...

However, to manage 20K cores, you will not want to use the off the shelf core management implementation to maintain these cores. Consider overriding SolrDispatchFilter to initialize a CoreContainer that you manage.


On May 17, 2009, at 10:11 PM, Chris Cornell wrote:

On Sun, May 17, 2009 at 8:38 PM, Otis Gospodnetic
<otis_gospodne...@yahoo.com> wrote:

Chris,

Yes, disk space is cheap, and with so little overlap you won't gain much by putting everything in a single index. Plus, when each user has a separate index, it's easy to to split users and distribute over multiple machines if you ever need to do that, it's easy and fast to completely reindex one user's data without affecting other users, etc.

Several years ago I built Simpy at http://www.simpy.com/ that way (but pre-Solr, so it uses Lucene directly) and never regretted it. There are way more than 20K users there with many searches per second and with constant indexing. Each user has an index for bookmarks and an index for notes. Each group has its own index, shared by all group members. The main bookmark search is another index. People search is yet another index. And so on. Single server.


Thankyou very much for your insight and experience, sounds like we
shouldn't be thinking about "prematurely optimizing" this.

Has someone actually used multicore this way, though? With thousands of them?

Independently of advice in that regard, I guess our next step is to
explore and create some "dummy" scenarios/tests to try and stress
multicore (search latency is not as much of a factor as memory usage
is).  I'll report back on any conclusion we come to.

Thanks!
Chris

Reply via email to