Really great to know you were able to fire up about 100 cores. But, when it scales up to around 1000 or even more. I wonder how it would perform.
I have a question regarding ids i.e. the unique key. Since there is a potential use case that two users might add the same document, how would we set the id. I was thinking of appending the user id to the an id I would use ex: "/system/bar.pdfuserid25". Otherwise, solr would replace the document of one user, which is not what we want. This is also applicable to deleteById. Is there a better way to do this? On Tue, Oct 26, 2010 at 7:45 PM, Jonathan Rochkind <rochk...@jhu.edu> wrote: > mike anderson wrote: >> >> I'm really curious if there is a clever solution to the obvious problem >> with: "So your better off using a single index and with a user id and use >> a query filter with the user id when fetching data.", i.e.. when you have >> hundreds of thousands of user IDs tagged on each article. That just >> doesn't >> sound like it scales very well.. >> > > Actually, I think that design would scale pretty fine, I don't think there's > an 'obvious' problem. You store your userIDs in a multi-valued field (or as > multiple terms in a single value, ends up being similar). You fq on there > with the current userID. There's one way to find out of course, but that > doesn't seem a patently ridiculous scenario or anything, that's the kind of > thing Solr is generally good at, it's what it's built for. The problem > might actually be in the time it takes to add such a document to the index; > but not in query time. > > Doesn't mean it's the best solution for your problem though, I can't say. > > My impression is that Solr in general isn't really designed to support the > kind of multi-tenancy use case people are talking about lately. So trying > to make it work anyway... if multi-cores work for you, then great, but be > aware they weren't really designed for that (having thousands of cores) and > may not. If a single index can work for you instead, great, but as you've > discovered it's not neccesarily obvious how to set up the schema to do what > you need -- really this applies to Solr in general, unlike an rdbms where > you just third-form-normalize everything and figure it'll work for almost > any use case that comes up, in Solr you generally need to custom fit the > schema for your particular use cases, sometimes being kind of clever to > figure out the optimal way to do that. > > This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr > index takes more intellectual work than setting up an rdbms. The trade off > is you get speed, and flexible ways to set up relevancy (that still perform > well). Took a couple decades for rdbms to get as brainless to use as they > are, maybe in a couple more we'll have figured out ways to make indexing > engines like solr equally brainless, but not yet -- but it's still pretty > damn easy for what it is, the lucene/Solr folks have done a remarkable job. > -- Regards, Tharindu