On 1/10/2011 5:03 PM, Dennis Gearon wrote:
What I seem to see suggested here is to use different cores for the things you
suggested:
different types of documents
Access Control Lists
I wonder how sharding would work in that scenario?
Sharding has nothing to do with that scenario at all. Different cores
are essentially _entirely seperate_. While it can be convenient to use
different cores like this, it means you don't get ANY searches that
'join' over multiple 'kinds' of data in different cores.
Solr is not great at handling hetereogenous data like that. Putting it
in seperate cores is one solution, although then they are entirely
seperate. If that works, great. Another solution is putting them in
the same index, but using mostly different fields, and perhaps having a
'type' field shared amongst all of your 'kinds' of data, and then always
querying with an 'fq' for the right 'kind'. Or if the fields they use
are entirely different, you don't even need the fq, since a query on a
certain field will only match a certain 'kind' of document.
Solr is not great at handling complex queries over data with
hetereogenous schemata. Solr wants you to to flatten all your data into
one single set of documents.
Sharding is a way of splitting up a single index (multiple cores are
_multiple indexes_) amongst several hosts for performance reasons,
mostly when you have a very large index. That is it. The end. if you
have multiple cores, that's the same as having multiple solr indexes
(which may or may not happen to be on the same machine). Any one or more
of those cores could be sharded if you want. This is a seperate issue.