Re: Dynamic collections in SolrCloud for log indexing

Per Steffensen Mon, 24 Dec 2012 01:30:52 -0800

I believe it is a misunderstandig to use custom routing (or sharding asErick calls it) for this kind of stuff. Custom routing is nice if youwant to control which slice/shard under a collection a specific documentgoes to - mainly to be able to control that two (or more) documents areindexed on the same slice/shard, but also just to be able to control onwhich slice/shard a specific document is indexed. Knowing/controllingthis kind of stuff can be used for a lot of nice purposes. But you dontwant to move slices/shards around among collection or delete/add slicesfrom/to a collection - unless its for elasticity reasons.

I think you should fill a collection every week/month and just keepthose collections as is. Instead of ending up with a big "historic"collection containing many slices/shards/cores (one for each historicweek/month), you will end up with many historic collections (one foreach historic week/month). Searching historic data you will have tocross-search those historic collections, but that is no problem at all.If Solr Cloud is made at it is supposed to be made (and I believe it is)it shouldnt require more resouces or be harder in any way tocross-search X slices across many collections, than it is tocross-search X slices under the same collection.

Besides that see my answer for topic "Will SolrCloud always slice by IDhash?" a few days back.


Regards, Per Steffensen

On 12/24/12 1:07 AM, Erick Erickson wrote:

I think this is one of the primary use-cases for custom sharding. Solr 4.0
doesn't really lend itself to this scenario, but I _believe_ that the patch
for custom sharding has been committed...

That said, I'm not quite sure how you drop off the old shard if you don't
need to keep old data. I'd guess it's possible, but haven't implemented
anything like that myself.

FWIW,
Erick


On Fri, Dec 21, 2012 at 12:17 PM, Upayavira <u...@odoko.co.uk> wrote:

I'm working on a system for indexing logs. We're probably looking at
filling one core every month.

We'll maintain a short term index containing the last 7 days - that one
is easy to handle.

For the longer term stuff, we'd like to maintain a collection that will
query across all the historic data, but that means every month we need
to add another core to an existing collection, which as I understand it
in 4.0 is not possible.

How do people handle this sort of situation where you have rolling new
content arriving? I'm sure I've heard people using SolrCloud for this
sort of thing.

Given it is logs, distributed IDF has no real bearing.

Upayavira

Re: Dynamic collections in SolrCloud for log indexing

Reply via email to