Re: Dynamic collections in SolrCloud for log indexing

Otis Gospodnetic Mon, 24 Dec 2012 22:49:55 -0800

Hi,

Right, this is not really about routing in ElasticSearch-sense.
What's handy for indexing logs are index aliases.... which I thought I had
added to JIRA a while back, but it looks like I have not.
Index aliases would let you keep a "last 7 days" alias fixed while
underneath you push and pop an index every day without the client app
having to adjust.


Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html



On Mon, Dec 24, 2012 at 4:30 AM, Per Steffensen <st...@designware.dk> wrote:

> I believe it is a misunderstandig to use custom routing (or sharding as
> Erick calls it) for this kind of stuff. Custom routing is nice if you want
> to control which slice/shard under a collection a specific document goes to
> - mainly to be able to control that two (or more) documents are indexed on
> the same slice/shard, but also just to be able to control on which
> slice/shard a specific document is indexed. Knowing/controlling this kind
> of stuff can be used for a lot of nice purposes. But you dont want to move
> slices/shards around among collection or delete/add slices from/to a
> collection - unless its for elasticity reasons.
>
> I think you should fill a collection every week/month and just keep those
> collections as is. Instead of ending up with a big "historic" collection
> containing many slices/shards/cores (one for each historic week/month), you
> will end up with many historic collections (one for each historic
> week/month). Searching historic data you will have to cross-search those
> historic collections, but that is no problem at all. If Solr Cloud is made
> at it is supposed to be made (and I believe it is) it shouldnt require more
> resouces or be harder in any way to cross-search X slices across many
> collections, than it is to cross-search X slices under the same collection.
>
> Besides that see my answer for topic "Will SolrCloud always slice by ID
> hash?" a few days back.
>
> Regards, Per Steffensen
>
>
> On 12/24/12 1:07 AM, Erick Erickson wrote:
>
>> I think this is one of the primary use-cases for custom sharding. Solr 4.0
>> doesn't really lend itself to this scenario, but I _believe_ that the
>> patch
>> for custom sharding has been committed...
>>
>> That said, I'm not quite sure how you drop off the old shard if you don't
>> need to keep old data. I'd guess it's possible, but haven't implemented
>> anything like that myself.
>>
>> FWIW,
>> Erick
>>
>>
>> On Fri, Dec 21, 2012 at 12:17 PM, Upayavira <u...@odoko.co.uk> wrote:
>>
>>  I'm working on a system for indexing logs. We're probably looking at
>>> filling one core every month.
>>>
>>> We'll maintain a short term index containing the last 7 days - that one
>>> is easy to handle.
>>>
>>> For the longer term stuff, we'd like to maintain a collection that will
>>> query across all the historic data, but that means every month we need
>>> to add another core to an existing collection, which as I understand it
>>> in 4.0 is not possible.
>>>
>>> How do people handle this sort of situation where you have rolling new
>>> content arriving? I'm sure I've heard people using SolrCloud for this
>>> sort of thing.
>>>
>>> Given it is logs, distributed IDF has no real bearing.
>>>
>>> Upayavira
>>>
>>>
>

Re: Dynamic collections in SolrCloud for log indexing

Reply via email to