Re: Allow Join over two sharded collection

Glick, David Sat, 01 Jul 2017 17:18:49 -0700

Unsubscribe 

Sent from my iPhone


> On Jul 1, 2017, at 8:02 PM, Susheel Kumar <susheel2...@gmail.com> wrote:
> 
> Depending on your use case people also use collection aliasing for time
> series data.  See below
> 
> https://blog.cloudera.com/blog/2013/10/collection-aliasing-near-real-time-search-for-really-big-data/
> 
>> On Sat, Jul 1, 2017 at 7:13 PM, Susheel Kumar <susheel2...@gmail.com> wrote:
>> 
>> As Eric said 1docs/month isn't a big deal.  I have 45+ million docs in one
>> shard but YMMV depending on other factors.
>> 
>> Also there is lot of confusion in the terminology. The default routing is
>> compositeID routing.  The implicit routing which Eric mentioned is the
>> manual routing.  https://issues.apache.org/jira/browse/SOLR-6630
>> 
>> Which routing you are suggesting to use? Can you clarify again.  Also
>> what's your exact use case.  Do you query old aged documents or you don't
>> need to and most or all of your queries are supposed to go to shard with
>> newer documents.
>> 
>> Thanks,
>> Susheel
>> 
>> On Sat, Jul 1, 2017 at 12:14 PM, Erick Erickson <erickerick...@gmail.com>
>> wrote:
>> 
>>> 1M docs/month shouldn't make Solr break a sweat. If it really worries
>>> you and you're indexing in a big batch, index during off hours. At
>>> very worst, if you're ingesting them all at once you might have to
>>> throttle the indexing a bit.
>>> 
>>> Frankly, most of the time acquiring the documents from the system of
>>> record is where the bottleneck is and Solr easily handles the indexing
>>> load.
>>> 
>>> The other advantage is that if you use implicit routing rather than a
>>> composite ID, you can add shards to your collection one at a time as
>>> required, for time-series data that's an elegant way to "age out" old
>>> documents.
>>> 
>>> Best,
>>> Erick
>>> 
>>>> On Sat, Jul 1, 2017 at 8:57 AM, mganeshs <mgane...@live.in> wrote:
>>>> Hi Susheel,
>>>> 
>>>> Currently we have around 20M documents already and we are expecting now
>>> on
>>>> that every month 1M of documents.
>>>> The reason why don't want to for time based implicit routing is that,
>>> all
>>>> documents will end up with recent shard and so indexing will be heavy
>>> for
>>>> the new shard, where as older shards will be used just for query
>>> purpose.
>>>> If we have default sharding, then load for indexing is distributed
>>> across
>>>> all the shards. That's the reason we would like to stick to default
>>>> sharding. But Join is the issue over here when default sharding is used
>>> :-(
>>>> 
>>>> 
>>>> 
>>>> --
>>>> View this message in context: http://lucene.472066.n3.nabble
>>> .com/Allow-Join-over-two-sharded-collection-tp4343443p4343803.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
>>

Re: Allow Join over two sharded collection

Reply via email to