Re: Solr document routing using composite key

2018-04-07 Thread Nawab Zada Asad Iqbal
Thanks Shawn and Erick.

This is what I also ended up finding, as the number of buckets increased, I
noticed the issue.

Zheng: I am using Solr7. But this was only an experiment on the hash, i.e.,
what distribution should I expect from it. (as the above gist shows). I
didn't actually index into solr7 but would expect it to do something like
the above if I had actually indexed in solr with these partitions and Ids.





On Fri, Mar 16, 2018 at 9:24 AM, Erick Erickson 
wrote:

> What Shawn said. 117 shards and 116 docs tells you absolutely nothing
> useful. I've never seen the number of docs on various shards be off by
> more than 2-3% when enough docs are indexed to be statistically valid.
>
> Best,
> Erick
>
> On Fri, Mar 16, 2018 at 5:34 AM, Shawn Heisey  wrote:
> > On 3/6/2018 11:53 AM, Nawab Zada Asad Iqbal wrote:
> >>
> >> I have 117 shards and i tried to use document ids from zero to 116. I
> find
> >> that the distribution is very uneven, e.g., the largest bucket receives
> >> total 5 documents; and around 38 shards will be empty.  Is it expected?
> >
> >
> > With such a small data set, this fits what I would expect.
> >
> > Choosing buckets by hashing (which is what compositeId does) is not
> perfect,
> > but if you send it thousands or millions of documents, it will be
> > *generally* balanced.
> >
> > Thanks,
> > Shawn
> >
>


Re: Solr document routing using composite key

2018-03-16 Thread Erick Erickson
What Shawn said. 117 shards and 116 docs tells you absolutely nothing
useful. I've never seen the number of docs on various shards be off by
more than 2-3% when enough docs are indexed to be statistically valid.

Best,
Erick

On Fri, Mar 16, 2018 at 5:34 AM, Shawn Heisey  wrote:
> On 3/6/2018 11:53 AM, Nawab Zada Asad Iqbal wrote:
>>
>> I have 117 shards and i tried to use document ids from zero to 116. I find
>> that the distribution is very uneven, e.g., the largest bucket receives
>> total 5 documents; and around 38 shards will be empty.  Is it expected?
>
>
> With such a small data set, this fits what I would expect.
>
> Choosing buckets by hashing (which is what compositeId does) is not perfect,
> but if you send it thousands or millions of documents, it will be
> *generally* balanced.
>
> Thanks,
> Shawn
>


Re: Solr document routing using composite key

2018-03-16 Thread Shawn Heisey

On 3/6/2018 11:53 AM, Nawab Zada Asad Iqbal wrote:

I have 117 shards and i tried to use document ids from zero to 116. I find
that the distribution is very uneven, e.g., the largest bucket receives
total 5 documents; and around 38 shards will be empty.  Is it expected?


With such a small data set, this fits what I would expect.

Choosing buckets by hashing (which is what compositeId does) is not 
perfect, but if you send it thousands or millions of documents, it will 
be *generally* balanced.


Thanks,
Shawn



Re: Solr document routing using composite key

2018-03-15 Thread Zheng Lin Edwin Yeo
Hi,

What version of Solr are you running? How did you configure your shards in
Solr?

Regards,
Edwin

On 7 March 2018 at 02:53, Nawab Zada Asad Iqbal  wrote:

> Hi solr community:
>
>
> I have been thinking to use composite key for my next project iteration and
> tried it today to see how it distributes the documents.
>
> Here is a gist of my code:
> https://gist.github.com/niqbal/3e293e2bcb800d6912a250d914c9d478
>
> I have 117 shards and i tried to use document ids from zero to 116. I find
> that the distribution is very uneven, e.g., the largest bucket receives
> total 5 documents; and around 38 shards will be empty.  Is it expected?
>
> In the following result: value1 is the shard number, value 2 is a list of
> documents which it received.
>
> List(98:List(29)
> , 34:List(36)
> , 8:List(54)
> , 73:List(31)
> , 19:List(77)
> , 23:List(59)
> , 62:List(86)
> , 77:List(105)
> , 11:List(11)
> , 104:List(23)
> , 44:List(4)
> , 37:List(0)
> , 61:List(71)
> , 107:List(37)
> , 46:List(34)
> , 99:List(19)
> , 24:List(32)
> , 94:List(90)
> , 103:List(106)
> , 72:List(97)
> , 59:List(2)
> , 76:List(6)
> , 54:List(20)
> , 65:List(3)
> , 71:List(26)
> , 108:List(17)
> , 106:List(57)
> , 17:List(108)
> , 25:List(13)
> , 60:List(56)
> , 102:List(87)
> , 69:List(60)
> , 64:List(53)
> , 53:List(85)
> , 42:List(35)
> , 115:List(82)
> , 0:List(28)
> , 20:List(27)
> , 81:List(39)
> , 101:List(92)
> , 30:List(16)
> , 41:List(63)
> , 3:List(10)
> , 91:List(21)
> , 85:List(18)
> , 28:List(8)
> , 113:List(76, 95)
> , 51:List(47, 102)
> , 78:List(30, 67)
> , 4:List(52, 84)
> , 110:List(112, 116)
> , 9:List(1, 40)
> , 50:List(22, 101)
> , 13:List(72, 83)
> , 35:List(73, 100)
> , 16:List(48, 64)
> , 112:List(69, 103)
> , 10:List(14, 66)
> , 87:List(68, 104)
> , 57:List(49, 114)
> , 36:List(79, 99)
> , 1:List(24, 70)
> , 96:List(5, 98)
> , 95:List(45, 89)
> , 75:List(9, 91)
> , 70:List(62, 78)
> , 2:List(74, 75)
> , 114:List(81, 88)
> , 74:List(7, 115)
> , 52:List(46, 111)
> , 55:List(12, 50, 113)
> , 47:List(43, 44, 96)
> , 92:List(25, 33, 58)
> , 39:List(15, 41, 61, 107)
> , 21:List(38, 51, 55, 93, 110)
> , 27:List(42, 65, 80, 94, 109)
> )
>


Solr document routing using composite key

2018-03-06 Thread Nawab Zada Asad Iqbal
Hi solr community:


I have been thinking to use composite key for my next project iteration and
tried it today to see how it distributes the documents.

Here is a gist of my code:
https://gist.github.com/niqbal/3e293e2bcb800d6912a250d914c9d478

I have 117 shards and i tried to use document ids from zero to 116. I find
that the distribution is very uneven, e.g., the largest bucket receives
total 5 documents; and around 38 shards will be empty.  Is it expected?

In the following result: value1 is the shard number, value 2 is a list of
documents which it received.

List(98:List(29)
, 34:List(36)
, 8:List(54)
, 73:List(31)
, 19:List(77)
, 23:List(59)
, 62:List(86)
, 77:List(105)
, 11:List(11)
, 104:List(23)
, 44:List(4)
, 37:List(0)
, 61:List(71)
, 107:List(37)
, 46:List(34)
, 99:List(19)
, 24:List(32)
, 94:List(90)
, 103:List(106)
, 72:List(97)
, 59:List(2)
, 76:List(6)
, 54:List(20)
, 65:List(3)
, 71:List(26)
, 108:List(17)
, 106:List(57)
, 17:List(108)
, 25:List(13)
, 60:List(56)
, 102:List(87)
, 69:List(60)
, 64:List(53)
, 53:List(85)
, 42:List(35)
, 115:List(82)
, 0:List(28)
, 20:List(27)
, 81:List(39)
, 101:List(92)
, 30:List(16)
, 41:List(63)
, 3:List(10)
, 91:List(21)
, 85:List(18)
, 28:List(8)
, 113:List(76, 95)
, 51:List(47, 102)
, 78:List(30, 67)
, 4:List(52, 84)
, 110:List(112, 116)
, 9:List(1, 40)
, 50:List(22, 101)
, 13:List(72, 83)
, 35:List(73, 100)
, 16:List(48, 64)
, 112:List(69, 103)
, 10:List(14, 66)
, 87:List(68, 104)
, 57:List(49, 114)
, 36:List(79, 99)
, 1:List(24, 70)
, 96:List(5, 98)
, 95:List(45, 89)
, 75:List(9, 91)
, 70:List(62, 78)
, 2:List(74, 75)
, 114:List(81, 88)
, 74:List(7, 115)
, 52:List(46, 111)
, 55:List(12, 50, 113)
, 47:List(43, 44, 96)
, 92:List(25, 33, 58)
, 39:List(15, 41, 61, 107)
, 21:List(38, 51, 55, 93, 110)
, 27:List(42, 65, 80, 94, 109)
)