Hi ,
I have bulk loading job.
My job is for User data aggregation.
Before I run Bulk Loading aggregation I want to create regions
UserID looks like this :
943e2c6d66d732e06ab257903f240d27
a0617cb2b964690a39b0d93e7fe2f021
ac85b4dee6d8c8495d61201234dfb73e
b8416d5e0fe2a1228f042dffa8d291e2
c422be9e75d28d9afe0f1f98f59cda92
fe6b0ad1822455958586e240eb75c1d7
1790ee2ce4487d976cd9eddd036275d6
344c3de9449a9522d2a4de8bb9e81b02
4fcccd6790aec3056f897741b467d08c
6b67dc1922e4fc0cd6fa31f64bd51ef3
87f1374e7c900a243450f5b5c3a2b2b9
a4180db6a62f300cdecf77310f0010ac
I have ~ 50.000.000 users. I run aggregation on daily basis and per day I
have ~ 30 regions.
So The objective is to create 30 regions with more or less equal
distributions.
The question is : What is the best practice to verify start / end key for
regions in my use case?
Thanks in advance
Oleg.