Re: Zookeeper for generating sequential IDs

Jeff Hodges Mon, 28 Feb 2011 09:05:02 -0800

If you patch snowflake to remove 4 bits from the timestamp section,
you will take the time that it takes before the IDs generated overflow
the JVM 63-bit limit from about 70 years (2 ** 41 milliseconds) to a
little over 4 years (2 ** 37 milliseconds). This is likely
unacceptable for your use case.


However, the larger point to discuss is that encoding additional
information about your data in the identifying id is, in general, a
bad idea. It means your architecture is strictly coupled to your
current and likely less-than-perfect understanding of the problem and
makes it harder to iterate towards a better one. For instance, we had
to rewrite certain parts of our search infrastructure when migrating
to snowflake because it had assumed that the generated id space of
tweets was uniform across time.

But, of course, I'm just some dude on the internet who doesn't know
your particular problem or design in detail. God speed and good luck.

On Mon, Feb 28, 2011 at 8:35 AM, Ertio Lew <[email protected]> wrote:
> Yes I think we could perhaps reduce the micro seconds precision
> provided by it(I think 41 bits) to an appropriate extent to match our
> needs.
>
> On Mon, Feb 28, 2011 at 9:38 PM, Ted Dunning <[email protected]> wrote:
>> So patch it!
>>
>> On Mon, Feb 28, 2011 at 7:59 AM, Ertio Lew <[email protected]> wrote:
>>
>>> First that it does not start at 0 since it comprises timestamp,
>>> workerId and noOfGeneratedIds.
>>> Thus it is not sequential! Secondly if I insert my 4 bits into this ID
>>> then I risk* that it might overwrite the already existing ID created
>>> by it.
>>>
>>> On Mon, Feb 28, 2011 at 9:16 PM, Ted Dunning <[email protected]>
>>> wrote:
>>> > Uh.... any sequential generator that starts at zero will take a LONG time
>>> > until it generates a value > 2^60.
>>> >
>>> > If you generator a million id's per second (= 2^20) then it will be
>>> longer
>>> > than 30,000 years before you get past 2^60.
>>> >
>>> > Is this *really* a problem?
>>> >
>>> > On Mon, Feb 28, 2011 at 7:25 AM, Ertio Lew <[email protected]> wrote:
>>> >
>>> >> Could you recommend any other ID generator that could help me with
>>> >> increasing Ids(not necessarily sequential) with size<= 60 bits ?
>>> >>
>>> >> Thanks
>>> >>
>>> >>
>>> >> On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew <[email protected]> wrote:
>>> >> > Thanks Patrick,
>>> >> >
>>> >> > I considered your suggestion. But sadly it could not fit my use case.
>>> >> > I am looking for a solution that could help me generate 64 bits Ids
>>> >> > but in those 64 bits I would like atleast 4 free bits so that I could
>>> >> > manage with those free bits to distinguish the type of data for a
>>> >> > particular entity in the same columnfamily.
>>> >> >
>>> >> > If I could keep the snowflake's Id size to around 60 bits, that would
>>> >> > have been great..
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt <[email protected]>
>>> wrote:
>>> >> >> Keep in mind that blog post is pretty old. I see comments like this
>>> in
>>> >> >> the commit log
>>> >> >>
>>> >> >> "hard to call it alpha/experimental after serving billions of ids"
>>> >> >>
>>> >> >> so it seems it's in production at twitter at least...
>>> >> >>
>>> >> >> Patrick
>>> >> >>
>>> >> >> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew <[email protected]>
>>> wrote:
>>> >> >>> Thanks Patrick,
>>> >> >>>
>>> >> >>> The fact that it is still in the alpha stage and twitter is not yet
>>> >> >>> using it, makes me look to other solutions as well, which have a
>>> large
>>> >> >>> community/users base & are more mature.
>>> >> >>>
>>> >> >>> I do not know much about the snowflake if it is being used in
>>> >> >>> production by anyone ..
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt <[email protected]>
>>> >> wrote:
>>> >> >>>> Have you looked at snowflake?
>>> >> >>>>
>>> >> >>>> http://engineering.twitter.com/2010/06/announcing-snowflake.html
>>> >> >>>>
>>> >> >>>> Patrick
>>> >> >>>>
>>> >> >>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning <
>>> [email protected]>
>>> >> wrote:
>>> >> >>>>> If your id's don't need to be exactly sequential or if the
>>> generation
>>> >> rate
>>> >> >>>>> is less than a few thousand per second, ZK is a fine choice.
>>> >> >>>>>
>>> >> >>>>> To get very high generation rates, what is typically done is to
>>> >> allocate
>>> >> >>>>> blocks of id's using ZK and then allocate out of the block
>>> locally.
>>> >>  This
>>> >> >>>>> can cause you to wind up with a slightly swiss-cheesed id space
>>> and
>>> >> it means
>>> >> >>>>> that the ordering of id's only approximates the time ordering of
>>> when
>>> >> the
>>> >> >>>>> id's were assigned.  Neither of these is typically a problem.
>>> >> >>>>>
>>> >> >>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew <[email protected]>
>>> >> wrote:
>>> >> >>>>>
>>> >> >>>>>> Hi all,
>>> >> >>>>>>
>>> >> >>>>>> I am involved in a project where we're building a social
>>> application
>>> >> >>>>>> using Cassandra DB and Java. I am looking for a solution to
>>> generate
>>> >> >>>>>> unique sequential IDs for the content on the application. I have
>>> >> been
>>> >> >>>>>> suggested by some people to have a look  to Zookeeper for this. I
>>> >> >>>>>> would highly appreciate if anyone can suggest if zookeeper is
>>> >> suitable
>>> >> >>>>>> for this purpose and any good resources to gain information about
>>> >> >>>>>> zookeeper.
>>> >> >>>>>>
>>> >> >>>>>> Since the application is based on a eventually consistent
>>> >> distributed
>>> >> >>>>>> platform using Cassandra, we have felt a need to look over to
>>> other
>>> >> >>>>>> solutions instead of building our own using our DB.
>>> >> >>>>>>
>>> >> >>>>>> Any kind of comments, suggestions are highly welcomed! :)
>>> >> >>>>>>
>>> >> >>>>>> Regards
>>> >> >>>>>> Ertio Lew.
>>> >> >>>>>>
>>> >> >>>>>
>>> >> >>>>
>>> >> >>>
>>> >> >>
>>> >> >
>>> >>
>>> >
>>>
>>
>

Re: Zookeeper for generating sequential IDs

Reply via email to