Re: Sequences vs UUID's

James Taylor Fri, 05 May 2017 09:26:30 -0700

Hi Ash,

Typically users come up with a composite primary key based on event data
that is unique. For example: <event type><event time><event ID>. If you
don't have an event ID from an external source, perhaps you can either use
a cyclical sequence or a random number (using our RAND built-in function).
Would that give you the uniqueness you need? Depending on how many event
types you have, you may want to salt the table as well. HBase has only a
single PK that is indexed (yes, you get more through secondary indexing,
but there's a cost to that), so it's best to use something that will be
useful when querying.


The gaps you see in sequences won't be filled, but you can decrease the
gaps by decreasing the amount of caching [1] done on the client (at the
expense of more RPC calls).

Another option would be to create a UDF [2] that generates UUIDs and use
this in the PK. This would be a useful UDF to contribute back to Phoenix if
you go this route.

HTH. Thanks,

    James

[1] https://phoenix.apache.org/language/index.html#create_sequence
[2] https://phoenix.apache.org/udf.html

On Fri, May 5, 2017 at 12:15 AM, Ash N <742...@gmail.com> wrote:

> Could any please help with guidance for the below or point me to any
> documents?
>
> Thanks
>
>
> On May 3, 2017 1:01 AM, "Ash N" <742...@gmail.com> wrote:
>
> John,
>
> Thank you so much for responding.  Appreciate the link to ppt.  Something
> I could not find. but read about snowflake
>   I was looking for guidance on the sequence numbers vs UUID approach.
>
> Could I use sequence numbers ?  are the gaps in the sequence numbers ever
> back filled?
> There is not much documentation on how it works.  If some one explains, I
> will be more happy to update the documentation.
>
>
> thanks again,
> -ash
>
> On Wed, May 3, 2017 at 12:51 AM, John Leach <jlea...@gmail.com> wrote:
>
>> Ash,
>>
>> I built one a while back based on twitter’s snowflake algorithm.
>>
>> Here is a link to a presentation from twitter on it…
>>
>> https://www.slideshare.net/davegardnerisme/unique-id-generat
>> ion-in-distributed-systems
>>
>> We used it as the primary key for the table when in essence there was not
>> a primary key (just needed uniqueness).
>>
>> Good luck.
>>
>> Regards,
>> John Leach
>>
>> On May 2, 2017, at 6:46 PM, Ash N <742...@gmail.com> wrote:
>>
>> Hello,
>>
>> Distributed web application.  Millions of users connecting to the site.
>>
>> we are receiving about 150,000 events/ sec through Kinesis Stream.
>> We need to store these events in a phoenix table identified by an ID the
>> primary for the table.
>>
>> what is the best way to accomplish this?
>>
>> Option 1
>> I played with sequences and they seem to work well.  Although with lot of
>> gaps.
>> will the gaps be filled at all?  if not we will run out of IDs pretty
>> soon.
>>
>> Option 2
>> UUIDs.
>>
>> What is the best way to generate UUID's local or network?
>>
>> How are folks typically handling this situation?
>>
>> which route is recommended Sequences or UUIDs?
>>
>> thanks,
>> -ash
>>
>>
>>
>>
>>
>
>

Re: Sequences vs UUID's

Reply via email to