Re: RowID format tradeoffs

Ariel Valentin Sun, 06 Apr 2014 04:59:19 -0700

Russ,

I experienced the same problem. In the end what we decided to do was to take 
another property and use it as a prefix and then presplit the tables
E.g. apples\0454316778
We still have situations where nodes run hot during peak usage but we are able 
to live with it


Thanks,
Ariel
---
Sent from my mobile device. Please excuse any errors.

> On Apr 6, 2014, at 3:16 AM, Russ Weeks <[email protected]> wrote:
> 
> Hi,
> 
> I'm looking for advice re. the best way to structure my row IDs. 
> Monotonically increasing IDs have the very appealing property that I can 
> quickly scan all recently-ingested unprocessed rows, particularly because I 
> maintain a "checkpoint" of the most-recently processed row.
> 
> Of course, the problem with increasing IDs is that it's the lowest-order bits 
> which are changing, which (I think?) means it's less optimal for distributing 
> data across my cluster. I guess that the ways to get around this are to 
> either reverse the ID or to define partitions, and use the partition ID as 
> the high-order bits of the row id? Reversing the ID will destroy the property 
> I describe above; I guess that using partitions may preserve it as long as I 
> use a BatchScanner, but would a BatchScanner play nicely with 
> AccumuloInputFormat? So many questions.
> 
> Anyways, I think there's a pretty good chance that I'm missing something 
> obvious in this analysis. For instance, if it's easy to "rebalance" the data 
> across my tablet servers periodically, then I'd probably just stick with 
> increasing IDs.
> 
> Very interested to hear your advice, or the pros and cons of any of these 
> approaches.
> 
> Thanks,
> -Russ

Re: RowID format tradeoffs

Reply via email to