Russ, I experienced the same problem. In the end what we decided to do was to take another property and use it as a prefix and then presplit the tables E.g. apples\0454316778 We still have situations where nodes run hot during peak usage but we are able to live with it
Thanks, Ariel --- Sent from my mobile device. Please excuse any errors. > On Apr 6, 2014, at 3:16 AM, Russ Weeks <[email protected]> wrote: > > Hi, > > I'm looking for advice re. the best way to structure my row IDs. > Monotonically increasing IDs have the very appealing property that I can > quickly scan all recently-ingested unprocessed rows, particularly because I > maintain a "checkpoint" of the most-recently processed row. > > Of course, the problem with increasing IDs is that it's the lowest-order bits > which are changing, which (I think?) means it's less optimal for distributing > data across my cluster. I guess that the ways to get around this are to > either reverse the ID or to define partitions, and use the partition ID as > the high-order bits of the row id? Reversing the ID will destroy the property > I describe above; I guess that using partitions may preserve it as long as I > use a BatchScanner, but would a BatchScanner play nicely with > AccumuloInputFormat? So many questions. > > Anyways, I think there's a pretty good chance that I'm missing something > obvious in this analysis. For instance, if it's easy to "rebalance" the data > across my tablet servers periodically, then I'd probably just stick with > increasing IDs. > > Very interested to hear your advice, or the pros and cons of any of these > approaches. > > Thanks, > -Russ
