The "acct:" in the row seems to be unnecessary. It seems like the ID should be enough. You'll want to consider the maximum of transactions that you want to support. You don't want a single row to grow indefinitely, but you're probably talking about GBs of data (compressed).

The column family is usually best served as a filtering mechanism. Limiting it to "payment" alone is a good idea as you can then efficiently filter on that column family (or other relevant column families) by configuring a locality group.

You could then make the column qualifier: timestamp_receiverId_edgeId.

You might also be able to use the ReverseLexicoder[1] and the DateLexicoder[2] to encode the date so you can get the most recent transactions first.

Lots of different ways to approach this, but it depends on what exactly you want to support.

[1] http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/client/lexicoder/ReverseLexicoder.html [2] http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/client/lexicoder/DateLexicoder.html

On 6/16/14, 10:02 PM, Jianshi Huang wrote:
Hi all,

I'm thinking about storing payments in the following format:

rowId: senderId (i.e. "acct:123")
CF: "payment@<timestamp>" (i.e. "payment@201406171224000")
CQ: receiverId_edgeId ("acct:456_payment:1001")
Value: properties

Is this a good way to model payment events? The most frequent ops is to
get the last payment, so can I scan the table using a reversed range?

Also I'd like to know if point-in-time status data can be modeled in a
similar fashion, or should I take advantage of the timestamp column.


Cheers,
--
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Reply via email to