The "acct:" in the row seems to be unnecessary. It seems like the ID
should be enough. You'll want to consider the maximum of transactions
that you want to support. You don't want a single row to grow
indefinitely, but you're probably talking about GBs of data (compressed).
The column family is usually best served as a filtering mechanism.
Limiting it to "payment" alone is a good idea as you can then
efficiently filter on that column family (or other relevant column
families) by configuring a locality group.
You could then make the column qualifier: timestamp_receiverId_edgeId.
You might also be able to use the ReverseLexicoder[1] and the
DateLexicoder[2] to encode the date so you can get the most recent
transactions first.
Lots of different ways to approach this, but it depends on what exactly
you want to support.
[1]
http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/client/lexicoder/ReverseLexicoder.html
[2]
http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/client/lexicoder/DateLexicoder.html
On 6/16/14, 10:02 PM, Jianshi Huang wrote:
Hi all,
I'm thinking about storing payments in the following format:
rowId: senderId (i.e. "acct:123")
CF: "payment@<timestamp>" (i.e. "payment@201406171224000")
CQ: receiverId_edgeId ("acct:456_payment:1001")
Value: properties
Is this a good way to model payment events? The most frequent ops is to
get the last payment, so can I scan the table using a reversed range?
Also I'd like to know if point-in-time status data can be modeled in a
similar fashion, or should I take advantage of the timestamp column.
Cheers,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/