Another possible solution for you: use Phoenix:
https://github.com/forcedotcom/phoenix
Phoenix would allow you to model your scenario using SQL through JDBC,
like this:
Connection conn = DriverManager.connect("jdbc:phoenix:<your zookeeper
quorum>");
Statement stmt = conn.createStatement(
"CREATE TABLE article_discussion (" +
" category_id BIGINT not null," +
" article_id BIGINT not null," +
" comment_id BIGINT not null
" CONSTRAINT pk PRIMARY KEY (category_id, article_id,
comment_id))");
stmt.execute();
You could, of course, add other non primary key columns to the above.
You could also pre-split the table and/or pass through any HBase
configuration parameters necessary.
Then, you populate your table like this:
PrepareStatement stmt = conn.prepareStatement(
"UPSERT INTO article_discussion VALUES(?,?,?)");
stmt.setLong("category_id", categoryId);
stmt.setLong("article_id", articleId);
stmt.setLong("comment_id", commentId);
And to query for all comments in a given category, you'd do this:
PrepareStatement stmt1 = conn.prepareStatement(
"SELECT * FROM article_discussion WHERE category_id = ?");
stmt1.setLong("category_id", categoryId);
ResultSet rs = stmt1.executeQuery();
while (rs.next()) {
System.out.println(
"category_id=" + rs.getLong("category_id") +
",article_id=" + rs.getLong("article_id") +
",comment_id=" + rs.getLong("comment_id"));
}
and for all comments for a given category and article:
PrepareStatement stmt1 = conn.prepareStatement(
"SELECT * FROM article_discussion WHERE category_id = ? AND
article_id = ?");
stmt1.setLong("category_id", categoryId);
stmt1.setLong("article_id", articleId);
you could use arbitrary where clauses like this:
SELECT * FROM article_discussion
WHERE (article_name LIKE '% baseball %' OR article_name LIKE '% food %')
AND category_id IN (?, ?, ?)
AND published_date >= to_date('2013-01-01 00:00:00')
AND published_date <= to_date('2013-01-31 00:00:00')
or do aggregation like this:
SELECT category_id, count(article_id), count(comment_id)
FROM article_discussion
WHERE published_date > ?
GROUP BY category_id
HAVING count(comment_id) > 100
Regards,
James
On 03/06/2013 11:42 PM, Asaf Mesika wrote:
I would convert each id to long and then use Bytes.toBytes to convert this
long to a byte array. If it is an int then even better.
Now, write all 3 longs one after another to one array which will be your
rowkey.
This gives you:
* fixed size
* small row key - 3*8 bytes if you use long and 3*4 for int.
Why do you need to use prefix split policy?
On Monday, March 4, 2013, Lukáš Drbal wrote:
Hi,
i have one question about rowkey design and presplit table.
My usecase:
I need store a lot of comments where each comment are for one article and
this article has one category.
What i need:
1) read one comment by id (where i know commentId, articleId and
categoryId)
2) read all coments for article (i know categoryId and articleId)
3) read all comments for category (i know categoryId)
From this read pattern i see one good rowkey:
<categoryId>_<articleId>_<commentId>
But here i don't have fixed size of rowkey, so i don't know how to define
split pattern. How can be this solved?
This id's come from external system and grow very fast, so add some like
"padding" for each part are hard.
Maybe i can use hash function for each part
md5(<categoryId>_md5(<articleId>)_md5(<commentId>), but this rowkey is very
long (3*32+2 bytes), i don't have experience with this long rowkeys.
Can someone give me a suggestions please?
Regards
Lukas Drbal