Columns in a table are identified by column-family:column-name.
A column-name is a byte array, and you can assign a dynamic value.
So, in this case, you can have table with variable columns where each column
can represent on web site, and the cell value can be the count of views user
has done for that page.
Rowkey - <========================== Columns
====================================>
User1 pageviews:www.yahoo.com pageviews:www.google.com
10 20
So, if you do
hbase> get "useractivity", "userid1", {COLUMN=>'pageviews:www.yahoo.com'}
then, you should get desired value.
This solution too has some gotchas though. Keep in mind that you can query
either particular columns or all columns in a Get request. If number of
columns is too large, then, you can risk out-of-memory error when doing all
columns get. If you are going to query by column name (specific web site in
this case), then, you should be okay with this design.
Alternatively, you can define your row key to contain the web site name. For
example, you can have one row per user per website.
So, your row key will look like "userID1-com.yahoo.www" (It is typically
suggested to use reverse domain names
http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable ).
Row key is too just a byte array and it is up to you to figure out what you
want it to be consist of.
Regards,
Srikanth
-----Original Message-----
From: Narayanan K [mailto:[email protected]]
Sent: Monday, July 11, 2011 2:46 PM
To: [email protected]
Subject: Re: Fetching and iterating through all column values belonging to all
Timestamps of a Row
Hi Srikanth,
Yes. Versions will help me if I have fixed number of Versions.
But in my case, I will not know the number of versions beforehand. The table
will be populated from feedfiles using a mapreduce program.
Once loaded, all these will go into the same column family:column. Then I
would want to count the number of times a particular URI was accessed by
userid1.
For this, I need to be able to scan through all the versions loaded in that
rowkey and do a counter increment.
How is this possible,if I donot know the number of versions that is getting
loaded into a table rowkey as it is a dynamic property (each feedfile may
have different number of records) ?
Is the setTimeRange method of GET and SCAN meant to do this? If so, why am I
not getting all the column values for a particular rowkey?
Regards,
Narayanan
On Mon, Jul 11, 2011 at 12:28 PM, Srikanth P. Shreenivas <
[email protected]> wrote:
> Hi Narayanan,
>
> I think you need to create the table with versions enabled.
>
> For example, if you need to store 5 versions, you can use create like this:
>
> Hbase> create 'useractivity', {NAME => 'pageviews', VERSIONS => 5}
>
> HBase> put 'useractivity', 'userid1', 'pageviews:uri', '
> http://www.allaboutdata.net'
> HBase> put 'useractivity', 'userid1', 'pageviews:uri', '
> http://www.yahoo.co.in'
>
> HBase> get "useractivity", "userid1", {COLUMN=>'pageviews',VERSIONS=>2}
> COLUMN CELL
> pageviews:uri timestamp=1310367267049,
> value=http://www.yahoo.co.in
> pageviews:uri timestamp=1310367221129,
> value=http://www.allaboutdata.net
> 2 row(s) in 0.0440 seconds
>
>
> One thing you need to watch out for is the VERSIONS is defined on column
> family, and hence, you cannot change it once you have defined your column
> family. This will work if your applications wishes to store only fixed
> number of versions you want to store. If that is not the case, you need to
> relook at your table design and realize that using some other way.
>
> Regards,
> Srikanth
>
> -----Original Message-----
> From: Narayanan K [mailto:[email protected]]
> Sent: Monday, July 11, 2011 11:07 AM
> To: [email protected]
> Subject: Fetching and iterating through all column values belonging to all
> Timestamps of a Row
>
> Hi all,
>
> I am using Hadoop - 0.20.1 and HBASE - 0.20.
>
> Currently, I am trying to retrieve and iterate through all the column
> values
> of a particular rowkey in an Hbase Table.
> But I am able to retrieve *only* the cell+value having the *latest
> Timestamp
> *.
>
> Eg:
>
> *hbase>create 'useractivity', 'pageviews'
> hbase>put 'useractivity', 'userid1', 'pageviews:uri',
> 'http://www.allaboutdata.net'
> hbase>put 'useractivity', 'userid1', 'pageviews:uri', '
> http://www.yahoo.co.in'*
>
> *hbase>get 'useractivity', 'userid1' *
> is fetching only the "http://www.yahoo.co.in" column value as it has the
> latest timestamp.
>
> I wanted to view both the values in the column *uri*.
>
> I tried the same with the java API - Get as well as Scan. But still both of
> them gave me the same result with the column having value that was
> inserted the latest.
> I also read through some old archives and found I could setTimeRange on
> Get/Scan which is also not solving my problem.
>
> *get.setTimeRange(0,Long.MAXVALUE);* as in :
>
> *HTable table = new HTable(new HBaseConfiguration(), "useractivity");
> Get get = new Get(Bytes.toBytes("userid1"));
> get.addFamily(Bytes.toBytes("pageviews"));
> get.setTimeRange(0,Long.MAXVALUE);
> Result result = table.get(get);
> byte[] value = result.getValue(Bytes.toBytes("pageviews"),
> Bytes.toBytes("uri"));
>
> System.out.println(Bytes.toString(value));*
>
> This is fetching me only the column value with the latest timestamp.
>
> I tried the same with Scan API. But I get the same result.
>
> *Could you please let me know how I can retrieve all column values of all
> timestamps of a particular rowkey??*
>
> Many Thanks,
> Narayanan
>
> ________________________________
>
> http://www.mindtree.com/email/disclaimer.html
>