Hi all,

Could somebody please throw some light on this?

If it is a limitation in Hbase that I can access only that column value
having the latest timestamp in a row of Hbase table, then
will have to think about a different schema where each user event entry will
need to go into a different row.

Also could somebody let me know if getTimeRange method of GET or SCAN can be
used to access all the column values falling under all timestamps of a
particular row.

Thanks,
Narayanan

On Mon, Jul 11, 2011 at 5:15 PM, Narayanan K <[email protected]> wrote:

> Hi,
>
> I'll make my doubt a little more illustrative.
>
> The flat feed file ( just a sample scenario to make my point clear) would
> have the user events at various times of a day.
>
> Eg: *UserID     Time             Url           Views  Timespent*
>        1            05:27            a.com      1         20
>        2            05:34            b.com      2         12
>        1            06:00            a.com      1         18
>        3            06:02            c.com      3         56
>        1            07:03            a.com      2         10
>
> So these data would be dumped into Hbase Table with rowkey as the date *
> 2011-07-01* and columns as *User:UserID,  Http:Url*,  *Metrics:Views*, *
> Metrics:Timespent.*
> So the next day, the rowkey will be incremented and all feeds of this day
> will be loaded into the table with new rowkey *2011-07-02* and so on.
>
> Now I need to sum up the Timespent or the Views for a user "1" for url "
> a.com" for the day say *2011-07-01*  (Just a sample scenario I am thinking
> of) - which means I need to sum up all the Timespent for a particular userID
> for a particular Url present in the row *2011-07-01.
>
> *A GET on this table for this rowkey will just give me the latest entry
> into the row. But I need to be able to scan through all values in a row and
> sum up.
> The output should be like below:
>
> Output to different table:
>
> *Date            ->   User   url          TotalViews    TotalTS*
> 2011-07-01   ->   1        a.com    4                  48
>                          2        b.com    2                  12
>                          3        c.com    3                  56
>
> 2011-07-02   ->  so on.....
>
> I hope this would make my doubt a little more clearer.
>
> Thanks,
> Narayanan
>
>
> On Mon, Jul 11, 2011 at 2:59 PM, Srikanth P. Shreenivas <
> [email protected]> wrote:
>
>> Columns in a table are identified by column-family:column-name.
>>
>> A column-name is a byte array, and you can assign a dynamic value.
>> So, in this case, you can have table with variable columns where each
>> column can represent on web site, and the cell value can be the count of
>> views user has done for that page.
>>
>> Rowkey  - <==========================  Columns
>> ====================================>
>>
>> User1    pageviews:www.yahoo.com     pageviews:www.google.com
>>         10                          20
>>
>>
>> So, if you do
>>
>> hbase> get "useractivity", "userid1", {COLUMN=>'pageviews:www.yahoo.com'}
>>
>> then, you should get desired value.
>>
>>
>> This solution too has some gotchas though.  Keep in mind that you can
>> query either particular columns or all columns in a Get request.  If number
>> of columns is too large, then, you can risk out-of-memory error when doing
>> all columns get.   If you are going to query by column name (specific web
>> site in this case), then, you should be okay with this design.
>>
>> Alternatively, you can define your row key to contain the web site name.
>>  For example, you can have one row per user per website.
>> So, your row key will look like "userID1-com.yahoo.www" (It is typically
>> suggested to use reverse domain names
>> http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable).   
>>  Row key is too just a byte array and it is up to you to figure out
>> what you want it to be consist of.
>>
>>
>> Regards,
>> Srikanth
>>
>>
>> -----Original Message-----
>> From: Narayanan K [mailto:[email protected]]
>> Sent: Monday, July 11, 2011 2:46 PM
>> To: [email protected]
>> Subject: Re: Fetching and iterating through all column values belonging to
>> all Timestamps of a Row
>>
>> Hi Srikanth,
>>
>> Yes. Versions will help me if I have fixed number of Versions.
>>
>> But in my case, I will not know the number of versions beforehand. The
>> table
>> will be populated from feedfiles using a mapreduce program.
>> Once loaded, all these will go into the same column family:column. Then I
>> would want to count the number of times a particular URI was accessed by
>> userid1.
>> For this, I need to be able to scan through all the versions loaded in
>> that
>> rowkey and do a counter increment.
>>
>> How is this possible,if I donot know the number of versions that is
>> getting
>> loaded into a table rowkey as it is a dynamic property (each feedfile may
>> have different number of records) ?
>>
>> Is the setTimeRange method of GET and SCAN meant to do this? If so, why am
>> I
>> not getting all the column values for a particular rowkey?
>>
>> Regards,
>> Narayanan
>>
>>
>> On Mon, Jul 11, 2011 at 12:28 PM, Srikanth P. Shreenivas <
>> [email protected]> wrote:
>>
>> > Hi Narayanan,
>> >
>> > I think you need to create the table with versions enabled.
>> >
>> > For example, if you need to store 5 versions, you can use create like
>> this:
>> >
>> > Hbase> create 'useractivity', {NAME => 'pageviews', VERSIONS => 5}
>> >
>> > HBase> put 'useractivity', 'userid1', 'pageviews:uri', '
>> > http://www.allaboutdata.net'
>> > HBase> put 'useractivity', 'userid1', 'pageviews:uri', '
>> > http://www.yahoo.co.in'
>> >
>> > HBase> get "useractivity", "userid1", {COLUMN=>'pageviews',VERSIONS=>2}
>> > COLUMN                                        CELL
>> >  pageviews:uri                                timestamp=1310367267049,
>> > value=http://www.yahoo.co.in
>> >  pageviews:uri                                timestamp=1310367221129,
>> > value=http://www.allaboutdata.net
>> > 2 row(s) in 0.0440 seconds
>> >
>> >
>> > One thing you need to watch out for is the VERSIONS is defined on column
>> > family, and hence, you cannot change it once you have defined your
>> column
>> > family.  This will work if your applications wishes to store only fixed
>> > number of versions you want to store.  If that is not the case, you need
>> to
>> > relook at your table design and realize that using some other way.
>> >
>> > Regards,
>> > Srikanth
>> >
>> > -----Original Message-----
>> > From: Narayanan K [mailto:[email protected]]
>> > Sent: Monday, July 11, 2011 11:07 AM
>> > To: [email protected]
>> > Subject: Fetching and iterating through all column values belonging to
>> all
>> > Timestamps of a Row
>> >
>> > Hi all,
>> >
>> > I am using Hadoop - 0.20.1 and HBASE - 0.20.
>> >
>> > Currently, I am trying to retrieve and iterate through all the column
>> > values
>> > of a particular rowkey in an Hbase Table.
>> > But I am able to retrieve *only* the cell+value having the *latest
>> > Timestamp
>> > *.
>> >
>> > Eg:
>> >
>> > *hbase>create 'useractivity', 'pageviews'
>> > hbase>put 'useractivity', 'userid1', 'pageviews:uri',
>> > 'http://www.allaboutdata.net'
>> > hbase>put 'useractivity', 'userid1', 'pageviews:uri', '
>> > http://www.yahoo.co.in'*
>> >
>> > *hbase>get 'useractivity', 'userid1' *
>> > is fetching only the  "http://www.yahoo.co.in"; column value as it has
>> the
>> > latest timestamp.
>> >
>> > I wanted to view both the values in the column *uri*.
>> >
>> > I tried the same with the java API - Get as well as Scan. But still both
>> of
>> > them gave me the same result with the column having value that was
>> > inserted the latest.
>> > I also read through some old archives and found I could setTimeRange on
>> > Get/Scan which is also not solving my problem.
>> >
>> > *get.setTimeRange(0,Long.MAXVALUE);* as in :
>> >
>> >  *HTable table = new HTable(new HBaseConfiguration(), "useractivity");
>> >  Get get = new Get(Bytes.toBytes("userid1"));
>> >        get.addFamily(Bytes.toBytes("pageviews"));
>> >        get.setTimeRange(0,Long.MAXVALUE);
>> >        Result result = table.get(get);
>> >        byte[] value = result.getValue(Bytes.toBytes("pageviews"),
>> > Bytes.toBytes("uri"));
>> >
>> >        System.out.println(Bytes.toString(value));*
>> >
>> >  This is fetching me only the column value with the latest timestamp.
>> >
>> > I tried the same with Scan API. But I get the same result.
>> >
>> > *Could you please let me know how I can retrieve all column values of
>> all
>> > timestamps of a particular rowkey??*
>> >
>> > Many Thanks,
>> > Narayanan
>> >
>> > ________________________________
>> >
>> > http://www.mindtree.com/email/disclaimer.html
>> >
>>
>
>

Reply via email to