I meet similar problem too. This is my practice:
After logs were collected, i will use one MR job to process this logs, and store them into hbase, RowKey Column date + userId List of urls Because urls list is very large, i do compress on it. So if i need one person's urls history in one day, it is only one GET If i need one person's urls history in some days, it is a scan, and because rows number is not large, scan is fast. Hopes it will be helpful. 2015-12-01 18:39 GMT+08:00 Rajeshkumar J <[email protected]>: > Hi > > Thats an sample use case for my doubt . This is my use case > > Customers visiting our website are generated as logs and we will be > processing it which is usually done by Apache Pig for processing it and > inserts the output from pig into hbase table(test) directly using > HbaseStorage. This will be done every morning. Data consists of following > columns > > Customerid | Name | visitedurl | timestamp | location | companyname > > I have only one column family (test_family) > > As of now I have generated random no for each row and it is inserted as row > key for that table. For ex I have following data to be inserted into table > > 1725|xxx|www.something.com|127987834 | india |zzzz > 1726|yyy|www.some.com|128389478 | UK | yyyy > > If so I will add 1 as row key for first row and 2 for second one and so on. > > Note : Same id will be repeated for different days so I chose random no to > be row-key > > while querying data from table where I use scan 'test', > > {FILTER=>"SingleColumnValueFilter('test_family',Customerr'id',=,'binary:1002')"} > it takes more than 2 minutes to return the results. > > Suggest me a way so that I have to bring down this process to 1 to 2 > seconds since I am using it in real-time analytics > > Thanks > > On Tue, Dec 1, 2015 at 3:40 PM, Heng Chen <[email protected]> > wrote: > > > So, maybe we can use 1212 + customerId as rowKey. > > btw, what is 1212 used for? > > > > 2015-12-01 17:49 GMT+08:00 Rajeshkumar J <[email protected]>: > > > > > Hi chen, > > > > > > yes I have customerid column to represent each customers > > > > > > > > > > > > On Tue, Dec 1, 2015 at 3:11 PM, Heng Chen <[email protected]> > > > wrote: > > > > > > > Hm.., is there anything unique like userId to represent one people? > > > > > > > > > > > > 2015-12-01 16:33 GMT+08:00 Rajeshkumar J < > [email protected] > > >: > > > > > > > > > Is there any other way to store only id becoz there may be new rows > > > with > > > > > the same name like > > > > > > > > > > 1212 | xxxx | 20 > > > > > 1212 | yyyy | 21 > > > > > 1212 | xxxx | 22 > > > > > > > > > > > > > > > On Tue, Dec 1, 2015 at 1:59 PM, Heng Chen < > [email protected]> > > > > > wrote: > > > > > > > > > > > Yeah, if you want to get all records about 1212, just scan rows > > > with > > > > > > prefix 1212 > > > > > > > > > > > > 2015-12-01 16:27 GMT+08:00 Rajeshkumar J < > > > [email protected] > > > > >: > > > > > > > > > > > > > so you want me to design row-key value by appending name column > > > value > > > > > to > > > > > > > the rowkey > > > > > > > > > > > > > > On Tue, Dec 1, 2015 at 1:19 PM, Heng Chen < > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > So, why not > > > > > > > > > > > > > > > > 1212-xxx 20 > > > > > > > > 1212-yyy 21 > > > > > > > > 1212-zzz 22 > > > > > > > > > > > > > > > > 2015-12-01 15:33 GMT+08:00 Rajeshkumar J < > > > > > [email protected] > > > > > > >: > > > > > > > > > > > > > > > > > Hi > > > > > > > > > > > > > > > > > > I meant like below is this possible > > > > > > > > > > > > > > > > > > Rowkey | column family > > > > > > > > > > > > > > > > > > Name | Age > > > > > > > > > > > > > > > > > > 1212 | xxxx | 20 > > > > > > > > > 1212 | yyyy | 21 > > > > > > > > > 1212 | zzzz | 22 > > > > > > > > > > > > > > > > > > On Tue, Dec 1, 2015 at 12:03 PM, Heng Chen < > > > > > [email protected] > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > why not > > > > > > > > > > > > > > > > > > > > 1212 | 10, 11, 12, 13, 14, 15, 16, 27, 28 ? > > > > > > > > > > > > > > > > > > > > 2015-12-01 14:29 GMT+08:00 Rajeshkumar J < > > > > > > > [email protected] > > > > > > > > >: > > > > > > > > > > > > > > > > > > > > > Hi Ted, > > > > > > > > > > > > > > > > > > > > > > This is my use case. I have to store values like this > > is > > > it > > > > > > > > possible? > > > > > > > > > > > > > > > > > > > > > > RowKey | Values > > > > > > > > > > > > > > > > > > > > > > 1212 | 10,11,12 > > > > > > > > > > > > > > > > > > > > > > 1212 | 13, 14, 15 > > > > > > > > > > > > > > > > > > > > > > 1212 | 16,27,28 > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Nov 30, 2015 at 10:40 PM, Ted Yu < > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Have you read > > > > > http://hbase.apache.org/book.html#rowkey.design > > > > > > ? > > > > > > > > > > > > > > > > > > > > > > > > bq. we can store more than one row for a row-key > value. > > > > > > > > > > > > > > > > > > > > > > > > Can you clarify your intention / use case ? If row > key > > is > > > > the > > > > > > > same, > > > > > > > > > key > > > > > > > > > > > > values would be in the same row. > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Nov 30, 2015 at 8:30 AM, Rajeshkumar J < > > > > > > > > > > > > [email protected]> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > I am new to Apache Hbase and I know that in a > table > > > > when > > > > > we > > > > > > > try > > > > > > > > > to > > > > > > > > > > > > insert > > > > > > > > > > > > > row key value which is already present either new > > value > > > > is > > > > > > > > > discarded > > > > > > > > > > or > > > > > > > > > > > > > updated. Also I came across row version through > which > > > we > > > > > can > > > > > > > > store > > > > > > > > > > > > > different versions of row key based on timestamp. > Any > > > one > > > > > > > correct > > > > > > > > > me > > > > > > > > > > > if I > > > > > > > > > > > > > am wrong? Also I need to know is there any way we > can > > > > store > > > > > > > more > > > > > > > > > than > > > > > > > > > > > one > > > > > > > > > > > > > row for a row-key value. > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
