Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? It'll use all of the parts of your row key and depending on how much data you're returning back to the client, will query over 10 million row in seconds.
James @JamesPlusPlus http://phoenix-hbase.blogspot.com On Apr 30, 2013, at 1:59 PM, "[email protected]" <[email protected]> wrote: > That depends on how dynamic your data is. If it is pretty static, you can > also consider using something like Create Table As Select (CTAS) to create > a snapshot of your data to HDFS and then run queries on top of that data. > > So your query might become something like: > > create table my_table as select * from event where key.name=’Signup’ and > key.dateCreated=’2013-03-06 16:39:55.353’ and key.uid=’7af4c330-5988-4255- > 9250-924ce5864e3bf’; > > Since your data is now in HDFS, this should give you a considerable > performance boost. > > > On Tue, Apr 30, 2013 at 3:00 PM, Rupinder Singh <[email protected]> wrote: > >> Swarnim,**** >> >> ** ** >> >> Thanks. So this means custom map reduce is the viable option when working >> with hbase tables having composite keys, since it allows to set the start >> and stop keys. Hive+Hbase combination is out.**** >> >> ** ** >> >> Regards**** >> >> Rupinder**** >> >> ** ** >> >> *From:* [email protected] [mailto:[email protected]] >> *Sent:* Wednesday, May 01, 2013 12:17 AM >> >> *To:* [email protected] >> *Cc:* [email protected] >> *Subject:* Re: Very poor read performance with composite keys in hbase**** >> >> ** ** >> >> Rupinder,**** >> >> ** ** >> >> Hive supports a filter pushdown[1] which means that the predicates in the >> where clause are pushed down to the storage handler level where either they >> get handled by the storage handler or delegated to hive if they cannot >> handle them. As of now, the HBaseStorageHandler only supports primitive >> types. So when you use strings as keys, behind the scenes they get >> converted to start and stop keys and restrict the hbase scan. This does not >> happen for structs. Hence you see a full table scan causing bad performance. >> **** >> >> ** ** >> >> [1] https://cwiki.apache.org/Hive/filterpushdowndev.html**** >> >> ** ** >> >> On Tue, Apr 30, 2013 at 1:04 PM, Sanjay Subramanian < >> [email protected]> wrote:**** >> >> My experience with hive + hbase has been about 8x slower on an average. So >> I went ahead with hive only option. >> >> Sent from my iPhone**** >> >> >> On Apr 30, 2013, at 11:19 PM, "Rupinder Singh" <[email protected]> wrote:*** >> * >> >> Hi,**** >> >> **** >> >> I have an hbase cluster where I have a table with a composite key. I map >> this table to a Hive external table using which I insert/select data >> into/from this table:**** >> >> CREATE EXTERNAL TABLE event(key >> struct<name:string,dateCreated:string,uid:string>, {more columns here})*** >> * >> >> ROW FORMAT DELIMITED**** >> >> COLLECTION ITEMS TERMINATED BY '~'**** >> >> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'**** >> >> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, other columns ")*** >> * >> >> TBLPROPERTIES ("hbase.table.name" = "event");**** >> >> **** >> >> The table has about 10 million rows. When I do a select * using all 3 >> components of the key, essentially selecting just 1 row, the response time >> is almost 700 sec, which seems pretty bad.**** >> >> **** >> >> For comparison purpose, I created another table with a simple string key, >> and the rest of the columns etc same. The key is a string UUID. Table has >> same number of column families and same number of rows.**** >> >> CREATE EXTERNAL TABLE test_event(key string, blah blah…..**** >> >> TBLPROPERTIES ("hbase.table.name" = "test_event");**** >> >> **** >> >> When I select a single row from this table by doing select * where >> key=’something’, the response time is 35 sec.**** >> >> **** >> >> This seems to indicate that in case of composite keys, there is a full >> table scan happening. This seems weird.**** >> >> **** >> >> What am I missing here? Is there something special I need to do to get >> good read performance if I am using composite keys ?**** >> >> Insert performance in both cases is comparable and is as per expectation.* >> *** >> >> **** >> >> Any help is appreciated.**** >> >> Here is the env spec:**** >> >> **** >> >> Amazon EMR**** >> >> Hbase Cluster- 3 core nodes with 7.5 GB RAM each, 2 CPUs of 2.2 GHz each. >> Master 7.5 GB RAM, 2 CPUs of 2.2 GHz each**** >> >> Hive Cluster – 3 core nodes 3.75 GB RAM each, 1 CPU of 1.8 GHz. Master >> 3.75 GB RAM, 1 CPU of 1.8 GHz**** >> >> **** >> >> Thanks**** >> >> Rupinder**** >> >> ** ** >> >> ** ** >> >> This email is intended for the person(s) to whom it is addressed and may >> contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized >> use, distribution, copying, or disclosure by any person other than the >> addressee(s) is strictly prohibited. If you have received this email in >> error, please notify the sender immediately by return email and delete the >> message and any attachments from your system.**** >> >> ** ** >> >> ** ** >> >> CONFIDENTIALITY NOTICE >> ====================== >> This email message and any attachments are for the exclusive use of the >> intended recipient(s) and may contain confidential and privileged >> information. Any unauthorized review, use, disclosure or distribution is >> prohibited. If you are not the intended recipient, please contact the >> sender by reply email and destroy all copies of the original message along >> with any attachments, from your computer system. If you are the intended >> recipient, please be advised that the content of this message is subject to >> access, review and disclosure by the sender's Email System Administrator.* >> *** >> >> >> >> **** >> >> ** ** >> >> -- >> Swarnim **** >> > > > > -- > Swarnim
