Seems byte_substr function is not documented anywhere. Should we add below example in Wiki? For example:
In hbase shell, create a hbase or maprdb table: create '/testtable','cf' put '/testtable','ABCD_9223370655563575807','cf:c','abc' Then we can get the substring of rowkey in Drill: select convert_from(byte_substr(row_key,1,4),'UTF8') from dfs.`/testtable`; +------------+ | EXPR$0 | +------------+ | ABCD | +------------+ 1 row selected (0.293 seconds) select convert_from(byte_substr(row_key,6,length(row_key)),'UTF8') from dfs.`/testtable`; +------------+ | EXPR$0 | +------------+ | 9223370655563575807 | +------------+ 1 row selected (0.263 seconds) Thanks, Hao On Tue, Jan 20, 2015 at 1:44 PM, Aditya <[email protected]> wrote: > I have been experimenting with byte_substr() function to build composite > keys as part of the HBase row key with good results. > > Here is a sample view for TPCH lineitem table > > select > convert_from(byte_substr(lineitem.row_key, 9, 4), 'int_be') as > l_orderkey, > convert_from(byte_substr(lineitem.row_key, 14, 4), 'int_be') as > l_partkey, > convert_from(byte_substr(lineitem.row_key, 18, 4), 'int_be') as > l_suppkey, > convert_from(byte_substr(lineitem.row_key, 13, 1), 'tinyint') as > l_linenumber, > convert_from(lineitem.F.l_quantity, 'double_be') as l_quantity, > convert_from(lineitem.F.l_extendedprice, 'double_be') as > l_extendedprice, > convert_from(lineitem.F.l_discount, 'double_be') as l_discount, > convert_from(lineitem.F.l_tax, 'double_be') as l_tax, > convert_from(lineitem.F.l_returnflag, 'utf8') as l_returnflag, > convert_from(lineitem.F.l_linestatus, 'utf8') as l_linestatus, > convert_from(byte_substr(lineitem.row_key, 1, 8), 'date_epoch_be') as > l_shipdate, > convert_from(lineitem.F.l_commitdate, 'date_epoch_be') l_commitdate, > convert_from(lineitem.F.l_receiptdate, 'date_epoch_be') l_receiptdate, > convert_from(lineitem.F.l_shipinstruct, 'utf8') as l_shipinstruct, > convert_from(lineitem.F.l_shipmode, 'utf8') as l_shipmode, > convert_from(lineitem.G.l_comment, 'utf8') as l_comment > from hbase.lineitem lineitem; > > To answer, Ted's question on filter pushdown, we have identified this as a > requirement > i.e. to be able to pushdown predicates with byte_substr() into HBase scan > but it is not > yet implemented. > > > On Tue, Jan 20, 2015 at 1:27 PM, Ted Dunning <[email protected]> > wrote: > > > What about filter pushdown in these cases? I know that some filter ops > > push down through convert calls. What about through byte_substr? > > > > > > > > On Tue, Jan 20, 2015 at 12:39 PM, Jacques Nadeau <[email protected]> > > wrote: > > > > > I believe there is byte_substr (or similar) which you could use before > > > handing the value to convert_from > > > > > > On Tue, Jan 20, 2015 at 7:56 AM, Carol McDonald < > [email protected]> > > > wrote: > > > > > > > what if the HBase primary key is a composite key composed of > multiple > > > > types , for example a string followed by a reverse timestamp (long) > > > like > > > > AMZN_9223370655563575807, > > > > > > > > are there parameters to specify the length in the function > > > > convert_from(string > > > > bytea, src_encoding name) > > > > > > > > > > > > > > > > On Thu, Dec 18, 2014 at 12:22 AM, Jacques Nadeau <[email protected] > > > > > > wrote: > > > > > > > > > String keys work but aren't the most performant or appropriate > > encoding > > > > to > > > > > use in many cases. Drill provides CONVERT_TO and CONVERT_FROM > with a > > > > large > > > > > number of encodings (including those use by many Hadoop > applications > > as > > > > > well the Apache Phoenix project). This improves performance of > data > > > use > > > > in > > > > > HBase. You can use strings but you should use an encoding > > appropriate > > > to > > > > > your actual data. Drill will then do projection pushdown, filter > > > > pushdown > > > > > and range pruning based on your query. > > > > > > > > > > On Wed, Dec 17, 2014 at 8:33 AM, Carol Bourgade < > > [email protected]> > > > > > wrote: > > > > > > > > > > > > Implala documentation says for best performance use the string > data > > > > type > > > > > > for HBase row keys. I know that you do not have to define the > data > > > > types > > > > > > for Drill queries , but do string bytes work better for drill > > queries > > > > on > > > > > > hbase row keys ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_hbase.html > > > > > > For best performance of Impala queries against HBase tables, most > > > > queries > > > > > > will perform comparisons in the WHERE against the column that > > > > corresponds > > > > > > to the HBase row key. When creating the table through the Hive > > shell, > > > > use > > > > > > the STRING data type for the column that corresponds to the HBase > > row > > > > > key. > > > > > > Impala can translate conditional tests (through operators such as > > =, > > > <, > > > > > > BETWEEN, and IN) against this column into fast lookups in HBase, > > but > > > > this > > > > > > optimization ("predicate pushdown") only works when that column > is > > > > > defined > > > > > > as STRING. > > > > > > > > > > > > > > > > > > > > >
