String keys work but aren't the most performant or appropriate encoding to
use in many cases.  Drill provides CONVERT_TO and CONVERT_FROM with a large
number of encodings (including those use by many Hadoop applications as
well the Apache Phoenix project).  This improves performance of data use in
HBase.  You can use strings but you should use an encoding appropriate to
your actual data.  Drill will then do projection pushdown, filter pushdown
and range pruning based on your query.

On Wed, Dec 17, 2014 at 8:33 AM, Carol Bourgade <[email protected]>
wrote:
>
> Implala documentation says for best performance use the string data type
> for HBase row keys.  I know that you do not have to define the data types
> for Drill queries , but do string bytes work better for drill queries on
> hbase row keys ?
>
>
> http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_hbase.html
> For best performance of Impala queries against HBase tables, most queries
> will perform comparisons in the WHERE against the column that corresponds
> to the HBase row key. When creating the table through the Hive shell, use
> the STRING data type for the column that corresponds to the HBase row key.
> Impala can translate conditional tests (through operators such as =, <,
> BETWEEN, and IN) against this column into fast lookups in HBase, but this
> optimization ("predicate pushdown") only works when that column is defined
> as STRING.
>

Reply via email to