I have an HBase table I've defined as an external table in Hive, and I'm having 
trouble determining the proper escaping of newlines in the byte arrays.

The primary use-case of this table is writing via the HBase client API, then 
reading via HiveQL select queries against HiveServer2.

I've found that if I leave the newlines alone (as just \n), then a query 
utilizing a WHERE clause creates extraneous rows with NULL values, but writing 
them to HBase as \\n makes the queries return the correct rows, but they stay 
escaped in the query result. 

I was expecting to need to escape them since I'm writing to HBase outside of 
Hive, but I also expected them to come back out of Hive without needed an extra 
un-escaping step.

Running Hive 0.10 from CDH4.2.1, table structure looks like:

CREATE EXTERNAL TABLE blog_post (
    id STRUCT<blog_name: STRING, post_id: STRING>,
    blog_name STRING,
    post_id STRING,
    body STRING
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
    'hbase.columns.mapping' = ':key,post:blog_name,post:post_id,post:body',
    'hbase.table.default.storage.type' = 'binary'
)
TBLPROPERTIES (
    'hbase.table.name' = 'blog_post'
);

Example query:

SELECT * FROM blog_post WHERE blog_name = 'testblog';

Thanks,

Rob Roland

Reply via email to