Looks like your table is using text storage format. Binary data needs to be 
stored as base64 in TextInputformat, so those values are probably being 
interpreted as base64 strings.


________________________________
From: Ujjwal Wadhawan <uwadha...@gmail.com>
Sent: Monday, September 14, 2015 2:32 PM
To: user@hive.apache.org
Subject: binary column data consistency in hive table copy

Hi all,

I recently observed a behavior in hive that I'll like to share and get inputs.

Scenario:

Say you have a hive table with a binary column.

create table binsource (bincol binary);

and some input data

$ cat /nis3/home/ujjwal2/test2/binin
10000101
121
10
1011
Asfs


Let's load the data in the table

LOAD DATA LOCAL INPATH '/home/ujjwal2/test2/binin' OVERWRITE INTO TABLE 
binsource;

When I do a select * on hive CLI, I see following characters (see image)

[http://puu.sh/k6HBw/877367d595.png]

The underlying HDFS file still has the actual input though.

[cid:image001.png@01D0EF10.AE2240E0]

Now I make a copy of this table using command "create table ujjwal2.bintarget 
as select * from ujjwal2.binsource;".

[http://puu.sh/k6HEj/b34a8bd4a0.png]

ISSUE:

Now when I see the underlying file create on HDFS for bintarget, I see some 
extra characters.

[cid:image006.png@01D0EF10.AE2240E0]

In may combinations I have tried, the extra characters are in "=", "w" and "A".

10000101
120=
1w==
1011
Asfs

Does anyone know what these characters signify ?

Best,
Ujjwal


Reply via email to