Great! Thank you all for your inputs. -Ujjwal
On Tue, Sep 15, 2015 at 8:08 AM, Gabriel Balan <gabriel.ba...@oracle.com> wrote: > Hi > > You see "1w==" when you do a CTAS into a table using text files and > lazysimpleserde > because in that case binary columns are stored as base64. > > That also means lazySimpleSerde will also expect your 'binin' text file to > have base64 encoded values. > The strange things you see when you select from binsource must be the > base64 decoding of '10000101', etc. > > You can read about base64 here: https://en.wikipedia.org/wiki/Base64 > > Also, I find the intended use for binary very interesting. According to > https://cwiki.apache.org/confluence/display/Hive/Binary+DataType+Proposal: > > "Often [...] a row in a data might be very wide with hundreds of columns. > Sometimes, user is just interested in few of those columns and doesn't want > to bother about exact type information for rest of columns. In such cases, > he may just declare the types of those columns as binary and Hive will not > try to interpret those columns." > > hth > Gabriel Balan > > The statements and opinions expressed here are my own and do not > necessarily represent those of Oracle Corporation. > > > ----- Original Message ----- > From: xihuyu2...@126.com > To: user@hive.apache.org > Sent: Monday, September 14, 2015 7:17:24 PM GMT -05:00 US/Canada Eastern > Subject: Re: Re: binary column data consistency in hive table copy > > if use CTAS then a MR job occures. Maybe the problem is in the MR job. > 2015-09-15 > ------------------------------ > xihuyu2000 > ------------------------------ > > *发件人:*Jason Dere <jd...@hortonworks.com> > *发送时间:*2015-09-15 06:00 > *主题:*Re: binary column data consistency in hive table copy > *收件人:*"user@hive.apache.org"<user@hive.apache.org> > *抄送:* > > > Looks like your table is using text storage format. Binary data needs to > be stored as base64 in TextInputformat, so those values are probably being > interpreted as base64 strings. > > > ------------------------------ > *From:* Ujjwal Wadhawan <uwadha...@gmail.com> > *Sent:* Monday, September 14, 2015 2:32 PM > *To:* user@hive.apache.org > *Subject:* binary column data consistency in hive table copy > > Hi all, > > > > I recently observed a behavior in hive that I’ll like to share and get > inputs. > > > > *Scenario:* > > > > Say you have a hive table with a binary column. > > > > create table binsource (bincol binary); > > > > and some input data > > > > $ cat /nis3/home/ujjwal2/test2/binin > > 10000101 > > 121 > > 10 > > 1011 > > Asfs > > > > > > Let’s load the data in the table > > > > LOAD DATA LOCAL INPATH '/home/ujjwal2/test2/binin' OVERWRITE INTO TABLE > binsource; > > > > When I do a select * on hive CLI, I see following characters (see image) > > > [image: http://puu.sh/k6HBw/877367d595.png] > > > > The underlying HDFS file still has the actual input though. > > > > > > Now I make a copy of this table using command "create table > ujjwal2.bintarget as select * from ujjwal2.binsource;". > > > [image: http://puu.sh/k6HEj/b34a8bd4a0.png] > > > > > *ISSUE:* > > > Now when I see the underlying file create on HDFS for bintarget, I see > some extra characters. > > > > > > In may combinations I have tried, the extra characters are in “=”, “w” and > “A”. > > > 10000101 > > 120= > > 1w== > > 1011 > > Asfs > > > Does anyone know what these characters signify ? > > > > Best, > > Ujjwal > > >