Hi Chuck - I've used binary columns with Newlines in the data. I used RCFile format for my storage method. Works great so far. Whether or not this is "the" way to get data in, I use hexed data (my transform script outputs hex encoded) and the final insert into the table gets a unhex(sourcedata). That's never been a problem for me, seems a bit hackish, but works well.
On Sat, Dec 1, 2012 at 10:50 AM, Connell, Chuck <[email protected]>wrote: > I am trying to use BINARY columns and believe I have the perfect > use-case for it, but I am missing something. Has anyone used this for true > binary data (which may contain newlines)? > > > Here is the background... I have some files that each contain just one > logical field, which is a binary object. (The files are Google Protobuf > format.) I want to put these binary files into a larger file, where each > protobuf is a logical record. Then I want to define a Hive table that > stores each protobuf as one row, with the entire protobuf object in one > BINARY column. Then I will use a custom UDF to select/query the binary > object. > > > This is about as simple as can be for putting binary data into Hive. > > > What file format should I use to package the binary rows? What should > the Hive table definition be? Which SerDe option (LazySimpleBinary?). I > cannot use TEXTFILE, since the binary may contain newlines. Many of my > attempts have choked on the newlines. > > > Thank you, > > Chuck Connell > > Nuance > > Burlington, MA > >
