Querying Delimited Sequence file

Robin Moffatt Tue, 30 Aug 2016 07:18:22 -0700

Hi,

I'm trying to read a sequence file, in which the key is null and the value
holds multiple columns [1], delimited by \x01. In Hive I simply define it
as :


CREATE EXTERNAL TABLE foo (col1 string, col2 string, col3 timestamp)
ROW FORMAT DELIMITED
STORED as sequencefile
LOCATION '/user/oracle/foo/bar';

In Drill I've got as far as

SELECT CONVERT_FROM(binary_value, 'UTF8') from  `/user/oracle/foo/bar`

which yields the data but as a single column. I can cast it to individual
columns but this is no use if the field positions change

SELECT substr(CONVERT_FROM(binary_value, 'UTF8'),5,1) as
col0,substr(CONVERT_FROM(binary_value, 'UTF8'),7,13) as
col1,substr(CONVERT_FROM(binary_value, 'UTF8'),20,20) as col3 from
 `/user/oracle/seq/pdb.soe.logon` limit 5;
+-------+----------------+-----------------------+
| col0  |      col1      |         col3          |
+-------+----------------+-----------------------+
| I     | PDB.SOE.LOGON  | 2016-07-29 13:36:40  |


Is there a way to treat a column as delimited and burst it out into
multiple columns? Presumably I could somehow dump the string contents to
CSV and then re-read it - but I'm interested here in using Drill the query
existing data; wrangling it to suit Drill isn't really what I'm looking for
(and maybe Drill just isn't the right tool here?).


thanks,

Robin.

[1]
https://docs.oracle.com/goldengate/bd1221/gg-bd/GADBD/GUID-85A82B2E-CD51-463A-8674-3D686C3C0EC0.htm#GADBD-GUID-4CAFC347-0F7D-49AB-B293-EFBCE95B66D6

Querying Delimited Sequence file

Reply via email to