Hi, I'm trying to read a sequence file, in which the key is null and the value holds multiple columns [1], delimited by \x01. In Hive I simply define it as :
CREATE EXTERNAL TABLE foo (col1 string, col2 string, col3 timestamp) ROW FORMAT DELIMITED STORED as sequencefile LOCATION '/user/oracle/foo/bar'; In Drill I've got as far as SELECT CONVERT_FROM(binary_value, 'UTF8') from `/user/oracle/foo/bar` which yields the data but as a single column. I can cast it to individual columns but this is no use if the field positions change SELECT substr(CONVERT_FROM(binary_value, 'UTF8'),5,1) as col0,substr(CONVERT_FROM(binary_value, 'UTF8'),7,13) as col1,substr(CONVERT_FROM(binary_value, 'UTF8'),20,20) as col3 from `/user/oracle/seq/pdb.soe.logon` limit 5; +-------+----------------+-----------------------+ | col0 | col1 | col3 | +-------+----------------+-----------------------+ | I | PDB.SOE.LOGON | 2016-07-29 13:36:40 | Is there a way to treat a column as delimited and burst it out into multiple columns? Presumably I could somehow dump the string contents to CSV and then re-read it - but I'm interested here in using Drill the query existing data; wrangling it to suit Drill isn't really what I'm looking for (and maybe Drill just isn't the right tool here?). thanks, Robin. [1] https://docs.oracle.com/goldengate/bd1221/gg-bd/GADBD/GUID-85A82B2E-CD51-463A-8674-3D686C3C0EC0.htm#GADBD-GUID-4CAFC347-0F7D-49AB-B293-EFBCE95B66D6
