Hi, Thanks - I think SPLIT gets me some of the way, but after the FLATTEN I want to PIVOT, so instead of :
0: jdbc:drill:zk=cdh57-01-node-01.moffatt.me:> select flatten(split(version,'.')) from sys.version; +---------+ | EXPR$0 | +---------+ | 1 | | 7 | | 0 | +---------+ I'd get something like: +---------+---------+---------+ | EXPR$0 | EXPR$1 | EXPR$2 | +---------+---------+---------+ | 1 | 7 | 0 | +---------+---------+---------+ I'm guessing this isn't possible in Drill yet? Also, what would be be the syntax to enter the \x01 character in the SPLIT function? Entered literally I get an error: 0: jdbc:drill:zk=cdh57-01-node-01.moffatt.me:> SELECT split(CONVERT_FROM(binary_value, 'UTF8'),'\x01') from `/user/oracle/seq/pdb.soe.logon` limit 5; Error: SYSTEM ERROR: IllegalArgumentException: Only single character delimiters are supported for split() BTW I didn't realise SPLIT was supported, and it's not listed in https://drill.apache.org/docs/string-manipulation/ or https://drill.apache.org/search/?q=split -- is there somewhere I should log this kind of documentation issue? thanks, Robin. On 30 August 2016 at 16:07, Zelaine Fong <[email protected]> wrote: > If the column is delimited by some character, you can use the SPLIT() > function to separate the value into an array of values. You can then use > the FLATTEN() function to separate the array of values into individual > records. > > E.g., if your column has the value "a:b", where your delimiter is ":", you > would run the following query: > > 0: jdbc:drill:zk=local> select flatten(split(columns[0],':')) from > `/tmp/foo.csv`; > +---------+ > | EXPR$0 | > +---------+ > | a | > | b | > +---------+ > 2 rows selected (0.319 seconds) > > Is that what you had in mind? > > -- Zelaine > > On Tue, Aug 30, 2016 at 7:17 AM, Robin Moffatt < > [email protected]> wrote: > > > Hi, > > > > I'm trying to read a sequence file, in which the key is null and the > value > > holds multiple columns [1], delimited by \x01. In Hive I simply define it > > as : > > > > CREATE EXTERNAL TABLE foo (col1 string, col2 string, col3 timestamp) > > ROW FORMAT DELIMITED > > STORED as sequencefile > > LOCATION '/user/oracle/foo/bar'; > > > > In Drill I've got as far as > > > > SELECT CONVERT_FROM(binary_value, 'UTF8') from `/user/oracle/foo/bar` > > > > which yields the data but as a single column. I can cast it to individual > > columns but this is no use if the field positions change > > > > SELECT substr(CONVERT_FROM(binary_value, 'UTF8'),5,1) as > > col0,substr(CONVERT_FROM(binary_value, 'UTF8'),7,13) as > > col1,substr(CONVERT_FROM(binary_value, 'UTF8'),20,20) as col3 from > > `/user/oracle/seq/pdb.soe.logon` limit 5; > > +-------+----------------+-----------------------+ > > | col0 | col1 | col3 | > > +-------+----------------+-----------------------+ > > | I | PDB.SOE.LOGON | 2016-07-29 13:36:40 | > > > > > > Is there a way to treat a column as delimited and burst it out into > > multiple columns? Presumably I could somehow dump the string contents to > > CSV and then re-read it - but I'm interested here in using Drill the > query > > existing data; wrangling it to suit Drill isn't really what I'm looking > for > > (and maybe Drill just isn't the right tool here?). > > > > > > thanks, > > > > Robin. > > > > [1] > > https://docs.oracle.com/goldengate/bd1221/gg-bd/GADBD/ > > GUID-85A82B2E-CD51-463A-8674-3D686C3C0EC0.htm#GADBD-GUID- > > 4CAFC347-0F7D-49AB-B293-EFBCE95B66D6 > > >
