Thanks, SPLIT_PART looks useful. 0: jdbc:drill:zk=cdh57-01-node-01.moffatt.me:> select split_part(version,'.',1),split_part(version,'.',2),split_part(version,'.',3) from sys.version; +---------+---------+---------+ | EXPR$0 | EXPR$1 | EXPR$2 | +---------+---------+---------+ | 1 | 7 | 0 | +---------+---------+---------+ 1 row selected (0.351 seconds)
But used with my actual data (sequence file), I get an error. I've successfully SPLIT it using CHR(1) for the \x01 delimiter: 0: jdbc:drill:zk=cdh57-01-node-01.moffatt.me:> SELECT split(CONVERT_FROM(binary_value, 'UTF8'),chr(1)) from `/user/oracle/seq/pdb.soe.logon` limit 1; +--------+ | EXPR$0 | +--------+ | ["\u0000\u0000\u0000|I","PDB.SOE.LOGON","2016-08-30 10:34:01.000145","2016-08-30T11:34:07.934000","00000000000001558898","","338328","13645","2016-08-30:11:34:01"] | +--------+ But if I now try to access one of those elements, it errors: 0: jdbc:drill:zk=cdh57-01-node-01.moffatt.me:> SELECT split_part(CONVERT_FROM(binary_value, 'UTF8'),chr(1),1) from `/user/oracle/seq/pdb.soe.logon` limit 5; Error: SYSTEM ERROR: IllegalArgumentException: length: -123 (expected: >= 0) Fragment 0:0 [Error Id: beba85c3-8c5b-4c05-9ae7-d12263811af4 on cdh57-01-node-02.moffatt.me:31010] (state=,code=0) 0: jdbc:drill:zk=cdh57-01-node-01.moffatt.me:> SELECT split_part(CONVERT_FROM(binary_value, 'UTF8'),chr(1),2) from `/user/oracle/seq/pdb.soe.logon` limit 5; Error: SYSTEM ERROR: IllegalArgumentException: length: -6 (expected: >= 0) Fragment 0:0 [Error Id: b4f18223-2999-4388-9450-dc9683c543ec on cdh57-01-node-02.moffatt.me:31010] (state=,code=0) Should this work? thanks. On 30 August 2016 at 19:06, rahul challapalli <[email protected]> wrote: > You should be able to use split_part function (I haven't tried it > myself...but it is supported). With this function you can extract > individual columns. Unfortunately I couldn't find the documentation for > this function as well. But it should be similar to how other databases > implement this function. > > Also as you have observed, split does not support delimiters with more than > one character. You can raise a jira and mark it as documentation related. > > Rahul > > > > On Tue, Aug 30, 2016 at 8:58 AM, Robin Moffatt < > [email protected]> wrote: > > > Hi, > > > > Thanks - I think SPLIT gets me some of the way, but after the FLATTEN I > > want to PIVOT, so instead of : > > > > 0: jdbc:drill:zk=cdh57-01-node-01.moffatt.me:> select > > flatten(split(version,'.')) from sys.version; > > +---------+ > > | EXPR$0 | > > +---------+ > > | 1 | > > | 7 | > > | 0 | > > +---------+ > > > > I'd get something like: > > > > +---------+---------+---------+ > > | EXPR$0 | EXPR$1 | EXPR$2 | > > +---------+---------+---------+ > > | 1 | 7 | 0 | > > +---------+---------+---------+ > > > > I'm guessing this isn't possible in Drill yet? > > > > Also, what would be be the syntax to enter the \x01 character in the > SPLIT > > function? Entered literally I get an error: > > > > 0: jdbc:drill:zk=cdh57-01-node-01.moffatt.me:> SELECT > > split(CONVERT_FROM(binary_value, 'UTF8'),'\x01') from > > `/user/oracle/seq/pdb.soe.logon` limit 5; > > Error: SYSTEM ERROR: IllegalArgumentException: Only single character > > delimiters are supported for split() > > > > BTW I didn't realise SPLIT was supported, and it's not listed in > > https://drill.apache.org/docs/string-manipulation/ or > > https://drill.apache.org/search/?q=split -- is there somewhere I should > > log > > this kind of documentation issue? > > > > thanks, Robin. > > > > > > On 30 August 2016 at 16:07, Zelaine Fong <[email protected]> wrote: > > > > > If the column is delimited by some character, you can use the SPLIT() > > > function to separate the value into an array of values. You can then > use > > > the FLATTEN() function to separate the array of values into individual > > > records. > > > > > > E.g., if your column has the value "a:b", where your delimiter is ":", > > you > > > would run the following query: > > > > > > 0: jdbc:drill:zk=local> select flatten(split(columns[0],':')) from > > > `/tmp/foo.csv`; > > > +---------+ > > > | EXPR$0 | > > > +---------+ > > > | a | > > > | b | > > > +---------+ > > > 2 rows selected (0.319 seconds) > > > > > > Is that what you had in mind? > > > > > > -- Zelaine > > > > > > On Tue, Aug 30, 2016 at 7:17 AM, Robin Moffatt < > > > [email protected]> wrote: > > > > > > > Hi, > > > > > > > > I'm trying to read a sequence file, in which the key is null and the > > > value > > > > holds multiple columns [1], delimited by \x01. In Hive I simply > define > > it > > > > as : > > > > > > > > CREATE EXTERNAL TABLE foo (col1 string, col2 string, col3 timestamp) > > > > ROW FORMAT DELIMITED > > > > STORED as sequencefile > > > > LOCATION '/user/oracle/foo/bar'; > > > > > > > > In Drill I've got as far as > > > > > > > > SELECT CONVERT_FROM(binary_value, 'UTF8') from > `/user/oracle/foo/bar` > > > > > > > > which yields the data but as a single column. I can cast it to > > individual > > > > columns but this is no use if the field positions change > > > > > > > > SELECT substr(CONVERT_FROM(binary_value, 'UTF8'),5,1) as > > > > col0,substr(CONVERT_FROM(binary_value, 'UTF8'),7,13) as > > > > col1,substr(CONVERT_FROM(binary_value, 'UTF8'),20,20) as col3 from > > > > `/user/oracle/seq/pdb.soe.logon` limit 5; > > > > +-------+----------------+-----------------------+ > > > > | col0 | col1 | col3 | > > > > +-------+----------------+-----------------------+ > > > > | I | PDB.SOE.LOGON | 2016-07-29 13:36:40 | > > > > > > > > > > > > Is there a way to treat a column as delimited and burst it out into > > > > multiple columns? Presumably I could somehow dump the string contents > > to > > > > CSV and then re-read it - but I'm interested here in using Drill the > > > query > > > > existing data; wrangling it to suit Drill isn't really what I'm > looking > > > for > > > > (and maybe Drill just isn't the right tool here?). > > > > > > > > > > > > thanks, > > > > > > > > Robin. > > > > > > > > [1] > > > > https://docs.oracle.com/goldengate/bd1221/gg-bd/GADBD/ > > > > GUID-85A82B2E-CD51-463A-8674-3D686C3C0EC0.htm#GADBD-GUID- > > > > 4CAFC347-0F7D-49AB-B293-EFBCE95B66D6 > > > > > > > > > >
