Cool, thx for testing. Best to file a JIRA.
—Andries On Apr 2, 2015, at 1:27 PM, Sudhakar Thota <[email protected]> wrote: > Vince/Andries, > > Perhaps this could be a bug. I get the same results. > > But the plan is very different, the UnionExchange is set up immediately after > the scan operation in successful case( Case -1 ), where as UnionExchange is > happening after scan->project (Case -2). > > Case -1.Successful case. > > 0: jdbc:drill:> explain plan for select to_timestamp(t.t, > 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from > dfs.sthota_prq.`/tstamp_test/*.parquet` limit 13015351) t; > +------------+------------+ > | text | json | > +------------+------------+ > | 00-00 Screen > 00-01 Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'), > 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')]) > 00-02 SelectionVectorRemover > 00-03 Limit(fetch=[13015351]) > 00-04 UnionExchange > 01-01 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath > [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet], > ReadEntryWithPath > [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet], > ReadEntryWithPath > [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]], > selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test, > numFiles=3, columns=[`*`]]]) > | { > "head" : { > "version" : 1, > "generator" : { > "type" : "ExplainHandler", > "info" : "" > }, > "type" : "APACHE_DRILL_PHYSICAL", > "options" : [ ], > "queue" : 0, > "resultMode" : "EXEC" > }, > > Case -2. Unsuccessful case: > > 0: jdbc:drill:> explain plan for select to_timestamp(t.t, > 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from > dfs.sthota_prq.`/tstamp_test/*.parquet` ) t; > +------------+------------+ > | text | json | > +------------+------------+ > | 00-00 Screen > 00-01 UnionExchange > 01-01 Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'), > 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')]) > 01-02 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet], > ReadEntryWithPath > [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet], > ReadEntryWithPath > [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]], > selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test, > numFiles=3, columns=[`*`]]]) > | { > "head" : { > "version" : 1, > "generator" : { > "type" : "ExplainHandler", > "info" : "" > }, > "type" : "APACHE_DRILL_PHYSICAL", > "options" : [ ], > "queue" : 0, > "resultMode" : "EXEC" > }, > > Thanks > Sudhakar Thota > > > On Apr 2, 2015, at 12:01 PM, Vince Gonzalez <[email protected]> wrote: > >> Ok, will do. Thanks. >> >> On Thu, Apr 2, 2015 at 2:49 PM, Andries Engelbrecht < >> [email protected]> wrote: >> >>> Compare the query plans and you probably want to look at the log file to >>> see what fails and post here. >>> >>> >>> >>> —Andries >>> >>> >>> On Apr 1, 2015, at 12:54 PM, Vince Gonzalez <[email protected]> >>> wrote: >>> >>>> Is this a bug? >>>> >>>> Created a parquet table (using CTAS) with one column containing text >>>> timestamps. >>>> >>>> 0: jdbc:drill:zk=localhost:2181> select * from tstamp_test limit 1; >>>> +------------+ >>>> | t | >>>> +------------+ >>>> | 2015-01-27T13:43:53.000Z | >>>> +------------+ >>>> 1 row selected (0.119 seconds) >>>> >>>> The below queries, identical apart from the limit clause, behave >>>> differently. The one with the limit clause works, the one without >>> doesn't. >>>> The limit is larger than the total number of rows, so in both cases we >>>> should be processing all rows. >>>> >>>> No limit clause. It fails: >>>> >>>> ``` >>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t, >>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test) as >>> t; >>>> Query failed: RemoteRpcException: Failure while trying to start remote >>>> fragment, Expression has syntax error! line 1:30:mismatched input 'T' >>>> expecting CParen [ 7d30d753-0822-4820-afd0-b7e7fe5e639c on >>>> 192.168.99.1:31010 ] >>>> ``` >>>> >>>> Limit clause in the subselect (larger than the number of rows in the >>> table) >>>> succeeds. >>>> >>>> ``` >>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t, >>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test limit >>>> 100000000) as t; >>>> ... >>>> | 2015-02-17 07:18:00.0 | >>>> +------------+ >>>> 13,015,350 rows selected (105.257 seconds) >>>> ``` >>>> >>>> Data can be downloaded here: >>>> >>>> https://s3.amazonaws.com/vgonzalez/data/tstamp_test.tar.gz >>> >>> >
