Steven, Thanks. Andries already pointed that to me and I opened public jira.
https://issues.apache.org/jira/browse/DRILL-2669 Thanks Sudhakar Thota On Apr 2, 2015, at 4:09 PM, Steven Phillips <[email protected]> wrote: > Could you please file a public jira. That link is to an internal issue. > > On Thu, Apr 2, 2015 at 1:52 PM, Sudhakar Thota <[email protected]> wrote: > >> Here it is: >> >> https://maprdrill.atlassian.net/browse/MD-204?filter=-2 >> >> Thanks >> Sudhakar Thota >> >> >> On Apr 2, 2015, at 1:36 PM, Andries Engelbrecht <[email protected]> >> wrote: >> >>> Cool, thx for testing. >>> >>> Best to file a JIRA. >>> >>> —Andries >>> >>> On Apr 2, 2015, at 1:27 PM, Sudhakar Thota <[email protected]> wrote: >>> >>>> Vince/Andries, >>>> >>>> Perhaps this could be a bug. I get the same results. >>>> >>>> But the plan is very different, the UnionExchange is set up immediately >> after the scan operation in successful case( Case -1 ), where as >> UnionExchange is happening after scan->project (Case -2). >>>> >>>> Case -1.Successful case. >>>> >>>> 0: jdbc:drill:> explain plan for select to_timestamp(t.t, >> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from >> dfs.sthota_prq.`/tstamp_test/*.parquet` limit 13015351) t; >>>> +------------+------------+ >>>> | text | json | >>>> +------------+------------+ >>>> | 00-00 Screen >>>> 00-01 Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'), >> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')]) >>>> 00-02 SelectionVectorRemover >>>> 00-03 Limit(fetch=[13015351]) >>>> 00-04 UnionExchange >>>> 01-01 Scan(groupscan=[ParquetGroupScan >> [entries=[ReadEntryWithPath [path=maprfs:/mapr/ >> demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet], >> ReadEntryWithPath [path=maprfs:/mapr/ >> demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet], >> ReadEntryWithPath [path=maprfs:/mapr/ >> demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]], >> selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test, >> numFiles=3, columns=[`*`]]]) >>>> | { >>>> "head" : { >>>> "version" : 1, >>>> "generator" : { >>>> "type" : "ExplainHandler", >>>> "info" : "" >>>> }, >>>> "type" : "APACHE_DRILL_PHYSICAL", >>>> "options" : [ ], >>>> "queue" : 0, >>>> "resultMode" : "EXEC" >>>> }, >>>> >>>> Case -2. Unsuccessful case: >>>> >>>> 0: jdbc:drill:> explain plan for select to_timestamp(t.t, >> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from >> dfs.sthota_prq.`/tstamp_test/*.parquet` ) t; >>>> +------------+------------+ >>>> | text | json | >>>> +------------+------------+ >>>> | 00-00 Screen >>>> 00-01 UnionExchange >>>> 01-01 Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'), >> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')]) >>>> 01-02 Scan(groupscan=[ParquetGroupScan >> [entries=[ReadEntryWithPath [path=maprfs:/mapr/ >> demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet], >> ReadEntryWithPath [path=maprfs:/mapr/ >> demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet], >> ReadEntryWithPath [path=maprfs:/mapr/ >> demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]], >> selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test, >> numFiles=3, columns=[`*`]]]) >>>> | { >>>> "head" : { >>>> "version" : 1, >>>> "generator" : { >>>> "type" : "ExplainHandler", >>>> "info" : "" >>>> }, >>>> "type" : "APACHE_DRILL_PHYSICAL", >>>> "options" : [ ], >>>> "queue" : 0, >>>> "resultMode" : "EXEC" >>>> }, >>>> >>>> Thanks >>>> Sudhakar Thota >>>> >>>> >>>> On Apr 2, 2015, at 12:01 PM, Vince Gonzalez <[email protected]> >> wrote: >>>> >>>>> Ok, will do. Thanks. >>>>> >>>>> On Thu, Apr 2, 2015 at 2:49 PM, Andries Engelbrecht < >>>>> [email protected]> wrote: >>>>> >>>>>> Compare the query plans and you probably want to look at the log file >> to >>>>>> see what fails and post here. >>>>>> >>>>>> >>>>>> >>>>>> —Andries >>>>>> >>>>>> >>>>>> On Apr 1, 2015, at 12:54 PM, Vince Gonzalez <[email protected] >>> >>>>>> wrote: >>>>>> >>>>>>> Is this a bug? >>>>>>> >>>>>>> Created a parquet table (using CTAS) with one column containing text >>>>>>> timestamps. >>>>>>> >>>>>>> 0: jdbc:drill:zk=localhost:2181> select * from tstamp_test limit 1; >>>>>>> +------------+ >>>>>>> | t | >>>>>>> +------------+ >>>>>>> | 2015-01-27T13:43:53.000Z | >>>>>>> +------------+ >>>>>>> 1 row selected (0.119 seconds) >>>>>>> >>>>>>> The below queries, identical apart from the limit clause, behave >>>>>>> differently. The one with the limit clause works, the one without >>>>>> doesn't. >>>>>>> The limit is larger than the total number of rows, so in both cases >> we >>>>>>> should be processing all rows. >>>>>>> >>>>>>> No limit clause. It fails: >>>>>>> >>>>>>> ``` >>>>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t, >>>>>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test) >> as >>>>>> t; >>>>>>> Query failed: RemoteRpcException: Failure while trying to start >> remote >>>>>>> fragment, Expression has syntax error! line 1:30:mismatched input 'T' >>>>>>> expecting CParen [ 7d30d753-0822-4820-afd0-b7e7fe5e639c on >>>>>>> 192.168.99.1:31010 ] >>>>>>> ``` >>>>>>> >>>>>>> Limit clause in the subselect (larger than the number of rows in the >>>>>> table) >>>>>>> succeeds. >>>>>>> >>>>>>> ``` >>>>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t, >>>>>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test >> limit >>>>>>> 100000000) as t; >>>>>>> ... >>>>>>> | 2015-02-17 07:18:00.0 | >>>>>>> +------------+ >>>>>>> 13,015,350 rows selected (105.257 seconds) >>>>>>> ``` >>>>>>> >>>>>>> Data can be downloaded here: >>>>>>> >>>>>>> https://s3.amazonaws.com/vgonzalez/data/tstamp_test.tar.gz >>>>>> >>>>>> >>>> >>> >> >> > > > -- > Steven Phillips > Software Engineer > > mapr.com
