Vince/Andries,
Perhaps this could be a bug. I get the same results.
But the plan is very different, the UnionExchange is set up immediately after
the scan operation in successful case( Case -1 ), where as UnionExchange is
happening after scan->project (Case -2).
Case -1.Successful case.
0: jdbc:drill:> explain plan for select to_timestamp(t.t,
'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from
dfs.sthota_prq.`/tstamp_test/*.parquet` limit 13015351) t;
+------------+------------+
| text | json |
+------------+------------+
| 00-00 Screen
00-01 Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'),
'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
00-02 SelectionVectorRemover
00-03 Limit(fetch=[13015351])
00-04 UnionExchange
01-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet],
ReadEntryWithPath
[path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet],
ReadEntryWithPath
[path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]],
selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test, numFiles=3,
columns=[`*`]]])
| {
"head" : {
"version" : 1,
"generator" : {
"type" : "ExplainHandler",
"info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ ],
"queue" : 0,
"resultMode" : "EXEC"
},
Case -2. Unsuccessful case:
0: jdbc:drill:> explain plan for select to_timestamp(t.t,
'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from
dfs.sthota_prq.`/tstamp_test/*.parquet` ) t;
+------------+------------+
| text | json |
+------------+------------+
| 00-00 Screen
00-01 UnionExchange
01-01 Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'),
'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
01-02 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet],
ReadEntryWithPath
[path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet],
ReadEntryWithPath
[path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]],
selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test, numFiles=3,
columns=[`*`]]])
| {
"head" : {
"version" : 1,
"generator" : {
"type" : "ExplainHandler",
"info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ ],
"queue" : 0,
"resultMode" : "EXEC"
},
Thanks
Sudhakar Thota
On Apr 2, 2015, at 12:01 PM, Vince Gonzalez <[email protected]> wrote:
> Ok, will do. Thanks.
>
> On Thu, Apr 2, 2015 at 2:49 PM, Andries Engelbrecht <
> [email protected]> wrote:
>
>> Compare the query plans and you probably want to look at the log file to
>> see what fails and post here.
>>
>>
>>
>> —Andries
>>
>>
>> On Apr 1, 2015, at 12:54 PM, Vince Gonzalez <[email protected]>
>> wrote:
>>
>>> Is this a bug?
>>>
>>> Created a parquet table (using CTAS) with one column containing text
>>> timestamps.
>>>
>>> 0: jdbc:drill:zk=localhost:2181> select * from tstamp_test limit 1;
>>> +------------+
>>> | t |
>>> +------------+
>>> | 2015-01-27T13:43:53.000Z |
>>> +------------+
>>> 1 row selected (0.119 seconds)
>>>
>>> The below queries, identical apart from the limit clause, behave
>>> differently. The one with the limit clause works, the one without
>> doesn't.
>>> The limit is larger than the total number of rows, so in both cases we
>>> should be processing all rows.
>>>
>>> No limit clause. It fails:
>>>
>>> ```
>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test) as
>> t;
>>> Query failed: RemoteRpcException: Failure while trying to start remote
>>> fragment, Expression has syntax error! line 1:30:mismatched input 'T'
>>> expecting CParen [ 7d30d753-0822-4820-afd0-b7e7fe5e639c on
>>> 192.168.99.1:31010 ]
>>> ```
>>>
>>> Limit clause in the subselect (larger than the number of rows in the
>> table)
>>> succeeds.
>>>
>>> ```
>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test limit
>>> 100000000) as t;
>>> ...
>>> | 2015-02-17 07:18:00.0 |
>>> +------------+
>>> 13,015,350 rows selected (105.257 seconds)
>>> ```
>>>
>>> Data can be downloaded here:
>>>
>>> https://s3.amazonaws.com/vgonzalez/data/tstamp_test.tar.gz
>>
>>