Re: more on parsing timestamps

Sudhakar Thota Thu, 02 Apr 2015 16:36:45 -0700

Steven,

Thanks. Andries already pointed that to me and I opened public jira.


https://issues.apache.org/jira/browse/DRILL-2669

Thanks
Sudhakar Thota


On Apr 2, 2015, at 4:09 PM, Steven Phillips <[email protected]> wrote:

> Could you please file a public jira. That link is to an internal issue.
> 
> On Thu, Apr 2, 2015 at 1:52 PM, Sudhakar Thota <[email protected]> wrote:
> 
>> Here it is:
>> 
>> https://maprdrill.atlassian.net/browse/MD-204?filter=-2
>> 
>> Thanks
>> Sudhakar Thota
>> 
>> 
>> On Apr 2, 2015, at 1:36 PM, Andries Engelbrecht <[email protected]>
>> wrote:
>> 
>>> Cool, thx for testing.
>>> 
>>> Best to file a JIRA.
>>> 
>>> —Andries
>>> 
>>> On Apr 2, 2015, at 1:27 PM, Sudhakar Thota <[email protected]> wrote:
>>> 
>>>> Vince/Andries,
>>>> 
>>>> Perhaps this could be a bug. I get the same results.
>>>> 
>>>> But the plan is very different, the UnionExchange is set up immediately
>> after the scan operation in successful case( Case -1 ), where as
>> UnionExchange is happening after scan->project (Case -2).
>>>> 
>>>> Case -1.Successful case.
>>>> 
>>>> 0: jdbc:drill:> explain plan for select to_timestamp(t.t,
>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from
>> dfs.sthota_prq.`/tstamp_test/*.parquet` limit 13015351) t;
>>>> +------------+------------+
>>>> |    text    |    json    |
>>>> +------------+------------+
>>>> | 00-00    Screen
>>>> 00-01      Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'),
>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
>>>> 00-02        SelectionVectorRemover
>>>> 00-03          Limit(fetch=[13015351])
>>>> 00-04            UnionExchange
>>>> 01-01              Scan(groupscan=[ParquetGroupScan
>> [entries=[ReadEntryWithPath [path=maprfs:/mapr/
>> demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet],
>> ReadEntryWithPath [path=maprfs:/mapr/
>> demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet],
>> ReadEntryWithPath [path=maprfs:/mapr/
>> demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]],
>> selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test,
>> numFiles=3, columns=[`*`]]])
>>>> | {
>>>> "head" : {
>>>>  "version" : 1,
>>>>  "generator" : {
>>>>    "type" : "ExplainHandler",
>>>>    "info" : ""
>>>>  },
>>>>  "type" : "APACHE_DRILL_PHYSICAL",
>>>>  "options" : [ ],
>>>>  "queue" : 0,
>>>>  "resultMode" : "EXEC"
>>>> },
>>>> 
>>>> Case -2. Unsuccessful case:
>>>> 
>>>> 0: jdbc:drill:> explain plan for select to_timestamp(t.t,
>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from
>> dfs.sthota_prq.`/tstamp_test/*.parquet` ) t;
>>>> +------------+------------+
>>>> |    text    |    json    |
>>>> +------------+------------+
>>>> | 00-00    Screen
>>>> 00-01      UnionExchange
>>>> 01-01        Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'),
>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
>>>> 01-02          Scan(groupscan=[ParquetGroupScan
>> [entries=[ReadEntryWithPath [path=maprfs:/mapr/
>> demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet],
>> ReadEntryWithPath [path=maprfs:/mapr/
>> demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet],
>> ReadEntryWithPath [path=maprfs:/mapr/
>> demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]],
>> selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test,
>> numFiles=3, columns=[`*`]]])
>>>> | {
>>>> "head" : {
>>>>  "version" : 1,
>>>>  "generator" : {
>>>>    "type" : "ExplainHandler",
>>>>    "info" : ""
>>>>  },
>>>>  "type" : "APACHE_DRILL_PHYSICAL",
>>>>  "options" : [ ],
>>>>  "queue" : 0,
>>>>  "resultMode" : "EXEC"
>>>> },
>>>> 
>>>> Thanks
>>>> Sudhakar Thota
>>>> 
>>>> 
>>>> On Apr 2, 2015, at 12:01 PM, Vince Gonzalez <[email protected]>
>> wrote:
>>>> 
>>>>> Ok, will do. Thanks.
>>>>> 
>>>>> On Thu, Apr 2, 2015 at 2:49 PM, Andries Engelbrecht <
>>>>> [email protected]> wrote:
>>>>> 
>>>>>> Compare the query plans and you probably want to look at the log file
>> to
>>>>>> see what fails and post here.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> —Andries
>>>>>> 
>>>>>> 
>>>>>> On Apr 1, 2015, at 12:54 PM, Vince Gonzalez <[email protected]
>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Is this a bug?
>>>>>>> 
>>>>>>> Created a parquet table (using CTAS) with one column containing text
>>>>>>> timestamps.
>>>>>>> 
>>>>>>> 0: jdbc:drill:zk=localhost:2181> select * from tstamp_test limit 1;
>>>>>>> +------------+
>>>>>>> |     t      |
>>>>>>> +------------+
>>>>>>> | 2015-01-27T13:43:53.000Z |
>>>>>>> +------------+
>>>>>>> 1 row selected (0.119 seconds)
>>>>>>> 
>>>>>>> The below queries, identical apart from the limit clause, behave
>>>>>>> differently. The one with the limit clause works, the one without
>>>>>> doesn't.
>>>>>>> The limit is larger than the total number of rows, so in both cases
>> we
>>>>>>> should be processing all rows.
>>>>>>> 
>>>>>>> No limit clause. It fails:
>>>>>>> 
>>>>>>> ```
>>>>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
>>>>>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test)
>> as
>>>>>> t;
>>>>>>> Query failed: RemoteRpcException: Failure while trying to start
>> remote
>>>>>>> fragment, Expression has syntax error! line 1:30:mismatched input 'T'
>>>>>>> expecting CParen [ 7d30d753-0822-4820-afd0-b7e7fe5e639c on
>>>>>>> 192.168.99.1:31010 ]
>>>>>>> ```
>>>>>>> 
>>>>>>> Limit clause in the subselect (larger than the number of rows in the
>>>>>> table)
>>>>>>> succeeds.
>>>>>>> 
>>>>>>> ```
>>>>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
>>>>>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test
>> limit
>>>>>>> 100000000) as t;
>>>>>>> ...
>>>>>>> | 2015-02-17 07:18:00.0 |
>>>>>>> +------------+
>>>>>>> 13,015,350 rows selected (105.257 seconds)
>>>>>>> ```
>>>>>>> 
>>>>>>> Data can be downloaded here:
>>>>>>> 
>>>>>>> https://s3.amazonaws.com/vgonzalez/data/tstamp_test.tar.gz
>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
>> 
> 
> 
> -- 
> Steven Phillips
> Software Engineer
> 
> mapr.com

Re: more on parsing timestamps

Reply via email to