[jira] [Commented] (HIVE-15634) Hive/Druid integration: Timestamp column inconsistent w/o Fetch optimization

2017-03-25 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941902#comment-15941902
 ] 

Pengcheng Xiong commented on HIVE-15634:


I am deferring this to Hive 3.0 as we are going to cut the first RC and it is 
not marked as blocker.


> Hive/Druid integration: Timestamp column inconsistent w/o Fetch optimization
> 
>
> Key: HIVE-15634
> URL: https://issues.apache.org/jira/browse/HIVE-15634
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: slim bouguerra
>Priority: Critical
>
> {{SET hive.tez.java.opts=-Duser.timezone="UTC";}} can be used to change 
> timezone for Tez tasks. However, when Fetch optimizer kicks in because we can 
> push the full query to Druid, we obtain different values for the timestamp 
> than when jobs are executed. This probably has to do with the timezone on the 
> client side. How should we handle this issue?
> For instance, this can be observed with the following query:
> {code:sql}
> set hive.fetch.task.conversion=more;
> SELECT DISTINCT `__time`
> FROM store_sales_sold_time_subset
> WHERE `__time` < '1999-11-10 00:00:00';
> OK
> 1999-10-31 19:00:00
> 1999-11-01 19:00:00
> 1999-11-02 19:00:00
> 1999-11-03 19:00:00
> 1999-11-04 19:00:00
> 1999-11-05 19:00:00
> 1999-11-06 19:00:00
> 1999-11-07 19:00:00
> 1999-11-08 19:00:00
> set hive.fetch.task.conversion=none;
> SELECT DISTINCT `__time`
> FROM store_sales_sold_time_subset
> WHERE `__time` < '1999-11-10 00:00:00';
> OK
> 1999-11-01 00:00:00
> 1999-11-02 00:00:00
> 1999-11-03 00:00:00
> 1999-11-04 00:00:00
> 1999-11-05 00:00:00
> 1999-11-06 00:00:00
> 1999-11-07 00:00:00
> 1999-11-08 00:00:00
> 1999-11-09 00:00:00
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15634) Hive/Druid integration: Timestamp column inconsistent w/o Fetch optimization

2017-03-25 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941894#comment-15941894
 ] 

Pengcheng Xiong commented on HIVE-15634:


[~jcamachorodriguez], will this go into 2.3.0? Thanks.

> Hive/Druid integration: Timestamp column inconsistent w/o Fetch optimization
> 
>
> Key: HIVE-15634
> URL: https://issues.apache.org/jira/browse/HIVE-15634
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: slim bouguerra
>Priority: Critical
>
> {{SET hive.tez.java.opts=-Duser.timezone="UTC";}} can be used to change 
> timezone for Tez tasks. However, when Fetch optimizer kicks in because we can 
> push the full query to Druid, we obtain different values for the timestamp 
> than when jobs are executed. This probably has to do with the timezone on the 
> client side. How should we handle this issue?
> For instance, this can be observed with the following query:
> {code:sql}
> set hive.fetch.task.conversion=more;
> SELECT DISTINCT `__time`
> FROM store_sales_sold_time_subset
> WHERE `__time` < '1999-11-10 00:00:00';
> OK
> 1999-10-31 19:00:00
> 1999-11-01 19:00:00
> 1999-11-02 19:00:00
> 1999-11-03 19:00:00
> 1999-11-04 19:00:00
> 1999-11-05 19:00:00
> 1999-11-06 19:00:00
> 1999-11-07 19:00:00
> 1999-11-08 19:00:00
> set hive.fetch.task.conversion=none;
> SELECT DISTINCT `__time`
> FROM store_sales_sold_time_subset
> WHERE `__time` < '1999-11-10 00:00:00';
> OK
> 1999-11-01 00:00:00
> 1999-11-02 00:00:00
> 1999-11-03 00:00:00
> 1999-11-04 00:00:00
> 1999-11-05 00:00:00
> 1999-11-06 00:00:00
> 1999-11-07 00:00:00
> 1999-11-08 00:00:00
> 1999-11-09 00:00:00
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15634) Hive/Druid integration: Timestamp column inconsistent w/o Fetch optimization

2017-02-01 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849116#comment-15849116
 ] 

slim bouguerra commented on HIVE-15634:
---

making sure that the timezone is the same as the hive server will solve the 
issue.
Hence avoid doing {code} SET hive.tez.java.opts=-Duser.timezone="UTC" {code}

> Hive/Druid integration: Timestamp column inconsistent w/o Fetch optimization
> 
>
> Key: HIVE-15634
> URL: https://issues.apache.org/jira/browse/HIVE-15634
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
>
> {{SET hive.tez.java.opts=-Duser.timezone="UTC";}} can be used to change 
> timezone for Tez tasks. However, when Fetch optimizer kicks in because we can 
> push the full query to Druid, we obtain different values for the timestamp 
> than when jobs are executed. This probably has to do with the timezone on the 
> client side. How should we handle this issue?
> For instance, this can be observed with the following query:
> {code:sql}
> set hive.fetch.task.conversion=more;
> SELECT DISTINCT `__time`
> FROM store_sales_sold_time_subset
> WHERE `__time` < '1999-11-10 00:00:00';
> OK
> 1999-10-31 19:00:00
> 1999-11-01 19:00:00
> 1999-11-02 19:00:00
> 1999-11-03 19:00:00
> 1999-11-04 19:00:00
> 1999-11-05 19:00:00
> 1999-11-06 19:00:00
> 1999-11-07 19:00:00
> 1999-11-08 19:00:00
> set hive.fetch.task.conversion=none;
> SELECT DISTINCT `__time`
> FROM store_sales_sold_time_subset
> WHERE `__time` < '1999-11-10 00:00:00';
> OK
> 1999-11-01 00:00:00
> 1999-11-02 00:00:00
> 1999-11-03 00:00:00
> 1999-11-04 00:00:00
> 1999-11-05 00:00:00
> 1999-11-06 00:00:00
> 1999-11-07 00:00:00
> 1999-11-08 00:00:00
> 1999-11-09 00:00:00
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)