[jira] [Commented] (HIVE-19474) Decimal type should be casted as part of the CTAS or INSERT Clause.

2018-05-14 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474570#comment-16474570
 ] 

Vineet Garg commented on HIVE-19474:


Pushed to branch-3

> Decimal type should be casted as part of the CTAS or INSERT Clause.
> ---
>
> Key: HIVE-19474
> URL: https://issues.apache.org/jira/browse/HIVE-19474
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>  Labels: druid
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-19474.patch
>
>
> HIVE-18569  introduced a runtime config variable to allow the indexing of 
> Decimal as Double, this leads to kind of messy state, Hive metadata think the 
> column is still decimal while it is stored as double. Since the Hive metadata 
> of the column is Decimal the logical optimizer will not push down aggregates. 
> i tried to fix this by adding some logic to the application but it makes the 
> code very clumsy with lot of branches. Instead i propose to revert  
> HIVE-18569  and let the user introduce an explicit cast this will be better 
> since the metada reflects actual storage type and push down aggregates will 
> kick in and there is no config needed without adding any code or bug.
> cc [~ashutoshc] and [~nishantbangarwa]
> You can see the difference with the following DDL
> {code:java}
> create table test_base_table(`timecolumn` timestamp, `interval_marker` 
> string, `num_l` DECIMAL(10,2));
> insert into test_base_table values ('2015-03-08 00:00:00', 'i1-start', 4.5);
> set hive.druid.approx.result=true;
> CREATE TABLE druid_test_table
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.segment.granularity" = "DAY")
> AS
> select cast(`timecolumn` as timestamp with local time zone) as `__time`, 
> `interval_marker`, cast(`num_l` as double)
> FROM test_base_table;
> describe druid_test_table;
> explain select sum(num_l), min(num_l) FROM druid_test_table;
> CREATE TABLE druid_test_table_2
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.segment.granularity" = "DAY")
> AS
> select cast(`timecolumn` as timestamp with local time zone) as `__time`, 
> `interval_marker`, `num_l`
> FROM test_base_table;
> describe druid_test_table_2;
> explain select sum(num_l), min(num_l) FROM druid_test_table_2;
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19474) Decimal type should be casted as part of the CTAS or INSERT Clause.

2018-05-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473390#comment-16473390
 ] 

Ashutosh Chauhan commented on HIVE-19474:
-

+1 I agree it doesnt make sense to declare a column as decimal which actually 
is stored as double.

> Decimal type should be casted as part of the CTAS or INSERT Clause.
> ---
>
> Key: HIVE-19474
> URL: https://issues.apache.org/jira/browse/HIVE-19474
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>  Labels: druid
> Fix For: 3.0.0
>
> Attachments: HIVE-19474.patch
>
>
> HIVE-18569  introduced a runtime config variable to allow the indexing of 
> Decimal as Double, this leads to kind of messy state, Hive metadata think the 
> column is still decimal while it is stored as double. Since the Hive metadata 
> of the column is Decimal the logical optimizer will not push down aggregates. 
> i tried to fix this by adding some logic to the application but it makes the 
> code very clumsy with lot of branches. Instead i propose to revert  
> HIVE-18569  and let the user introduce an explicit cast this will be better 
> since the metada reflects actual storage type and push down aggregates will 
> kick in and there is no config needed without adding any code or bug.
> cc [~ashutoshc] and [~nishantbangarwa]
> You can see the difference with the following DDL
> {code:java}
> create table test_base_table(`timecolumn` timestamp, `interval_marker` 
> string, `num_l` DECIMAL(10,2));
> insert into test_base_table values ('2015-03-08 00:00:00', 'i1-start', 4.5);
> set hive.druid.approx.result=true;
> CREATE TABLE druid_test_table
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.segment.granularity" = "DAY")
> AS
> select cast(`timecolumn` as timestamp with local time zone) as `__time`, 
> `interval_marker`, cast(`num_l` as double)
> FROM test_base_table;
> describe druid_test_table;
> explain select sum(num_l), min(num_l) FROM druid_test_table;
> CREATE TABLE druid_test_table_2
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.segment.granularity" = "DAY")
> AS
> select cast(`timecolumn` as timestamp with local time zone) as `__time`, 
> `interval_marker`, `num_l`
> FROM test_base_table;
> describe druid_test_table_2;
> explain select sum(num_l), min(num_l) FROM druid_test_table_2;
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19474) Decimal type should be casted as part of the CTAS or INSERT Clause.

2018-05-10 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470940#comment-16470940
 ] 

Jesus Camacho Rodriguez commented on HIVE-19474:


[~bslim], could we change the message in {{DruidStorageHandlerUtils}} to "Cast 
to any numeric type supported by Druid: x, y, z, t"? Thanks

> Decimal type should be casted as part of the CTAS or INSERT Clause.
> ---
>
> Key: HIVE-19474
> URL: https://issues.apache.org/jira/browse/HIVE-19474
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>  Labels: druid
> Fix For: 3.0.0
>
> Attachments: HIVE-19474.patch
>
>
> HIVE-18569  introduced a runtime config variable to allow the indexing of 
> Decimal as Double, this leads to kind of messy state, Hive metadata think the 
> column is still decimal while it is stored as double. Since the Hive metadata 
> of the column is Decimal the logical optimizer will not push down aggregates. 
> i tried to fix this by adding some logic to the application but it makes the 
> code very clumsy with lot of branches. Instead i propose to revert  
> HIVE-18569  and let the user introduce an explicit cast this will be better 
> since the metada reflects actual storage type and push down aggregates will 
> kick in and there is no config needed without adding any code or bug.
> cc [~ashutoshc] and [~nishantbangarwa]
> You can see the difference with the following DDL
> {code:java}
> create table test_base_table(`timecolumn` timestamp, `interval_marker` 
> string, `num_l` DECIMAL(10,2));
> insert into test_base_table values ('2015-03-08 00:00:00', 'i1-start', 4.5);
> set hive.druid.approx.result=true;
> CREATE TABLE druid_test_table
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.segment.granularity" = "DAY")
> AS
> select cast(`timecolumn` as timestamp with local time zone) as `__time`, 
> `interval_marker`, cast(`num_l` as double)
> FROM test_base_table;
> describe druid_test_table;
> explain select sum(num_l), min(num_l) FROM druid_test_table;
> CREATE TABLE druid_test_table_2
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.segment.granularity" = "DAY")
> AS
> select cast(`timecolumn` as timestamp with local time zone) as `__time`, 
> `interval_marker`, `num_l`
> FROM test_base_table;
> describe druid_test_table_2;
> explain select sum(num_l), min(num_l) FROM druid_test_table_2;
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19474) Decimal type should be casted as part of the CTAS or INSERT Clause.

2018-05-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470623#comment-16470623
 ] 

Hive QA commented on HIVE-19474:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12922588/HIVE-19474.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 43 failed/errored test(s), 13544 tests 
executed
*Failed tests:*
{noformat}
TestDbNotificationListener - did not produce a TEST-*.xml file (likely timed 
out) (batchId=247)
TestHCatHiveCompatibility - did not produce a TEST-*.xml file (likely timed 
out) (batchId=247)
TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=95)


[jira] [Commented] (HIVE-19474) Decimal type should be casted as part of the CTAS or INSERT Clause.

2018-05-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470419#comment-16470419
 ] 

Hive QA commented on HIVE-19474:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
37s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
30s{color} | {color:blue} common in master has 62 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
25s{color} | {color:blue} druid-handler in master has 12 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} common: The patch generated 0 new + 427 unchanged - 
1 fixed = 427 total (was 428) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
12s{color} | {color:red} druid-handler: The patch generated 1 new + 180 
unchanged - 8 fixed = 181 total (was 188) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
13s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 14m 32s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-10803/dev-support/hive-personality.sh
 |
| git revision | master / 1cd5274 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-10803/yetus/diff-checkstyle-druid-handler.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-10803/yetus/patch-asflicense-problems.txt
 |
| modules | C: common druid-handler U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-10803/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Decimal type should be casted as part of the CTAS or INSERT Clause.
> ---
>
> Key: HIVE-19474
> URL: https://issues.apache.org/jira/browse/HIVE-19474
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>Priority: Major
>  Labels: druid
> Fix For: 3.0.0
>
> Attachments: HIVE-19474.patch
>
>
> HIVE-18569  introduced a runtime config variable to allow the indexing of 
> Decimal as Double, this leads to kind of messy state, Hive metadata think the 
> column is still decimal while it is stored as double. 

[jira] [Commented] (HIVE-19474) Decimal type should be casted as part of the CTAS or INSERT Clause.

2018-05-08 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468274#comment-16468274
 ] 

slim bouguerra commented on HIVE-19474:
---

Results of the posted ddls

{code}

PREHOOK: query: create table test_base_table(`timecolumn` timestamp, 
`interval_marker` string, `num_l` DECIMAL(10,2))
PREHOOK: type: CREATETABLE
PREHOOK: Output: database:default
PREHOOK: Output: default@test_base_table
POSTHOOK: query: create table test_base_table(`timecolumn` timestamp, 
`interval_marker` string, `num_l` DECIMAL(10,2))
POSTHOOK: type: CREATETABLE
POSTHOOK: Output: database:default
POSTHOOK: Output: default@test_base_table
PREHOOK: query: insert into test_base_table values ('2015-03-08 00:00:00', 
'i1-start', 4.5)
PREHOOK: type: QUERY
PREHOOK: Input: _dummy_database@_dummy_table
PREHOOK: Output: default@test_base_table
POSTHOOK: query: insert into test_base_table values ('2015-03-08 00:00:00', 
'i1-start', 4.5)
POSTHOOK: type: QUERY
POSTHOOK: Input: _dummy_database@_dummy_table
POSTHOOK: Output: default@test_base_table
POSTHOOK: Lineage: test_base_table.interval_marker SCRIPT []
POSTHOOK: Lineage: test_base_table.num_l SCRIPT []
POSTHOOK: Lineage: test_base_table.timecolumn SCRIPT []
PREHOOK: query: CREATE TABLE druid_test_table
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.segment.granularity" = "DAY")
AS
select cast(`timecolumn` as timestamp with local time zone) as `__time`, 
`interval_marker`, cast(`num_l` as double)
FROM test_base_table
PREHOOK: type: CREATETABLE_AS_SELECT
PREHOOK: Input: default@test_base_table
PREHOOK: Output: database:default
PREHOOK: Output: default@druid_test_table
POSTHOOK: query: CREATE TABLE druid_test_table
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.segment.granularity" = "DAY")
AS
select cast(`timecolumn` as timestamp with local time zone) as `__time`, 
`interval_marker`, cast(`num_l` as double)
FROM test_base_table
POSTHOOK: type: CREATETABLE_AS_SELECT
POSTHOOK: Input: default@test_base_table
POSTHOOK: Output: database:default
POSTHOOK: Output: default@druid_test_table
POSTHOOK: Lineage: druid_test_table.__time EXPRESSION 
[(test_base_table)test_base_table.FieldSchema(name:timecolumn, type:timestamp, 
comment:null), ]
POSTHOOK: Lineage: druid_test_table.interval_marker SIMPLE 
[(test_base_table)test_base_table.FieldSchema(name:interval_marker, 
type:string, comment:null), ]
POSTHOOK: Lineage: druid_test_table.num_l EXPRESSION 
[(test_base_table)test_base_table.FieldSchema(name:num_l, type:decimal(10,2), 
comment:null), ]
PREHOOK: query: describe druid_test_table
PREHOOK: type: DESCTABLE
PREHOOK: Input: default@druid_test_table
POSTHOOK: query: describe druid_test_table
POSTHOOK: type: DESCTABLE
POSTHOOK: Input: default@druid_test_table
__time timestamp with local time zone from deserializer 
interval_marker string from deserializer 
num_l double from deserializer 
PREHOOK: query: explain select sum(num_l), min(num_l) FROM druid_test_table
PREHOOK: type: QUERY
POSTHOOK: query: explain select sum(num_l), min(num_l) FROM druid_test_table
POSTHOOK: type: QUERY
STAGE DEPENDENCIES:
 Stage-0 is a root stage

STAGE PLANS:
 Stage: Stage-0
 Fetch Operator
 limit: -1
 Processor Tree:
 TableScan
 alias: druid_test_table
 properties:
 druid.fieldNames $f0,$f1
 druid.fieldTypes double,double
 druid.query.json 
\{"queryType":"timeseries","dataSource":"default.druid_test_table","descending":false,"granularity":"all","aggregations":[{"type":"doubleSum","name":"$f0","fieldName":"num_l"},\{"type":"doubleMin","name":"$f1","fieldName":"num_l"}],"intervals":["1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"],"context":\{"skipEmptyBuckets":true}}
 druid.query.type timeseries
 Select Operator
 expressions: $f0 (type: double), $f1 (type: double)
 outputColumnNames: _col0, _col1
 ListSink

PREHOOK: query: CREATE TABLE druid_test_table_2
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.segment.granularity" = "DAY")
AS
select cast(`timecolumn` as timestamp with local time zone) as `__time`, 
`interval_marker`, `num_l`
FROM test_base_table
PREHOOK: type: CREATETABLE_AS_SELECT
PREHOOK: Input: default@test_base_table
PREHOOK: Output: database:default
PREHOOK: Output: default@druid_test_table_2
POSTHOOK: query: CREATE TABLE druid_test_table_2
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.segment.granularity" = "DAY")
AS
select cast(`timecolumn` as timestamp with local time zone) as `__time`, 
`interval_marker`, `num_l`
FROM test_base_table
POSTHOOK: type: CREATETABLE_AS_SELECT
POSTHOOK: Input: default@test_base_table
POSTHOOK: Output: database:default
POSTHOOK: Output: default@druid_test_table_2
POSTHOOK: Lineage: druid_test_table_2.__time EXPRESSION 
[(test_base_table)test_base_table.FieldSchema(name:timecolumn, type:timestamp, 
comment:null), ]
POSTHOOK: Lineage: