[jira] [Commented] (HIVE-19474) Decimal type should be casted as part of the CTAS or INSERT Clause.
[ https://issues.apache.org/jira/browse/HIVE-19474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474570#comment-16474570 ] Vineet Garg commented on HIVE-19474: Pushed to branch-3 > Decimal type should be casted as part of the CTAS or INSERT Clause. > --- > > Key: HIVE-19474 > URL: https://issues.apache.org/jira/browse/HIVE-19474 > Project: Hive > Issue Type: Bug > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra >Priority: Major > Labels: druid > Fix For: 3.0.0, 3.1.0 > > Attachments: HIVE-19474.patch > > > HIVE-18569 introduced a runtime config variable to allow the indexing of > Decimal as Double, this leads to kind of messy state, Hive metadata think the > column is still decimal while it is stored as double. Since the Hive metadata > of the column is Decimal the logical optimizer will not push down aggregates. > i tried to fix this by adding some logic to the application but it makes the > code very clumsy with lot of branches. Instead i propose to revert > HIVE-18569 and let the user introduce an explicit cast this will be better > since the metada reflects actual storage type and push down aggregates will > kick in and there is no config needed without adding any code or bug. > cc [~ashutoshc] and [~nishantbangarwa] > You can see the difference with the following DDL > {code:java} > create table test_base_table(`timecolumn` timestamp, `interval_marker` > string, `num_l` DECIMAL(10,2)); > insert into test_base_table values ('2015-03-08 00:00:00', 'i1-start', 4.5); > set hive.druid.approx.result=true; > CREATE TABLE druid_test_table > STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' > TBLPROPERTIES ("druid.segment.granularity" = "DAY") > AS > select cast(`timecolumn` as timestamp with local time zone) as `__time`, > `interval_marker`, cast(`num_l` as double) > FROM test_base_table; > describe druid_test_table; > explain select sum(num_l), min(num_l) FROM druid_test_table; > CREATE TABLE druid_test_table_2 > STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' > TBLPROPERTIES ("druid.segment.granularity" = "DAY") > AS > select cast(`timecolumn` as timestamp with local time zone) as `__time`, > `interval_marker`, `num_l` > FROM test_base_table; > describe druid_test_table_2; > explain select sum(num_l), min(num_l) FROM druid_test_table_2; > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19474) Decimal type should be casted as part of the CTAS or INSERT Clause.
[ https://issues.apache.org/jira/browse/HIVE-19474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473390#comment-16473390 ] Ashutosh Chauhan commented on HIVE-19474: - +1 I agree it doesnt make sense to declare a column as decimal which actually is stored as double. > Decimal type should be casted as part of the CTAS or INSERT Clause. > --- > > Key: HIVE-19474 > URL: https://issues.apache.org/jira/browse/HIVE-19474 > Project: Hive > Issue Type: Bug > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra >Priority: Major > Labels: druid > Fix For: 3.0.0 > > Attachments: HIVE-19474.patch > > > HIVE-18569 introduced a runtime config variable to allow the indexing of > Decimal as Double, this leads to kind of messy state, Hive metadata think the > column is still decimal while it is stored as double. Since the Hive metadata > of the column is Decimal the logical optimizer will not push down aggregates. > i tried to fix this by adding some logic to the application but it makes the > code very clumsy with lot of branches. Instead i propose to revert > HIVE-18569 and let the user introduce an explicit cast this will be better > since the metada reflects actual storage type and push down aggregates will > kick in and there is no config needed without adding any code or bug. > cc [~ashutoshc] and [~nishantbangarwa] > You can see the difference with the following DDL > {code:java} > create table test_base_table(`timecolumn` timestamp, `interval_marker` > string, `num_l` DECIMAL(10,2)); > insert into test_base_table values ('2015-03-08 00:00:00', 'i1-start', 4.5); > set hive.druid.approx.result=true; > CREATE TABLE druid_test_table > STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' > TBLPROPERTIES ("druid.segment.granularity" = "DAY") > AS > select cast(`timecolumn` as timestamp with local time zone) as `__time`, > `interval_marker`, cast(`num_l` as double) > FROM test_base_table; > describe druid_test_table; > explain select sum(num_l), min(num_l) FROM druid_test_table; > CREATE TABLE druid_test_table_2 > STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' > TBLPROPERTIES ("druid.segment.granularity" = "DAY") > AS > select cast(`timecolumn` as timestamp with local time zone) as `__time`, > `interval_marker`, `num_l` > FROM test_base_table; > describe druid_test_table_2; > explain select sum(num_l), min(num_l) FROM druid_test_table_2; > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19474) Decimal type should be casted as part of the CTAS or INSERT Clause.
[ https://issues.apache.org/jira/browse/HIVE-19474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470940#comment-16470940 ] Jesus Camacho Rodriguez commented on HIVE-19474: [~bslim], could we change the message in {{DruidStorageHandlerUtils}} to "Cast to any numeric type supported by Druid: x, y, z, t"? Thanks > Decimal type should be casted as part of the CTAS or INSERT Clause. > --- > > Key: HIVE-19474 > URL: https://issues.apache.org/jira/browse/HIVE-19474 > Project: Hive > Issue Type: Bug > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra >Priority: Major > Labels: druid > Fix For: 3.0.0 > > Attachments: HIVE-19474.patch > > > HIVE-18569 introduced a runtime config variable to allow the indexing of > Decimal as Double, this leads to kind of messy state, Hive metadata think the > column is still decimal while it is stored as double. Since the Hive metadata > of the column is Decimal the logical optimizer will not push down aggregates. > i tried to fix this by adding some logic to the application but it makes the > code very clumsy with lot of branches. Instead i propose to revert > HIVE-18569 and let the user introduce an explicit cast this will be better > since the metada reflects actual storage type and push down aggregates will > kick in and there is no config needed without adding any code or bug. > cc [~ashutoshc] and [~nishantbangarwa] > You can see the difference with the following DDL > {code:java} > create table test_base_table(`timecolumn` timestamp, `interval_marker` > string, `num_l` DECIMAL(10,2)); > insert into test_base_table values ('2015-03-08 00:00:00', 'i1-start', 4.5); > set hive.druid.approx.result=true; > CREATE TABLE druid_test_table > STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' > TBLPROPERTIES ("druid.segment.granularity" = "DAY") > AS > select cast(`timecolumn` as timestamp with local time zone) as `__time`, > `interval_marker`, cast(`num_l` as double) > FROM test_base_table; > describe druid_test_table; > explain select sum(num_l), min(num_l) FROM druid_test_table; > CREATE TABLE druid_test_table_2 > STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' > TBLPROPERTIES ("druid.segment.granularity" = "DAY") > AS > select cast(`timecolumn` as timestamp with local time zone) as `__time`, > `interval_marker`, `num_l` > FROM test_base_table; > describe druid_test_table_2; > explain select sum(num_l), min(num_l) FROM druid_test_table_2; > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-19474) Decimal type should be casted as part of the CTAS or INSERT Clause.
[ https://issues.apache.org/jira/browse/HIVE-19474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470623#comment-16470623 ] Hive QA commented on HIVE-19474: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12922588/HIVE-19474.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 43 failed/errored test(s), 13544 tests executed *Failed tests:* {noformat} TestDbNotificationListener - did not produce a TEST-*.xml file (likely timed out) (batchId=247) TestHCatHiveCompatibility - did not produce a TEST-*.xml file (likely timed out) (batchId=247) TestNegativeCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=95)
[jira] [Commented] (HIVE-19474) Decimal type should be casted as part of the CTAS or INSERT Clause.
[ https://issues.apache.org/jira/browse/HIVE-19474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470419#comment-16470419 ] Hive QA commented on HIVE-19474: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 37s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 30s{color} | {color:blue} common in master has 62 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 25s{color} | {color:blue} druid-handler in master has 12 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} common: The patch generated 0 new + 427 unchanged - 1 fixed = 427 total (was 428) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s{color} | {color:red} druid-handler: The patch generated 1 new + 180 unchanged - 8 fixed = 181 total (was 188) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 13s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 14m 32s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-10803/dev-support/hive-personality.sh | | git revision | master / 1cd5274 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-10803/yetus/diff-checkstyle-druid-handler.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-10803/yetus/patch-asflicense-problems.txt | | modules | C: common druid-handler U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-10803/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Decimal type should be casted as part of the CTAS or INSERT Clause. > --- > > Key: HIVE-19474 > URL: https://issues.apache.org/jira/browse/HIVE-19474 > Project: Hive > Issue Type: Bug > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra >Priority: Major > Labels: druid > Fix For: 3.0.0 > > Attachments: HIVE-19474.patch > > > HIVE-18569 introduced a runtime config variable to allow the indexing of > Decimal as Double, this leads to kind of messy state, Hive metadata think the > column is still decimal while it is stored as double.
[jira] [Commented] (HIVE-19474) Decimal type should be casted as part of the CTAS or INSERT Clause.
[ https://issues.apache.org/jira/browse/HIVE-19474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468274#comment-16468274 ] slim bouguerra commented on HIVE-19474: --- Results of the posted ddls {code} PREHOOK: query: create table test_base_table(`timecolumn` timestamp, `interval_marker` string, `num_l` DECIMAL(10,2)) PREHOOK: type: CREATETABLE PREHOOK: Output: database:default PREHOOK: Output: default@test_base_table POSTHOOK: query: create table test_base_table(`timecolumn` timestamp, `interval_marker` string, `num_l` DECIMAL(10,2)) POSTHOOK: type: CREATETABLE POSTHOOK: Output: database:default POSTHOOK: Output: default@test_base_table PREHOOK: query: insert into test_base_table values ('2015-03-08 00:00:00', 'i1-start', 4.5) PREHOOK: type: QUERY PREHOOK: Input: _dummy_database@_dummy_table PREHOOK: Output: default@test_base_table POSTHOOK: query: insert into test_base_table values ('2015-03-08 00:00:00', 'i1-start', 4.5) POSTHOOK: type: QUERY POSTHOOK: Input: _dummy_database@_dummy_table POSTHOOK: Output: default@test_base_table POSTHOOK: Lineage: test_base_table.interval_marker SCRIPT [] POSTHOOK: Lineage: test_base_table.num_l SCRIPT [] POSTHOOK: Lineage: test_base_table.timecolumn SCRIPT [] PREHOOK: query: CREATE TABLE druid_test_table STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' TBLPROPERTIES ("druid.segment.granularity" = "DAY") AS select cast(`timecolumn` as timestamp with local time zone) as `__time`, `interval_marker`, cast(`num_l` as double) FROM test_base_table PREHOOK: type: CREATETABLE_AS_SELECT PREHOOK: Input: default@test_base_table PREHOOK: Output: database:default PREHOOK: Output: default@druid_test_table POSTHOOK: query: CREATE TABLE druid_test_table STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' TBLPROPERTIES ("druid.segment.granularity" = "DAY") AS select cast(`timecolumn` as timestamp with local time zone) as `__time`, `interval_marker`, cast(`num_l` as double) FROM test_base_table POSTHOOK: type: CREATETABLE_AS_SELECT POSTHOOK: Input: default@test_base_table POSTHOOK: Output: database:default POSTHOOK: Output: default@druid_test_table POSTHOOK: Lineage: druid_test_table.__time EXPRESSION [(test_base_table)test_base_table.FieldSchema(name:timecolumn, type:timestamp, comment:null), ] POSTHOOK: Lineage: druid_test_table.interval_marker SIMPLE [(test_base_table)test_base_table.FieldSchema(name:interval_marker, type:string, comment:null), ] POSTHOOK: Lineage: druid_test_table.num_l EXPRESSION [(test_base_table)test_base_table.FieldSchema(name:num_l, type:decimal(10,2), comment:null), ] PREHOOK: query: describe druid_test_table PREHOOK: type: DESCTABLE PREHOOK: Input: default@druid_test_table POSTHOOK: query: describe druid_test_table POSTHOOK: type: DESCTABLE POSTHOOK: Input: default@druid_test_table __time timestamp with local time zone from deserializer interval_marker string from deserializer num_l double from deserializer PREHOOK: query: explain select sum(num_l), min(num_l) FROM druid_test_table PREHOOK: type: QUERY POSTHOOK: query: explain select sum(num_l), min(num_l) FROM druid_test_table POSTHOOK: type: QUERY STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: druid_test_table properties: druid.fieldNames $f0,$f1 druid.fieldTypes double,double druid.query.json \{"queryType":"timeseries","dataSource":"default.druid_test_table","descending":false,"granularity":"all","aggregations":[{"type":"doubleSum","name":"$f0","fieldName":"num_l"},\{"type":"doubleMin","name":"$f1","fieldName":"num_l"}],"intervals":["1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"],"context":\{"skipEmptyBuckets":true}} druid.query.type timeseries Select Operator expressions: $f0 (type: double), $f1 (type: double) outputColumnNames: _col0, _col1 ListSink PREHOOK: query: CREATE TABLE druid_test_table_2 STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' TBLPROPERTIES ("druid.segment.granularity" = "DAY") AS select cast(`timecolumn` as timestamp with local time zone) as `__time`, `interval_marker`, `num_l` FROM test_base_table PREHOOK: type: CREATETABLE_AS_SELECT PREHOOK: Input: default@test_base_table PREHOOK: Output: database:default PREHOOK: Output: default@druid_test_table_2 POSTHOOK: query: CREATE TABLE druid_test_table_2 STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' TBLPROPERTIES ("druid.segment.granularity" = "DAY") AS select cast(`timecolumn` as timestamp with local time zone) as `__time`, `interval_marker`, `num_l` FROM test_base_table POSTHOOK: type: CREATETABLE_AS_SELECT POSTHOOK: Input: default@test_base_table POSTHOOK: Output: database:default POSTHOOK: Output: default@druid_test_table_2 POSTHOOK: Lineage: druid_test_table_2.__time EXPRESSION [(test_base_table)test_base_table.FieldSchema(name:timecolumn, type:timestamp, comment:null), ] POSTHOOK: Lineage: