[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine
[ https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233670#comment-16233670 ] Lefty Leverenz commented on HIVE-17433: --- Doc note: This adds *hive.vectorized.input.format.supports.enabled* and *hive.test.vectorized.execution.enabled.override* to HiveConf.java. Only *hive.vectorized.input.format.supports.enabled* needs to be documented in the wiki, because *hive.test.vectorized.execution.enabled.override* is for internal use only. Besides documenting the configuration parameter, perhaps this should also be mentioned in the Data Types doc. * [Configuration Properties -- Vectorization | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Vectorization] * [Hive Data Types -- Decimals | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-DecimalsdecimalDecimals] Added a TODOC3.0 label. ([~mmccline], please update the fix version.) > Vectorization: Support Decimal64 in Hive Query Engine > - > > Key: HIVE-17433 > URL: https://issues.apache.org/jira/browse/HIVE-17433 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Labels: TODOC3.0 > Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, > HIVE-17433.05.patch, HIVE-17433.06.patch, HIVE-17433.07.patch, > HIVE-17433.08.patch, HIVE-17433.09.patch, HIVE-17433.091.patch, > HIVE-17433.092.patch, HIVE-17433.093.patch, HIVE-17433.094.patch > > > Provide partial support for Decimal64 within Hive. By partial I mean that > our current decimal has a large surface area of features (rounding, multiply, > divide, remainder, power, big precision, and many more) but only a small > number has been identified as being performance hotspots. > Those are small precision decimals with precision <= 18 that fit within a > 64-bit long we are calling Decimal64 . Just as we optimize row-mode > execution engine hotspots by selectively adding new vectorization code, we > can treat the current decimal as the full featured one and add additional > Decimal64 optimization where query benchmarks really show it help. > This change creates a Decimal64ColumnVector. > This change currently detects small decimal with Hive for Vectorized text > input format and uses some new Decimal64 vectorized classes for comparison, > addition, and later perhaps a few GroupBy aggregations like sum, avg, min, > max. > The patch also supports a new annotation that can mark a > VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64). So, > in separate work those other formats such as ORC, PARQUET, etc can be done in > later JIRAs so they participate in the Decimal64 performance optimization. > The idea is when you annotate your input format with: > @VectorizedInputFormatSupports(supports = {DECIMAL_64}) > the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of > DecimalColumnVector. Upon an input format seeing Decimal64ColumnVector being > used, the input format can fill that column vector with decimal64 longs > instead of HiveDecimalWritable objects of DecimalColumnVector. > There will be a Hive environment variable > hive.vectorized.input.format.supports.enabled that has a string list of > supported features. The default will start as "decimal_64". It can be > turned off to allow for performance comparisons and testing. > The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY > key, value > Will have a vectorized explain plan looking like: > ... > Filter Operator > Filter Vectorization: > className: VectorFilterOperator > native: true > predicateExpression: > FilterDecimal64ColLessDecimal64Scalar(col 2, val 2000)(children: > Decimal64ColSubtractDecimal64Scalar(col 0, val 1000, > outputDecimal64AbsMax 999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean > predicate: ((key - 100) < 200) (type: boolean) > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine
[ https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224196#comment-16224196 ] Matt McCline commented on HIVE-17433: - Committed to master. [~teddy.choi] thank you for your code review! > Vectorization: Support Decimal64 in Hive Query Engine > - > > Key: HIVE-17433 > URL: https://issues.apache.org/jira/browse/HIVE-17433 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, > HIVE-17433.05.patch, HIVE-17433.06.patch, HIVE-17433.07.patch, > HIVE-17433.08.patch, HIVE-17433.09.patch, HIVE-17433.091.patch, > HIVE-17433.092.patch, HIVE-17433.093.patch, HIVE-17433.094.patch > > > Provide partial support for Decimal64 within Hive. By partial I mean that > our current decimal has a large surface area of features (rounding, multiply, > divide, remainder, power, big precision, and many more) but only a small > number has been identified as being performance hotspots. > Those are small precision decimals with precision <= 18 that fit within a > 64-bit long we are calling Decimal64 . Just as we optimize row-mode > execution engine hotspots by selectively adding new vectorization code, we > can treat the current decimal as the full featured one and add additional > Decimal64 optimization where query benchmarks really show it help. > This change creates a Decimal64ColumnVector. > This change currently detects small decimal with Hive for Vectorized text > input format and uses some new Decimal64 vectorized classes for comparison, > addition, and later perhaps a few GroupBy aggregations like sum, avg, min, > max. > The patch also supports a new annotation that can mark a > VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64). So, > in separate work those other formats such as ORC, PARQUET, etc can be done in > later JIRAs so they participate in the Decimal64 performance optimization. > The idea is when you annotate your input format with: > @VectorizedInputFormatSupports(supports = {DECIMAL_64}) > the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of > DecimalColumnVector. Upon an input format seeing Decimal64ColumnVector being > used, the input format can fill that column vector with decimal64 longs > instead of HiveDecimalWritable objects of DecimalColumnVector. > There will be a Hive environment variable > hive.vectorized.input.format.supports.enabled that has a string list of > supported features. The default will start as "decimal_64". It can be > turned off to allow for performance comparisons and testing. > The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY > key, value > Will have a vectorized explain plan looking like: > ... > Filter Operator > Filter Vectorization: > className: VectorFilterOperator > native: true > predicateExpression: > FilterDecimal64ColLessDecimal64Scalar(col 2, val 2000)(children: > Decimal64ColSubtractDecimal64Scalar(col 0, val 1000, > outputDecimal64AbsMax 999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean > predicate: ((key - 100) < 200) (type: boolean) > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine
[ https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224190#comment-16224190 ] Matt McCline commented on HIVE-17433: - Test failures are unrelated. > Vectorization: Support Decimal64 in Hive Query Engine > - > > Key: HIVE-17433 > URL: https://issues.apache.org/jira/browse/HIVE-17433 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, > HIVE-17433.05.patch, HIVE-17433.06.patch, HIVE-17433.07.patch, > HIVE-17433.08.patch, HIVE-17433.09.patch, HIVE-17433.091.patch, > HIVE-17433.092.patch, HIVE-17433.093.patch, HIVE-17433.094.patch > > > Provide partial support for Decimal64 within Hive. By partial I mean that > our current decimal has a large surface area of features (rounding, multiply, > divide, remainder, power, big precision, and many more) but only a small > number has been identified as being performance hotspots. > Those are small precision decimals with precision <= 18 that fit within a > 64-bit long we are calling Decimal64 . Just as we optimize row-mode > execution engine hotspots by selectively adding new vectorization code, we > can treat the current decimal as the full featured one and add additional > Decimal64 optimization where query benchmarks really show it help. > This change creates a Decimal64ColumnVector. > This change currently detects small decimal with Hive for Vectorized text > input format and uses some new Decimal64 vectorized classes for comparison, > addition, and later perhaps a few GroupBy aggregations like sum, avg, min, > max. > The patch also supports a new annotation that can mark a > VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64). So, > in separate work those other formats such as ORC, PARQUET, etc can be done in > later JIRAs so they participate in the Decimal64 performance optimization. > The idea is when you annotate your input format with: > @VectorizedInputFormatSupports(supports = {DECIMAL_64}) > the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of > DecimalColumnVector. Upon an input format seeing Decimal64ColumnVector being > used, the input format can fill that column vector with decimal64 longs > instead of HiveDecimalWritable objects of DecimalColumnVector. > There will be a Hive environment variable > hive.vectorized.input.format.supports.enabled that has a string list of > supported features. The default will start as "decimal_64". It can be > turned off to allow for performance comparisons and testing. > The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY > key, value > Will have a vectorized explain plan looking like: > ... > Filter Operator > Filter Vectorization: > className: VectorFilterOperator > native: true > predicateExpression: > FilterDecimal64ColLessDecimal64Scalar(col 2, val 2000)(children: > Decimal64ColSubtractDecimal64Scalar(col 0, val 1000, > outputDecimal64AbsMax 999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean > predicate: ((key - 100) < 200) (type: boolean) > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine
[ https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224133#comment-16224133 ] Hive QA commented on HIVE-17433: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12894638/HIVE-17433.094.patch {color:green}SUCCESS:{color} +1 due to 51 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11340 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=62) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=101) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=93) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=205) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=222) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7549/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7549/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7549/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12894638 - PreCommit-HIVE-Build > Vectorization: Support Decimal64 in Hive Query Engine > - > > Key: HIVE-17433 > URL: https://issues.apache.org/jira/browse/HIVE-17433 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, > HIVE-17433.05.patch, HIVE-17433.06.patch, HIVE-17433.07.patch, > HIVE-17433.08.patch, HIVE-17433.09.patch, HIVE-17433.091.patch, > HIVE-17433.092.patch, HIVE-17433.093.patch, HIVE-17433.094.patch > > > Provide partial support for Decimal64 within Hive. By partial I mean that > our current decimal has a large surface area of features (rounding, multiply, > divide, remainder, power, big precision, and many more) but only a small > number has been identified as being performance hotspots. > Those are small precision decimals with precision <= 18 that fit within a > 64-bit long we are calling Decimal64 . Just as we optimize row-mode > execution engine hotspots by selectively adding new vectorization code, we > can treat the current decimal as the full featured one and add additional > Decimal64 optimization where query benchmarks really show it help. > This change creates a Decimal64ColumnVector. > This change currently detects small decimal with Hive for Vectorized text > input format and uses some new Decimal64 vectorized classes for comparison, > addition, and later perhaps a few GroupBy aggregations like sum, avg, min, > max. > The patch also supports a new annotation that can mark a > VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64). So, > in separate work those other formats such as ORC, PARQUET, etc can be done in > later JIRAs so they participate in the Decimal64 performance optimization. > The idea is when you annotate your input format with: > @VectorizedInputFormatSupports(supports = {DECIMAL_64}) > the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of > DecimalColumnVector. Upon an input format seeing Decimal64ColumnVector being > used, the input format can fill that column vector with decimal64 longs > instead of HiveDecimalWritable objects of DecimalColumnVector. > There will be a Hive environment variable > hive.vectorized.input.format.supports.enabled that has a string list of > supported features. The default will start as "decimal_64". It can be > turned off to allow for performance comparisons and testing. > The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY > key, value > Will have a vectorized explain plan looking like: > ... > Filter Operator > Filter Vectorization: > className: VectorFilterOperator > native: true > predicateExpression: > FilterDecimal64ColLessDecimal64Scalar(col 2, val 2000)(children: > Decimal64ColSubtractDecimal64Scalar(col 0, val 1000, > outputDecimal64AbsMax 999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean > predicate: ((key - 100) < 200) (type: boolean) > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine
[ https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223932#comment-16223932 ] Hive QA commented on HIVE-17433: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12894620/HIVE-17433.093.patch {color:green}SUCCESS:{color} +1 due to 51 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 49 failed/errored test(s), 11340 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=62) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acidvec_part] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acidvec_table] (batchId=165) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_vec_part] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_vec_part_all_complex] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_vec_part_all_primitive] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_vec_table] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_complex] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_primitive] (batchId=160) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part] (batchId=165) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_complex] (batchId=165) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_primitive] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_table] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_udf] (batchId=166) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=172) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=172) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=100) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vector_non_string_partition] (batchId=100) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vectorization_div0] (batchId=101) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vectorization_limit] (batchId=100) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=93) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_mapjoin_reduce] (batchId=137) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_orderby_5] (batchId=120) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_string_concat] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_0] (batchId=139) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_12] (batchId=106) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_13] (batchId=124) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_14] (batchId=108) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_16] (batchId=120) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_17] (batchId=140) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_1] (batchId=128) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_2] (batchId=111) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_3] (batchId=136) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_4] (batchId=111) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_5] (batchId=127) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_9] (batchId=102) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_div0] (batchId=132) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_short_regress] (batchId=123) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_case] (batchId=127) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_mapjoin] (batchId=134) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf]
[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine
[ https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223703#comment-16223703 ] Hive QA commented on HIVE-17433: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12894505/HIVE-17433.092.patch {color:green}SUCCESS:{color} +1 due to 48 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 11340 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=62) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=78) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] (batchId=38) org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver[udaf_example_avg] (batchId=237) org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver[udtf_output_on_close] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_no_buckets] (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=155) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_3] (batchId=174) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=100) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=93) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_shufflejoin] (batchId=134) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=205) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=222) org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighShuffleBytes (batchId=229) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerHighShuffleBytes (batchId=229) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7540/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7540/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7540/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 17 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12894505 - PreCommit-HIVE-Build > Vectorization: Support Decimal64 in Hive Query Engine > - > > Key: HIVE-17433 > URL: https://issues.apache.org/jira/browse/HIVE-17433 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, > HIVE-17433.05.patch, HIVE-17433.06.patch, HIVE-17433.07.patch, > HIVE-17433.08.patch, HIVE-17433.09.patch, HIVE-17433.091.patch, > HIVE-17433.092.patch > > > Provide partial support for Decimal64 within Hive. By partial I mean that > our current decimal has a large surface area of features (rounding, multiply, > divide, remainder, power, big precision, and many more) but only a small > number has been identified as being performance hotspots. > Those are small precision decimals with precision <= 18 that fit within a > 64-bit long we are calling Decimal64 . Just as we optimize row-mode > execution engine hotspots by selectively adding new vectorization code, we > can treat the current decimal as the full featured one and add additional > Decimal64 optimization where query benchmarks really show it help. > This change creates a Decimal64ColumnVector. > This change currently detects small decimal with Hive for Vectorized text > input format and uses some new Decimal64 vectorized classes for comparison, > addition, and later perhaps a few GroupBy aggregations like sum, avg, min, > max. > The patch also supports a new annotation that can mark a > VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64). So, > in separate work those other formats such as ORC, PARQUET, etc can be done in > later JIRAs so they participate in the Decimal64 performance optimization. > The idea is when you annotate your input format with: > @VectorizedInputFormatSupports(supports = {DECIMAL_64}) > the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of > DecimalColumnVector. Upon an input format seeing Decimal64ColumnVector being > used, the input
[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine
[ https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222901#comment-16222901 ] Hive QA commented on HIVE-17433: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12894421/HIVE-17433.091.patch {color:green}SUCCESS:{color} +1 due to 48 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 11327 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] (batchId=52) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=78) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_no_buckets] (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_distinct_gby] (batchId=163) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=100) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=93) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_shufflejoin] (batchId=134) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=205) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=222) org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighShuffleBytes (batchId=229) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7519/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7519/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7519/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12894421 - PreCommit-HIVE-Build > Vectorization: Support Decimal64 in Hive Query Engine > - > > Key: HIVE-17433 > URL: https://issues.apache.org/jira/browse/HIVE-17433 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, > HIVE-17433.05.patch, HIVE-17433.06.patch, HIVE-17433.07.patch, > HIVE-17433.08.patch, HIVE-17433.09.patch, HIVE-17433.091.patch > > > Provide partial support for Decimal64 within Hive. By partial I mean that > our current decimal has a large surface area of features (rounding, multiply, > divide, remainder, power, big precision, and many more) but only a small > number has been identified as being performance hotspots. > Those are small precision decimals with precision <= 18 that fit within a > 64-bit long we are calling Decimal64 . Just as we optimize row-mode > execution engine hotspots by selectively adding new vectorization code, we > can treat the current decimal as the full featured one and add additional > Decimal64 optimization where query benchmarks really show it help. > This change creates a Decimal64ColumnVector. > This change currently detects small decimal with Hive for Vectorized text > input format and uses some new Decimal64 vectorized classes for comparison, > addition, and later perhaps a few GroupBy aggregations like sum, avg, min, > max. > The patch also supports a new annotation that can mark a > VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64). So, > in separate work those other formats such as ORC, PARQUET, etc can be done in > later JIRAs so they participate in the Decimal64 performance optimization. > The idea is when you annotate your input format with: > @VectorizedInputFormatSupports(supports = {DECIMAL_64}) > the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of > DecimalColumnVector. Upon an input format seeing Decimal64ColumnVector being > used, the input format can fill that column vector with decimal64 longs > instead of HiveDecimalWritable objects of DecimalColumnVector. > There will be a Hive environment variable > hive.vectorized.input.format.supports.enabled that has a string list of >
[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine
[ https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222406#comment-16222406 ] Hive QA commented on HIVE-17433: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12894263/HIVE-17433.08.patch {color:green}SUCCESS:{color} +1 due to 46 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 38 failed/errored test(s), 11327 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] (batchId=52) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_6] (batchId=81) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=78) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] (batchId=38) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mergejoin] (batchId=58) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_no_row_serde] (batchId=68) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_parquet_projection] (batchId=42) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_no_buckets] (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketmapjoin1] (batchId=165) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketmapjoin2] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketmapjoin3] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_join_hash] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_udf] (batchId=166) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_distinct_gby] (batchId=163) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=172) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=100) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=93) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_input_format_excludes] (batchId=122) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_parquet_projection] (batchId=121) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_short_regress] (batchId=123) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_shufflejoin] (batchId=134) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=205) org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning (batchId=294) org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorDateExpressions.testVectorUDFWeekOfYear (batchId=276) org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorMathFunctions.testVectorBin (batchId=277) org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorMathFunctions.testVectorHex (batchId=277) org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testRegex (batchId=277) org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLike (batchId=277) org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLikeMultiByte (batchId=277) org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLikePatternType (batchId=277) org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLikeRandomized (batchId=277) org.apache.hadoop.hive.ql.optimizer.physical.TestVectorizer.testAggregateOnUDF (batchId=273) org.apache.hadoop.hive.ql.optimizer.physical.TestVectorizer.testValidateNestedExpressions (batchId=273) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=222) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7503/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7503/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7503/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 38 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12894263 - PreCommit-HIVE-Build > Vectorization:
[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine
[ https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222137#comment-16222137 ] Hive QA commented on HIVE-17433: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12894263/HIVE-17433.08.patch {color:green}SUCCESS:{color} +1 due to 46 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 37 failed/errored test(s), 11327 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] (batchId=52) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=78) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] (batchId=38) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mergejoin] (batchId=58) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_no_row_serde] (batchId=68) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_parquet_projection] (batchId=42) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_no_buckets] (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketmapjoin1] (batchId=165) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketmapjoin2] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketmapjoin3] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_join_hash] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_udf] (batchId=166) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_distinct_gby] (batchId=163) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=172) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=100) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=93) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_input_format_excludes] (batchId=122) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_parquet_projection] (batchId=121) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_short_regress] (batchId=123) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_shufflejoin] (batchId=134) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=205) org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorDateExpressions.testVectorUDFWeekOfYear (batchId=276) org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorMathFunctions.testVectorBin (batchId=277) org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorMathFunctions.testVectorHex (batchId=277) org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testRegex (batchId=277) org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLike (batchId=277) org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLikeMultiByte (batchId=277) org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLikePatternType (batchId=277) org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLikeRandomized (batchId=277) org.apache.hadoop.hive.ql.optimizer.physical.TestVectorizer.testAggregateOnUDF (batchId=273) org.apache.hadoop.hive.ql.optimizer.physical.TestVectorizer.testValidateNestedExpressions (batchId=273) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=222) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7500/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7500/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7500/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 37 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12894263 - PreCommit-HIVE-Build > Vectorization: Support Decimal64 in Hive Query Engine >
[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine
[ https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218011#comment-16218011 ] Hive QA commented on HIVE-17433: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12893727/HIVE-17433.05.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7459/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7459/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7459/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-10-25 02:48:56.379 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-7459/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-10-25 02:48:56.382 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 84950cf HIVE-17764 : alter view fails when hive.metastore.disallow.incompatible.col.type.changes set to true (Janaki Lahorani, reviewed by Andrew Sherman and Vihang Karajgaonkar) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 84950cf HIVE-17764 : alter view fails when hive.metastore.disallow.incompatible.col.type.changes set to true (Janaki Lahorani, reviewed by Andrew Sherman and Vihang Karajgaonkar) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-10-25 02:48:56.906 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: patch failed: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:2898 error: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: patch does not apply error: patch failed: ql/src/test/results/clientpositive/llap/vector_between_columns.q.out:91 error: ql/src/test/results/clientpositive/llap/vector_between_columns.q.out: patch does not apply error: patch failed: ql/src/test/results/clientpositive/llap/vector_complex_all.q.out:678 error: ql/src/test/results/clientpositive/llap/vector_complex_all.q.out: patch does not apply error: patch failed: ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out:39 error: ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out: patch does not apply error: patch failed: ql/src/test/results/clientpositive/llap/vector_include_no_sel.q.out:224 error: ql/src/test/results/clientpositive/llap/vector_include_no_sel.q.out: patch does not apply error: patch failed: ql/src/test/results/clientpositive/llap/vectorized_dynamic_partition_pruning.q.out:5955 error: ql/src/test/results/clientpositive/llap/vectorized_dynamic_partition_pruning.q.out: patch does not apply The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12893727 - PreCommit-HIVE-Build > Vectorization: Support Decimal64 in Hive Query Engine > - > > Key: HIVE-17433 > URL: https://issues.apache.org/jira/browse/HIVE-17433 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, > HIVE-17433.05.patch > > > Provide partial support for Decimal64 within Hive. By partial I mean that > our current decimal has a large surface area of features (rounding, multiply, > divide, remainder, power, big precision, and many more) but only a small >
[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine
[ https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217539#comment-16217539 ] Matt McCline commented on HIVE-17433: - Known Wrong Vectorization Results on Master: HIVE-17893: Vectorization: Wrong results for vector_udf3.q HIVE-17892: Vectorization: Wrong results for vectorized_timestamp_funcs.q HIVE-17890: Vectorization: Wrong results for vectorized_case.q HIVE-17889: Vectorization: Wrong results for vectorization_15.q HIVE-17863: Vectorization: Two Q files produce wrong PTF query results HIVE-17123: Vectorization: Wrong results for vector_groupby_cube1.q > Vectorization: Support Decimal64 in Hive Query Engine > - > > Key: HIVE-17433 > URL: https://issues.apache.org/jira/browse/HIVE-17433 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, > HIVE-17433.05.patch > > > Provide partial support for Decimal64 within Hive. By partial I mean that > our current decimal has a large surface area of features (rounding, multiply, > divide, remainder, power, big precision, and many more) but only a small > number has been identified as being performance hotspots. > Those are small precision decimals with precision <= 18 that fit within a > 64-bit long we are calling Decimal64 . Just as we optimize row-mode > execution engine hotspots by selectively adding new vectorization code, we > can treat the current decimal as the full featured one and add additional > Decimal64 optimization where query benchmarks really show it help. > This change creates a Decimal64ColumnVector. > This change currently detects small decimal with Hive for Vectorized text > input format and uses some new Decimal64 vectorized classes for comparison, > addition, and later perhaps a few GroupBy aggregations like sum, avg, min, > max. > The patch also supports a new annotation that can mark a > VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64). So, > in separate work those other formats such as ORC, PARQUET, etc can be done in > later JIRAs so they participate in the Decimal64 performance optimization. > The idea is when you annotate your input format with: > @VectorizedInputFormatSupports(supports = {DECIMAL_64}) > the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of > DecimalColumnVector. Upon an input format seeing Decimal64ColumnVector being > used, the input format can fill that column vector with decimal64 longs > instead of HiveDecimalWritable objects of DecimalColumnVector. > There will be a Hive environment variable > hive.vectorized.input.format.supports.enabled that has a string list of > supported features. The default will start as "decimal_64". It can be > turned off to allow for performance comparisons and testing. > The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY > key, value > Will have a vectorized explain plan looking like: > ... > Filter Operator > Filter Vectorization: > className: VectorFilterOperator > native: true > predicateExpression: > FilterDecimal64ColLessDecimal64Scalar(col 2, val 2000)(children: > Decimal64ColSubtractDecimal64Scalar(col 0, val 1000, > outputDecimal64AbsMax 999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean > predicate: ((key - 100) < 200) (type: boolean) > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine
[ https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217208#comment-16217208 ] Teddy Choi commented on HIVE-17433: --- +1 tests pending. This is a massive and great change to vectorization. The vectorizer finally understands input types. It adds more flexibility for future vectorization implementations. There are few small errors. vector_decimal_2.q seems like have some error in SET command. vector_decimal_udf.q should have one name for DECIMAL_UDF_small and DECIMAL_UDF_txt_small. I will add more comments on review board after the test is finished. Thanks for hard work! > Vectorization: Support Decimal64 in Hive Query Engine > - > > Key: HIVE-17433 > URL: https://issues.apache.org/jira/browse/HIVE-17433 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, > HIVE-17433.05.patch > > > Provide partial support for Decimal64 within Hive. By partial I mean that > our current decimal has a large surface area of features (rounding, multiply, > divide, remainder, power, big precision, and many more) but only a small > number has been identified as being performance hotspots. > Those are small precision decimals with precision <= 18 that fit within a > 64-bit long we are calling Decimal64 . Just as we optimize row-mode > execution engine hotspots by selectively adding new vectorization code, we > can treat the current decimal as the full featured one and add additional > Decimal64 optimization where query benchmarks really show it help. > This change creates a Decimal64ColumnVector. > This change currently detects small decimal with Hive for Vectorized text > input format and uses some new Decimal64 vectorized classes for comparison, > addition, and later perhaps a few GroupBy aggregations like sum, avg, min, > max. > The patch also supports a new annotation that can mark a > VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64). So, > in separate work those other formats such as ORC, PARQUET, etc can be done in > later JIRAs so they participate in the Decimal64 performance optimization. > The idea is when you annotate your input format with: > @VectorizedInputFormatSupports(supports = {DECIMAL_64}) > the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of > DecimalColumnVector. Upon an input format seeing Decimal64ColumnVector being > used, the input format can fill that column vector with decimal64 longs > instead of HiveDecimalWritable objects of DecimalColumnVector. > There will be a Hive environment variable > hive.vectorized.input.format.supports.enabled that has a string list of > supported features. The default will start as "decimal_64". It can be > turned off to allow for performance comparisons and testing. > The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY > key, value > Will have a vectorized explain plan looking like: > ... > Filter Operator > Filter Vectorization: > className: VectorFilterOperator > native: true > predicateExpression: > FilterDecimal64ColLessDecimal64Scalar(col 2, val 2000)(children: > Decimal64ColSubtractDecimal64Scalar(col 0, val 1000, > outputDecimal64AbsMax 999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean > predicate: ((key - 100) < 200) (type: boolean) > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine
[ https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16207728#comment-16207728 ] Hive QA commented on HIVE-17433: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12892567/HIVE-17433.04.patch {color:green}SUCCESS:{color} +1 due to 20 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 211 failed/errored test(s), 11154 tests executed *Failed tests:* {noformat} TestConstantVectorExpression - did not produce a TEST-*.xml file (likely timed out) (batchId=277) TestVectorDateExpressions - did not produce a TEST-*.xml file (likely timed out) (batchId=275) TestVectorFilterExpressions - did not produce a TEST-*.xml file (likely timed out) (batchId=275) TestVectorGenericDateExpressions - did not produce a TEST-*.xml file (likely timed out) (batchId=275) TestVectorLogicalExpressions - did not produce a TEST-*.xml file (likely timed out) (batchId=275) TestVectorTimestampExpressions - did not produce a TEST-*.xml file (likely timed out) (batchId=276) TestVectorTypeCasts - did not produce a TEST-*.xml file (likely timed out) (batchId=275) TestVectorizationContext - did not produce a TEST-*.xml file (likely timed out) (batchId=275) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mergejoin] (batchId=58) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_no_row_serde] (batchId=67) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_reference_windowed] (batchId=31) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_casts] (batchId=80) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_partitioned] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_vector_nohybridgrace] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mergejoin] (batchId=160) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_distinct_gby] (batchId=162) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=171) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_inner_join] (batchId=173) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join0] (batchId=173) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join1] (batchId=172) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join2] (batchId=171) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=100) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vector_non_string_partition] (batchId=100) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vectorization_div0] (batchId=101) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vectorization_limit] (batchId=100) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi] (batchId=110) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_notin] (batchId=133) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_scalar] (batchId=119) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_select] (batchId=119) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_views] (batchId=108) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_between_in] (batchId=127) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_cast_constant] (batchId=106) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_char_4] (batchId=141) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_count_distinct] (batchId=113) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_data_types] (batchId=136) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_decimal_aggregate] (batchId=110) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_decimal_mapjoin] (batchId=126) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_distinct_2] (batchId=124) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_elt] (batchId=117) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_groupby_3] (batchId=130) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_left_outer_join] (batchId=112) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_mapjoin_reduce] (batchId=137) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_orderby_5] (batchId=120) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_string_concat] (batchId=116)
[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine
[ https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16207027#comment-16207027 ] Hive QA commented on HIVE-17433: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12892510/HIVE-17433.03.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7340/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7340/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7340/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-10-17 05:21:16.710 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-7340/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-10-17 05:21:16.712 + cd apache-github-source-source + git fetch origin >From https://github.com/apache/hive 8fea117..8c3f0e4 master -> origin/master + git reset --hard HEAD HEAD is now at 8fea117 HIVE-17371 : Move tokenstores to metastore module (Vihang Karajgaonkar, reviewed by Alan Gates, Thejas M Nair) + git clean -f -d Removing standalone-metastore/src/gen/org/ + git checkout master Already on 'master' Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded. (use "git pull" to update your local branch) + git reset --hard origin/master HEAD is now at 8c3f0e4 HIVE-17815: prevent OOM with Atlas Hive hook (Anishek Agarwal reviewed by Thejas Nair) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-10-17 05:21:20.549 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: patch failed: ql/src/test/results/clientpositive/llap/vector_groupby_grouping_id2.q.out:612 error: ql/src/test/results/clientpositive/llap/vector_groupby_grouping_id2.q.out: patch does not apply The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12892510 - PreCommit-HIVE-Build > Vectorization: Support Decimal64 in Hive Query Engine > - > > Key: HIVE-17433 > URL: https://issues.apache.org/jira/browse/HIVE-17433 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17433.03.patch > > > Provide partial support for Decimal64 within Hive. By partial I mean that > our current decimal has a large surface area of features (rounding, multiply, > divide, remainder, power, big precision, and many more) but only a small > number has been identified as being performance hotspots. > Those are small precision decimals with precision <= 18 that fit within a > 64-bit long we are calling Decimal64 . Just as we optimize row-mode > execution engine hotspots by selectively adding new vectorization code, we > can treat the current decimal as the full featured one and add additional > Decimal64 optimization where query benchmarks really show it help. > This change creates a Decimal64ColumnVector. > This change currently detects small decimal with Hive for Vectorized text > input format and uses some new Decimal64 vectorized classes for comparison, > addition, and later perhaps a few GroupBy aggregations like sum, avg, min, > max. > The patch also supports a new annotation that can mark a > VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64). So, > in separate work those other formats such as ORC, PARQUET, etc can be done in >