[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine

2017-10-31 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233670#comment-16233670
 ] 

Lefty Leverenz commented on HIVE-17433:
---

Doc note:  This adds *hive.vectorized.input.format.supports.enabled* and 
*hive.test.vectorized.execution.enabled.override* to HiveConf.java.

Only *hive.vectorized.input.format.supports.enabled* needs to be documented in 
the wiki, because *hive.test.vectorized.execution.enabled.override* is for 
internal use only.

Besides documenting the configuration parameter, perhaps this should also be 
mentioned in the Data Types doc.

* [Configuration Properties -- Vectorization | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Vectorization]
* [Hive Data Types -- Decimals | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-DecimalsdecimalDecimals]

Added a TODOC3.0 label.

([~mmccline], please update the fix version.)

> Vectorization: Support Decimal64 in Hive Query Engine
> -
>
> Key: HIVE-17433
> URL: https://issues.apache.org/jira/browse/HIVE-17433
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>  Labels: TODOC3.0
> Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, 
> HIVE-17433.05.patch, HIVE-17433.06.patch, HIVE-17433.07.patch, 
> HIVE-17433.08.patch, HIVE-17433.09.patch, HIVE-17433.091.patch, 
> HIVE-17433.092.patch, HIVE-17433.093.patch, HIVE-17433.094.patch
>
>
> Provide partial support for Decimal64 within Hive.  By partial I mean that 
> our current decimal has a large surface area of features (rounding, multiply, 
> divide, remainder, power, big precision, and many more) but only a small 
> number has been identified as being performance hotspots.
> Those are small precision decimals with precision <= 18 that fit within a 
> 64-bit long we are calling Decimal64 ​.  Just as we optimize row-mode 
> execution engine hotspots by selectively adding new vectorization code, we 
> can treat the current decimal as the full featured one and add additional 
> Decimal64 optimization where query benchmarks really show it help.
> This change creates a Decimal64ColumnVector.
> This change currently detects small decimal with Hive for Vectorized text 
> input format and uses some new Decimal64 vectorized classes for comparison, 
> addition, and later perhaps a few GroupBy aggregations like sum, avg, min, 
> max.
> The patch also supports a new annotation that can mark a 
> VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64).  So, 
> in separate work those other formats such as ORC, PARQUET, etc can be done in 
> later JIRAs so they participate in the Decimal64 performance optimization.
> The idea is when you annotate your input format with:
> @VectorizedInputFormatSupports(supports = {DECIMAL_64})
> the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of 
> DecimalColumnVector.  Upon an input format seeing Decimal64ColumnVector being 
> used, the input format can fill that column vector with decimal64 longs 
> instead of HiveDecimalWritable objects of DecimalColumnVector.
> There will be a Hive environment variable 
> hive.vectorized.input.format.supports.enabled that has a string list of 
> supported features.  The default will start as "decimal_64".  It can be 
> turned off to allow for performance comparisons and testing.
> The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY 
> key, value
> Will have a vectorized explain plan looking like:
> ...
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: 
> FilterDecimal64ColLessDecimal64Scalar(col 2, val 2000)(children: 
> Decimal64ColSubtractDecimal64Scalar(col 0, val 1000, 
> outputDecimal64AbsMax 999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean
>   predicate: ((key - 100) < 200) (type: boolean)
> ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine

2017-10-29 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224196#comment-16224196
 ] 

Matt McCline commented on HIVE-17433:
-

Committed to master.  [~teddy.choi] thank you for your code review!

> Vectorization: Support Decimal64 in Hive Query Engine
> -
>
> Key: HIVE-17433
> URL: https://issues.apache.org/jira/browse/HIVE-17433
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, 
> HIVE-17433.05.patch, HIVE-17433.06.patch, HIVE-17433.07.patch, 
> HIVE-17433.08.patch, HIVE-17433.09.patch, HIVE-17433.091.patch, 
> HIVE-17433.092.patch, HIVE-17433.093.patch, HIVE-17433.094.patch
>
>
> Provide partial support for Decimal64 within Hive.  By partial I mean that 
> our current decimal has a large surface area of features (rounding, multiply, 
> divide, remainder, power, big precision, and many more) but only a small 
> number has been identified as being performance hotspots.
> Those are small precision decimals with precision <= 18 that fit within a 
> 64-bit long we are calling Decimal64 ​.  Just as we optimize row-mode 
> execution engine hotspots by selectively adding new vectorization code, we 
> can treat the current decimal as the full featured one and add additional 
> Decimal64 optimization where query benchmarks really show it help.
> This change creates a Decimal64ColumnVector.
> This change currently detects small decimal with Hive for Vectorized text 
> input format and uses some new Decimal64 vectorized classes for comparison, 
> addition, and later perhaps a few GroupBy aggregations like sum, avg, min, 
> max.
> The patch also supports a new annotation that can mark a 
> VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64).  So, 
> in separate work those other formats such as ORC, PARQUET, etc can be done in 
> later JIRAs so they participate in the Decimal64 performance optimization.
> The idea is when you annotate your input format with:
> @VectorizedInputFormatSupports(supports = {DECIMAL_64})
> the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of 
> DecimalColumnVector.  Upon an input format seeing Decimal64ColumnVector being 
> used, the input format can fill that column vector with decimal64 longs 
> instead of HiveDecimalWritable objects of DecimalColumnVector.
> There will be a Hive environment variable 
> hive.vectorized.input.format.supports.enabled that has a string list of 
> supported features.  The default will start as "decimal_64".  It can be 
> turned off to allow for performance comparisons and testing.
> The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY 
> key, value
> Will have a vectorized explain plan looking like:
> ...
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: 
> FilterDecimal64ColLessDecimal64Scalar(col 2, val 2000)(children: 
> Decimal64ColSubtractDecimal64Scalar(col 0, val 1000, 
> outputDecimal64AbsMax 999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean
>   predicate: ((key - 100) < 200) (type: boolean)
> ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine

2017-10-29 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224190#comment-16224190
 ] 

Matt McCline commented on HIVE-17433:
-

Test failures are unrelated.

> Vectorization: Support Decimal64 in Hive Query Engine
> -
>
> Key: HIVE-17433
> URL: https://issues.apache.org/jira/browse/HIVE-17433
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, 
> HIVE-17433.05.patch, HIVE-17433.06.patch, HIVE-17433.07.patch, 
> HIVE-17433.08.patch, HIVE-17433.09.patch, HIVE-17433.091.patch, 
> HIVE-17433.092.patch, HIVE-17433.093.patch, HIVE-17433.094.patch
>
>
> Provide partial support for Decimal64 within Hive.  By partial I mean that 
> our current decimal has a large surface area of features (rounding, multiply, 
> divide, remainder, power, big precision, and many more) but only a small 
> number has been identified as being performance hotspots.
> Those are small precision decimals with precision <= 18 that fit within a 
> 64-bit long we are calling Decimal64 ​.  Just as we optimize row-mode 
> execution engine hotspots by selectively adding new vectorization code, we 
> can treat the current decimal as the full featured one and add additional 
> Decimal64 optimization where query benchmarks really show it help.
> This change creates a Decimal64ColumnVector.
> This change currently detects small decimal with Hive for Vectorized text 
> input format and uses some new Decimal64 vectorized classes for comparison, 
> addition, and later perhaps a few GroupBy aggregations like sum, avg, min, 
> max.
> The patch also supports a new annotation that can mark a 
> VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64).  So, 
> in separate work those other formats such as ORC, PARQUET, etc can be done in 
> later JIRAs so they participate in the Decimal64 performance optimization.
> The idea is when you annotate your input format with:
> @VectorizedInputFormatSupports(supports = {DECIMAL_64})
> the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of 
> DecimalColumnVector.  Upon an input format seeing Decimal64ColumnVector being 
> used, the input format can fill that column vector with decimal64 longs 
> instead of HiveDecimalWritable objects of DecimalColumnVector.
> There will be a Hive environment variable 
> hive.vectorized.input.format.supports.enabled that has a string list of 
> supported features.  The default will start as "decimal_64".  It can be 
> turned off to allow for performance comparisons and testing.
> The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY 
> key, value
> Will have a vectorized explain plan looking like:
> ...
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: 
> FilterDecimal64ColLessDecimal64Scalar(col 2, val 2000)(children: 
> Decimal64ColSubtractDecimal64Scalar(col 0, val 1000, 
> outputDecimal64AbsMax 999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean
>   predicate: ((key - 100) < 200) (type: boolean)
> ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine

2017-10-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224133#comment-16224133
 ] 

Hive QA commented on HIVE-17433:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12894638/HIVE-17433.094.patch

{color:green}SUCCESS:{color} +1 due to 51 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11340 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=62)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=101)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc]
 (batchId=93)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=205)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=222)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7549/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7549/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7549/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12894638 - PreCommit-HIVE-Build

> Vectorization: Support Decimal64 in Hive Query Engine
> -
>
> Key: HIVE-17433
> URL: https://issues.apache.org/jira/browse/HIVE-17433
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, 
> HIVE-17433.05.patch, HIVE-17433.06.patch, HIVE-17433.07.patch, 
> HIVE-17433.08.patch, HIVE-17433.09.patch, HIVE-17433.091.patch, 
> HIVE-17433.092.patch, HIVE-17433.093.patch, HIVE-17433.094.patch
>
>
> Provide partial support for Decimal64 within Hive.  By partial I mean that 
> our current decimal has a large surface area of features (rounding, multiply, 
> divide, remainder, power, big precision, and many more) but only a small 
> number has been identified as being performance hotspots.
> Those are small precision decimals with precision <= 18 that fit within a 
> 64-bit long we are calling Decimal64 ​.  Just as we optimize row-mode 
> execution engine hotspots by selectively adding new vectorization code, we 
> can treat the current decimal as the full featured one and add additional 
> Decimal64 optimization where query benchmarks really show it help.
> This change creates a Decimal64ColumnVector.
> This change currently detects small decimal with Hive for Vectorized text 
> input format and uses some new Decimal64 vectorized classes for comparison, 
> addition, and later perhaps a few GroupBy aggregations like sum, avg, min, 
> max.
> The patch also supports a new annotation that can mark a 
> VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64).  So, 
> in separate work those other formats such as ORC, PARQUET, etc can be done in 
> later JIRAs so they participate in the Decimal64 performance optimization.
> The idea is when you annotate your input format with:
> @VectorizedInputFormatSupports(supports = {DECIMAL_64})
> the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of 
> DecimalColumnVector.  Upon an input format seeing Decimal64ColumnVector being 
> used, the input format can fill that column vector with decimal64 longs 
> instead of HiveDecimalWritable objects of DecimalColumnVector.
> There will be a Hive environment variable 
> hive.vectorized.input.format.supports.enabled that has a string list of 
> supported features.  The default will start as "decimal_64".  It can be 
> turned off to allow for performance comparisons and testing.
> The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY 
> key, value
> Will have a vectorized explain plan looking like:
> ...
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: 
> FilterDecimal64ColLessDecimal64Scalar(col 2, val 2000)(children: 
> Decimal64ColSubtractDecimal64Scalar(col 0, val 1000, 
> outputDecimal64AbsMax 999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean
>   predicate: ((key - 100) < 200) (type: boolean)
> ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine

2017-10-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223932#comment-16223932
 ] 

Hive QA commented on HIVE-17433:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12894620/HIVE-17433.093.patch

{color:green}SUCCESS:{color} +1 due to 51 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 49 failed/errored test(s), 11340 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=62)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acidvec_part]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acidvec_table]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_vec_part]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_vec_part_all_complex]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_vec_part_all_primitive]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_vec_table]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_complex]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_primitive]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_complex]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_primitive]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_table]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_udf]
 (batchId=166)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vector_non_string_partition]
 (batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vectorization_div0]
 (batchId=101)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vectorization_limit]
 (batchId=100)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_mapjoin_reduce]
 (batchId=137)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_orderby_5] 
(batchId=120)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_string_concat]
 (batchId=116)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_0] 
(batchId=139)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_12] 
(batchId=106)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_13] 
(batchId=124)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_14] 
(batchId=108)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_16] 
(batchId=120)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_17] 
(batchId=140)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_1] 
(batchId=128)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_2] 
(batchId=111)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_3] 
(batchId=136)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_4] 
(batchId=111)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_5] 
(batchId=127)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_9] 
(batchId=102)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_div0] 
(batchId=132)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_short_regress]
 (batchId=123)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_case] 
(batchId=127)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_mapjoin] 
(batchId=134)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] 

[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine

2017-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223703#comment-16223703
 ] 

Hive QA commented on HIVE-17433:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12894505/HIVE-17433.092.patch

{color:green}SUCCESS:{color} +1 due to 48 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 11340 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] 
(batchId=38)
org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver[udaf_example_avg] 
(batchId=237)
org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver[udtf_output_on_close]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_no_buckets]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=155)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_3]
 (batchId=174)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=100)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_shufflejoin]
 (batchId=134)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=205)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=222)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighShuffleBytes
 (batchId=229)
org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerHighShuffleBytes 
(batchId=229)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7540/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7540/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7540/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12894505 - PreCommit-HIVE-Build

> Vectorization: Support Decimal64 in Hive Query Engine
> -
>
> Key: HIVE-17433
> URL: https://issues.apache.org/jira/browse/HIVE-17433
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, 
> HIVE-17433.05.patch, HIVE-17433.06.patch, HIVE-17433.07.patch, 
> HIVE-17433.08.patch, HIVE-17433.09.patch, HIVE-17433.091.patch, 
> HIVE-17433.092.patch
>
>
> Provide partial support for Decimal64 within Hive.  By partial I mean that 
> our current decimal has a large surface area of features (rounding, multiply, 
> divide, remainder, power, big precision, and many more) but only a small 
> number has been identified as being performance hotspots.
> Those are small precision decimals with precision <= 18 that fit within a 
> 64-bit long we are calling Decimal64 ​.  Just as we optimize row-mode 
> execution engine hotspots by selectively adding new vectorization code, we 
> can treat the current decimal as the full featured one and add additional 
> Decimal64 optimization where query benchmarks really show it help.
> This change creates a Decimal64ColumnVector.
> This change currently detects small decimal with Hive for Vectorized text 
> input format and uses some new Decimal64 vectorized classes for comparison, 
> addition, and later perhaps a few GroupBy aggregations like sum, avg, min, 
> max.
> The patch also supports a new annotation that can mark a 
> VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64).  So, 
> in separate work those other formats such as ORC, PARQUET, etc can be done in 
> later JIRAs so they participate in the Decimal64 performance optimization.
> The idea is when you annotate your input format with:
> @VectorizedInputFormatSupports(supports = {DECIMAL_64})
> the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of 
> DecimalColumnVector.  Upon an input format seeing Decimal64ColumnVector being 
> used, the input 

[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine

2017-10-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222901#comment-16222901
 ] 

Hive QA commented on HIVE-17433:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12894421/HIVE-17433.091.patch

{color:green}SUCCESS:{color} +1 due to 48 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 11327 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] 
(batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] 
(batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_no_buckets]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_distinct_gby]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=100)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_shufflejoin]
 (batchId=134)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=205)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=222)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighShuffleBytes
 (batchId=229)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7519/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7519/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7519/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12894421 - PreCommit-HIVE-Build

> Vectorization: Support Decimal64 in Hive Query Engine
> -
>
> Key: HIVE-17433
> URL: https://issues.apache.org/jira/browse/HIVE-17433
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, 
> HIVE-17433.05.patch, HIVE-17433.06.patch, HIVE-17433.07.patch, 
> HIVE-17433.08.patch, HIVE-17433.09.patch, HIVE-17433.091.patch
>
>
> Provide partial support for Decimal64 within Hive.  By partial I mean that 
> our current decimal has a large surface area of features (rounding, multiply, 
> divide, remainder, power, big precision, and many more) but only a small 
> number has been identified as being performance hotspots.
> Those are small precision decimals with precision <= 18 that fit within a 
> 64-bit long we are calling Decimal64 ​.  Just as we optimize row-mode 
> execution engine hotspots by selectively adding new vectorization code, we 
> can treat the current decimal as the full featured one and add additional 
> Decimal64 optimization where query benchmarks really show it help.
> This change creates a Decimal64ColumnVector.
> This change currently detects small decimal with Hive for Vectorized text 
> input format and uses some new Decimal64 vectorized classes for comparison, 
> addition, and later perhaps a few GroupBy aggregations like sum, avg, min, 
> max.
> The patch also supports a new annotation that can mark a 
> VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64).  So, 
> in separate work those other formats such as ORC, PARQUET, etc can be done in 
> later JIRAs so they participate in the Decimal64 performance optimization.
> The idea is when you annotate your input format with:
> @VectorizedInputFormatSupports(supports = {DECIMAL_64})
> the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of 
> DecimalColumnVector.  Upon an input format seeing Decimal64ColumnVector being 
> used, the input format can fill that column vector with decimal64 longs 
> instead of HiveDecimalWritable objects of DecimalColumnVector.
> There will be a Hive environment variable 
> hive.vectorized.input.format.supports.enabled that has a string list of 
> 

[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine

2017-10-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222406#comment-16222406
 ] 

Hive QA commented on HIVE-17433:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12894263/HIVE-17433.08.patch

{color:green}SUCCESS:{color} +1 due to 46 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 38 failed/errored test(s), 11327 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] 
(batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_6] 
(batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] 
(batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mergejoin] (batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_no_row_serde] 
(batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_parquet_projection]
 (batchId=42)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_no_buckets]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketmapjoin1]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketmapjoin2]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketmapjoin3]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_join_hash]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_udf]
 (batchId=166)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_distinct_gby]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=100)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_input_format_excludes]
 (batchId=122)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_parquet_projection]
 (batchId=121)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_short_regress]
 (batchId=123)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_shufflejoin]
 (batchId=134)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=205)
org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning 
(batchId=294)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorDateExpressions.testVectorUDFWeekOfYear
 (batchId=276)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorMathFunctions.testVectorBin
 (batchId=277)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorMathFunctions.testVectorHex
 (batchId=277)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testRegex
 (batchId=277)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLike
 (batchId=277)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLikeMultiByte
 (batchId=277)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLikePatternType
 (batchId=277)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLikeRandomized
 (batchId=277)
org.apache.hadoop.hive.ql.optimizer.physical.TestVectorizer.testAggregateOnUDF 
(batchId=273)
org.apache.hadoop.hive.ql.optimizer.physical.TestVectorizer.testValidateNestedExpressions
 (batchId=273)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=222)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7503/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7503/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7503/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 38 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12894263 - PreCommit-HIVE-Build

> Vectorization: 

[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine

2017-10-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222137#comment-16222137
 ] 

Hive QA commented on HIVE-17433:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12894263/HIVE-17433.08.patch

{color:green}SUCCESS:{color} +1 due to 46 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 37 failed/errored test(s), 11327 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] 
(batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] 
(batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mergejoin] (batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_no_row_serde] 
(batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_parquet_projection]
 (batchId=42)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_no_buckets]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketmapjoin1]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketmapjoin2]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketmapjoin3]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_join_hash]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_udf]
 (batchId=166)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_distinct_gby]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=100)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_input_format_excludes]
 (batchId=122)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_parquet_projection]
 (batchId=121)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_short_regress]
 (batchId=123)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_shufflejoin]
 (batchId=134)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=205)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorDateExpressions.testVectorUDFWeekOfYear
 (batchId=276)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorMathFunctions.testVectorBin
 (batchId=277)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorMathFunctions.testVectorHex
 (batchId=277)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testRegex
 (batchId=277)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLike
 (batchId=277)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLikeMultiByte
 (batchId=277)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLikePatternType
 (batchId=277)
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorStringExpressions.testStringLikeRandomized
 (batchId=277)
org.apache.hadoop.hive.ql.optimizer.physical.TestVectorizer.testAggregateOnUDF 
(batchId=273)
org.apache.hadoop.hive.ql.optimizer.physical.TestVectorizer.testValidateNestedExpressions
 (batchId=273)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=222)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7500/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7500/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7500/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 37 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12894263 - PreCommit-HIVE-Build

> Vectorization: Support Decimal64 in Hive Query Engine
> 

[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine

2017-10-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218011#comment-16218011
 ] 

Hive QA commented on HIVE-17433:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12893727/HIVE-17433.05.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7459/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7459/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7459/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-10-25 02:48:56.379
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-7459/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-10-25 02:48:56.382
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 84950cf HIVE-17764 : alter view fails when 
hive.metastore.disallow.incompatible.col.type.changes set to true (Janaki 
Lahorani, reviewed by Andrew Sherman and Vihang Karajgaonkar)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 84950cf HIVE-17764 : alter view fails when 
hive.metastore.disallow.incompatible.col.type.changes set to true (Janaki 
Lahorani, reviewed by Andrew Sherman and Vihang Karajgaonkar)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-10-25 02:48:56.906
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:2898
error: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: patch does 
not apply
error: patch failed: 
ql/src/test/results/clientpositive/llap/vector_between_columns.q.out:91
error: ql/src/test/results/clientpositive/llap/vector_between_columns.q.out: 
patch does not apply
error: patch failed: 
ql/src/test/results/clientpositive/llap/vector_complex_all.q.out:678
error: ql/src/test/results/clientpositive/llap/vector_complex_all.q.out: patch 
does not apply
error: patch failed: 
ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out:39
error: ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out: 
patch does not apply
error: patch failed: 
ql/src/test/results/clientpositive/llap/vector_include_no_sel.q.out:224
error: ql/src/test/results/clientpositive/llap/vector_include_no_sel.q.out: 
patch does not apply
error: patch failed: 
ql/src/test/results/clientpositive/llap/vectorized_dynamic_partition_pruning.q.out:5955
error: 
ql/src/test/results/clientpositive/llap/vectorized_dynamic_partition_pruning.q.out:
 patch does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12893727 - PreCommit-HIVE-Build

> Vectorization: Support Decimal64 in Hive Query Engine
> -
>
> Key: HIVE-17433
> URL: https://issues.apache.org/jira/browse/HIVE-17433
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, 
> HIVE-17433.05.patch
>
>
> Provide partial support for Decimal64 within Hive.  By partial I mean that 
> our current decimal has a large surface area of features (rounding, multiply, 
> divide, remainder, power, big precision, and many more) but only a small 
> 

[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine

2017-10-24 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217539#comment-16217539
 ] 

Matt McCline commented on HIVE-17433:
-

Known Wrong Vectorization Results on Master:

HIVE-17893: Vectorization: Wrong results for vector_udf3.q
HIVE-17892: Vectorization: Wrong results for vectorized_timestamp_funcs.q
HIVE-17890: Vectorization: Wrong results for vectorized_case.q
HIVE-17889: Vectorization: Wrong results for vectorization_15.q
HIVE-17863: Vectorization: Two Q files produce wrong PTF query results
HIVE-17123: Vectorization: Wrong results for vector_groupby_cube1.q

> Vectorization: Support Decimal64 in Hive Query Engine
> -
>
> Key: HIVE-17433
> URL: https://issues.apache.org/jira/browse/HIVE-17433
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, 
> HIVE-17433.05.patch
>
>
> Provide partial support for Decimal64 within Hive.  By partial I mean that 
> our current decimal has a large surface area of features (rounding, multiply, 
> divide, remainder, power, big precision, and many more) but only a small 
> number has been identified as being performance hotspots.
> Those are small precision decimals with precision <= 18 that fit within a 
> 64-bit long we are calling Decimal64 ​.  Just as we optimize row-mode 
> execution engine hotspots by selectively adding new vectorization code, we 
> can treat the current decimal as the full featured one and add additional 
> Decimal64 optimization where query benchmarks really show it help.
> This change creates a Decimal64ColumnVector.
> This change currently detects small decimal with Hive for Vectorized text 
> input format and uses some new Decimal64 vectorized classes for comparison, 
> addition, and later perhaps a few GroupBy aggregations like sum, avg, min, 
> max.
> The patch also supports a new annotation that can mark a 
> VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64).  So, 
> in separate work those other formats such as ORC, PARQUET, etc can be done in 
> later JIRAs so they participate in the Decimal64 performance optimization.
> The idea is when you annotate your input format with:
> @VectorizedInputFormatSupports(supports = {DECIMAL_64})
> the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of 
> DecimalColumnVector.  Upon an input format seeing Decimal64ColumnVector being 
> used, the input format can fill that column vector with decimal64 longs 
> instead of HiveDecimalWritable objects of DecimalColumnVector.
> There will be a Hive environment variable 
> hive.vectorized.input.format.supports.enabled that has a string list of 
> supported features.  The default will start as "decimal_64".  It can be 
> turned off to allow for performance comparisons and testing.
> The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY 
> key, value
> Will have a vectorized explain plan looking like:
> ...
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: 
> FilterDecimal64ColLessDecimal64Scalar(col 2, val 2000)(children: 
> Decimal64ColSubtractDecimal64Scalar(col 0, val 1000, 
> outputDecimal64AbsMax 999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean
>   predicate: ((key - 100) < 200) (type: boolean)
> ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine

2017-10-24 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217208#comment-16217208
 ] 

Teddy Choi commented on HIVE-17433:
---

+1 tests pending.

This is a massive and great change to vectorization. The vectorizer finally 
understands input types. It adds more flexibility for future vectorization 
implementations.

There are few small errors. vector_decimal_2.q seems like have some error in 
SET command. vector_decimal_udf.q should have one name for DECIMAL_UDF_small 
and DECIMAL_UDF_txt_small. I will add more comments on review board after the 
test is finished.

Thanks for hard work!

> Vectorization: Support Decimal64 in Hive Query Engine
> -
>
> Key: HIVE-17433
> URL: https://issues.apache.org/jira/browse/HIVE-17433
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, 
> HIVE-17433.05.patch
>
>
> Provide partial support for Decimal64 within Hive.  By partial I mean that 
> our current decimal has a large surface area of features (rounding, multiply, 
> divide, remainder, power, big precision, and many more) but only a small 
> number has been identified as being performance hotspots.
> Those are small precision decimals with precision <= 18 that fit within a 
> 64-bit long we are calling Decimal64 ​.  Just as we optimize row-mode 
> execution engine hotspots by selectively adding new vectorization code, we 
> can treat the current decimal as the full featured one and add additional 
> Decimal64 optimization where query benchmarks really show it help.
> This change creates a Decimal64ColumnVector.
> This change currently detects small decimal with Hive for Vectorized text 
> input format and uses some new Decimal64 vectorized classes for comparison, 
> addition, and later perhaps a few GroupBy aggregations like sum, avg, min, 
> max.
> The patch also supports a new annotation that can mark a 
> VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64).  So, 
> in separate work those other formats such as ORC, PARQUET, etc can be done in 
> later JIRAs so they participate in the Decimal64 performance optimization.
> The idea is when you annotate your input format with:
> @VectorizedInputFormatSupports(supports = {DECIMAL_64})
> the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of 
> DecimalColumnVector.  Upon an input format seeing Decimal64ColumnVector being 
> used, the input format can fill that column vector with decimal64 longs 
> instead of HiveDecimalWritable objects of DecimalColumnVector.
> There will be a Hive environment variable 
> hive.vectorized.input.format.supports.enabled that has a string list of 
> supported features.  The default will start as "decimal_64".  It can be 
> turned off to allow for performance comparisons and testing.
> The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY 
> key, value
> Will have a vectorized explain plan looking like:
> ...
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: 
> FilterDecimal64ColLessDecimal64Scalar(col 2, val 2000)(children: 
> Decimal64ColSubtractDecimal64Scalar(col 0, val 1000, 
> outputDecimal64AbsMax 999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean
>   predicate: ((key - 100) < 200) (type: boolean)
> ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine

2017-10-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16207728#comment-16207728
 ] 

Hive QA commented on HIVE-17433:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12892567/HIVE-17433.04.patch

{color:green}SUCCESS:{color} +1 due to 20 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 211 failed/errored test(s), 11154 tests 
executed
*Failed tests:*
{noformat}
TestConstantVectorExpression - did not produce a TEST-*.xml file (likely timed 
out) (batchId=277)
TestVectorDateExpressions - did not produce a TEST-*.xml file (likely timed 
out) (batchId=275)
TestVectorFilterExpressions - did not produce a TEST-*.xml file (likely timed 
out) (batchId=275)
TestVectorGenericDateExpressions - did not produce a TEST-*.xml file (likely 
timed out) (batchId=275)
TestVectorLogicalExpressions - did not produce a TEST-*.xml file (likely timed 
out) (batchId=275)
TestVectorTimestampExpressions - did not produce a TEST-*.xml file (likely 
timed out) (batchId=276)
TestVectorTypeCasts - did not produce a TEST-*.xml file (likely timed out) 
(batchId=275)
TestVectorizationContext - did not produce a TEST-*.xml file (likely timed out) 
(batchId=275)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mergejoin] (batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_no_row_serde] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_reference_windowed]
 (batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_casts] 
(batchId=80)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_partitioned]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_vector_nohybridgrace]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mergejoin] 
(batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_distinct_gby]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=171)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_inner_join]
 (batchId=173)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join0]
 (batchId=173)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join1]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join2]
 (batchId=171)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vector_non_string_partition]
 (batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vectorization_div0]
 (batchId=101)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vectorization_limit]
 (batchId=100)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi] 
(batchId=110)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_notin] 
(batchId=133)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_scalar] 
(batchId=119)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_select] 
(batchId=119)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_views] 
(batchId=108)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_between_in] 
(batchId=127)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_cast_constant]
 (batchId=106)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_char_4] 
(batchId=141)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_count_distinct]
 (batchId=113)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_data_types] 
(batchId=136)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_decimal_aggregate]
 (batchId=110)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_decimal_mapjoin]
 (batchId=126)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_distinct_2] 
(batchId=124)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_elt] 
(batchId=117)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_groupby_3] 
(batchId=130)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_left_outer_join]
 (batchId=112)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_mapjoin_reduce]
 (batchId=137)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_orderby_5] 
(batchId=120)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_string_concat]
 (batchId=116)

[jira] [Commented] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine

2017-10-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16207027#comment-16207027
 ] 

Hive QA commented on HIVE-17433:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12892510/HIVE-17433.03.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7340/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7340/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7340/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-10-17 05:21:16.710
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-7340/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-10-17 05:21:16.712
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   8fea117..8c3f0e4  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 8fea117 HIVE-17371 : Move tokenstores to metastore module 
(Vihang Karajgaonkar, reviewed by Alan Gates, Thejas M Nair)
+ git clean -f -d
Removing standalone-metastore/src/gen/org/
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at 8c3f0e4 HIVE-17815: prevent OOM with Atlas Hive hook (Anishek 
Agarwal reviewed by Thejas Nair)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-10-17 05:21:20.549
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
ql/src/test/results/clientpositive/llap/vector_groupby_grouping_id2.q.out:612
error: 
ql/src/test/results/clientpositive/llap/vector_groupby_grouping_id2.q.out: 
patch does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12892510 - PreCommit-HIVE-Build

> Vectorization: Support Decimal64 in Hive Query Engine
> -
>
> Key: HIVE-17433
> URL: https://issues.apache.org/jira/browse/HIVE-17433
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17433.03.patch
>
>
> Provide partial support for Decimal64 within Hive.  By partial I mean that 
> our current decimal has a large surface area of features (rounding, multiply, 
> divide, remainder, power, big precision, and many more) but only a small 
> number has been identified as being performance hotspots.
> Those are small precision decimals with precision <= 18 that fit within a 
> 64-bit long we are calling Decimal64 ​.  Just as we optimize row-mode 
> execution engine hotspots by selectively adding new vectorization code, we 
> can treat the current decimal as the full featured one and add additional 
> Decimal64 optimization where query benchmarks really show it help.
> This change creates a Decimal64ColumnVector.
> This change currently detects small decimal with Hive for Vectorized text 
> input format and uses some new Decimal64 vectorized classes for comparison, 
> addition, and later perhaps a few GroupBy aggregations like sum, avg, min, 
> max.
> The patch also supports a new annotation that can mark a 
> VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64).  So, 
> in separate work those other formats such as ORC, PARQUET, etc can be done in 
>