[jira] [Commented] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-09-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165784#comment-16165784
 ] 

Hive QA commented on HIVE-17139:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886850/HIVE-17139.9.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 84 failed/errored test(s), 11040 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[foldts] (batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_bitmap_compression]
 (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] 
(batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_types_non_dictionary_encoding_vectorization]
 (batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_types_vectorization]
 (batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_between_columns] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_binary_join_groupby]
 (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_coalesce] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_complex_join] 
(batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_custom_udf_configure]
 (batchId=63)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_data_types] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_math_funcs]
 (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_udf2] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_struct_in] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_casts] 
(batchId=79)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_date_funcs] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_distinct_gby] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_math_funcs] 
(batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_parquet_types]
 (batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_timestamp_ints_casts]
 (batchId=47)
org.apache.hadoop.hive.cli.TestCompareCliDriver.testCliDriver[vectorized_math_funcs]
 (batchId=235)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[parquet_types_vectorization]
 (batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_partitioned]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acidvec_part]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acidvec_table]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_table]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_adaptor_usage_mode]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_auto_smb_mapjoin_14]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_between_columns]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_between_in]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_binary_join_groupby]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_coalesce]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_complex_join]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_data_types]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_math_funcs]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_udf2]
 

[jira] [Commented] (HIVE-17428) REPL LOAD of ALTER_PARTITION event doesn't create import tasks if the partition doesn't exist during analyze phase.

2017-09-13 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165760#comment-16165760
 ] 

anishek commented on HIVE-17428:


+1 

cc [~thejas]/[~daijy]

> REPL LOAD of ALTER_PARTITION event doesn't create import tasks if the 
> partition doesn't exist during analyze phase.
> ---
>
> Key: HIVE-17428
> URL: https://issues.apache.org/jira/browse/HIVE-17428
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17428.01.patch, HIVE-17428.02.patch, 
> HIVE-17428.03.patch
>
>
> If the incremental dump event sequence have ADD_PARTITION followed by 
> ALTER_PARTITION doesn't create any task for ALTER_PARTITION event as the 
> partition doesn't exist during analyze phase. Due to this REPL STATUS returns 
> wrong last repl ID.
> Scenario:
> 1. Create DB
> 2. Create partitioned table.
> 3. Bootstrap dump and load
> 4. Insert into table to a dynamically created partition. - This insert 
> generate ADD_PARTITION and ALTER_PARTITION events.
> 5. Incremental dump and load.
> - Load will be successful.
> - But the last repl ID set was incorrect as ALTER_PARTITION event was never 
> applied.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter

2017-09-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165722#comment-16165722
 ] 

Hive QA commented on HIVE-17261:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886803/HIVE-17261.11.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11040 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_char] 
(batchId=9)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=156)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[drop_table_failure2]
 (batchId=89)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.ql.TestAcidOnTez.testCtasTezUnion (batchId=215)
org.apache.hadoop.hive.ql.TestAcidOnTez.testNonStandardConversion01 
(batchId=215)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6809/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6809/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6809/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886803 - PreCommit-HIVE-Build

> Hive use deprecated ParquetInputSplit constructor which blocked parquet 
> dictionary filter
> -
>
> Key: HIVE-17261
> URL: https://issues.apache.org/jira/browse/HIVE-17261
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
> Attachments: HIVE-17261.10.patch, HIVE-17261.11.patch, 
> HIVE-17261.2.patch, HIVE-17261.3.patch, HIVE-17261.4.patch, 
> HIVE-17261.5.patch, HIVE-17261.6.patch, HIVE-17261.7.patch, 
> HIVE-17261.8.patch, HIVE-17261.diff, HIVE-17261.patch
>
>
> Hive use deprecated ParquetInputSplit in 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128]
> Please see interface definition in 
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80]
> Old interface set rowgroupoffset values which will lead to skip dictionary 
> filter in parquet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter

2017-09-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165648#comment-16165648
 ] 

Hive QA commented on HIVE-17261:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886803/HIVE-17261.11.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11040 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_char] 
(batchId=9)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=156)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[drop_table_failure2]
 (batchId=89)
org.apache.hadoop.hive.ql.TestAcidOnTez.testCtasTezUnion (batchId=215)
org.apache.hadoop.hive.ql.TestAcidOnTez.testNonStandardConversion01 
(batchId=215)
org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat.testNewOutputFormat 
(batchId=262)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6808/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6808/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6808/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886803 - PreCommit-HIVE-Build

> Hive use deprecated ParquetInputSplit constructor which blocked parquet 
> dictionary filter
> -
>
> Key: HIVE-17261
> URL: https://issues.apache.org/jira/browse/HIVE-17261
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
> Attachments: HIVE-17261.10.patch, HIVE-17261.11.patch, 
> HIVE-17261.2.patch, HIVE-17261.3.patch, HIVE-17261.4.patch, 
> HIVE-17261.5.patch, HIVE-17261.6.patch, HIVE-17261.7.patch, 
> HIVE-17261.8.patch, HIVE-17261.diff, HIVE-17261.patch
>
>
> Hive use deprecated ParquetInputSplit in 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128]
> Please see interface definition in 
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80]
> Old interface set rowgroupoffset values which will lead to skip dictionary 
> filter in parquet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17530) ClassCastException when converting uniontype

2017-09-13 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-17530:
---
Status: Patch Available  (was: Open)

Attached patch. Also posted RB: https://reviews.apache.org/r/62321/

> ClassCastException when converting uniontype
> 
>
> Key: HIVE-17530
> URL: https://issues.apache.org/jira/browse/HIVE-17530
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-17530.1.patch
>
>
> To repro:
> {noformat}
> SET hive.exec.schema.evolution = false;
> CREATE TABLE avro_orc_partitioned_uniontype (a uniontype) 
> PARTITIONED BY (b int) STORED AS ORC;
> INSERT INTO avro_orc_partitioned_uniontype PARTITION (b=1) SELECT 
> create_union(1, true, value) FROM src LIMIT 5;
> ALTER TABLE avro_orc_partitioned_uniontype SET FILEFORMAT AVRO;
> SELECT * FROM avro_orc_partitioned_uniontype;
> {noformat}
> The exception you get is:
> {code}
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.UnionObject
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17530) ClassCastException when converting uniontype

2017-09-13 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-17530:
---
Attachment: HIVE-17530.1.patch

> ClassCastException when converting uniontype
> 
>
> Key: HIVE-17530
> URL: https://issues.apache.org/jira/browse/HIVE-17530
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-17530.1.patch
>
>
> To repro:
> {noformat}
> SET hive.exec.schema.evolution = false;
> CREATE TABLE avro_orc_partitioned_uniontype (a uniontype) 
> PARTITIONED BY (b int) STORED AS ORC;
> INSERT INTO avro_orc_partitioned_uniontype PARTITION (b=1) SELECT 
> create_union(1, true, value) FROM src LIMIT 5;
> ALTER TABLE avro_orc_partitioned_uniontype SET FILEFORMAT AVRO;
> SELECT * FROM avro_orc_partitioned_uniontype;
> {noformat}
> The exception you get is:
> {code}
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.UnionObject
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17530) ClassCastException when converting uniontype

2017-09-13 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu reassigned HIVE-17530:
--


> ClassCastException when converting uniontype
> 
>
> Key: HIVE-17530
> URL: https://issues.apache.org/jira/browse/HIVE-17530
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 3.0.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>
> To repro:
> {noformat}
> SET hive.exec.schema.evolution = false;
> CREATE TABLE avro_orc_partitioned_uniontype (a uniontype) 
> PARTITIONED BY (b int) STORED AS ORC;
> INSERT INTO avro_orc_partitioned_uniontype PARTITION (b=1) SELECT 
> create_union(1, true, value) FROM src LIMIT 5;
> ALTER TABLE avro_orc_partitioned_uniontype SET FILEFORMAT AVRO;
> SELECT * FROM avro_orc_partitioned_uniontype;
> {noformat}
> The exception you get is:
> {code}
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.UnionObject
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17529) Bucket Map Join : Sets incorrect edge type causing execution failure

2017-09-13 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-17529:
--
Attachment: (was: HIVE-17529.1.patch)

> Bucket Map Join : Sets incorrect edge type causing execution failure
> 
>
> Key: HIVE-17529
> URL: https://issues.apache.org/jira/browse/HIVE-17529
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-17529.2.patch
>
>
> If while traversing the tree to generate tasks, a bucket mapjoin may set its 
> edge as CUSTOM_SIMPLE_EDGE against CUSTOM_EDGE if the bigtable is already not 
> traversed causing Tez to assert and fail the vertex.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17529) Bucket Map Join : Sets incorrect edge type causing execution failure

2017-09-13 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-17529:
--
Attachment: HIVE-17529.2.patch

Uploaded wrong patch

> Bucket Map Join : Sets incorrect edge type causing execution failure
> 
>
> Key: HIVE-17529
> URL: https://issues.apache.org/jira/browse/HIVE-17529
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-17529.2.patch
>
>
> If while traversing the tree to generate tasks, a bucket mapjoin may set its 
> edge as CUSTOM_SIMPLE_EDGE against CUSTOM_EDGE if the bigtable is already not 
> traversed causing Tez to assert and fail the vertex.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17529) Bucket Map Join : Sets incorrect edge type causing execution failure

2017-09-13 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-17529:
--
Attachment: HIVE-17529.1.patch

[~jdere] [~hagleitn] [~gopalv] Can you please review?

Fixed the issue and updated the tests to actually test bucket mapjoin.

> Bucket Map Join : Sets incorrect edge type causing execution failure
> 
>
> Key: HIVE-17529
> URL: https://issues.apache.org/jira/browse/HIVE-17529
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-17529.1.patch
>
>
> If while traversing the tree to generate tasks, a bucket mapjoin may set its 
> edge as CUSTOM_SIMPLE_EDGE against CUSTOM_EDGE if the bigtable is already not 
> traversed causing Tez to assert and fail the vertex.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17529) Bucket Map Join : Sets incorrect edge type causing execution failure

2017-09-13 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-17529:
--
Status: Patch Available  (was: In Progress)

> Bucket Map Join : Sets incorrect edge type causing execution failure
> 
>
> Key: HIVE-17529
> URL: https://issues.apache.org/jira/browse/HIVE-17529
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> If while traversing the tree to generate tasks, a bucket mapjoin may set its 
> edge as CUSTOM_SIMPLE_EDGE against CUSTOM_EDGE if the bigtable is already not 
> traversed causing Tez to assert and fail the vertex.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-17529) Bucket Map Join : Sets incorrect edge type causing execution failure

2017-09-13 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17529 started by Deepak Jaiswal.
-
> Bucket Map Join : Sets incorrect edge type causing execution failure
> 
>
> Key: HIVE-17529
> URL: https://issues.apache.org/jira/browse/HIVE-17529
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> If while traversing the tree to generate tasks, a bucket mapjoin may set its 
> edge as CUSTOM_SIMPLE_EDGE against CUSTOM_EDGE if the bigtable is already not 
> traversed causing Tez to assert and fail the vertex.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17522) cleanup old 'repl dump' dirs

2017-09-13 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165617#comment-16165617
 ] 

Tao Li commented on HIVE-17522:
---

Test looks good (failures are unrelated). [~daijy] Can you please take a look 
at this change?

> cleanup old 'repl dump' dirs
> 
>
> Key: HIVE-17522
> URL: https://issues.apache.org/jira/browse/HIVE-17522
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17522.1.patch
>
>
> We want to clean up the old dump dirs to save space and reduce scan time when 
> needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17522) cleanup old 'repl dump' dirs

2017-09-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165608#comment-16165608
 ] 

Hive QA commented on HIVE-17522:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886788/HIVE-17522.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11040 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[drop_table_failure2]
 (batchId=89)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.ql.TestAcidOnTez.testCtasTezUnion (batchId=215)
org.apache.hadoop.hive.ql.TestAcidOnTez.testNonStandardConversion01 
(batchId=215)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6807/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6807/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6807/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886788 - PreCommit-HIVE-Build

> cleanup old 'repl dump' dirs
> 
>
> Key: HIVE-17522
> URL: https://issues.apache.org/jira/browse/HIVE-17522
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17522.1.patch
>
>
> We want to clean up the old dump dirs to save space and reduce scan time when 
> needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (HIVE-16332) When create a partitioned text format table with one partition, after we change the format of table to orc, then the array type field may output error.

2017-09-13 Thread Zhizhen Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhizhen Hou updated HIVE-16332:
---
Comment: was deleted

(was:  IMHI, the ArrayList.ensureCapacity  does not clear all the data of 
previous row.
 When the size of array of current row is less than that of previous row, it 
data of list will not be fully overwrite and the not overwrite data will be 
output. 
  )

> When create a partitioned text format table with one partition, after we 
> change the format of table to orc, then the array type field may output error.
> ---
>
> Key: HIVE-16332
> URL: https://issues.apache.org/jira/browse/HIVE-16332
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.1
>Reporter: Zhizhen Hou
>Assignee: Zhizhen Hou
>Priority: Critical
>  Labels: patch
> Attachments: HIVE-16332.1.patch
>
>
> ##The step to reproduce the result.
> 1. First crate a text format table with array type field in hive.
> ```
>  create table test_text_orc (
>   col_int bigint,
>   col_text string, 
>   col_array array, 
>   col_map map
>   ) 
>   PARTITIONED BY (
>day string
>)
>ROW FORMAT DELIMITED
>  FIELDS TERMINATED BY ',' 
>  collection items TERMINATED  BY ']'
>  map keys TERMINATED BY ':'
>   ;
>  
> ```
> 2. Create new text file hive-orc-text-file-array-error-test.txt.
> ```
> 1,text_value1,array_value1]array_value2]array_value3, 
> map_key1:map_value1,map_key2:map_value2
> 2,text_value2,array_value4, map_key1:map_value3
> ,text_value3,, map_key1:]map_key3:map_value3
> ```
> 3.  Load the data into one partition.
> ```
>  LOAD DATA local INPATH '.hive-orc-text-file-array-error-test.txt' overwrite 
> into table test_text_orc partition(day=20170329)
> ```
> 4. select the data to verify the result.
> ```
> hive> select * from test.test_text_orc;
> OK
> 1 text_value1 ["array_value1","array_value2","array_value3"]  {" 
> map_key1":"map_value1","map_key2":"map_value2"}  20170329
> 2 text_value2 ["array_value4"]{"map_key1":"map_value3"}   
> 20170329
> NULL  text_value3 []  {" map_key1":"","map_key3":"map_value3"}
> 20170329
> ```
> 5. Alter table format of table to orc;
> ```
>  alter table test_text_orc set fileformat orc;
> ```
> 6. Check the result again, and you can see the  error result.
> ```
> hive> select * from test.test_text_orc;
> OK
> 1 text_value1 ["array_value1","array_value2","array_value3"]  {" 
> map_key1":"map_value1","map_key2":"map_value2"}  20170329
> 2 text_value2 ["array_value4","array_value2","array_value3"]  
> {"map_key1":"map_value3"}   20170329
> NULL  text_value3 ["array_value4","array_value2","array_value3"]  
> {"map_key3":"map_value3"," map_key1":""}20170329
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (HIVE-16332) When create a partitioned text format table with one partition, after we change the format of table to orc, then the array type field may output error.

2017-09-13 Thread Zhizhen Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhizhen Hou updated HIVE-16332:
---
Comment: was deleted

(was: IMHO, the ArrayList.ensureCapacity  does not clear all the data of 
previous row.
  When the size of array of current row is less than that of previous row, 
it data of list will not be fully  overwrite and the not overwrite data will be 
output.)

> When create a partitioned text format table with one partition, after we 
> change the format of table to orc, then the array type field may output error.
> ---
>
> Key: HIVE-16332
> URL: https://issues.apache.org/jira/browse/HIVE-16332
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.1
>Reporter: Zhizhen Hou
>Assignee: Zhizhen Hou
>Priority: Critical
>  Labels: patch
> Attachments: HIVE-16332.1.patch
>
>
> ##The step to reproduce the result.
> 1. First crate a text format table with array type field in hive.
> ```
>  create table test_text_orc (
>   col_int bigint,
>   col_text string, 
>   col_array array, 
>   col_map map
>   ) 
>   PARTITIONED BY (
>day string
>)
>ROW FORMAT DELIMITED
>  FIELDS TERMINATED BY ',' 
>  collection items TERMINATED  BY ']'
>  map keys TERMINATED BY ':'
>   ;
>  
> ```
> 2. Create new text file hive-orc-text-file-array-error-test.txt.
> ```
> 1,text_value1,array_value1]array_value2]array_value3, 
> map_key1:map_value1,map_key2:map_value2
> 2,text_value2,array_value4, map_key1:map_value3
> ,text_value3,, map_key1:]map_key3:map_value3
> ```
> 3.  Load the data into one partition.
> ```
>  LOAD DATA local INPATH '.hive-orc-text-file-array-error-test.txt' overwrite 
> into table test_text_orc partition(day=20170329)
> ```
> 4. select the data to verify the result.
> ```
> hive> select * from test.test_text_orc;
> OK
> 1 text_value1 ["array_value1","array_value2","array_value3"]  {" 
> map_key1":"map_value1","map_key2":"map_value2"}  20170329
> 2 text_value2 ["array_value4"]{"map_key1":"map_value3"}   
> 20170329
> NULL  text_value3 []  {" map_key1":"","map_key3":"map_value3"}
> 20170329
> ```
> 5. Alter table format of table to orc;
> ```
>  alter table test_text_orc set fileformat orc;
> ```
> 6. Check the result again, and you can see the  error result.
> ```
> hive> select * from test.test_text_orc;
> OK
> 1 text_value1 ["array_value1","array_value2","array_value3"]  {" 
> map_key1":"map_value1","map_key2":"map_value2"}  20170329
> 2 text_value2 ["array_value4","array_value2","array_value3"]  
> {"map_key1":"map_value3"}   20170329
> NULL  text_value3 ["array_value4","array_value2","array_value3"]  
> {"map_key3":"map_value3"," map_key1":""}20170329
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-14836) Test the predicate pushing down support for Parquet vectorization read path

2017-09-13 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-14836:

Attachment: HIVE-14836.patch

> Test the predicate pushing down support for Parquet vectorization read path
> ---
>
> Key: HIVE-14836
> URL: https://issues.apache.org/jira/browse/HIVE-14836
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14836.patch
>
>
> Currently we filter blocks using Predict pushing down. We should support it 
> in page reader as well to improve its efficiency. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-14836) Test the predicate pushing down support for Parquet vectorization read path

2017-09-13 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-14836:

Summary: Test the predicate pushing down support for Parquet vectorization 
read path  (was: Implement predicate pushing down in Vectorized Page reader)

> Test the predicate pushing down support for Parquet vectorization read path
> ---
>
> Key: HIVE-14836
> URL: https://issues.apache.org/jira/browse/HIVE-14836
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
>
> Currently we filter blocks using Predict pushing down. We should support it 
> in page reader as well to improve its efficiency. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17529) Bucket Map Join : Sets incorrect edge type causing execution failure

2017-09-13 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-17529:
-


> Bucket Map Join : Sets incorrect edge type causing execution failure
> 
>
> Key: HIVE-17529
> URL: https://issues.apache.org/jira/browse/HIVE-17529
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> If while traversing the tree to generate tasks, a bucket mapjoin may set its 
> edge as CUSTOM_SIMPLE_EDGE against CUSTOM_EDGE if the bigtable is already not 
> traversed causing Tez to assert and fail the vertex.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-09-13 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165582#comment-16165582
 ] 

Prasanth Jayachandran commented on HIVE-15665:
--

+1, pending tests

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.01.patch, HIVE-15665.02.patch, 
> HIVE-15665.03.patch, HIVE-15665.04.patch, HIVE-15665.05.patch, 
> HIVE-15665.06.patch, HIVE-15665.07.patch, HIVE-15665.08.patch, 
> HIVE-15665.09.patch, HIVE-15665.10.patch, HIVE-15665.11.patch, 
> HIVE-15665.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17514) Use SHA-256 for cookie signer to improve security

2017-09-13 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165573#comment-16165573
 ] 

Tao Li commented on HIVE-17514:
---

Tests look good. [~thejas] Can you please take a look at this change?

> Use SHA-256 for cookie signer to improve security
> -
>
> Key: HIVE-17514
> URL: https://issues.apache.org/jira/browse/HIVE-17514
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17514.1.patch
>
>
> See HIVE-17226 for detailed description.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17515) Use SHA-256 for GenericUDFMaskHash to improve security

2017-09-13 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165567#comment-16165567
 ] 

Tao Li commented on HIVE-17515:
---

Tests look good. [~thejas] Can you please take a look?

> Use SHA-256 for GenericUDFMaskHash to improve security
> --
>
> Key: HIVE-17515
> URL: https://issues.apache.org/jira/browse/HIVE-17515
> Project: Hive
>  Issue Type: Sub-task
>  Components: UDF
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17515.1.patch
>
>
> See HIVE-17226 for detailed description.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17515) Use SHA-256 for GenericUDFMaskHash to improve security

2017-09-13 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165567#comment-16165567
 ] 

Tao Li edited comment on HIVE-17515 at 9/14/17 1:05 AM:


Tests look good. [~thejas] Can you please take a look at this change?


was (Author: taoli-hwx):
Tests look good. [~thejas] Can you please take a look?

> Use SHA-256 for GenericUDFMaskHash to improve security
> --
>
> Key: HIVE-17515
> URL: https://issues.apache.org/jira/browse/HIVE-17515
> Project: Hive
>  Issue Type: Sub-task
>  Components: UDF
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17515.1.patch
>
>
> See HIVE-17226 for detailed description.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17482) External LLAP client: acquire locks for tables queried directly by LLAP

2017-09-13 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17482:
--
Attachment: HIVE-17482.2.patch

> External LLAP client: acquire locks for tables queried directly by LLAP
> ---
>
> Key: HIVE-17482
> URL: https://issues.apache.org/jira/browse/HIVE-17482
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17482.1.patch, HIVE-17482.2.patch
>
>
> When using the LLAP external client with simple queries (filter/project of 
> single table), the appropriate locks should be taken on the table being read 
> like they are for normal Hive queries. This is important in the case of 
> transactional tables being queried, since the compactor relies on the 
> presence of table locks to determine whether it can safely delete old 
> versions of compacted files without affecting currently running queries.
> This does not have to happen in the complex query case, since a query is used 
> (with the appropriate locking mechanisms) to create/populate the temp table 
> holding the results to the complex query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17482) External LLAP client: acquire locks for tables queried directly by LLAP

2017-09-13 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17482:
--
Attachment: (was: HIVE-17482.2.patch)

> External LLAP client: acquire locks for tables queried directly by LLAP
> ---
>
> Key: HIVE-17482
> URL: https://issues.apache.org/jira/browse/HIVE-17482
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17482.1.patch
>
>
> When using the LLAP external client with simple queries (filter/project of 
> single table), the appropriate locks should be taken on the table being read 
> like they are for normal Hive queries. This is important in the case of 
> transactional tables being queried, since the compactor relies on the 
> presence of table locks to determine whether it can safely delete old 
> versions of compacted files without affecting currently running queries.
> This does not have to happen in the complex query case, since a query is used 
> (with the appropriate locking mechanisms) to create/populate the temp table 
> holding the results to the complex query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17508) Implement pool rules and triggers based on counters

2017-09-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165547#comment-16165547
 ] 

Sergey Shelukhin commented on HIVE-17508:
-

General comment: rules can change during execution, if RP is updated or if 
query moves to a different pool. Should not be attached to a query but rather 
applied to query by workload management. That will also affect the location of 
the code probably, i.e. TezJobMonitor.
I didn't review a lot of structure for now cause it will probably change when 
merging.

> Implement pool rules and triggers based on counters
> ---
>
> Key: HIVE-17508
> URL: https://issues.apache.org/jira/browse/HIVE-17508
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17508.1.patch, HIVE-17508.WIP.2.patch, 
> HIVE-17508.WIP.patch
>
>
> Workload management can defined Rules that are bound to a resource plan. Each 
> rule can have a trigger expression and an action associated with it. Trigger 
> expressions are evaluated at runtime after configurable check interval, based 
> on which actions like killing a query, moving a query to different pool etc. 
> will get invoked. Simple rule could be something like
> {code}
> CREATE RULE slow_query IN resource_plan_name
> WHEN execution_time_ms > 1
> MOVE TO slow_queue
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17482) External LLAP client: acquire locks for tables queried directly by LLAP

2017-09-13 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17482:
--
Attachment: HIVE-17482.2.patch

> External LLAP client: acquire locks for tables queried directly by LLAP
> ---
>
> Key: HIVE-17482
> URL: https://issues.apache.org/jira/browse/HIVE-17482
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17482.1.patch, HIVE-17482.2.patch
>
>
> When using the LLAP external client with simple queries (filter/project of 
> single table), the appropriate locks should be taken on the table being read 
> like they are for normal Hive queries. This is important in the case of 
> transactional tables being queried, since the compactor relies on the 
> presence of table locks to determine whether it can safely delete old 
> versions of compacted files without affecting currently running queries.
> This does not have to happen in the complex query case, since a query is used 
> (with the appropriate locking mechanisms) to create/populate the temp table 
> holding the results to the complex query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17482) External LLAP client: acquire locks for tables queried directly by LLAP

2017-09-13 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165540#comment-16165540
 ] 

Jason Dere commented on HIVE-17482:
---

So this has not been possible test using a unit test - will test out on a 
cluster.
Thanks for pointing out that a new transaction is opened during 
Driver.compile(), that was not being accounted for in the patch and you're 
right that the new TxnManager would not have worked correctly because of that. 
Reworking the patch so that the Driver class can be initialized with the 
TxnManager so that it does not have to use the one from the SessionState. 
Though the default behavior will be to use the SessionState txn manager as 
before.
The configuration will not be clobbered by the next query - each query will get 
a new Configuration which will then be passed along in the LLAP input splits.

> External LLAP client: acquire locks for tables queried directly by LLAP
> ---
>
> Key: HIVE-17482
> URL: https://issues.apache.org/jira/browse/HIVE-17482
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17482.1.patch
>
>
> When using the LLAP external client with simple queries (filter/project of 
> single table), the appropriate locks should be taken on the table being read 
> like they are for normal Hive queries. This is important in the case of 
> transactional tables being queried, since the compactor relies on the 
> presence of table locks to determine whether it can safely delete old 
> versions of compacted files without affecting currently running queries.
> This does not have to happen in the complex query case, since a query is used 
> (with the appropriate locking mechanisms) to create/populate the temp table 
> holding the results to the complex query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-09-13 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165535#comment-16165535
 ] 

Zhiyuan Yang commented on HIVE-17386:
-

Sure, I will review it soon.

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.only.patch, HIVE-17386.01.patch, 
> HIVE-17386.01.patch, HIVE-17386.02.patch, HIVE-17386.03.patch, 
> HIVE-17386.04.patch, HIVE-17386.only.patch, HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-09-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165534#comment-16165534
 ] 

Sergey Shelukhin commented on HIVE-17386:
-

Test failures are unrelated. [~aplusplus] can you please take a look at the 
update? thanks

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.only.patch, HIVE-17386.01.patch, 
> HIVE-17386.01.patch, HIVE-17386.02.patch, HIVE-17386.03.patch, 
> HIVE-17386.04.patch, HIVE-17386.only.patch, HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-09-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165529#comment-16165529
 ] 

Sergey Shelukhin commented on HIVE-15665:
-

Test failures are unrelated. [~prasanth_j] can you take a look at the update?

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.01.patch, HIVE-15665.02.patch, 
> HIVE-15665.03.patch, HIVE-15665.04.patch, HIVE-15665.05.patch, 
> HIVE-15665.06.patch, HIVE-15665.07.patch, HIVE-15665.08.patch, 
> HIVE-15665.09.patch, HIVE-15665.10.patch, HIVE-15665.11.patch, 
> HIVE-15665.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17473) implement workload management pools

2017-09-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165522#comment-16165522
 ] 

Hive QA commented on HIVE-17473:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886780/HIVE-17473.01.only.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6806/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6806/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6806/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-09-14 00:23:32.128
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-6806/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-09-14 00:23:32.131
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at ea9ebe4 HIVE-17430: Add LOAD DATA test for blobstores (Yuzhou 
Sun, reviewed by Sergio Pena)
+ git clean -f -d
Removing 
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/MetadataCache.java
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at ea9ebe4 HIVE-17430: Add LOAD DATA test for blobstores (Yuzhou 
Sun, reviewed by Sergio Pena)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-09-14 00:23:36.889
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java:50
error: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java: 
patch does not apply
error: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java: No 
such file or directory
error: ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestWorkloadManager.java: 
No such file or directory
error: patch failed: 
service/src/java/org/apache/hive/service/server/HiveServer2.java:165
error: service/src/java/org/apache/hive/service/server/HiveServer2.java: patch 
does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886780 - PreCommit-HIVE-Build

> implement workload management pools
> ---
>
> Key: HIVE-17473
> URL: https://issues.apache.org/jira/browse/HIVE-17473
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17473.01.only.patch, HIVE-17473.01.patch, 
> HIVE-17473.only.patch, HIVE-17473.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-09-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165519#comment-16165519
 ] 

Hive QA commented on HIVE-15665:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886777/HIVE-15665.11.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11040 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=156)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[drop_table_failure2]
 (batchId=89)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=234)
org.apache.hadoop.hive.ql.TestAcidOnTez.testCtasTezUnion (batchId=215)
org.apache.hadoop.hive.ql.TestAcidOnTez.testNonStandardConversion01 
(batchId=215)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6805/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6805/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6805/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886777 - PreCommit-HIVE-Build

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.01.patch, HIVE-15665.02.patch, 
> HIVE-15665.03.patch, HIVE-15665.04.patch, HIVE-15665.05.patch, 
> HIVE-15665.06.patch, HIVE-15665.07.patch, HIVE-15665.08.patch, 
> HIVE-15665.09.patch, HIVE-15665.10.patch, HIVE-15665.11.patch, 
> HIVE-15665.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17465) Statistics: Drill-down filters don't reduce row-counts progressively

2017-09-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17465:
---
Status: Patch Available  (was: Open)

> Statistics: Drill-down filters don't reduce row-counts progressively
> 
>
> Key: HIVE-17465
> URL: https://issues.apache.org/jira/browse/HIVE-17465
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-17465.1.patch, HIVE-17465.2.patch, 
> HIVE-17465.3.patch, HIVE-17465.4.patch, HIVE-17465.5.patch
>
>
> {code}
> explain select count(d_date_sk) from date_dim where d_year=2001 ;
> explain select count(d_date_sk) from date_dim where d_year=2001  and d_moy = 
> 9;
> explain select count(d_date_sk) from date_dim where d_year=2001 and d_moy = 9 
> and d_dom = 21;
> {code}
> All 3 queries end up with the same row-count estimates after the filter.
> {code}
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_year = 2001) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_year = 2001) (type: boolean)
> Statistics: Num rows: 363 Data size: 4356 Basic stats: 
> COMPLETE Column stats: COMPLETE
>  
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
> Statistics: Num rows: 363 Data size: 5808 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
> Statistics: Num rows: 363 Data size: 7260 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17465) Statistics: Drill-down filters don't reduce row-counts progressively

2017-09-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17465:
---
Attachment: HIVE-17465.5.patch

Latest patch(5) addresses review comments.

> Statistics: Drill-down filters don't reduce row-counts progressively
> 
>
> Key: HIVE-17465
> URL: https://issues.apache.org/jira/browse/HIVE-17465
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-17465.1.patch, HIVE-17465.2.patch, 
> HIVE-17465.3.patch, HIVE-17465.4.patch, HIVE-17465.5.patch
>
>
> {code}
> explain select count(d_date_sk) from date_dim where d_year=2001 ;
> explain select count(d_date_sk) from date_dim where d_year=2001  and d_moy = 
> 9;
> explain select count(d_date_sk) from date_dim where d_year=2001 and d_moy = 9 
> and d_dom = 21;
> {code}
> All 3 queries end up with the same row-count estimates after the filter.
> {code}
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_year = 2001) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_year = 2001) (type: boolean)
> Statistics: Num rows: 363 Data size: 4356 Basic stats: 
> COMPLETE Column stats: COMPLETE
>  
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
> Statistics: Num rows: 363 Data size: 5808 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
> Statistics: Num rows: 363 Data size: 7260 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17465) Statistics: Drill-down filters don't reduce row-counts progressively

2017-09-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17465:
---
Status: Open  (was: Patch Available)

> Statistics: Drill-down filters don't reduce row-counts progressively
> 
>
> Key: HIVE-17465
> URL: https://issues.apache.org/jira/browse/HIVE-17465
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-17465.1.patch, HIVE-17465.2.patch, 
> HIVE-17465.3.patch, HIVE-17465.4.patch
>
>
> {code}
> explain select count(d_date_sk) from date_dim where d_year=2001 ;
> explain select count(d_date_sk) from date_dim where d_year=2001  and d_moy = 
> 9;
> explain select count(d_date_sk) from date_dim where d_year=2001 and d_moy = 9 
> and d_dom = 21;
> {code}
> All 3 queries end up with the same row-count estimates after the filter.
> {code}
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_year = 2001) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_year = 2001) (type: boolean)
> Statistics: Num rows: 363 Data size: 4356 Basic stats: 
> COMPLETE Column stats: COMPLETE
>  
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
> Statistics: Num rows: 363 Data size: 5808 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
> Statistics: Num rows: 363 Data size: 7260 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17493) Improve PKFK cardinality estimation in Physical planning

2017-09-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165481#comment-16165481
 ] 

Ashutosh Chauhan commented on HIVE-17493:
-

+1 pending tests

> Improve PKFK cardinality estimation in Physical planning
> 
>
> Key: HIVE-17493
> URL: https://issues.apache.org/jira/browse/HIVE-17493
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17493.1.patch, HIVE-17493.2.patch, 
> HIVE-17493.3.patch
>
>
> Cardinality estimation of a join, after PK-FK relation has been ascertained, 
> could be improved if parent of the join operator is LEFT outer or RIGHT outer 
> join.
> Currently estimation is done by estimating reduction of rows occurred on PK 
> side, then multiplying the reduction to FK side row count. This estimation of 
> reduction currently doesn't distinguish b/w INNER or OUTER joins. This could 
> be improved to handle outer joins better.
> TPC-DS query45 is impacted by this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17508) Implement pool rules and triggers based on counters

2017-09-13 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165464#comment-16165464
 ] 

Prasanth Jayachandran commented on HIVE-17508:
--

[~sershe] Initial patch up for review. This currently is not hooked up with 
metastore rules yet (driven by configs now).
Also for elapsed time, we can map the rule to hive.query.timeout.seconds which 
HS2 already handles. Will add qtests after metastore integration. 


> Implement pool rules and triggers based on counters
> ---
>
> Key: HIVE-17508
> URL: https://issues.apache.org/jira/browse/HIVE-17508
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17508.1.patch, HIVE-17508.WIP.2.patch, 
> HIVE-17508.WIP.patch
>
>
> Workload management can defined Rules that are bound to a resource plan. Each 
> rule can have a trigger expression and an action associated with it. Trigger 
> expressions are evaluated at runtime after configurable check interval, based 
> on which actions like killing a query, moving a query to different pool etc. 
> will get invoked. Simple rule could be something like
> {code}
> CREATE RULE slow_query IN resource_plan_name
> WHEN execution_time_ms > 1
> MOVE TO slow_queue
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17508) Implement pool rules and triggers based on counters

2017-09-13 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17508:
-
Status: Patch Available  (was: Open)

> Implement pool rules and triggers based on counters
> ---
>
> Key: HIVE-17508
> URL: https://issues.apache.org/jira/browse/HIVE-17508
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17508.1.patch, HIVE-17508.WIP.2.patch, 
> HIVE-17508.WIP.patch
>
>
> Workload management can defined Rules that are bound to a resource plan. Each 
> rule can have a trigger expression and an action associated with it. Trigger 
> expressions are evaluated at runtime after configurable check interval, based 
> on which actions like killing a query, moving a query to different pool etc. 
> will get invoked. Simple rule could be something like
> {code}
> CREATE RULE slow_query IN resource_plan_name
> WHEN execution_time_ms > 1
> MOVE TO slow_queue
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17508) Implement pool rules and triggers based on counters

2017-09-13 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17508:
-
Attachment: HIVE-17508.1.patch

> Implement pool rules and triggers based on counters
> ---
>
> Key: HIVE-17508
> URL: https://issues.apache.org/jira/browse/HIVE-17508
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17508.1.patch, HIVE-17508.WIP.2.patch, 
> HIVE-17508.WIP.patch
>
>
> Workload management can defined Rules that are bound to a resource plan. Each 
> rule can have a trigger expression and an action associated with it. Trigger 
> expressions are evaluated at runtime after configurable check interval, based 
> on which actions like killing a query, moving a query to different pool etc. 
> will get invoked. Simple rule could be something like
> {code}
> CREATE RULE slow_query IN resource_plan_name
> WHEN execution_time_ms > 1
> MOVE TO slow_queue
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17511) Error while populating orc cache in llap

2017-09-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165443#comment-16165443
 ] 

Sergey Shelukhin commented on HIVE-17511:
-

Frankly the only explanation that I can see is if the same ProcCacheChunk is 
returned twice from the object pool due to a bug (or returned TO the pool twice 
by a thread and then legitimately returned twice from the pool). It seems like 
lists from two racing threads are merged at an item while both threads are just 
straightforwardly uncompressing ORC CBs linearly from a 100% cache miss. At the 
same time, given that after getting the item it's initialized, I'd expect one 
of the lists to have a completely invalid item, whereas here looks like only 
one link is invalid while all lists are contiguous, without the item that would 
be overwritten. So it's really weird. Looking at it now.
Looks like the ordering checks patch that could have made the error clearer is 
missing from this build, I'm  backporting it for now. 
Pool has pretty good multi-threaded tests so not sure yet how can this happen.


> Error while populating orc cache in llap
> 
>
> Key: HIVE-17511
> URL: https://issues.apache.org/jira/browse/HIVE-17511
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Ashutosh Chauhan
>Assignee: Sergey Shelukhin
>
> Observed that while querying an error is thrown while loading cache in llap 
> daemons



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17523) Insert into druid table hangs Hive server2 in an infinit loop

2017-09-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165433#comment-16165433
 ] 

Hive QA commented on HIVE-17523:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886768/HIVE-17523.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11041 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=156)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[drop_table_failure2]
 (batchId=89)
org.apache.hadoop.hive.ql.TestAcidOnTez.testCtasTezUnion (batchId=215)
org.apache.hadoop.hive.ql.TestAcidOnTez.testNonStandardConversion01 
(batchId=215)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6804/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6804/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6804/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886768 - PreCommit-HIVE-Build

> Insert into druid table  hangs Hive server2 in an infinit loop
> --
>
> Key: HIVE-17523
> URL: https://issues.apache.org/jira/browse/HIVE-17523
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: Nishant Bangarwa
>  Labels: pull-request-available
> Attachments: HIVE-17523.patch
>
>
> Inserting data via insert into table backed by druid can lead to a Hive 
> server hang.
> This is due to some bug in the naming of druid segments partitions.
> To reproduce the issue 
> {code}
> drop table login_hive;
> create table login_hive(`timecolumn` timestamp, `userid` string, `num_l` 
> double);
> insert into login_hive values ('2015-01-01 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-01 01:00:00', 'user2', 4);
> insert into login_hive values ('2015-01-01 02:00:00', 'user3', 2);
> insert into login_hive values ('2015-01-02 00:00:00', 'user1', 1);
> insert into login_hive values ('2015-01-02 01:00:00', 'user2', 2);
> insert into login_hive values ('2015-01-02 02:00:00', 'user3', 8);
> insert into login_hive values ('2015-01-03 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-03 01:00:00', 'user2', 9);
> insert into login_hive values ('2015-01-03 04:00:00', 'user3', 2);
> insert into login_hive values ('2015-03-09 00:00:00', 'user3', 5);
> insert into login_hive values ('2015-03-09 01:00:00', 'user1', 0);
> insert into login_hive values ('2015-03-09 05:00:00', 'user2', 0);
> drop table login_druid;
> CREATE TABLE login_druid
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "druid_login_test_tmp", 
> "druid.segment.granularity" = "DAY", "druid.query.granularity" = "HOUR")
> AS
> select `timecolumn` as `__time`, `userid`, `num_l` FROM login_hive;
> select * FROM login_druid;
> insert into login_druid values ('2015-03-09 05:00:00', 'user4', 0); 
> {code}
> This patch unifies the logic of pushing and segments naming by using Druid 
> data segment pusher as much as possible.
> This patch also has some minor code refactoring and test enhancements.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17196) CM: ReplCopyTask should retain the original file names even if copied from CM path.

2017-09-13 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165407#comment-16165407
 ] 

Daniel Dai commented on HIVE-17196:
---

[~sankarh], can you review?

> CM: ReplCopyTask should retain the original file names even if copied from CM 
> path.
> ---
>
> Key: HIVE-17196
> URL: https://issues.apache.org/jira/browse/HIVE-17196
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Daniel Dai
> Fix For: 3.0.0
>
> Attachments: HIVE-17196.1.patch
>
>
> Consider the below scenario,
> 1. Insert into table T1 with value(X).
> 2. Insert into table T1 with value(X).
> 3. Truncate the table T1. 
> – This step backs up 2 files with same content to cmroot which ends up with 
> one file in cmroot as checksum matches.
> 4. Incremental repl with above 3 operations.
> – In this step, both the insert event files will be read from cmroot where 
> copy of one leads to overwrite the other one as the file name is same in cm 
> path (checksum as file name).
> So, this leads to data loss and hence it is necessary to retain the original 
> file names even if we copy from cm path.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17508) Implement pool rules and triggers based on counters

2017-09-13 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17508:
-
Attachment: HIVE-17508.WIP.2.patch

Added simple expression parsing from string. Some more unit tests. 

> Implement pool rules and triggers based on counters
> ---
>
> Key: HIVE-17508
> URL: https://issues.apache.org/jira/browse/HIVE-17508
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17508.WIP.2.patch, HIVE-17508.WIP.patch
>
>
> Workload management can defined Rules that are bound to a resource plan. Each 
> rule can have a trigger expression and an action associated with it. Trigger 
> expressions are evaluated at runtime after configurable check interval, based 
> on which actions like killing a query, moving a query to different pool etc. 
> will get invoked. Simple rule could be something like
> {code}
> CREATE RULE slow_query IN resource_plan_name
> WHEN execution_time_ms > 1
> MOVE TO slow_queue
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17493) Improve PKFK cardinality estimation in Physical planning

2017-09-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17493:
---
Attachment: HIVE-17493.3.patch

> Improve PKFK cardinality estimation in Physical planning
> 
>
> Key: HIVE-17493
> URL: https://issues.apache.org/jira/browse/HIVE-17493
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17493.1.patch, HIVE-17493.2.patch, 
> HIVE-17493.3.patch
>
>
> Cardinality estimation of a join, after PK-FK relation has been ascertained, 
> could be improved if parent of the join operator is LEFT outer or RIGHT outer 
> join.
> Currently estimation is done by estimating reduction of rows occurred on PK 
> side, then multiplying the reduction to FK side row count. This estimation of 
> reduction currently doesn't distinguish b/w INNER or OUTER joins. This could 
> be improved to handle outer joins better.
> TPC-DS query45 is impacted by this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17493) Improve PKFK cardinality estimation in Physical planning

2017-09-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17493:
---
Status: Patch Available  (was: Open)

> Improve PKFK cardinality estimation in Physical planning
> 
>
> Key: HIVE-17493
> URL: https://issues.apache.org/jira/browse/HIVE-17493
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17493.1.patch, HIVE-17493.2.patch, 
> HIVE-17493.3.patch
>
>
> Cardinality estimation of a join, after PK-FK relation has been ascertained, 
> could be improved if parent of the join operator is LEFT outer or RIGHT outer 
> join.
> Currently estimation is done by estimating reduction of rows occurred on PK 
> side, then multiplying the reduction to FK side row count. This estimation of 
> reduction currently doesn't distinguish b/w INNER or OUTER joins. This could 
> be improved to handle outer joins better.
> TPC-DS query45 is impacted by this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17493) Improve PKFK cardinality estimation in Physical planning

2017-09-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17493:
---
Status: Open  (was: Patch Available)

> Improve PKFK cardinality estimation in Physical planning
> 
>
> Key: HIVE-17493
> URL: https://issues.apache.org/jira/browse/HIVE-17493
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17493.1.patch, HIVE-17493.2.patch, 
> HIVE-17493.3.patch
>
>
> Cardinality estimation of a join, after PK-FK relation has been ascertained, 
> could be improved if parent of the join operator is LEFT outer or RIGHT outer 
> join.
> Currently estimation is done by estimating reduction of rows occurred on PK 
> side, then multiplying the reduction to FK side row count. This estimation of 
> reduction currently doesn't distinguish b/w INNER or OUTER joins. This could 
> be improved to handle outer joins better.
> TPC-DS query45 is impacted by this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17465) Statistics: Drill-down filters don't reduce row-counts progressively

2017-09-13 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165381#comment-16165381
 ] 

Vineet Garg commented on HIVE-17465:


[~ashutoshc] New patch is uploaded and review board link is linked to the JIRA.

> Statistics: Drill-down filters don't reduce row-counts progressively
> 
>
> Key: HIVE-17465
> URL: https://issues.apache.org/jira/browse/HIVE-17465
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-17465.1.patch, HIVE-17465.2.patch, 
> HIVE-17465.3.patch, HIVE-17465.4.patch
>
>
> {code}
> explain select count(d_date_sk) from date_dim where d_year=2001 ;
> explain select count(d_date_sk) from date_dim where d_year=2001  and d_moy = 
> 9;
> explain select count(d_date_sk) from date_dim where d_year=2001 and d_moy = 9 
> and d_dom = 21;
> {code}
> All 3 queries end up with the same row-count estimates after the filter.
> {code}
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_year = 2001) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_year = 2001) (type: boolean)
> Statistics: Num rows: 363 Data size: 4356 Basic stats: 
> COMPLETE Column stats: COMPLETE
>  
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
> Statistics: Num rows: 363 Data size: 5808 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
> Statistics: Num rows: 363 Data size: 7260 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17465) Statistics: Drill-down filters don't reduce row-counts progressively

2017-09-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17465:
---
Status: Patch Available  (was: Open)

> Statistics: Drill-down filters don't reduce row-counts progressively
> 
>
> Key: HIVE-17465
> URL: https://issues.apache.org/jira/browse/HIVE-17465
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-17465.1.patch, HIVE-17465.2.patch, 
> HIVE-17465.3.patch, HIVE-17465.4.patch
>
>
> {code}
> explain select count(d_date_sk) from date_dim where d_year=2001 ;
> explain select count(d_date_sk) from date_dim where d_year=2001  and d_moy = 
> 9;
> explain select count(d_date_sk) from date_dim where d_year=2001 and d_moy = 9 
> and d_dom = 21;
> {code}
> All 3 queries end up with the same row-count estimates after the filter.
> {code}
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_year = 2001) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_year = 2001) (type: boolean)
> Statistics: Num rows: 363 Data size: 4356 Basic stats: 
> COMPLETE Column stats: COMPLETE
>  
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
> Statistics: Num rows: 363 Data size: 5808 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
> Statistics: Num rows: 363 Data size: 7260 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17465) Statistics: Drill-down filters don't reduce row-counts progressively

2017-09-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17465:
---
Attachment: HIVE-17465.4.patch

> Statistics: Drill-down filters don't reduce row-counts progressively
> 
>
> Key: HIVE-17465
> URL: https://issues.apache.org/jira/browse/HIVE-17465
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-17465.1.patch, HIVE-17465.2.patch, 
> HIVE-17465.3.patch, HIVE-17465.4.patch
>
>
> {code}
> explain select count(d_date_sk) from date_dim where d_year=2001 ;
> explain select count(d_date_sk) from date_dim where d_year=2001  and d_moy = 
> 9;
> explain select count(d_date_sk) from date_dim where d_year=2001 and d_moy = 9 
> and d_dom = 21;
> {code}
> All 3 queries end up with the same row-count estimates after the filter.
> {code}
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_year = 2001) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_year = 2001) (type: boolean)
> Statistics: Num rows: 363 Data size: 4356 Basic stats: 
> COMPLETE Column stats: COMPLETE
>  
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
> Statistics: Num rows: 363 Data size: 5808 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
> Statistics: Num rows: 363 Data size: 7260 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17465) Statistics: Drill-down filters don't reduce row-counts progressively

2017-09-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17465:
---
Status: Open  (was: Patch Available)

> Statistics: Drill-down filters don't reduce row-counts progressively
> 
>
> Key: HIVE-17465
> URL: https://issues.apache.org/jira/browse/HIVE-17465
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-17465.1.patch, HIVE-17465.2.patch, 
> HIVE-17465.3.patch, HIVE-17465.4.patch
>
>
> {code}
> explain select count(d_date_sk) from date_dim where d_year=2001 ;
> explain select count(d_date_sk) from date_dim where d_year=2001  and d_moy = 
> 9;
> explain select count(d_date_sk) from date_dim where d_year=2001 and d_moy = 9 
> and d_dom = 21;
> {code}
> All 3 queries end up with the same row-count estimates after the filter.
> {code}
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_year = 2001) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_year = 2001) (type: boolean)
> Statistics: Num rows: 363 Data size: 4356 Basic stats: 
> COMPLETE Column stats: COMPLETE
>  
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
> Statistics: Num rows: 363 Data size: 5808 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
> Statistics: Num rows: 363 Data size: 7260 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17488) Move first set of classes to standalone metastore

2017-09-13 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165335#comment-16165335
 ] 

Alan Gates commented on HIVE-17488:
---

Most of these test failures are regular.  TestAcidOnTez fails for me on master. 
 TestExportImport passes for me on master and with this patch.

> Move first set of classes to standalone metastore
> -
>
> Key: HIVE-17488
> URL: https://issues.apache.org/jira/browse/HIVE-17488
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17488.2.patch, HIVE-17488.3.patch, HIVE-17488.patch
>
>
> There are a whole set of classes that can be moved with few changes other 
> than the config file, shims, etc.  This task will move those classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-09-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165328#comment-16165328
 ] 

Hive QA commented on HIVE-17386:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886767/HIVE-17386.04.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11046 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=156)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[drop_table_failure2]
 (batchId=89)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.ql.TestAcidOnTez.testCtasTezUnion (batchId=215)
org.apache.hadoop.hive.ql.TestAcidOnTez.testNonStandardConversion01 
(batchId=215)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6803/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6803/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6803/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886767 - PreCommit-HIVE-Build

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.only.patch, HIVE-17386.01.patch, 
> HIVE-17386.01.patch, HIVE-17386.02.patch, HIVE-17386.03.patch, 
> HIVE-17386.04.patch, HIVE-17386.only.patch, HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17465) Statistics: Drill-down filters don't reduce row-counts progressively

2017-09-13 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165300#comment-16165300
 ] 

Vineet Garg commented on HIVE-17465:


[~ashutoshc] I am investigating few suspicious test failures. I'll create RB as 
soon as I am done with the investigation.

> Statistics: Drill-down filters don't reduce row-counts progressively
> 
>
> Key: HIVE-17465
> URL: https://issues.apache.org/jira/browse/HIVE-17465
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-17465.1.patch, HIVE-17465.2.patch, 
> HIVE-17465.3.patch
>
>
> {code}
> explain select count(d_date_sk) from date_dim where d_year=2001 ;
> explain select count(d_date_sk) from date_dim where d_year=2001  and d_moy = 
> 9;
> explain select count(d_date_sk) from date_dim where d_year=2001 and d_moy = 9 
> and d_dom = 21;
> {code}
> All 3 queries end up with the same row-count estimates after the filter.
> {code}
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_year = 2001) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_year = 2001) (type: boolean)
> Statistics: Num rows: 363 Data size: 4356 Basic stats: 
> COMPLETE Column stats: COMPLETE
>  
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
> Statistics: Num rows: 363 Data size: 5808 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
> Statistics: Num rows: 363 Data size: 7260 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17493) Improve PKFK cardinality estimation in Physical planning

2017-09-13 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165298#comment-16165298
 ] 

Vineet Garg commented on HIVE-17493:


Yeah looks like it. I'll rebase and re-upload the patch.

> Improve PKFK cardinality estimation in Physical planning
> 
>
> Key: HIVE-17493
> URL: https://issues.apache.org/jira/browse/HIVE-17493
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17493.1.patch, HIVE-17493.2.patch
>
>
> Cardinality estimation of a join, after PK-FK relation has been ascertained, 
> could be improved if parent of the join operator is LEFT outer or RIGHT outer 
> join.
> Currently estimation is done by estimating reduction of rows occurred on PK 
> side, then multiplying the reduction to FK side row count. This estimation of 
> reduction currently doesn't distinguish b/w INNER or OUTER joins. This could 
> be improved to handle outer joins better.
> TPC-DS query45 is impacted by this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17493) Improve PKFK cardinality estimation in Physical planning

2017-09-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165297#comment-16165297
 ] 

Ashutosh Chauhan commented on HIVE-17493:
-

seems like rebase went wrong? contains many irrelevant changes.

> Improve PKFK cardinality estimation in Physical planning
> 
>
> Key: HIVE-17493
> URL: https://issues.apache.org/jira/browse/HIVE-17493
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17493.1.patch, HIVE-17493.2.patch
>
>
> Cardinality estimation of a join, after PK-FK relation has been ascertained, 
> could be improved if parent of the join operator is LEFT outer or RIGHT outer 
> join.
> Currently estimation is done by estimating reduction of rows occurred on PK 
> side, then multiplying the reduction to FK side row count. This estimation of 
> reduction currently doesn't distinguish b/w INNER or OUTER joins. This could 
> be improved to handle outer joins better.
> TPC-DS query45 is impacted by this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17496) Bootstrap repl is not cleaning up staging dirs

2017-09-13 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17496:
--
Status: Open  (was: Patch Available)

> Bootstrap repl is not cleaning up staging dirs
> --
>
> Key: HIVE-17496
> URL: https://issues.apache.org/jira/browse/HIVE-17496
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17496.1.patch, HIVE-17496.2.patch, 
> HIVE-17496.3.patch
>
>
> This will put more pressure on the HDFS file limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17496) Bootstrap repl is not cleaning up staging dirs

2017-09-13 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17496:
--
Status: Patch Available  (was: Open)

> Bootstrap repl is not cleaning up staging dirs
> --
>
> Key: HIVE-17496
> URL: https://issues.apache.org/jira/browse/HIVE-17496
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17496.1.patch, HIVE-17496.2.patch, 
> HIVE-17496.3.patch
>
>
> This will put more pressure on the HDFS file limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17496) Bootstrap repl is not cleaning up staging dirs

2017-09-13 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17496:
--
Attachment: HIVE-17496.3.patch

> Bootstrap repl is not cleaning up staging dirs
> --
>
> Key: HIVE-17496
> URL: https://issues.apache.org/jira/browse/HIVE-17496
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17496.1.patch, HIVE-17496.2.patch, 
> HIVE-17496.3.patch
>
>
> This will put more pressure on the HDFS file limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17496) Bootstrap repl is not cleaning up staging dirs

2017-09-13 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17496:
--
Attachment: (was: HIVE-17496.3.patch)

> Bootstrap repl is not cleaning up staging dirs
> --
>
> Key: HIVE-17496
> URL: https://issues.apache.org/jira/browse/HIVE-17496
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17496.1.patch, HIVE-17496.2.patch, 
> HIVE-17496.3.patch
>
>
> This will put more pressure on the HDFS file limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-13 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165262#comment-16165262
 ] 

Mithun Radhakrishnan commented on HIVE-17466:
-

+1. The failures are unrelated, and tracked separately. I'll check this in 
shortly, unless there is objection.

> Metastore API to list unique partition-key-value combinations
> -
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17466.1.patch, HIVE-17466.2-branch-2.patch, 
> HIVE-17466.2.patch, HIVE-17466.3.patch
>
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch 
> workflows based on the availability of table/partitions. Partitions are 
> currently discovered by listing partitions using (what boils down to) 
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
> given that {{Partition}} objects are heavyweight and carry redundant 
> information. The alternative is to use partition-names, which will need 
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been 
> published already, it would be preferable to have an API that pushed down 
> part-key extraction into the {{RawStore}} layer, and returned key-values as 
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM 
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17493) Improve PKFK cardinality estimation in Physical planning

2017-09-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165215#comment-16165215
 ] 

Hive QA commented on HIVE-17493:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886757/HIVE-17493.2.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11036 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=156)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=234)
org.apache.hadoop.hive.ql.TestAcidOnTez.testCtasTezUnion (batchId=215)
org.apache.hadoop.hive.ql.TestAcidOnTez.testNonStandardConversion01 
(batchId=215)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6802/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6802/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6802/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886757 - PreCommit-HIVE-Build

> Improve PKFK cardinality estimation in Physical planning
> 
>
> Key: HIVE-17493
> URL: https://issues.apache.org/jira/browse/HIVE-17493
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17493.1.patch, HIVE-17493.2.patch
>
>
> Cardinality estimation of a join, after PK-FK relation has been ascertained, 
> could be improved if parent of the join operator is LEFT outer or RIGHT outer 
> join.
> Currently estimation is done by estimating reduction of rows occurred on PK 
> side, then multiplying the reduction to FK side row count. This estimation of 
> reduction currently doesn't distinguish b/w INNER or OUTER joins. This could 
> be improved to handle outer joins better.
> TPC-DS query45 is impacted by this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17527) Support replication for rename/move table across database

2017-09-13 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17527:

Status: Patch Available  (was: Open)

> Support replication for rename/move table across database
> -
>
> Key: HIVE-17527
> URL: https://issues.apache.org/jira/browse/HIVE-17527
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, pull-request-available, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17527.01.patch
>
>
> Rename/move table across database should be supported for replication. The 
> scenario is as follows.
> 1. Create 2 databases (db1 and db2) in source cluster.
> 2. Create the table db1.tbl1.
> 3. Run bootstrap replication for db1 and db2 to target cluster.
> 4. Rename db1.tbl1 to db2.tbl1 in source.
> 5. Run incremental replication for both db1 and db2.
> - db1 dump missed the rename table operation as no event is generated for 
> db1. So, table exist after load.
> - db2 load skips the rename event as the source table is missing in target.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17527) Support replication for rename/move table across database

2017-09-13 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17527:

Attachment: HIVE-17527.01.patch

Added 01.patch with below updates.
- Modified rename table as DropTable+CreateTable+ events if 
renamed/moved across databases.
Request review from [~thejas]/[~daijy]/[~anishek]!

> Support replication for rename/move table across database
> -
>
> Key: HIVE-17527
> URL: https://issues.apache.org/jira/browse/HIVE-17527
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, pull-request-available, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17527.01.patch
>
>
> Rename/move table across database should be supported for replication. The 
> scenario is as follows.
> 1. Create 2 databases (db1 and db2) in source cluster.
> 2. Create the table db1.tbl1.
> 3. Run bootstrap replication for db1 and db2 to target cluster.
> 4. Rename db1.tbl1 to db2.tbl1 in source.
> 5. Run incremental replication for both db1 and db2.
> - db1 dump missed the rename table operation as no event is generated for 
> db1. So, table exist after load.
> - db2 load skips the rename event as the source table is missing in target.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17511) Error while populating orc cache in llap

2017-09-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165167#comment-16165167
 ] 

Sergey Shelukhin commented on HIVE-17511:
-

Looks like there's some weird race condition just before the first errors due 
problematic cache buffer appears, in 2 different threads. Looking...

> Error while populating orc cache in llap
> 
>
> Key: HIVE-17511
> URL: https://issues.apache.org/jira/browse/HIVE-17511
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Ashutosh Chauhan
>Assignee: Sergey Shelukhin
>
> Observed that while querying an error is thrown while loading cache in llap 
> daemons



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15212) merge branch into master

2017-09-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165163#comment-16165163
 ] 

Sergey Shelukhin commented on HIVE-15212:
-

Nm, looks like the logic takes the directory name and doesn't actually rely on 
its exact format. We'll see

> merge branch into master
> 
>
> Key: HIVE-15212
> URL: https://issues.apache.org/jira/browse/HIVE-15212
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15212.01.patch, HIVE-15212.02.patch, 
> HIVE-15212.03.patch, HIVE-15212.04.patch, HIVE-15212.05.patch, 
> HIVE-15212.06.patch, HIVE-15212.07.patch, HIVE-15212.08.patch, 
> HIVE-15212.09.patch, HIVE-15212.10.patch, HIVE-15212.11.patch, 
> HIVE-15212.12.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17422) Skip non-native/temporary tables for all major table/partition related scenarios

2017-09-13 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17422:
--
Status: Patch Available  (was: Open)

> Skip non-native/temporary tables for all major table/partition related 
> scenarios
> 
>
> Key: HIVE-17422
> URL: https://issues.apache.org/jira/browse/HIVE-17422
> Project: Hive
>  Issue Type: Improvement
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17422.1.patch, HIVE-17422.2.patch
>
>
> Currently during incremental dump, the non-native/temporary table info is 
> partially dumped in metadata file and will be ignored later by the repl load. 
> We can optimize it by moving the check (whether the table should be exported 
> or not) earlier so that we don't save any info to dump file for such types of 
> tables. CreateTableHandler already has this optimization, so we just need to 
> apply similar logic to other scenarios.
> The change is to apply the EximUtil.shouldExportTable check to all scenarios 
> (e.g. alter table) that calls into the common dump method. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17513) Refactor PathUtils to not contain instance fields

2017-09-13 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165157#comment-16165157
 ] 

Tao Li commented on HIVE-17513:
---

Test result looks good.

> Refactor PathUtils to not contain instance fields
> -
>
> Key: HIVE-17513
> URL: https://issues.apache.org/jira/browse/HIVE-17513
> Project: Hive
>  Issue Type: Improvement
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
> Attachments: HIVE-17513.1.patch
>
>
> This util class should just provide the static helper methods.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15212) merge branch into master

2017-09-13 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15212:

Attachment: HIVE-15212.12.patch

Rebasing again. Doesn't look like it broke any tests (that I ran) despite the 
union prefix change. I thought I relied on that somewhere. Need to dbl check. 
For now, at least I'd get some results here.

> merge branch into master
> 
>
> Key: HIVE-15212
> URL: https://issues.apache.org/jira/browse/HIVE-15212
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15212.01.patch, HIVE-15212.02.patch, 
> HIVE-15212.03.patch, HIVE-15212.04.patch, HIVE-15212.05.patch, 
> HIVE-15212.06.patch, HIVE-15212.07.patch, HIVE-15212.08.patch, 
> HIVE-15212.09.patch, HIVE-15212.10.patch, HIVE-15212.11.patch, 
> HIVE-15212.12.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work stopped] (HIVE-17527) Support replication for rename/move table across database

2017-09-13 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17527 stopped by Sankar Hariappan.
---
> Support replication for rename/move table across database
> -
>
> Key: HIVE-17527
> URL: https://issues.apache.org/jira/browse/HIVE-17527
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, pull-request-available, replication
> Fix For: 3.0.0
>
>
> Rename/move table across database should be supported for replication. The 
> scenario is as follows.
> 1. Create 2 databases (db1 and db2) in source cluster.
> 2. Create the table db1.tbl1.
> 3. Run bootstrap replication for db1 and db2 to target cluster.
> 4. Rename db1.tbl1 to db2.tbl1 in source.
> 5. Run incremental replication for both db1 and db2.
> - db1 dump missed the rename table operation as no event is generated for 
> db1. So, table exist after load.
> - db2 load skips the rename event as the source table is missing in target.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17527) Support replication for rename/move table across database

2017-09-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165117#comment-16165117
 ] 

ASF GitHub Bot commented on HIVE-17527:
---

GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/250

HIVE-17527: Support replication for rename/move table across database



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-17527

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/250.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #250


commit 223682c15e418b5cfbfecf71575229722f3dca25
Author: Sankar Hariappan 
Date:   2017-09-13T19:03:30Z

HIVE-17527: Support replication for rename/move table across database




> Support replication for rename/move table across database
> -
>
> Key: HIVE-17527
> URL: https://issues.apache.org/jira/browse/HIVE-17527
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, pull-request-available, replication
> Fix For: 3.0.0
>
>
> Rename/move table across database should be supported for replication. The 
> scenario is as follows.
> 1. Create 2 databases (db1 and db2) in source cluster.
> 2. Create the table db1.tbl1.
> 3. Run bootstrap replication for db1 and db2 to target cluster.
> 4. Rename db1.tbl1 to db2.tbl1 in source.
> 5. Run incremental replication for both db1 and db2.
> - db1 dump missed the rename table operation as no event is generated for 
> db1. So, table exist after load.
> - db2 load skips the rename event as the source table is missing in target.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17527) Support replication for rename/move table across database

2017-09-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-17527:
--
Labels: DR pull-request-available replication  (was: DR replication)

> Support replication for rename/move table across database
> -
>
> Key: HIVE-17527
> URL: https://issues.apache.org/jira/browse/HIVE-17527
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, pull-request-available, replication
> Fix For: 3.0.0
>
>
> Rename/move table across database should be supported for replication. The 
> scenario is as follows.
> 1. Create 2 databases (db1 and db2) in source cluster.
> 2. Create the table db1.tbl1.
> 3. Run bootstrap replication for db1 and db2 to target cluster.
> 4. Rename db1.tbl1 to db2.tbl1 in source.
> 5. Run incremental replication for both db1 and db2.
> - db1 dump missed the rename table operation as no event is generated for 
> db1. So, table exist after load.
> - db2 load skips the rename event as the source table is missing in target.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17488) Move first set of classes to standalone metastore

2017-09-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165107#comment-16165107
 ] 

Hive QA commented on HIVE-17488:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886740/HIVE-17488.3.patch

{color:green}SUCCESS:{color} +1 due to 10 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11040 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[drop_table_failure2]
 (batchId=89)
org.apache.hadoop.hive.ql.TestAcidOnTez.testCtasTezUnion (batchId=215)
org.apache.hadoop.hive.ql.TestAcidOnTez.testNonStandardConversion01 
(batchId=215)
org.apache.hadoop.hive.ql.parse.TestExportImport.dataImportAfterMetadataOnlyImport
 (batchId=218)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6801/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6801/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6801/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886740 - PreCommit-HIVE-Build

> Move first set of classes to standalone metastore
> -
>
> Key: HIVE-17488
> URL: https://issues.apache.org/jira/browse/HIVE-17488
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17488.2.patch, HIVE-17488.3.patch, HIVE-17488.patch
>
>
> There are a whole set of classes that can be moved with few changes other 
> than the config file, shims, etc.  This task will move those classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17528) Add more q-tests for Hive-on-Spark with Parquet vectorized reader

2017-09-13 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-17528:
--

Assignee: (was: Vihang Karajgaonkar)

> Add more q-tests for Hive-on-Spark with Parquet vectorized reader
> -
>
> Key: HIVE-17528
> URL: https://issues.apache.org/jira/browse/HIVE-17528
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>
> Most of the vectorization related q-tests operate on ORC tables using Tez. It 
> would be good to add more coverage on a different combination of engine and 
> file-format. We can model existing q-tests using parquet tables and run it 
> using TestSparkCliDriver



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17521) Improve defaults for few runtime configs

2017-09-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17521:

Status: Patch Available  (was: Open)

> Improve defaults for few runtime configs
> 
>
> Key: HIVE-17521
> URL: https://issues.apache.org/jira/browse/HIVE-17521
> Project: Hive
>  Issue Type: Task
>  Components: Configuration
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Attachments: HIVE-17521.2.patch, HIVE-17521.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17521) Improve defaults for few runtime configs

2017-09-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17521:

Attachment: HIVE-17521.2.patch

> Improve defaults for few runtime configs
> 
>
> Key: HIVE-17521
> URL: https://issues.apache.org/jira/browse/HIVE-17521
> Project: Hive
>  Issue Type: Task
>  Components: Configuration
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Attachments: HIVE-17521.2.patch, HIVE-17521.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17521) Improve defaults for few runtime configs

2017-09-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17521:

Status: Open  (was: Patch Available)

> Improve defaults for few runtime configs
> 
>
> Key: HIVE-17521
> URL: https://issues.apache.org/jira/browse/HIVE-17521
> Project: Hive
>  Issue Type: Task
>  Components: Configuration
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Attachments: HIVE-17521.2.patch, HIVE-17521.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17479) Staging directories do not get cleaned up for update/delete queries

2017-09-13 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17479:
---
Attachment: HIVE-17479.02.patch

> Staging directories do not get cleaned up for update/delete queries
> ---
>
> Key: HIVE-17479
> URL: https://issues.apache.org/jira/browse/HIVE-17479
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17479.01.patch, HIVE-17479.02.patch, 
> HIVE-17479.patch
>
>
> When these queries are internally rewritten, a new context is created with a 
> new execution id. This id is used to create the scratch directories. However, 
> only the original context is cleared, and thus the directories created with 
> the original execution id.
> The solution is to pass the execution id to the new context when the queries 
> are internally rewritten.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17527) Support replication for rename/move table across database

2017-09-13 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17527:

Description: 
Rename/move table across database should be supported for replication. The 
scenario is as follows.

1. Create 2 databases (db1 and db2) in source cluster.
2. Create the table db1.tbl1.
3. Run bootstrap replication for db1 and db2 to target cluster.
4. Rename db1.tbl1 to db2.tbl1 in source.
5. Run incremental replication for both db1 and db2.
- db1 dump missed the rename table operation as no event is generated for db1. 
So, table exist after load.
- db2 load skips the rename event as the source table is missing in target.

  was:
Rename/move table across database should be supported for replication. The 
scenario is as follows.

1. Create 2 databases (db1 and db2) in source cluster.
2. Create the table db1.tbl1.
3. Run bootstrap replication for db1 and db2 to target cluster.
4. Rename db1.tbl1 to db2.tbl1 in source.
5. Run incremental replication for both db1 and db2.
- db1 dump fails telling rename across databases is not supported.
- db2 dump missed the table as no event is generated when moved to db2. 


> Support replication for rename/move table across database
> -
>
> Key: HIVE-17527
> URL: https://issues.apache.org/jira/browse/HIVE-17527
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
>
> Rename/move table across database should be supported for replication. The 
> scenario is as follows.
> 1. Create 2 databases (db1 and db2) in source cluster.
> 2. Create the table db1.tbl1.
> 3. Run bootstrap replication for db1 and db2 to target cluster.
> 4. Rename db1.tbl1 to db2.tbl1 in source.
> 5. Run incremental replication for both db1 and db2.
> - db1 dump missed the rename table operation as no event is generated for 
> db1. So, table exist after load.
> - db2 load skips the rename event as the source table is missing in target.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17502) Reuse of default session should not throw an exception in LLAP w/ Tez

2017-09-13 Thread Thai Bui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165060#comment-16165060
 ] 

Thai Bui commented on HIVE-17502:
-

[~sershe] Thanks for the prompt reply. I see that the SessionState.get keeps a 
thread's static SessionState object which gets set/unset every time a session 
is returned to the pool. I'm not that familiar with Hive/TezPool/TezTask 
session code base, but given that it is thread local, it shouldn't impact other 
thread's and/or client, correct?

Also, to help us understand my use case a little bit better, I will try to 
elaborate what I'm trying to do a little bit better.

Let's say a client C1 requests for a new session, it is handled by thread T1 in 
HS2, which yields session S1 from the pool. The same client C1 wants to make a 
parallel request and reuses S1's in the request. Now, since S1 is still being 
used, the session pool should skip S1 and returns S2 from the pool, given that 
S2 is unused. 

Since, S2 should be created by another thread T2 since T1 was busy executing 
the TezTask in the first request in HS2. Given that SessionState.get is 
thread's local to only T1, T2 should gets a different SessionState.get, which 
yield a different SessionState. But I think that is not the case, 
SessionState.get will always yield the same currently used SessionState of the 
client C1, which will be unset and that is problem, correct?

If that's the case, what's the best course of actions that I can take to make 
parallel query execution of the same client in a session pool happen?

> Reuse of default session should not throw an exception in LLAP w/ Tez
> -
>
> Key: HIVE-17502
> URL: https://issues.apache.org/jira/browse/HIVE-17502
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Tez
>Affects Versions: 2.1.1, 2.2.0
> Environment: HDP 2.6.1.0-129, Hue 4
>Reporter: Thai Bui
>Assignee: Thai Bui
>
> Hive2 w/ LLAP on Tez doesn't allow a currently used, default session to be 
> skipped mostly because of this line 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L365.
> However, some clients such as Hue 4, allow multiple sessions to be used per 
> user. Under this configuration, a Thrift client will send a request to either 
> reuse or open a new session. The reuse request could include the session id 
> of a currently used snippet being executed in Hue, this causes HS2 to throw 
> an exception:
> {noformat}
> 2017-09-10T17:51:36,548 INFO  [Thread-89]: tez.TezSessionPoolManager 
> (TezSessionPoolManager.java:canWorkWithSameSession(512)) - The current user: 
> hive, session user: hive
> 2017-09-10T17:51:36,549 ERROR [Thread-89]: exec.Task 
> (TezTask.java:execute(232)) - Failed to execute tez graph.
> org.apache.hadoop.hive.ql.metadata.HiveException: The pool session 
> sessionId=5b61a578-6336-41c5-860d-9838166f97fe, queueName=llap, user=hive, 
> doAs=false, isOpen=true, isDefault=true, expires in 591015330ms should have 
> been returned to the pool
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.canWorkWithSameSession(TezSessionPoolManager.java:534)
>  ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.getSession(TezSessionPoolManager.java:544)
>  ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:147) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79) 
> [hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
> {noformat}
> Note that every query is issued as a single 'hive' user to share the LLAP 
> daemon pool, a set of pre-determined number of AMs is initialized at setup 
> time. Thus, HS2 should allow new sessions from a Thrift client to be used out 
> of the pool, or an existing session to be skipped and an unused session from 
> the pool to be returned. The logic to throw an exception in the  
> `canWorkWithSameSession` doesn't make sense to me.
> I have a solution to fix this issue in my local branch at 
> https://github.com/thaibui/hive/commit/078a521b9d0906fe6c0323b63e567f6eee2f3a70.
>  When applied, the log will become like so
> {noformat}
> 2017-09-10T09:15:33,578 INFO  [Thread-239]: tez.TezSessionPoolManager 
> (TezSessionPoolManager.java:canWorkWithSameSession(533)) - Skipping default 
> session 

[jira] [Commented] (HIVE-17472) Drop-partition for multi-level partition fails, if data does not exist.

2017-09-13 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165035#comment-16165035
 ] 

Mithun Radhakrishnan commented on HIVE-17472:
-

{{master}} seems to fair better with this patch. Still waiting on 
{{branch-2.2}} and {{branch-2}}.

> Drop-partition for multi-level partition fails, if data does not exist.
> ---
>
> Key: HIVE-17472
> URL: https://issues.apache.org/jira/browse/HIVE-17472
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17472.1.patch, HIVE-17472.2-branch-2.patch, 
> HIVE-17472.2.patch, HIVE-17472.3-branch-2.2.patch, 
> HIVE-17472.3-branch-2.patch, HIVE-17472.3.patch, 
> HIVE-17472.4-branch-2.2.patch, HIVE-17472.4-branch-2.patch, HIVE-17472.4.patch
>
>
> Raising this on behalf of [~cdrome] and [~selinazh]. 
> Here's how to reproduce the problem:
> {code:sql}
> CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY ( dt STRING, 
> region STRING ) STORED AS RCFILE LOCATION '/tmp/foobar';
> ALTER TABLE foobar ADD PARTITION ( dt='1', region='A' ) ;
> dfs -rm -R -skipTrash /tmp/foobar/dt=1;
> ALTER TABLE foobar DROP PARTITION ( dt='1' );
> {code}
> This causes a client-side error as follows:
> {code}
> 15/02/26 23:08:32 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unknown error. Please check 
> logs.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17472) Drop-partition for multi-level partition fails, if data does not exist.

2017-09-13 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17472:

Status: Open  (was: Patch Available)

> Drop-partition for multi-level partition fails, if data does not exist.
> ---
>
> Key: HIVE-17472
> URL: https://issues.apache.org/jira/browse/HIVE-17472
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17472.1.patch, HIVE-17472.2-branch-2.patch, 
> HIVE-17472.2.patch, HIVE-17472.3-branch-2.2.patch, 
> HIVE-17472.3-branch-2.patch, HIVE-17472.3.patch, 
> HIVE-17472.4-branch-2.2.patch, HIVE-17472.4-branch-2.patch, HIVE-17472.4.patch
>
>
> Raising this on behalf of [~cdrome] and [~selinazh]. 
> Here's how to reproduce the problem:
> {code:sql}
> CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY ( dt STRING, 
> region STRING ) STORED AS RCFILE LOCATION '/tmp/foobar';
> ALTER TABLE foobar ADD PARTITION ( dt='1', region='A' ) ;
> dfs -rm -R -skipTrash /tmp/foobar/dt=1;
> ALTER TABLE foobar DROP PARTITION ( dt='1' );
> {code}
> This causes a client-side error as follows:
> {code}
> 15/02/26 23:08:32 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unknown error. Please check 
> logs.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17472) Drop-partition for multi-level partition fails, if data does not exist.

2017-09-13 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17472:

Status: Patch Available  (was: Open)

> Drop-partition for multi-level partition fails, if data does not exist.
> ---
>
> Key: HIVE-17472
> URL: https://issues.apache.org/jira/browse/HIVE-17472
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17472.1.patch, HIVE-17472.2-branch-2.patch, 
> HIVE-17472.2.patch, HIVE-17472.3-branch-2.2.patch, 
> HIVE-17472.3-branch-2.patch, HIVE-17472.3.patch, 
> HIVE-17472.4-branch-2.2.patch, HIVE-17472.4-branch-2.patch, HIVE-17472.4.patch
>
>
> Raising this on behalf of [~cdrome] and [~selinazh]. 
> Here's how to reproduce the problem:
> {code:sql}
> CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY ( dt STRING, 
> region STRING ) STORED AS RCFILE LOCATION '/tmp/foobar';
> ALTER TABLE foobar ADD PARTITION ( dt='1', region='A' ) ;
> dfs -rm -R -skipTrash /tmp/foobar/dt=1;
> ALTER TABLE foobar DROP PARTITION ( dt='1' );
> {code}
> This causes a client-side error as follows:
> {code}
> 15/02/26 23:08:32 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unknown error. Please check 
> logs.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17479) Staging directories do not get cleaned up for update/delete queries

2017-09-13 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17479:
---
Attachment: HIVE-17479.01.patch

> Staging directories do not get cleaned up for update/delete queries
> ---
>
> Key: HIVE-17479
> URL: https://issues.apache.org/jira/browse/HIVE-17479
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17479.01.patch, HIVE-17479.patch
>
>
> When these queries are internally rewritten, a new context is created with a 
> new execution id. This id is used to create the scratch directories. However, 
> only the original context is cleared, and thus the directories created with 
> the original execution id.
> The solution is to pass the execution id to the new context when the queries 
> are internally rewritten.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17528) Add more q-tests for Hive-on-Spark with Parquet vectorized reader

2017-09-13 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-17528:
--


> Add more q-tests for Hive-on-Spark with Parquet vectorized reader
> -
>
> Key: HIVE-17528
> URL: https://issues.apache.org/jira/browse/HIVE-17528
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>
> Most of the vectorization related q-tests operate on ORC tables using Tez. It 
> would be good to add more coverage on a different combination of engine and 
> file-format. We can model existing q-tests using parquet tables and run it 
> using TestSparkCliDriver



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17465) Statistics: Drill-down filters don't reduce row-counts progressively

2017-09-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165002#comment-16165002
 ] 

Hive QA commented on HIVE-17465:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886731/HIVE-17465.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 135 failed/errored test(s), 11041 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_deep_filters]
 (batchId=86)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_filter] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer8] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[filter_cond_pushdown] 
(batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[fold_eq_with_case_when] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[folder_predicate] 
(batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_grouping_sets_grouping]
 (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_multi_single_reducer3]
 (batchId=45)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_unused] 
(batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join34] (batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join35] (batchId=63)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join45] (batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multi_insert_move_tasks_share_dependencies]
 (batchId=53)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pcr] (batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pcs] (batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd2] (batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_gby_join] 
(batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join2] (batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join3] (batchId=18)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_transform] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppr_allchildsarenull] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[push_or] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoin_mapjoin10] 
(batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_notin_having] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_mapjoin] 
(batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_windowing_multipartitioning]
 (batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_10] 
(batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_11] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_12] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_13] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_14] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_15] 
(batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_16] 
(batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_17] 
(batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_2] 
(batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_3] 
(batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_4] 
(batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_5] 
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_6] 
(batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_7] 
(batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_9] 
(batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_case] 
(batchId=55)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_7]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_partition_pruning]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_opt_vectorization]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization]
 (batchId=156)

[jira] [Commented] (HIVE-17430) Add LOAD DATA test for blobstores

2017-09-13 Thread Yuzhou Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164969#comment-16164969
 ] 

Yuzhou Sun commented on HIVE-17430:
---

Thank you [~spena] !

> Add LOAD DATA test for blobstores
> -
>
> Key: HIVE-17430
> URL: https://issues.apache.org/jira/browse/HIVE-17430
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Yuzhou Sun
>Assignee: Yuzhou Sun
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17430.patch
>
>
> This patch introduces load_data.q regression tests into the hive-blobstore 
> qtest module.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17521) Improve defaults for few runtime configs

2017-09-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164883#comment-16164883
 ] 

Hive QA commented on HIVE-17521:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886727/HIVE-17521.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 22 failed/errored test(s), 11039 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_semijoin_user_level]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_2]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_3]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_sw]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mapjoin_hint]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mergejoin] 
(batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[semijoin_hint]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_dynamic_semijoin_reduction2]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_dynamic_semijoin_reduction]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[drop_table_failure2]
 (batchId=89)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=234)
org.apache.hadoop.hive.ql.TestAcidOnTez.testCtasTezUnion (batchId=215)
org.apache.hadoop.hive.ql.TestAcidOnTez.testNonStandardConversion01 
(batchId=215)
org.apache.hadoop.hive.ql.parse.TestParseNegativeDriver.testCliDriver[wrong_distinct2]
 (batchId=238)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6799/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6799/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6799/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 22 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886727 - PreCommit-HIVE-Build

> Improve defaults for few runtime configs
> 
>
> Key: HIVE-17521
> URL: https://issues.apache.org/jira/browse/HIVE-17521
> Project: Hive
>  Issue Type: Task
>  Components: Configuration
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Attachments: HIVE-17521.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-17527) Support replication for rename/move table across database

2017-09-13 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17527 started by Sankar Hariappan.
---
> Support replication for rename/move table across database
> -
>
> Key: HIVE-17527
> URL: https://issues.apache.org/jira/browse/HIVE-17527
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
>
> Rename/move table across database should be supported for replication. The 
> scenario is as follows.
> 1. Create 2 databases (db1 and db2) in source cluster.
> 2. Create the table db1.tbl1.
> 3. Run bootstrap replication for db1 and db2 to target cluster.
> 4. Rename db1.tbl1 to db2.tbl1 in source.
> 5. Run incremental replication for both db1 and db2.
> - db1 dump fails telling rename across databases is not supported.
> - db2 dump missed the table as no event is generated when moved to db2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17527) Support replication for rename/move table across database

2017-09-13 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan reassigned HIVE-17527:
---


> Support replication for rename/move table across database
> -
>
> Key: HIVE-17527
> URL: https://issues.apache.org/jira/browse/HIVE-17527
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
>
> Rename/move table across database should be supported for replication. The 
> scenario is as follows.
> 1. Create 2 databases (db1 and db2) in source cluster.
> 2. Create the table db1.tbl1.
> 3. Run bootstrap replication for db1 and db2 to target cluster.
> 4. Rename db1.tbl1 to db2.tbl1 in source.
> 5. Run incremental replication for both db1 and db2.
> - db1 dump fails telling rename across databases is not supported.
> - db2 dump missed the table as no event is generated when moved to db2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17496) Bootstrap repl is not cleaning up staging dirs

2017-09-13 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164809#comment-16164809
 ] 

Tao Li commented on HIVE-17496:
---

Investigating the "testDeleteStagingDir" failure which is the added test for 
the staging dir cleanup.

> Bootstrap repl is not cleaning up staging dirs
> --
>
> Key: HIVE-17496
> URL: https://issues.apache.org/jira/browse/HIVE-17496
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17496.1.patch, HIVE-17496.2.patch, 
> HIVE-17496.3.patch
>
>
> This will put more pressure on the HDFS file limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17430) Add LOAD DATA test for blobstores

2017-09-13 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-17430:
---
   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Thanks [~yuzhousun] for your contribution.
I committed to master and branch-2

> Add LOAD DATA test for blobstores
> -
>
> Key: HIVE-17430
> URL: https://issues.apache.org/jira/browse/HIVE-17430
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Yuzhou Sun
>Assignee: Yuzhou Sun
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17430.patch
>
>
> This patch introduces load_data.q regression tests into the hive-blobstore 
> qtest module.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17494) Bootstrap REPL DUMP throws exception if a partitioned table is dropped while reading partitions.

2017-09-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-17494:
--
Labels: DR pull-request-available replication  (was: DR replication)

> Bootstrap REPL DUMP throws exception if a partitioned table is dropped while 
> reading partitions.
> 
>
> Key: HIVE-17494
> URL: https://issues.apache.org/jira/browse/HIVE-17494
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, pull-request-available, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17494.01.patch
>
>
> When a table is dropped between fetching table and fetching partitions, then 
> bootstrap dump throws exception.
> 1. Fetch table names.
> 2. Get table
> 3. Dump table object
> 4. Drop table from another thread.
> 5. Fetch partitions (throws exception from fireReadTablePreEvent)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17494) Bootstrap REPL DUMP throws exception if a partitioned table is dropped while reading partitions.

2017-09-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164770#comment-16164770
 ] 

ASF GitHub Bot commented on HIVE-17494:
---

Github user sankarh closed the pull request at:

https://github.com/apache/hive/pull/248


> Bootstrap REPL DUMP throws exception if a partitioned table is dropped while 
> reading partitions.
> 
>
> Key: HIVE-17494
> URL: https://issues.apache.org/jira/browse/HIVE-17494
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, pull-request-available, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17494.01.patch
>
>
> When a table is dropped between fetching table and fetching partitions, then 
> bootstrap dump throws exception.
> 1. Fetch table names.
> 2. Get table
> 3. Dump table object
> 4. Drop table from another thread.
> 5. Fetch partitions (throws exception from fireReadTablePreEvent)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17428) REPL LOAD of ALTER_PARTITION event doesn't create import tasks if the partition doesn't exist during analyze phase.

2017-09-13 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17428:

Status: Patch Available  (was: Open)

> REPL LOAD of ALTER_PARTITION event doesn't create import tasks if the 
> partition doesn't exist during analyze phase.
> ---
>
> Key: HIVE-17428
> URL: https://issues.apache.org/jira/browse/HIVE-17428
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17428.01.patch, HIVE-17428.02.patch, 
> HIVE-17428.03.patch
>
>
> If the incremental dump event sequence have ADD_PARTITION followed by 
> ALTER_PARTITION doesn't create any task for ALTER_PARTITION event as the 
> partition doesn't exist during analyze phase. Due to this REPL STATUS returns 
> wrong last repl ID.
> Scenario:
> 1. Create DB
> 2. Create partitioned table.
> 3. Bootstrap dump and load
> 4. Insert into table to a dynamically created partition. - This insert 
> generate ADD_PARTITION and ALTER_PARTITION events.
> 5. Incremental dump and load.
> - Load will be successful.
> - But the last repl ID set was incorrect as ALTER_PARTITION event was never 
> applied.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17428) REPL LOAD of ALTER_PARTITION event doesn't create import tasks if the partition doesn't exist during analyze phase.

2017-09-13 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17428:

Attachment: HIVE-17428.03.patch

Added 03.patch after rebasing with master.

> REPL LOAD of ALTER_PARTITION event doesn't create import tasks if the 
> partition doesn't exist during analyze phase.
> ---
>
> Key: HIVE-17428
> URL: https://issues.apache.org/jira/browse/HIVE-17428
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17428.01.patch, HIVE-17428.02.patch, 
> HIVE-17428.03.patch
>
>
> If the incremental dump event sequence have ADD_PARTITION followed by 
> ALTER_PARTITION doesn't create any task for ALTER_PARTITION event as the 
> partition doesn't exist during analyze phase. Due to this REPL STATUS returns 
> wrong last repl ID.
> Scenario:
> 1. Create DB
> 2. Create partitioned table.
> 3. Bootstrap dump and load
> 4. Insert into table to a dynamically created partition. - This insert 
> generate ADD_PARTITION and ALTER_PARTITION events.
> 5. Incremental dump and load.
> - Load will be successful.
> - But the last repl ID set was incorrect as ALTER_PARTITION event was never 
> applied.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164761#comment-16164761
 ] 

Hive QA commented on HIVE-17466:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886726/HIVE-17466.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11039 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=156)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[drop_table_failure2]
 (batchId=89)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=234)
org.apache.hadoop.hive.ql.TestAcidOnTez.testCtasTezUnion (batchId=215)
org.apache.hadoop.hive.ql.TestAcidOnTez.testNonStandardConversion01 
(batchId=215)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6798/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6798/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6798/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886726 - PreCommit-HIVE-Build

> Metastore API to list unique partition-key-value combinations
> -
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17466.1.patch, HIVE-17466.2-branch-2.patch, 
> HIVE-17466.2.patch, HIVE-17466.3.patch
>
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch 
> workflows based on the availability of table/partitions. Partitions are 
> currently discovered by listing partitions using (what boils down to) 
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
> given that {{Partition}} objects are heavyweight and carry redundant 
> information. The alternative is to use partition-names, which will need 
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been 
> published already, it would be preferable to have an API that pushed down 
> part-key extraction into the {{RawStore}} layer, and returned key-values as 
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM 
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17428) REPL LOAD of ALTER_PARTITION event doesn't create import tasks if the partition doesn't exist during analyze phase.

2017-09-13 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17428:

Status: Open  (was: Patch Available)

> REPL LOAD of ALTER_PARTITION event doesn't create import tasks if the 
> partition doesn't exist during analyze phase.
> ---
>
> Key: HIVE-17428
> URL: https://issues.apache.org/jira/browse/HIVE-17428
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17428.01.patch, HIVE-17428.02.patch, 
> HIVE-17428.03.patch
>
>
> If the incremental dump event sequence have ADD_PARTITION followed by 
> ALTER_PARTITION doesn't create any task for ALTER_PARTITION event as the 
> partition doesn't exist during analyze phase. Due to this REPL STATUS returns 
> wrong last repl ID.
> Scenario:
> 1. Create DB
> 2. Create partitioned table.
> 3. Bootstrap dump and load
> 4. Insert into table to a dynamically created partition. - This insert 
> generate ADD_PARTITION and ALTER_PARTITION events.
> 5. Incremental dump and load.
> - Load will be successful.
> - But the last repl ID set was incorrect as ALTER_PARTITION event was never 
> applied.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17523) Insert into druid table hangs Hive server2 in an infinit loop

2017-09-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164759#comment-16164759
 ] 

Ashutosh Chauhan commented on HIVE-17523:
-

Mostly looks good. Few minor comments on RB

> Insert into druid table  hangs Hive server2 in an infinit loop
> --
>
> Key: HIVE-17523
> URL: https://issues.apache.org/jira/browse/HIVE-17523
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: Nishant Bangarwa
>  Labels: pull-request-available
> Attachments: HIVE-17523.patch
>
>
> Inserting data via insert into table backed by druid can lead to a Hive 
> server hang.
> This is due to some bug in the naming of druid segments partitions.
> To reproduce the issue 
> {code}
> drop table login_hive;
> create table login_hive(`timecolumn` timestamp, `userid` string, `num_l` 
> double);
> insert into login_hive values ('2015-01-01 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-01 01:00:00', 'user2', 4);
> insert into login_hive values ('2015-01-01 02:00:00', 'user3', 2);
> insert into login_hive values ('2015-01-02 00:00:00', 'user1', 1);
> insert into login_hive values ('2015-01-02 01:00:00', 'user2', 2);
> insert into login_hive values ('2015-01-02 02:00:00', 'user3', 8);
> insert into login_hive values ('2015-01-03 00:00:00', 'user1', 5);
> insert into login_hive values ('2015-01-03 01:00:00', 'user2', 9);
> insert into login_hive values ('2015-01-03 04:00:00', 'user3', 2);
> insert into login_hive values ('2015-03-09 00:00:00', 'user3', 5);
> insert into login_hive values ('2015-03-09 01:00:00', 'user1', 0);
> insert into login_hive values ('2015-03-09 05:00:00', 'user2', 0);
> drop table login_druid;
> CREATE TABLE login_druid
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "druid_login_test_tmp", 
> "druid.segment.granularity" = "DAY", "druid.query.granularity" = "HOUR")
> AS
> select `timecolumn` as `__time`, `userid`, `num_l` FROM login_hive;
> select * FROM login_druid;
> insert into login_druid values ('2015-03-09 05:00:00', 'user4', 0); 
> {code}
> This patch unifies the logic of pushing and segments naming by using Druid 
> data segment pusher as much as possible.
> This patch also has some minor code refactoring and test enhancements.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17526) Disable conversion to ACID if table has _copy_N files on branch-1

2017-09-13 Thread Daniel Voros (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros updated HIVE-17526:

Attachment: HIVE-17526.1-branch-1.patch

Attaching patch #1. This adds a check to TransactionalValidationListener so it 
won't allow the conversion if there are *_copy_N files in the table.

> Disable conversion to ACID if table has _copy_N files on branch-1
> -
>
> Key: HIVE-17526
> URL: https://issues.apache.org/jira/browse/HIVE-17526
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Voros
>Assignee: Daniel Voros
> Fix For: 1.3.0
>
> Attachments: HIVE-17526.1-branch-1.patch
>
>
> As discussed in HIVE-16177, non-ACID to ACID conversion can lead to data loss 
> if the table has *_copy_N files.
> The patch for HIVE-16177 is quite massive and would basically need a 
> reimplementation to apply for branch-1 since the related code paths have 
> diverged a lot. We could disable the conversion to ACID if there are *_copy_N 
> files instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >