[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation

2017-08-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17308:
---
Attachment: HIVE-17308.4.patch

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, 
> HIVE-17308.3.patch, HIVE-17308.4.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation

2017-08-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17308:
---
Status: Open  (was: Patch Available)

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, 
> HIVE-17308.3.patch, HIVE-17308.4.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation

2017-08-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17308:
---
Status: Patch Available  (was: Open)

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, 
> HIVE-17308.3.patch, HIVE-17308.4.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16853) Minor org.apache.hadoop.hive.ql.exec.HashTableSinkOperator Improvement

2017-08-13 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-16853:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks [~belugabehr] for the patch.

> Minor org.apache.hadoop.hive.ql.exec.HashTableSinkOperator Improvement
> --
>
> Key: HIVE-16853
> URL: https://issues.apache.org/jira/browse/HIVE-16853
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.1, 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HIVE-16853.1.patch
>
>
> # Remove custom buffer size for {{BufferedOutputStream}} and rely on JVM 
> default (usually, a larger 8192 these days)
> # Remove needless logger checks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores

2017-08-13 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125157#comment-16125157
 ] 

Rui Li edited comment on HIVE-17292 at 8/14/17 2:40 AM:


[~pvary], thanks for the explanation. Would you mind set the config in 
{{MiniSparkShim}}, so that we can have all the configs in one place?


was (Author: lirui):
[~pvary], thanks for the explanation. Would you mind set the config in code, so 
that we can have all the configs in one place?

> Change TestMiniSparkOnYarnCliDriver test configuration to use the configured 
> cores
> --
>
> Key: HIVE-17292
> URL: https://issues.apache.org/jira/browse/HIVE-17292
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17292.1.patch
>
>
> Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test 
> defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster 
> does not allows the creation of the 3rd container.
> The FairScheduler uses 1GB increments for memory, but the containers would 
> like to use only 512MB. We should change the fairscheduler configuration to 
> use only the requested 512MB



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores

2017-08-13 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125157#comment-16125157
 ] 

Rui Li commented on HIVE-17292:
---

[~pvary], thanks for the explanation. Would you mind set the config in code, so 
that we can have all the configs in one place?

> Change TestMiniSparkOnYarnCliDriver test configuration to use the configured 
> cores
> --
>
> Key: HIVE-17292
> URL: https://issues.apache.org/jira/browse/HIVE-17292
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17292.1.patch
>
>
> Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test 
> defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster 
> does not allows the creation of the 3rd container.
> The FairScheduler uses 1GB increments for memory, but the containers would 
> like to use only 512MB. We should change the fairscheduler configuration to 
> use only the requested 512MB



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-13 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125156#comment-16125156
 ] 

Rui Li commented on HIVE-17287:
---

The groupByKey shuffle uses unbounded memory. You can set 
{{hive.spark.use.groupby.shuffle=false}} to use MR shuffle instead. By default 
the config is true.

> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: query67-fail-at-groupby.png, 
> query67-groupby_shuffle_metric.png
>
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642
> 25.7 G   
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
> {code}
> The skewed table {{store_sales}} caused the failed job. Is there any way to 
> solve the groupby problem of skewed table?  I tried to enable 
> {{hive.groupby.skewindata}} to first divide the data more evenly then start 
> do group by. But the job still hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-13 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125155#comment-16125155
 ] 

liyunzhang_intel commented on HIVE-17287:
-

[~lirui]:
bq.Have you tried hive.spark.use.groupby.shuffle? I think it can avoid 
unbounded mem usage.
  I have not enabled {{hive.spark.use.groupby.shuffle}} in my cluster. Will try 
this configuration later. But why in HiveConf it says "Spark groupByKey 
transformation has better performance but uses unbounded memory". Will this use 
unbounded memory?
bq.For the error you mentioned, I usually disable 
yarn.nodemanager.pmem-check-enabled as a workaround.
have disabled this configuration in my cluster but error still occurred.


> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: query67-fail-at-groupby.png, 
> query67-groupby_shuffle_metric.png
>
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642
> 25.7 G   
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
> {code}
> The skewed table {{store_sales}} caused the failed job. Is there any way to 
> solve the groupby problem of skewed table?  I tried to enable 
> {{hive.groupby.skewindata}} to first divide the data more evenly then start 
> do group by. But the job still hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information

2017-08-13 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125137#comment-16125137
 ] 

Rui Li commented on HIVE-17291:
---

Is this just to avoid unstable test output? If so, it seems we're doing 
something similar in {{QTestUtil}}: 
https://github.com/apache/hive/blob/master/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java#L1188

> Set the number of executors based on config if client does not provide 
> information
> --
>
> Key: HIVE-17291
> URL: https://issues.apache.org/jira/browse/HIVE-17291
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17291.1.patch
>
>
> When calculating the memory and cores and the client does not provide 
> information we should try to use the one provided by default. This can happen 
> on startup, when {{spark.dynamicAllocation.enabled}} is not enabled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-13 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125130#comment-16125130
 ] 

Rui Li commented on HIVE-17287:
---

Have you tried {{hive.spark.use.groupby.shuffle}}? I think it can avoid 
unbounded mem usage.
For the error you mentioned, I usually disable 
{{yarn.nodemanager.pmem-check-enabled}} as a workaround.

> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: query67-fail-at-groupby.png, 
> query67-groupby_shuffle_metric.png
>
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642
> 25.7 G   
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
> {code}
> The skewed table {{store_sales}} caused the failed job. Is there any way to 
> solve the groupby problem of skewed table?  I tried to enable 
> {{hive.groupby.skewindata}} to first divide the data more evenly then start 
> do group by. But the job still hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-13 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125125#comment-16125125
 ] 

liyunzhang_intel commented on HIVE-17287:
-

[~xuefuz]: the memory related error is
{noformat}
Container killed by YARN for exceeding memory limits. 36.1 GB of 33 GB physical 
memory used. Consider boosting spark.yarn.executor.memoryOverhead.
{noformat}
 It showed it exceeded the memory assigned to the task. I can increase the 
value of spark.yarn.executor.memoryOverhead. But i guess even i increase the 
value, the error will appear again as the problem is the key is not even for 
some task in group by operation.

> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: query67-fail-at-groupby.png, 
> query67-groupby_shuffle_metric.png
>
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642
> 25.7 G   
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
> {code}
> The skewed table {{store_sales}} caused the failed job. Is there any way to 
> solve the groupby problem of skewed table?  I tried to enable 
> {{hive.groupby.skewindata}} to first divide the data more evenly then start 
> do group by. But the job still hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17288) LlapOutputFormatService: Increase netty event loop threads

2017-08-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125098#comment-16125098
 ] 

Hive QA commented on HIVE-17288:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881150/HIVE-17288.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11004 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6379/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6379/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6379/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881150 - PreCommit-HIVE-Build

> LlapOutputFormatService: Increase netty event loop threads
> --
>
> Key: HIVE-17288
> URL: https://issues.apache.org/jira/browse/HIVE-17288
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-17288.1.patch
>
>
> Currently it is set to 1 which would be used for parent both acceptor and 
> client groups. It would be good to leave it at default, which sets the number 
> of threads to "number of processors * 2". It can be modified later via 
> {{-Dio.netty.eventLoopThreads}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation

2017-08-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125062#comment-16125062
 ] 

Hive QA commented on HIVE-17308:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881666/HIVE-17308.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 11004 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_alt_syntax] 
(batchId=75)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_cond_pushdown_2] 
(batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_cond_pushdown_4] 
(batchId=79)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDate2 (batchId=183)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=228)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6378/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6378/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6378/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881666 - PreCommit-HIVE-Build

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, 
> HIVE-17308.3.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation

2017-08-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17308:
---
Status: Open  (was: Patch Available)

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, 
> HIVE-17308.3.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation

2017-08-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17308:
---
Attachment: HIVE-17308.3.patch

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, 
> HIVE-17308.3.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation

2017-08-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17308:
---
Status: Patch Available  (was: Open)

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, 
> HIVE-17308.3.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation

2017-08-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125044#comment-16125044
 ] 

Hive QA commented on HIVE-17308:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881665/HIVE-17308.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6377/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6377/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6377/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-08-13 20:53:22.070
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-6377/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-08-13 20:53:22.073
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 0f9990b HIVE-17301: Make JSONMessageFactory.getTObj method 
thread safe
+ git clean -f -d
Removing ql/src/test/queries/clientpositive/alter_partition_onto_nocurrent_db.q
Removing 
ql/src/test/results/clientpositive/alter_partition_onto_nocurrent_db.q.out
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 0f9990b HIVE-17301: Make JSONMessageFactory.getTObj method 
thread safe
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-08-13 20:53:28.659
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: a/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: No such 
file or directory
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HivePlannerContext.java:
 No such file or directory
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdSelectivity.java:
 No such file or directory
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java: No 
such file or directory
error: 
a/ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/TestCBORuleFiredOnlyOnce.java:
 No such file or directory
error: a/ql/src/test/results/clientpositive/join_alt_syntax.q.out: No such file 
or directory
error: a/ql/src/test/results/clientpositive/join_cond_pushdown_2.q.out: No such 
file or directory
error: a/ql/src/test/results/clientpositive/join_cond_pushdown_4.q.out: No such 
file or directory
error: a/ql/src/test/results/clientpositive/perf/query17.q.out: No such file or 
directory
error: a/ql/src/test/results/clientpositive/perf/query24.q.out: No such file or 
directory
error: a/ql/src/test/results/clientpositive/perf/query25.q.out: No such file or 
directory
error: a/ql/src/test/results/clientpositive/perf/query29.q.out: No such file or 
directory
error: a/ql/src/test/results/clientpositive/perf/query50.q.out: No such file or 
directory
error: a/ql/src/test/results/clientpositive/perf/query54.q.out: No such file or 
directory
error: a/ql/src/test/results/clientpositive/perf/query64.q.out: No such file or 
directory
error: a/ql/src/test/results/clientpositive/perf/query72.q.out: No such file or 
directory
error: a/ql/src/test/results/clientpositive/perf/query85.q.out: No such file or 
directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881665 - PreCommit-HIVE-Build

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  

[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation

2017-08-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17308:
---
Status: Open  (was: Patch Available)

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation

2017-08-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17308:
---
Attachment: HIVE-17308.2.patch

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation

2017-08-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17308:
---
Status: Patch Available  (was: Open)

> Improvement in join cardinality estimation
> --
>
> Key: HIVE-17308
> URL: https://issues.apache.org/jira/browse/HIVE-17308
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch
>
>
> Currently during logical planning join cardinality is estimated assuming no 
> correlation among join keys (This estimation is done using exponential 
> backoff). Physical planning on the other hand consider correlation for multi 
> keys and uses different estimation. We should consider correlation during 
> logical planning as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-13 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17100 started by Sankar Hariappan.
---
> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)
> * Total number of events to load
> * Load Start Time{color}
> * After each event load, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Target Table/View/Function/Name
> * Target Partition Name (in case of partition operations such as 
> ADD_PARTITION, DROP_PARTITION, ALTER_PARTITION etc. For other operations, it 
> will be “null")
> * Event load end time
> * Event load progress. Format is Event sequence no/ Total number of 
> events.{color}
> * After completion of all event loads, will add a log as follows to 
> consolidate the load.
> {color:#59afe1}* Target Database Name.
> * Load Type (INCREMENTAL).
> * Load End Time.
> * Total number of events loaded.
> * Last Repl ID of the loaded 

[jira] [Commented] (HIVE-17309) alter partition onto a table not in current database throw InvalidOperationException

2017-08-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124981#comment-16124981
 ] 

Hive QA commented on HIVE-17309:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881656/HIVE-17309.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11005 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6376/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6376/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6376/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881656 - PreCommit-HIVE-Build

> alter partition onto a table not in current database throw 
> InvalidOperationException
> 
>
> Key: HIVE-17309
> URL: https://issues.apache.org/jira/browse/HIVE-17309
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.2, 2.1.1, 2.2.0
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Attachments: HIVE-17309.1.patch
>
>
> When executor alter partition onto a table which existed not in current 
> database, InvalidOperationException thrown.
> SQL example:
> {code}
> use default;
> ALTER TABLE anotherdb.test_table_for_alter_partition_nocurrentdb 
> partition(ds='haihua001') CHANGE COLUMN a a_new BOOLEAN;
> {code}
> We see this code in {{DDLTask.java}} potential problem that not transfer the 
> qualified table name with database name when {{db.alterPartitions}} called.
> {code}
>   if (allPartitions == null) {
> db.alterTable(alterTbl.getOldName(), tbl, alterTbl.getIsCascade(), 
> alterTbl.getEnvironmentContext());
>   } else {
> db.alterPartitions(tbl.getTableName(), allPartitions, 
> alterTbl.getEnvironmentContext());
>   }
> {code}
> stacktrace:
> {code}
> 2017-07-19T11:06:39,639  INFO [main] metastore.HiveMetaStore: New partition 
> values:[2017-07-14]
> 2017-07-19T11:06:39,654 ERROR [main] metastore.RetryingHMSHandler: 
> InvalidOperationException(message:alter is not possible)
> at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:526)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:3560)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at 
> com.sun.proxy.$Proxy21.alter_partitions_with_environment_context(Unknown 
> Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_partitions(HiveMetaStoreClient.java:1486)
> at 

[jira] [Commented] (HIVE-17300) WebUI query plan graphs

2017-08-13 Thread Wang Haihua (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124979#comment-16124979
 ] 

Wang Haihua commented on HIVE-17300:


 cool, Tez has graph but MR does not, Hopeful for this work to me

> WebUI query plan graphs
> ---
>
> Key: HIVE-17300
> URL: https://issues.apache.org/jira/browse/HIVE-17300
> Project: Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
> Attachments: complete_success.png, full_mapred_stats.png, 
> graph_with_mapred_stats.png, last_stage_error.png, last_stage_running.png, 
> non_mapred_task_selected.png
>
>
> Hi all,
> I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
> the option to display the query plan as a nice graph (scroll down for 
> screenshots). If you click on one of the graph’s stages, the plan for that 
> stage appears as text below. 
> Stages are color-coded if they have a status (Success, Error, Running), and 
> the rest are grayed out. Coloring is based on status already available in the 
> WebUI, under the Stages tab.
> There is an additional option to display stats for MapReduce tasks. This 
> includes the job’s ID, tracking URL (where the logs are found), and mapper 
> and reducer numbers/progress, among other info. 
> The library I’m using for the graph is called vis.js (http://visjs.org/). It 
> has an Apache license, and the only necessary file to be included from this 
> library is about 700 KB.
> I tried to keep server-side changes minimal, and graph generation is taken 
> care of by the client. Plans with more than a given number of stages 
> (default: 25) won't be displayed in order to preserve resources.
> I’d love to hear any and all input from the community about this feature: do 
> you think it’s useful, and is there anything important I’m missing?
> Thanks,
> Karen Coppage



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17309) alter partition onto a table not in current database throw InvalidOperationException

2017-08-13 Thread Wang Haihua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Haihua updated HIVE-17309:
---
Attachment: HIVE-17309.1.patch

> alter partition onto a table not in current database throw 
> InvalidOperationException
> 
>
> Key: HIVE-17309
> URL: https://issues.apache.org/jira/browse/HIVE-17309
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.2, 2.1.1, 2.2.0
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Attachments: HIVE-17309.1.patch
>
>
> When executor alter partition onto a table which existed not in current 
> database, InvalidOperationException thrown.
> SQL example:
> {code}
> use default;
> ALTER TABLE anotherdb.test_table_for_alter_partition_nocurrentdb 
> partition(ds='haihua001') CHANGE COLUMN a a_new BOOLEAN;
> {code}
> We see this code in {{DDLTask.java}} potential problem that not transfer the 
> qualified table name with database name when {{db.alterPartitions}} called.
> {code}
>   if (allPartitions == null) {
> db.alterTable(alterTbl.getOldName(), tbl, alterTbl.getIsCascade(), 
> alterTbl.getEnvironmentContext());
>   } else {
> db.alterPartitions(tbl.getTableName(), allPartitions, 
> alterTbl.getEnvironmentContext());
>   }
> {code}
> stacktrace:
> {code}
> 2017-07-19T11:06:39,639  INFO [main] metastore.HiveMetaStore: New partition 
> values:[2017-07-14]
> 2017-07-19T11:06:39,654 ERROR [main] metastore.RetryingHMSHandler: 
> InvalidOperationException(message:alter is not possible)
> at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:526)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:3560)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at 
> com.sun.proxy.$Proxy21.alter_partitions_with_environment_context(Unknown 
> Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_partitions(HiveMetaStoreClient.java:1486)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy22.alter_partitions(Unknown Source)
> at org.apache.hadoop.hive.ql.metadata.Hive.alterPartitions(Hive.java:712)
> at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3338)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:368)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2166)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1837)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1713)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1543)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1174)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1164)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> 2017-07-19T11:06:39,669 ERROR [main] exec.DDLTask: 
> 

[jira] [Updated] (HIVE-17309) alter partition onto a table not in current database throw InvalidOperationException

2017-08-13 Thread Wang Haihua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Haihua updated HIVE-17309:
---
Status: Patch Available  (was: In Progress)

> alter partition onto a table not in current database throw 
> InvalidOperationException
> 
>
> Key: HIVE-17309
> URL: https://issues.apache.org/jira/browse/HIVE-17309
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0, 2.1.1, 1.2.2
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Attachments: HIVE-17309.1.patch
>
>
> When executor alter partition onto a table which existed not in current 
> database, InvalidOperationException thrown.
> SQL example:
> {code}
> use default;
> ALTER TABLE anotherdb.test_table_for_alter_partition_nocurrentdb 
> partition(ds='haihua001') CHANGE COLUMN a a_new BOOLEAN;
> {code}
> We see this code in {{DDLTask.java}} potential problem that not transfer the 
> qualified table name with database name when {{db.alterPartitions}} called.
> {code}
>   if (allPartitions == null) {
> db.alterTable(alterTbl.getOldName(), tbl, alterTbl.getIsCascade(), 
> alterTbl.getEnvironmentContext());
>   } else {
> db.alterPartitions(tbl.getTableName(), allPartitions, 
> alterTbl.getEnvironmentContext());
>   }
> {code}
> stacktrace:
> {code}
> 2017-07-19T11:06:39,639  INFO [main] metastore.HiveMetaStore: New partition 
> values:[2017-07-14]
> 2017-07-19T11:06:39,654 ERROR [main] metastore.RetryingHMSHandler: 
> InvalidOperationException(message:alter is not possible)
> at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:526)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:3560)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at 
> com.sun.proxy.$Proxy21.alter_partitions_with_environment_context(Unknown 
> Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_partitions(HiveMetaStoreClient.java:1486)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy22.alter_partitions(Unknown Source)
> at org.apache.hadoop.hive.ql.metadata.Hive.alterPartitions(Hive.java:712)
> at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3338)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:368)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2166)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1837)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1713)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1543)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1174)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1164)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> 2017-07-19T11:06:39,669 ERROR [main] exec.DDLTask: 

[jira] [Work started] (HIVE-17309) alter partition onto a table not in current database throw InvalidOperationException

2017-08-13 Thread Wang Haihua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17309 started by Wang Haihua.
--
> alter partition onto a table not in current database throw 
> InvalidOperationException
> 
>
> Key: HIVE-17309
> URL: https://issues.apache.org/jira/browse/HIVE-17309
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.2, 2.1.1, 2.2.0
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Attachments: HIVE-17309.1.patch
>
>
> When executor alter partition onto a table which existed not in current 
> database, InvalidOperationException thrown.
> SQL example:
> {code}
> use default;
> ALTER TABLE anotherdb.test_table_for_alter_partition_nocurrentdb 
> partition(ds='haihua001') CHANGE COLUMN a a_new BOOLEAN;
> {code}
> We see this code in {{DDLTask.java}} potential problem that not transfer the 
> qualified table name with database name when {{db.alterPartitions}} called.
> {code}
>   if (allPartitions == null) {
> db.alterTable(alterTbl.getOldName(), tbl, alterTbl.getIsCascade(), 
> alterTbl.getEnvironmentContext());
>   } else {
> db.alterPartitions(tbl.getTableName(), allPartitions, 
> alterTbl.getEnvironmentContext());
>   }
> {code}
> stacktrace:
> {code}
> 2017-07-19T11:06:39,639  INFO [main] metastore.HiveMetaStore: New partition 
> values:[2017-07-14]
> 2017-07-19T11:06:39,654 ERROR [main] metastore.RetryingHMSHandler: 
> InvalidOperationException(message:alter is not possible)
> at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:526)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:3560)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at 
> com.sun.proxy.$Proxy21.alter_partitions_with_environment_context(Unknown 
> Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_partitions(HiveMetaStoreClient.java:1486)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy22.alter_partitions(Unknown Source)
> at org.apache.hadoop.hive.ql.metadata.Hive.alterPartitions(Hive.java:712)
> at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3338)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:368)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2166)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1837)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1713)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1543)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1174)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1164)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> 2017-07-19T11:06:39,669 ERROR [main] exec.DDLTask: 
> 

[jira] [Assigned] (HIVE-17309) alter partition onto a table not in current database throw InvalidOperationException

2017-08-13 Thread Wang Haihua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Haihua reassigned HIVE-17309:
--


> alter partition onto a table not in current database throw 
> InvalidOperationException
> 
>
> Key: HIVE-17309
> URL: https://issues.apache.org/jira/browse/HIVE-17309
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0, 2.1.1, 1.2.2
>Reporter: Wang Haihua
>Assignee: Wang Haihua
>
> When executor alter partition onto a table which existed not in current 
> database, InvalidOperationException thrown.
> SQL example:
> {code}
> use default;
> ALTER TABLE anotherdb.test_table_for_alter_partition_nocurrentdb 
> partition(ds='haihua001') CHANGE COLUMN a a_new BOOLEAN;
> {code}
> We see this code in {{DDLTask.java}} potential problem that not transfer the 
> qualified table name with database name when {{db.alterPartitions}} called.
> {code}
>   if (allPartitions == null) {
> db.alterTable(alterTbl.getOldName(), tbl, alterTbl.getIsCascade(), 
> alterTbl.getEnvironmentContext());
>   } else {
> db.alterPartitions(tbl.getTableName(), allPartitions, 
> alterTbl.getEnvironmentContext());
>   }
> {code}
> stacktrace:
> {code}
> 2017-07-19T11:06:39,639  INFO [main] metastore.HiveMetaStore: New partition 
> values:[2017-07-14]
> 2017-07-19T11:06:39,654 ERROR [main] metastore.RetryingHMSHandler: 
> InvalidOperationException(message:alter is not possible)
> at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:526)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:3560)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at 
> com.sun.proxy.$Proxy21.alter_partitions_with_environment_context(Unknown 
> Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_partitions(HiveMetaStoreClient.java:1486)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy22.alter_partitions(Unknown Source)
> at org.apache.hadoop.hive.ql.metadata.Hive.alterPartitions(Hive.java:712)
> at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3338)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:368)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2166)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1837)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1713)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1543)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1174)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1164)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> 2017-07-19T11:06:39,669 ERROR [main] exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter partition. 
> 

[jira] [Commented] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first

2017-08-13 Thread Wang Haihua (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124916#comment-16124916
 ] 

Wang Haihua commented on HIVE-17063:


ping   [~ashutoshc] 

> insert overwrite partition onto a external table fail when drop partition 
> first
> ---
>
> Key: HIVE-17063
> URL: https://issues.apache.org/jira/browse/HIVE-17063
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.2, 2.1.1, 2.2.0
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Attachments: HIVE-17063.1.patch, HIVE-17063.2.patch, 
> HIVE-17063.3.patch
>
>
> The default value of {{hive.exec.stagingdir}} which is a relative path, and 
> also drop partition on a external table will not clear the real data. As a 
> result, insert overwrite partition twice will happen to fail because of the 
> target data to be moved has 
>  already existed.
> This happened when we reproduce partition data onto a external table. 
> I see the target data will not be cleared only when {{immediately generated 
> data}} is child of {{the target data directory}}, so my proposal is trying  
> to clear target file already existed finally whe doing rename  {{immediately 
> generated data}} into {{the target data directory}}
> Operation reproduced:
> {code}
> create external table insert_after_drop_partition(key string, val string) 
> partitioned by (insertdate string);
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> alter table insert_after_drop_partition drop partition 
> (insertdate='2008-01-01');
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> {code}
> Stack trace:
> {code}
> 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] 
> exec.Task: Failed with exception java.io.IOException: rename for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename 
> for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1137)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:)
> at 
> org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:120)
> at 
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_after_drop_partition(TestCliDriver.java:103)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
>