[jira] [Updated] (HIVE-17191) Add InterfaceAudience and InterfaceStability annotations for StorageHandler APIs

2017-07-27 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17191:

Attachment: HIVE-17191.1.patch

> Add InterfaceAudience and InterfaceStability annotations for StorageHandler 
> APIs
> 
>
> Key: HIVE-17191
> URL: https://issues.apache.org/jira/browse/HIVE-17191
> Project: Hive
>  Issue Type: Sub-task
>  Components: StorageHandler
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17191.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17191) Add InterfaceAudience and InterfaceStability annotations for StorageHandler APIs

2017-07-27 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17191:

Status: Patch Available  (was: Open)

> Add InterfaceAudience and InterfaceStability annotations for StorageHandler 
> APIs
> 
>
> Key: HIVE-17191
> URL: https://issues.apache.org/jira/browse/HIVE-17191
> Project: Hive
>  Issue Type: Sub-task
>  Components: StorageHandler
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17191.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications

2017-07-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104476#comment-16104476
 ] 

Hive QA commented on HIVE-16759:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879231/HIVE16759.4.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10994 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] 
(batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6163/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6163/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6163/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879231 - PreCommit-HIVE-Build

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, 
> HIVE16759.3.patch, HIVE16759.4.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-27 Thread Ke Jia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated HIVE-17139:
--
Attachment: HIVE-17139.4.patch

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch, HIVE-17139.4.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-27 Thread Ke Jia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated HIVE-17139:
--
Status: Patch Available  (was: Open)

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch, HIVE-17139.4.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-27 Thread Ke Jia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated HIVE-17139:
--
Status: Open  (was: Patch Available)

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-27 Thread Ke Jia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated HIVE-17139:
--
Attachment: (was: HIVE-17139.4.patch)

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-27 Thread Ke Jia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated HIVE-17139:
--
Attachment: HIVE-17139.4.patch

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch, HIVE-17139.4.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-27 Thread Ke Jia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated HIVE-17139:
--
Status: Patch Available  (was: Open)

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch, HIVE-17139.4.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-27 Thread Ke Jia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated HIVE-17139:
--
Status: Open  (was: Patch Available)

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch, HIVE-17139.4.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17190) Don't store bitvectors for unpartitioned table

2017-07-27 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104467#comment-16104467
 ] 

Gopal V commented on HIVE-17190:


Does doing bitvectors for flat tables help with a multiple insert into merging?

> Don't store bitvectors for unpartitioned table
> --
>
> Key: HIVE-17190
> URL: https://issues.apache.org/jira/browse/HIVE-17190
> Project: Hive
>  Issue Type: Test
>  Components: Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-17190.patch
>
>
> Since current ones can't be intersected, there is no advantage of storing 
> them for unpartitioned tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results

2017-07-27 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104464#comment-16104464
 ] 

Gopal V commented on HIVE-16965:


LGTM - +1

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, 
> HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch, 
> HIVE-16965.6.patch, HIVE-16965.7.patch, HIVE-16965.8.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-07-27 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Status: Patch Available  (was: Open)

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-07-27 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Status: Open  (was: Patch Available)

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-07-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104419#comment-16104419
 ] 

Hive QA commented on HIVE-15665:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879225/HIVE-15665.08.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=142)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6162/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6162/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6162/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879225 - PreCommit-HIVE-Build

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.01.patch, HIVE-15665.02.patch, 
> HIVE-15665.03.patch, HIVE-15665.04.patch, HIVE-15665.05.patch, 
> HIVE-15665.06.patch, HIVE-15665.07.patch, HIVE-15665.08.patch, 
> HIVE-15665.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17190) Don't store bitvectors for unpartitioned table

2017-07-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan reassigned HIVE-17190:
---

Assignee: Ashutosh Chauhan

> Don't store bitvectors for unpartitioned table
> --
>
> Key: HIVE-17190
> URL: https://issues.apache.org/jira/browse/HIVE-17190
> Project: Hive
>  Issue Type: Test
>  Components: Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-17190.patch
>
>
> Since current ones can't be intersected, there is no advantage of storing 
> them for unpartitioned tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17190) Don't store bitvectors for unpartitioned table

2017-07-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17190:

Attachment: HIVE-17190.patch

Initial patch for testing.

> Don't store bitvectors for unpartitioned table
> --
>
> Key: HIVE-17190
> URL: https://issues.apache.org/jira/browse/HIVE-17190
> Project: Hive
>  Issue Type: Test
>  Components: Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Ashutosh Chauhan
> Attachments: HIVE-17190.patch
>
>
> Since current ones can't be intersected, there is no advantage of storing 
> them for unpartitioned tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17193) HoS: don't combine map works that are targets of different DPPs

2017-07-27 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li reassigned HIVE-17193:
-


> HoS: don't combine map works that are targets of different DPPs
> ---
>
> Key: HIVE-17193
> URL: https://issues.apache.org/jira/browse/HIVE-17193
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-591) create new type of join ( 1 row for a given key from multiple tables) (UNIQUEJOIN)

2017-07-27 Thread xinzhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xinzhang updated HIVE-591:
--
Description: 
It will be useful to support a new type of join:

say:

.

select .. from JOIN TABLES (A,B,C) WITH KEYS (A.key, B.key, C.key) where 


The semantics are that for a given key only 1 row is created - nulls are 
present for the the tables which do not contain a row for that key.
There is no limit on the number of tables, the number of keys should be the 
same as the number of tables.

  was:
It will be useful to support a new type of join:

say:



select .. from JOIN TABLES (A,B,C) WITH KEYS (A.key, B.key, C.key) where 


The semantics are that for a given key only 1 row is created - nulls are 
present for the the tables which do not contain a row for that key.
There is no limit on the number of tables, the number of keys should be the 
same as the number of tables.


> create new type of join ( 1 row for a given key from multiple tables) 
> (UNIQUEJOIN)
> --
>
> Key: HIVE-591
> URL: https://issues.apache.org/jira/browse/HIVE-591
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Emil Ibrishimov
> Fix For: 0.5.0
>
> Attachments: HIVE-591.1.patch, HIVE-591.2.patch, HIVE-591.3.patch, 
> HIVE-591.4.patch
>
>
> It will be useful to support a new type of join:
> say:
> .
> select .. from JOIN TABLES (A,B,C) WITH KEYS (A.key, B.key, C.key) where 
> The semantics are that for a given key only 1 row is created - nulls are 
> present for the the tables which do not contain a row for that key.
> There is no limit on the number of tables, the number of keys should be the 
> same as the number of tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark

2017-07-27 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104380#comment-16104380
 ] 

Rui Li commented on HIVE-16948:
---

Thinking more about this, I find a bug in combing equivalent works. If 2 map 
works contain same operators, but will be pruned by different DPP sinks, then 
they can't be combined. E.g., let's slightly change the above example into:
{code}
explain select * from (select srcpart.ds,srcpart.key from srcpart join src on 
srcpart.ds=src.key) a join (select srcpart.ds,srcpart.key from srcpart join src 
on srcpart.ds=src.value) b on a.key=b.key;
{code}
The two map works for {{srcpart}} still get combined. However, they need to be 
pruned by different values: {{src.key}} and {{src.value}} respectively. In the 
current implementation we'll probably have incorrect results.

> Invalid explain when running dynamic partition pruning query in Hive On Spark
> -
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16948_1.patch, HIVE-16948.patch
>
>
> in 
> [union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107]
>  in spark_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Reducer 11 
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
>   

[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark

2017-07-27 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104374#comment-16104374
 ] 

Rui Li commented on HIVE-16948:
---

[~kellyzly], I'm not talking about the 3 places. Here's an example:
{noformat}
set hive.cbo.enable=false;
explain select * from (select srcpart.ds,srcpart.key from srcpart join src on 
srcpart.ds=src.key) a join (select srcpart.ds,srcpart.key from srcpart join src 
on srcpart.ds=src.key) b on a.key=b.key;

STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-1 depends on stages: Stage-2
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-2
Spark
  DagName: lirui_20170728110559_4c2bc0ba-ab9a-428b-bf09-23f1b19b068f:16
  Vertices:
Map 8
Map Operator Tree:
TableScan
  alias: src
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Select Operator
  expressions: key (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Spark Partition Pruning Sink Operator
  partition key expr: ds
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  target column name: ds
  target work: Map 1
Execution mode: vectorized
Map 9
Map Operator Tree:
TableScan
  alias: src
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Select Operator
  expressions: key (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Spark Partition Pruning Sink Operator
  partition key expr: ds
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  target column name: ds
  target work: Map 5
Execution mode: vectorized

  Stage: Stage-1
Spark
  Edges:
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 4 (PARTITION-LEVEL 
SORT, 1)
Reducer 3 <- Reducer 2 (PARTITION-LEVEL SORT, 1), Reducer 6 
(PARTITION-LEVEL SORT, 1)
Reducer 6 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 4 (PARTITION-LEVEL 
SORT, 1)
  DagName: lirui_20170728110559_4c2bc0ba-ab9a-428b-bf09-23f1b19b068f:15
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: srcpart
  Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
COMPLETE Column stats: NONE
Reduce Output Operator
  key expressions: ds (type: string)
  sort order: +
  Map-reduce partition columns: ds (type: string)
  Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
COMPLETE Column stats: NONE
  value expressions: key (type: string)
Execution mode: vectorized
Map 4
Map Operator Tree:
TableScan
  alias: src
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Reduce 

[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results

2017-07-27 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16965:
--
Attachment: HIVE-16965.8.patch

Last patch mysteriously failed in build. Recreated one after code refresh.

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, 
> HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch, 
> HIVE-16965.6.patch, HIVE-16965.7.patch, HIVE-16965.8.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17167) Create metastore specific configuration tool

2017-07-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104361#comment-16104361
 ] 

Hive QA commented on HIVE-17167:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879216/HIVE-17167.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11030 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=207)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6161/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6161/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6161/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879216 - PreCommit-HIVE-Build

> Create metastore specific configuration tool
> 
>
> Key: HIVE-17167
> URL: https://issues.apache.org/jira/browse/HIVE-17167
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17167.patch
>
>
> As part of making the metastore a separately releasable module we need 
> configuration tools that are specific to that module.  It cannot use or 
> extend HiveConf as that is in hive common.  But it must take a HiveConf 
> object and be able to operate on it.
> The best way to achieve this is using Hadoop's Configuration object (which 
> HiveConf extends) together with enums and static methods.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17164) Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)

2017-07-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104272#comment-16104272
 ] 

Hive QA commented on HIVE-17164:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879215/HIVE-17164.02.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_windowing_expressions]
 (batchId=75)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_ptf_part_simple]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning 
(batchId=292)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6160/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6160/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6160/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879215 - PreCommit-HIVE-Build

> Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)
> ---
>
> Key: HIVE-17164
> URL: https://issues.apache.org/jira/browse/HIVE-17164
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17164.01.patch, HIVE-17164.02.patch
>
>
> Add disk storage backing.  Turn hive.vectorized.execution.ptf.enabled on by 
> default.
> Add hive.vectorized.ptf.max.memory.buffering.batch.count to specify the 
> maximum number of vectorized row batch to buffer in memory before spilling to 
> disk.
> Add hive.vectorized.testing.reducer.batch.size parameter to have the Tez 
> Reducer make small batches for making a lot of key group batches that cause 
> memory buffering and disk storage backing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17192) Add InterfaceAudience and InterfaceStability annotations for Stats Collection APIs

2017-07-27 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-17192:
---


> Add InterfaceAudience and InterfaceStability annotations for Stats Collection 
> APIs
> --
>
> Key: HIVE-17192
> URL: https://issues.apache.org/jira/browse/HIVE-17192
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17191) Add InterfaceAudience and InterfaceStability annotations for StorageHandler APIs

2017-07-27 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-17191:
---


> Add InterfaceAudience and InterfaceStability annotations for StorageHandler 
> APIs
> 
>
> Key: HIVE-17191
> URL: https://issues.apache.org/jira/browse/HIVE-17191
> Project: Hive
>  Issue Type: Sub-task
>  Components: StorageHandler
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results

2017-07-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104115#comment-16104115
 ] 

Hive QA commented on HIVE-16965:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879247/HIVE-16965.7.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6159/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6159/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6159/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-07-27 23:55:16.127
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-6159/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-07-27 23:55:16.129
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   e15b2de..61d8b7c  master -> origin/master
+ git reset --hard HEAD
HEAD is now at e15b2de HIVE-17168 Create separate module for stand alone 
metastore (Alan Gates, reviewed by Vihang Karajgaonkar)
+ git clean -f -d
Removing 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruning.java
Removing 
ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_mapjoin_only.q
Removing 
ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_mapjoin_only.q.out
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at 61d8b7c HIVE-17087: Remove unnecessary HoS DPP trees during 
map-join conversion (Sahil Takiar, reviewed by Liyun Zhang, Rui Li)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-07-27 23:55:21.989
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MapRecordSource.java: 
No such file or directory
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValueInputMerger.java:
 No such file or directory
error: a/ql/src/test/results/clientpositive/llap/llap_smb.q.out: No such file 
or directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879247 - PreCommit-HIVE-Build

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, 
> HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch, 
> HIVE-16965.6.patch, HIVE-16965.7.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from 

[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104108#comment-16104108
 ] 

Hive QA commented on HIVE-16998:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879205/HIVE16998.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6158/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6158/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6158/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879205 - PreCommit-HIVE-Build

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17184) Unexpected new line in beeline output when running with -f option

2017-07-27 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104106#comment-16104106
 ] 

Vihang Karajgaonkar commented on HIVE-17184:


Test failures are unrelated. [~pvary] Can you please review?

> Unexpected new line in beeline output when running with -f option
> -
>
> Key: HIVE-17184
> URL: https://issues.apache.org/jira/browse/HIVE-17184
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-17184.01.patch
>
>
> When running in -f mode on BeeLine I see an extra new line getting added at 
> the end of the results.
> {noformat}
> vihang-MBP:bin vihang$ beeline -f /tmp/query.sql 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$ beeline -e "select * from test;" 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-17189) Fix backwards incompatibility in HiveMetaStoreClient

2017-07-27 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17189 started by Vihang Karajgaonkar.
--
> Fix backwards incompatibility in HiveMetaStoreClient
> 
>
> Key: HIVE-17189
> URL: https://issues.apache.org/jira/browse/HIVE-17189
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17189.01.patch
>
>
> HIVE-12730 adds the ability to edit the basic stats using {{alter table}} and 
> {{alter partition}} commands. However, it changes the signature of @public 
> interface of MetastoreClient and removes some methods which breaks backwards 
> compatibility. This can be fixed easily by re-introducing the removed methods 
> and making them call into newly added method 
> {{alter_table_with_environment_context}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17189) Fix backwards incompatibility in HiveMetaStoreClient

2017-07-27 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17189:
---
Status: Patch Available  (was: In Progress)

> Fix backwards incompatibility in HiveMetaStoreClient
> 
>
> Key: HIVE-17189
> URL: https://issues.apache.org/jira/browse/HIVE-17189
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17189.01.patch
>
>
> HIVE-12730 adds the ability to edit the basic stats using {{alter table}} and 
> {{alter partition}} commands. However, it changes the signature of @public 
> interface of MetastoreClient and removes some methods which breaks backwards 
> compatibility. This can be fixed easily by re-introducing the removed methods 
> and making them call into newly added method 
> {{alter_table_with_environment_context}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17189) Fix backwards incompatibility in HiveMetaStoreClient

2017-07-27 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17189:
---
Attachment: HIVE-17189.01.patch

> Fix backwards incompatibility in HiveMetaStoreClient
> 
>
> Key: HIVE-17189
> URL: https://issues.apache.org/jira/browse/HIVE-17189
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17189.01.patch
>
>
> HIVE-12730 adds the ability to edit the basic stats using {{alter table}} and 
> {{alter partition}} commands. However, it changes the signature of @public 
> interface of MetastoreClient and removes some methods which breaks backwards 
> compatibility. This can be fixed easily by re-introducing the removed methods 
> and making them call into newly added method 
> {{alter_table_with_environment_context}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17008) DbNotificationListener should skip failed events

2017-07-27 Thread Dan Burkert (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Burkert updated HIVE-17008:
---
Attachment: HIVE-17008.2.patch

> DbNotificationListener should skip failed events
> 
>
> Key: HIVE-17008
> URL: https://issues.apache.org/jira/browse/HIVE-17008
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Dan Burkert
>Assignee: Dan Burkert
> Attachments: HIVE-17008.0.patch, HIVE-17008.1.patch, 
> HIVE-17008.2.patch
>
>
> When dropping a non-existent database, the HMS will still fire registered 
> {{DROP_DATABASE}} event listeners.  This results in an NPE when the listeners 
> attempt to deref the {{null}} database parameter.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-27 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104042#comment-16104042
 ] 

Sahil Takiar edited comment on HIVE-16998 at 7/27/17 11:11 PM:
---

You'll probably need to rebase this too, since I just merged HIVE-17087.


was (Author: stakiar):
You'll probably need to rebase this too, since I just merged HIVE-16923.

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()

2017-07-27 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104059#comment-16104059
 ] 

Mithun Radhakrishnan commented on HIVE-17169:
-

An additional reason to avoid {{KeyProvider::getMetadata()}} is that the HDFS 
might be set up to disallow this call for all but HDFS super-users. The 
{{EncryptionZone}} instance already provides what we need.

> Avoid extra call to KeyProvider::getMetadata()
> --
>
> Key: HIVE-17169
> URL: https://issues.apache.org/jira/browse/HIVE-17169
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17169.1.patch
>
>
> Here's the code from {{Hadoop23Shims}}:
> {code:title=Hadoop23Shims.java|borderStyle=solid}
> @Override
> public int comparePathKeyStrength(Path path1, Path path2) throws 
> IOException {
>   EncryptionZone zone1, zone2;
>   zone1 = hdfsAdmin.getEncryptionZoneForPath(path1);
>   zone2 = hdfsAdmin.getEncryptionZoneForPath(path2);
>   if (zone1 == null && zone2 == null) {
> return 0;
>   } else if (zone1 == null) {
> return -1;
>   } else if (zone2 == null) {
> return 1;
>   }
>   return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName());
> }
> private int compareKeyStrength(String keyname1, String keyname2) throws 
> IOException {
>   KeyProvider.Metadata meta1, meta2;
>   if (keyProvider == null) {
> throw new IOException("HDFS security key provider is not configured 
> on your server.");
>   }
>   meta1 = keyProvider.getMetadata(keyname1);
>   meta2 = keyProvider.getMetadata(keyname2);
>   if (meta1.getBitLength() < meta2.getBitLength()) {
> return -1;
>   } else if (meta1.getBitLength() == meta2.getBitLength()) {
> return 0;
>   } else {
> return 1;
>   }
> }
>   }
> {code}
> It turns out that {{EncryptionZone}} already has the cipher's bit-length 
> stored in a member variable. One shouldn't need an additional name-node call 
> ({{KeyProvider::getMetadata()}}) only to fetch it again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()

2017-07-27 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17169:

Status: Patch Available  (was: Open)

Submitting patch for tests.

> Avoid extra call to KeyProvider::getMetadata()
> --
>
> Key: HIVE-17169
> URL: https://issues.apache.org/jira/browse/HIVE-17169
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17169.1.patch
>
>
> Here's the code from {{Hadoop23Shims}}:
> {code:title=Hadoop23Shims.java|borderStyle=solid}
> @Override
> public int comparePathKeyStrength(Path path1, Path path2) throws 
> IOException {
>   EncryptionZone zone1, zone2;
>   zone1 = hdfsAdmin.getEncryptionZoneForPath(path1);
>   zone2 = hdfsAdmin.getEncryptionZoneForPath(path2);
>   if (zone1 == null && zone2 == null) {
> return 0;
>   } else if (zone1 == null) {
> return -1;
>   } else if (zone2 == null) {
> return 1;
>   }
>   return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName());
> }
> private int compareKeyStrength(String keyname1, String keyname2) throws 
> IOException {
>   KeyProvider.Metadata meta1, meta2;
>   if (keyProvider == null) {
> throw new IOException("HDFS security key provider is not configured 
> on your server.");
>   }
>   meta1 = keyProvider.getMetadata(keyname1);
>   meta2 = keyProvider.getMetadata(keyname2);
>   if (meta1.getBitLength() < meta2.getBitLength()) {
> return -1;
>   } else if (meta1.getBitLength() == meta2.getBitLength()) {
> return 0;
>   } else {
> return 1;
>   }
> }
>   }
> {code}
> It turns out that {{EncryptionZone}} already has the cipher's bit-length 
> stored in a member variable. One shouldn't need an additional name-node call 
> ({{KeyProvider::getMetadata()}}) only to fetch it again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()

2017-07-27 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17169:

Attachment: (was: HIVE-17169.branch-2.2.patch)

> Avoid extra call to KeyProvider::getMetadata()
> --
>
> Key: HIVE-17169
> URL: https://issues.apache.org/jira/browse/HIVE-17169
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17169.1.patch
>
>
> Here's the code from {{Hadoop23Shims}}:
> {code:title=Hadoop23Shims.java|borderStyle=solid}
> @Override
> public int comparePathKeyStrength(Path path1, Path path2) throws 
> IOException {
>   EncryptionZone zone1, zone2;
>   zone1 = hdfsAdmin.getEncryptionZoneForPath(path1);
>   zone2 = hdfsAdmin.getEncryptionZoneForPath(path2);
>   if (zone1 == null && zone2 == null) {
> return 0;
>   } else if (zone1 == null) {
> return -1;
>   } else if (zone2 == null) {
> return 1;
>   }
>   return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName());
> }
> private int compareKeyStrength(String keyname1, String keyname2) throws 
> IOException {
>   KeyProvider.Metadata meta1, meta2;
>   if (keyProvider == null) {
> throw new IOException("HDFS security key provider is not configured 
> on your server.");
>   }
>   meta1 = keyProvider.getMetadata(keyname1);
>   meta2 = keyProvider.getMetadata(keyname2);
>   if (meta1.getBitLength() < meta2.getBitLength()) {
> return -1;
>   } else if (meta1.getBitLength() == meta2.getBitLength()) {
> return 0;
>   } else {
> return 1;
>   }
> }
>   }
> {code}
> It turns out that {{EncryptionZone}} already has the cipher's bit-length 
> stored in a member variable. One shouldn't need an additional name-node call 
> ({{KeyProvider::getMetadata()}}) only to fetch it again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-27 Thread Janaki Lahorani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104048#comment-16104048
 ] 

Janaki Lahorani commented on HIVE-16998:


Thanks [~stakiar].  I will rebase.

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-27 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104042#comment-16104042
 ] 

Sahil Takiar commented on HIVE-16998:
-

You'll probably need to rebase this too, since I just merged HIVE-16923.

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion

2017-07-27 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17087:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the reviews, [~lirui] and [~kellyzly]. Committed to master.

> Remove unnecessary HoS DPP trees during map-join conversion
> ---
>
> Key: HIVE-17087
> URL: https://issues.apache.org/jira/browse/HIVE-17087
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17087.1.patch, HIVE-17087.2.patch, 
> HIVE-17087.3.patch, HIVE-17087.4.patch, HIVE-17087.5.patch
>
>
> Ran the following query in the {{TestSparkCliDriver}}:
> {code:sql}
> set hive.spark.dynamic.partition.pruning=true;
> set hive.auto.convert.join=true;
> create table partitioned_table1 (col int) partitioned by (part_col int);
> create table partitioned_table2 (col int) partitioned by (part_col int);
> create table regular_table (col int);
> insert into table regular_table values (1);
> alter table partitioned_table1 add partition (part_col = 1);
> insert into table partitioned_table1 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> alter table partitioned_table2 add partition (part_col = 1);
> insert into table partitioned_table2 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> explain select * from partitioned_table1, partitioned_table2 where 
> partitioned_table1.part_col = partitioned_table2.part_col;
> {code}
> and got the following explain plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-3 depends on stages: Stage-2
>   Stage-1 depends on stages: Stage-3
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col1 (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition key expr: part_col
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   target column name: part_col
>   target work: Map 2
>   Stage: Stage-3
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table2
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark HashTable Sink Operator
>   keys:
> 0 _col1 (type: int)
> 1 _col1 (type: int)
> Local Work:
>   Map Reduce Local Work
>   Stage: Stage-1
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   

[jira] [Commented] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-07-27 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104030#comment-16104030
 ] 

Mithun Radhakrishnan commented on HIVE-17188:
-

P.S. I've added clarification in the JIRA description.

We've had a rash of JIRAs with anaemic descriptions recently. I hope this 
version is more clear.

> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17188.1.patch
>
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> Note: The problem being addressed here isn't so much with the size of the 
> hundreds of Partition objects, but the cruft that builds with the 
> PersistenceManager, in the JDO layer, as confirmed through memory-profiling.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-07-27 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17188:

Description: 
For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
runs out of memory. Flushing the {{PersistenceManager}} alleviates the problem.

Note: The problem being addressed here isn't so much with the size of the 
hundreds of Partition objects, but the cruft that builds with the 
PersistenceManager, in the JDO layer, as confirmed through memory-profiling.

(Raising this on behalf of [~cdrome] and [~thiruvel].)

  was:
For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
runs out of memory. Flushing the {{PersistenceManager}} alleviates the problem.

(Raising this on behalf of [~cdrome] and [~thiruvel].)


> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17188.1.patch
>
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> Note: The problem being addressed here isn't so much with the size of the 
> hundreds of Partition objects, but the cruft that builds with the 
> PersistenceManager, in the JDO layer, as confirmed through memory-profiling.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17006) LLAP: Parquet caching

2017-07-27 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17006:

Attachment: HIVE-17006.01.patch

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.01.patch, HIVE-17006.patch, 
> HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark

2017-07-27 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104017#comment-16104017
 ] 

liyunzhang_intel commented on HIVE-16948:
-

[~lirui]: 
{quote}

 Is it possible the reduce works only contain one DPP sink?
{quote}
there are 3 conditions to remove dpp sink:
1. SparkRemoveDynamicPruningBySize 
2. SparkCompiler#runCycleAnalysisForPartitionPruning
3. SparkMapJoinOptimizer(HIVE-17087)

If i use 1 condition to remove dpp sink, can you give one example to show to 
remove 1 and remain another?

> Invalid explain when running dynamic partition pruning query in Hive On Spark
> -
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16948_1.patch, HIVE-16948.patch
>
>
> in 
> [union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107]
>  in spark_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Reducer 11 
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   

[jira] [Commented] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-07-27 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104014#comment-16104014
 ] 

Mithun Radhakrishnan commented on HIVE-17188:
-

@[~vihangk1]: Thank you for your attention. :]

bq. Can you please update the patch with HIVE specific JIRA number and 
description of this JIRA as per our convention?
Sorry, it's been a while, so perhaps you could clarify for me. My memory of the 
convention is that patches are named 
{{HIVE-..patch}}. If the patch is a port to 
another branch, then it's {{HIVE-..patch}}. 
>From perusing the JIRAs included in [the Hive 2.2 
>release|https://issues.apache.org/jira/projects/HIVE/versions/12335837], this 
>seems like the format of choice. Could you please clarify what I'm missing?

bq. You can add a line in the description where this patch was cherry-picked 
from I you like..
This is a port from Yahoo's internal production branch. The commit dates back 
to April of 2014. :]

bq. If there are hundreds of partitions being added, aren't they already in 
memory in the {{List}} parts object?
A fair question. :] I can try answer this, although [~cdrome] and [~thiruvel] 
are really the experts on this one. 
The problem being addressed here isn't so much with the size of the hundreds of 
{{Partition}} objects, but the cruft that builds with the 
{{PersistenceManager}}, in the JDO layer, as confirmed through memory-profiling.

Our larger commit also plugged leaks from neglecting to call 
{{Query::close()}}, etc. It looks like those have independently been solved 
already.


> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17188.1.patch
>
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17006) LLAP: Parquet caching

2017-07-27 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17006:

Attachment: (was: HIVE-17006.01.patch)

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17006) LLAP: Parquet caching

2017-07-27 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17006:

Attachment: HIVE-17006.01.patch

Fixing the initialization order, other minor changes.
I can observe the cache working on a small LLAP cluster, seemingly without 
errors.

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17129) Increase usage of InterfaceAudience and InterfaceStability annotations

2017-07-27 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104008#comment-16104008
 ] 

Sahil Takiar commented on HIVE-17129:
-

[~spena] what are your thoughts on marking {{MetaStoreEventListener}}, 
{{ListenerEvent}}, and the classes under 
{{org.apache.hadoop.hive.metastore.events}} as Public APIs. Do we expect Hive 
users to use these APIs, or even other Apache projects?

> Increase usage of InterfaceAudience and InterfaceStability annotations 
> ---
>
> Key: HIVE-17129
> URL: https://issues.apache.org/jira/browse/HIVE-17129
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The {{InterfaceAudience}} and {{InterfaceStability}} annotations were added a 
> while ago to mark certain classes as available for public use. However, they 
> were only added to a few classes. The annotations are largely missing for 
> major APIs such as the SerDe and UDF APIs. We should update these interfaces 
> to use these annotations.
> When done in conjunction with HIVE-17130, we should have an automated way to 
> prevent backwards incompatible changes to Hive APIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17189) Fix backwards incompatibility in HiveMetaStoreClient

2017-07-27 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-17189:
--


> Fix backwards incompatibility in HiveMetaStoreClient
> 
>
> Key: HIVE-17189
> URL: https://issues.apache.org/jira/browse/HIVE-17189
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>
> HIVE-12730 adds the ability to edit the basic stats using {{alter table}} and 
> {{alter partition}} commands. However, it changes the signature of @public 
> interface of MetastoreClient and removes some methods which breaks backwards 
> compatibility. This can be fixed easily by re-introducing the removed methods 
> and making them call into newly added method 
> {{alter_table_with_environment_context}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17184) Unexpected new line in beeline output when running with -f option

2017-07-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103998#comment-16103998
 ] 

Hive QA commented on HIVE-17184:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879204/HIVE-17184.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11012 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6157/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6157/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6157/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879204 - PreCommit-HIVE-Build

> Unexpected new line in beeline output when running with -f option
> -
>
> Key: HIVE-17184
> URL: https://issues.apache.org/jira/browse/HIVE-17184
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-17184.01.patch
>
>
> When running in -f mode on BeeLine I see an extra new line getting added at 
> the end of the results.
> {noformat}
> vihang-MBP:bin vihang$ beeline -f /tmp/query.sql 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$ beeline -e "select * from test;" 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-07-27 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103956#comment-16103956
 ] 

Vihang Karajgaonkar commented on HIVE-17188:


Hi Mithun, thanks for the providing the patch. Can you please update the patch 
with HIVE specific JIRA number and description of this JIRA as per our 
convention? You can add a line in the description where this patch was 
cherry-picked from I you like.. Also, wondering how the patch alleviates the 
problem? If there are hundreds of partitions being added, aren't they already 
in memory in the {{List parts}} object? If you have any stats to 
share it would be great. Eg. before --> running out of memory at X number of 
partitions ; after --> running out of memory at X+Y number of partitions. 
Thanks!

> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17188.1.patch
>
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results

2017-07-27 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16965:
--
Attachment: HIVE-16965.7.patch

Fixed the assert to compare paths and not the objects.

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, 
> HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch, 
> HIVE-16965.6.patch, HIVE-16965.7.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications

2017-07-27 Thread Janaki Lahorani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103932#comment-16103932
 ] 

Janaki Lahorani commented on HIVE-16759:


Thanks [~spena].
I have uploaded HIVE16759.4.patch after rebasing.  Job #6163 with the new patch 
is pending.

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, 
> HIVE16759.3.patch, HIVE16759.4.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17115) MetaStoreUtils.getDeserializer doesn't catch the java.lang.ClassNotFoundException

2017-07-27 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103916#comment-16103916
 ] 

Daniel Dai commented on HIVE-17115:
---

[~erik.fang], I find if SerDe.initialize throw exception, the create table 
statement would also fail as it will go through the same 
MetaStoreUtils.getDeserializer code. Do you know how this table is created and 
why we don't see exception at time of creation?

> MetaStoreUtils.getDeserializer doesn't catch the 
> java.lang.ClassNotFoundException
> -
>
> Key: HIVE-17115
> URL: https://issues.apache.org/jira/browse/HIVE-17115
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1
>Reporter: Erik.fang
>Assignee: Erik.fang
> Attachments: HIVE-17115.1.patch, HIVE-17115.patch
>
>
> Suppose we create a table with Custom SerDe, then call 
> HiveMetaStoreClient.getSchema(String db, String tableName) to extract the 
> metadata from HiveMetaStore Service
> the thrift client hangs there with exception in HiveMetaStore Service's log, 
> such as
> {code:java}
> Exception in thread "pool-5-thread-129" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/util/Bytes
> at 
> org.apache.hadoop.hive.hbase.HBaseSerDe.parseColumnsMapping(HBaseSerDe.java:184)
> at 
> org.apache.hadoop.hive.hbase.HBaseSerDeParameters.(HBaseSerDeParameters.java:73)
> at 
> org.apache.hadoop.hive.hbase.HBaseSerDe.initialize(HBaseSerDe.java:117)
> at 
> org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:53)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:401)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_fields_with_environment_context(HiveMetaStore.java:3556)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_schema_with_environment_context(HiveMetaStore.java:3636)
> at sun.reflect.GeneratedMethodAccessor104.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy4.get_schema_with_environment_context(Unknown 
> Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema_with_environment_context.getResult(ThriftHiveMetastore.java:9146)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema_with_environment_context.getResult(ThriftHiveMetastore.java:9130)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:551)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:546)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:546)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.util.Bytes
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications

2017-07-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103910#comment-16103910
 ] 

Sergio Peña commented on HIVE-16759:


Someting failed with the patch. Could you rebase your patch?
Btw, +1 on the patch.

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, 
> HIVE16759.3.patch, HIVE16759.4.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-07-27 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17188:

Status: Patch Available  (was: Open)

Submitting, to run tests.

> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17188.1.patch
>
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-07-27 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17188:

Attachment: HIVE-17188.1.patch

Here's the patch ported for {{master/}}.

I wonder if it's better to flush at an interval, instead of for *every* 
partition.

> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17188.1.patch
>
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17164) Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)

2017-07-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103878#comment-16103878
 ] 

Hive QA commented on HIVE-17164:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879215/HIVE-17164.02.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11012 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_windowing_expressions]
 (batchId=75)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_ptf_part_simple]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6156/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6156/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6156/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879215 - PreCommit-HIVE-Build

> Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)
> ---
>
> Key: HIVE-17164
> URL: https://issues.apache.org/jira/browse/HIVE-17164
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17164.01.patch, HIVE-17164.02.patch
>
>
> Add disk storage backing.  Turn hive.vectorized.execution.ptf.enabled on by 
> default.
> Add hive.vectorized.ptf.max.memory.buffering.batch.count to specify the 
> maximum number of vectorized row batch to buffer in memory before spilling to 
> disk.
> Add hive.vectorized.testing.reducer.batch.size parameter to have the Tez 
> Reducer make small batches for making a lot of key group batches that cause 
> memory buffering and disk storage backing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-27 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103871#comment-16103871
 ] 

Sahil Takiar commented on HIVE-16998:
-

[~janulatha], some minor comments on the changes to {{HiveConf}}, other than 
that, LGTM.

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-07-27 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17188:
---


> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results

2017-07-27 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16965:
--
Attachment: HIVE-16965.6.patch

Added a better assert as suggested by Gopal.
[~sershe][~gopalv] Can you please review.

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, 
> HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch, HIVE-16965.6.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15767) Hive On Spark is not working on secure clusters from Oozie

2017-07-27 Thread Peter Cseh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103843#comment-16103843
 ] 

Peter Cseh commented on HIVE-15767:
---

The problem is that we're not setting the _proper_ 
mapreduce.job.credentials.binary, but 
[here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java#L235],
 were passing every property from the HiveConf conf to the configuration for 
Spark.
If HiveCLI is called from the Oozie LauncherMapper, that HiveConf will contain 
the "mapreduce.job.credentials.binary" property for the LauncherMapper. e.g 
/yarn/nm/usercache/systest/appcache/application_1501079366372_0045/container_1501079366372_0045_01_01/container_tokens
This property have to be there so HiveCLI can access the tokens properly.

Passing this folder to the Spark driver is problematic as the driver often will 
be executed on an other machine in the cluster where it won't be able to read 
this file as it's not there. There are a couple ways to define the location of 
the container_tokens file and Yarn takes care of Spark getting the correct 
location on the node the driver will be executed on.


> Hive On Spark is not working on secure clusters from Oozie
> --
>
> Key: HIVE-15767
> URL: https://issues.apache.org/jira/browse/HIVE-15767
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Peter Cseh
>Assignee: Peter Cseh
> Attachments: HIVE-15767-001.patch, HIVE-15767-002.patch, 
> HIVE-15767.1.patch
>
>
> When a HiveAction is launched form Oozie with Hive On Spark enabled, we're 
> getting errors:
> {noformat}
> Caused by: java.io.IOException: Exception reading 
> file:/yarn/nm/usercache/yshi/appcache/application_1485271416004_0022/container_1485271416004_0022_01_02/container_tokens
> at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:188)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:155)
> {noformat}
> This is caused by passing the {{mapreduce.job.credentials.binary}} property 
> to the Spark configuration in RemoteHiveSparkClient.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-07-27 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Status: Open  (was: Patch Available)

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-07-27 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Status: Patch Available  (was: Open)

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-07-27 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Attachment: (was: HIVE-8472.branch-2.2.patch)

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16759) Add table type information to HMS log notifications

2017-07-27 Thread Janaki Lahorani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani updated HIVE-16759:
---
Attachment: HIVE16759.4.patch

Resolved merge conflicts.

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, 
> HIVE16759.3.patch, HIVE16759.4.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103742#comment-16103742
 ] 

Hive QA commented on HIVE-16998:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879205/HIVE16998.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_25] 
(batchId=84)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge1]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge3]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge_diff_fs]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[quotedid_smb]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_combine_equivalent_work]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_use_op_stats]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[truncate_column_buckets]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6155/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6155/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6155/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879205 - PreCommit-HIVE-Build

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17187) WebHCat SPNEGO support is incompleted

2017-07-27 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103740#comment-16103740
 ] 

Eric Yang commented on HIVE-17187:
--

See [the 
blog|https://developer.ibm.com/hadoop/2016/05/12/hbase-rest-gateway-security/] 
written by IBM about SPNEGO for HBase REST API.  This is a good source to 
implement SPNEGO properly with doAs calls with service principal instead of 
proxy user with SPNEGO credential.

> WebHCat SPNEGO support is incompleted
> -
>
> Key: HIVE-17187
> URL: https://issues.apache.org/jira/browse/HIVE-17187
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 1.2.1
>Reporter: Eric Yang
>
> [Some online 
> document|https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_security/content/spnego_setup_for_webhcat.html]
>  describes how to setup WebHCat with SPNEGO support.  However, there could be 
> multiple services use SPNEGO on the same host.  For example, HBase REST API 
> can also setup to use HTTP principal for SPNEGO support.  When HTTP principal 
> is shared among other services, Hadoop proxy user settings can not identify 
> the origin of doAs call with HTTP principal, is invoked by HBase REST API or 
> WebHCat.  Ideally, WebHCat should keep track of its own service principal 
> independent of SPNEGO principal to ensure that SPNEGO principal is only given 
> authentication access.  SPNEGO principal should not be used in proxy user 
> setting to grant authorization access.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17006) LLAP: Parquet caching

2017-07-27 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103723#comment-16103723
 ] 

Sergey Shelukhin commented on HIVE-17006:
-

load_dyn_part5 may be related (need to dbl check), the rest are unrelated. 
[~prasanth_j] do you want to review? A lot of the code for metadata cache is 
the same as in HIVE-15665, so only parts of the patch need separate review

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.patch, HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-07-27 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15665:

Attachment: HIVE-15665.08.patch

The same patch; looks like QA didn't trigger.

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.01.patch, HIVE-15665.02.patch, 
> HIVE-15665.03.patch, HIVE-15665.04.patch, HIVE-15665.05.patch, 
> HIVE-15665.06.patch, HIVE-15665.07.patch, HIVE-15665.08.patch, 
> HIVE-15665.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17167) Create metastore specific configuration tool

2017-07-27 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17167:
--
Attachment: HIVE-17167.patch

A patch with a MetastoreConf class.  This class is not itself instantiated.  It 
contains an enum that defines the conf values and a set of static methods that 
operation on Hadoop Configuration objects to read and write the values.

It honors existing Hive configuration values (e.g. 
"hive.metastore.rawstore.impl") while allowing metastore specific values (e.g. 
"metastore.rawstore.impl").  

Using Hadoop's Configuration class assures that a HiveConf object can be read 
from and written to using MetastoreConf methods.  It also allows operations on 
plain Configuration objects, which are passed through many of Hive's interfaces.


> Create metastore specific configuration tool
> 
>
> Key: HIVE-17167
> URL: https://issues.apache.org/jira/browse/HIVE-17167
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17167.patch
>
>
> As part of making the metastore a separately releasable module we need 
> configuration tools that are specific to that module.  It cannot use or 
> extend HiveConf as that is in hive common.  But it must take a HiveConf 
> object and be able to operate on it.
> The best way to achieve this is using Hadoop's Configuration object (which 
> HiveConf extends) together with enums and static methods.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17167) Create metastore specific configuration tool

2017-07-27 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17167:
--
Status: Patch Available  (was: Open)

> Create metastore specific configuration tool
> 
>
> Key: HIVE-17167
> URL: https://issues.apache.org/jira/browse/HIVE-17167
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17167.patch
>
>
> As part of making the metastore a separately releasable module we need 
> configuration tools that are specific to that module.  It cannot use or 
> extend HiveConf as that is in hive common.  But it must take a HiveConf 
> object and be able to operate on it.
> The best way to achieve this is using Hadoop's Configuration object (which 
> HiveConf extends) together with enums and static methods.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17167) Create metastore specific configuration tool

2017-07-27 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103661#comment-16103661
 ] 

Alan Gates commented on HIVE-17167:
---

[~vihangk1], I'm not sure what in SchemaTool you are suggesting we use.  It 
looked fairly different from what I was thinking, but I might be missing 
something.

> Create metastore specific configuration tool
> 
>
> Key: HIVE-17167
> URL: https://issues.apache.org/jira/browse/HIVE-17167
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> As part of making the metastore a separately releasable module we need 
> configuration tools that are specific to that module.  It cannot use or 
> extend HiveConf as that is in hive common.  But it must take a HiveConf 
> object and be able to operate on it.
> The best way to achieve this is using Hadoop's Configuration object (which 
> HiveConf extends) together with enums and static methods.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17167) Create metastore specific configuration tool

2017-07-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103657#comment-16103657
 ] 

ASF GitHub Bot commented on HIVE-17167:
---

GitHub user alanfgates opened a pull request:

https://github.com/apache/hive/pull/211

HIVE-17167 Create metastore specific configuration tool



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alanfgates/hive hive17167

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/211.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #211






> Create metastore specific configuration tool
> 
>
> Key: HIVE-17167
> URL: https://issues.apache.org/jira/browse/HIVE-17167
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> As part of making the metastore a separately releasable module we need 
> configuration tools that are specific to that module.  It cannot use or 
> extend HiveConf as that is in hive common.  But it must take a HiveConf 
> object and be able to operate on it.
> The best way to achieve this is using Hadoop's Configuration object (which 
> HiveConf extends) together with enums and static methods.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17164) Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)

2017-07-27 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-17164:

Attachment: HIVE-17164.02.patch

> Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)
> ---
>
> Key: HIVE-17164
> URL: https://issues.apache.org/jira/browse/HIVE-17164
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17164.01.patch, HIVE-17164.02.patch
>
>
> Add disk storage backing.  Turn hive.vectorized.execution.ptf.enabled on by 
> default.
> Add hive.vectorized.ptf.max.memory.buffering.batch.count to specify the 
> maximum number of vectorized row batch to buffer in memory before spilling to 
> disk.
> Add hive.vectorized.testing.reducer.batch.size parameter to have the Tez 
> Reducer make small batches for making a lot of key group batches that cause 
> memory buffering and disk storage backing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-15863) Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC timezone

2017-07-27 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-15863.

Resolution: Duplicate

> Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC 
> timezone
> --
>
> Key: HIVE-15863
> URL: https://issues.apache.org/jira/browse/HIVE-15863
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Related to CALCITE-1623.
> At query preparation time, Calcite uses a Calendar to hold the value of DATE, 
> TIME, TIMESTAMP literals. It assumes that Calendar has a UTC (GMT) time zone, 
> and bad things might happen if it does not. Currently, we pass the Calendar 
> object with user timezone from Hive. We need to pass it with UTC timezone and 
> make the inverse conversion when we go back from Calcite to Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17168) Create separate module for stand alone metastore

2017-07-27 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17168:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Patch committed.  Thanks Vihang for the review.

> Create separate module for stand alone metastore
> 
>
> Key: HIVE-17168
> URL: https://issues.apache.org/jira/browse/HIVE-17168
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 3.0.0
>
> Attachments: HIVE-17168.patch
>
>
> We need to create a separate maven module for the stand alone metastore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications

2017-07-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103618#comment-16103618
 ] 

Hive QA commented on HIVE-16759:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879190/HIVE16759.3.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6154/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6154/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6154/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-07-27 18:01:22.772
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-6154/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-07-27 18:01:22.775
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 0f7c33d HIVE-17088: HS2 WebUI throws a NullPointerException when 
opened (Sergio Pena, reviewed by Aihua Xu)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 0f7c33d HIVE-17088: HS2 WebUI throws a NullPointerException when 
opened (Sergio Pena, reviewed by Aihua Xu)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-07-27 18:01:25.483
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/TestDbNotificationListener.java:41
error: 
itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/TestDbNotificationListener.java:
 patch does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879190 - PreCommit-HIVE-Build

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, 
> HIVE16759.3.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103612#comment-16103612
 ] 

Hive QA commented on HIVE-17148:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879177/HIVE-17148.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 11012 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[nested_column_pruning] 
(batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[semijoin5] (batchId=15)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_2]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage2] 
(batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_in] 
(batchId=128)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6153/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6153/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6153/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879177 - PreCommit-HIVE-Build

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> 

[jira] [Commented] (HIVE-17039) Implement optimization rewritings that rely on database SQL constraints

2017-07-27 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103600#comment-16103600
 ] 

Jesus Camacho Rodriguez commented on HIVE-17039:


[~sershe], currently they are not, but we already have different options to 
enforce them vs rely on them for optimization purposes (other RDBMS can make 
this distinction too).

> Implement optimization rewritings that rely on database SQL constraints
> ---
>
> Key: HIVE-17039
> URL: https://issues.apache.org/jira/browse/HIVE-17039
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>
> Hive already has support to declare multiple SQL constraints (PRIMARY KEY, 
> FOREIGN KEY, UNIQUE, and NOT NULL). Although these constraints cannot be 
> currently enforced on the data, they can be made available to the optimizer 
> by using the 'RELY' keyword.
> This ticket is an umbrella for all the rewriting optimizations based on SQL 
> constraints that we will be including in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results

2017-07-27 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16965:
--
Attachment: HIVE-16965.5.patch

Somehow last patch did not trigger a run. Retrying.

[~sershe][~gopalv] can you please review?

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, 
> HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-27 Thread Janaki Lahorani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani updated HIVE-16998:
---
Status: Patch Available  (was: In Progress)

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-27 Thread Janaki Lahorani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani updated HIVE-16998:
---
Status: In Progress  (was: Patch Available)

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17184) Unexpected new line in beeline output when running with -f option

2017-07-27 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17184:
---
Summary: Unexpected new line in beeline output when running with -f option  
(was: Unexpected new line in beeline when running with -f option)

> Unexpected new line in beeline output when running with -f option
> -
>
> Key: HIVE-17184
> URL: https://issues.apache.org/jira/browse/HIVE-17184
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-17184.01.patch
>
>
> When running in -f mode on BeeLine I see an extra new line getting added at 
> the end of the results.
> {noformat}
> vihang-MBP:bin vihang$ beeline -f /tmp/query.sql 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$ beeline -e "select * from test;" 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17184) Unexpected new line in beeline when running with -f option

2017-07-27 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17184:
---
Status: Patch Available  (was: Open)

> Unexpected new line in beeline when running with -f option
> --
>
> Key: HIVE-17184
> URL: https://issues.apache.org/jira/browse/HIVE-17184
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-17184.01.patch
>
>
> When running in -f mode on BeeLine I see an extra new line getting added at 
> the end of the results.
> {noformat}
> vihang-MBP:bin vihang$ beeline -f /tmp/query.sql 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$ beeline -e "select * from test;" 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-27 Thread Janaki Lahorani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani updated HIVE-16998:
---
Attachment: HIVE16998.4.patch

Generated using
git show --full-index --no-prefix --no-renames

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17184) Unexpected new line in beeline when running with -f option

2017-07-27 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17184:
---
Attachment: HIVE-17184.01.patch

> Unexpected new line in beeline when running with -f option
> --
>
> Key: HIVE-17184
> URL: https://issues.apache.org/jira/browse/HIVE-17184
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-17184.01.patch
>
>
> When running in -f mode on BeeLine I see an extra new line getting added at 
> the end of the results.
> {noformat}
> vihang-MBP:bin vihang$ beeline -f /tmp/query.sql 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$ beeline -e "select * from test;" 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17186) `double` type constant operation loses precision

2017-07-27 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103559#comment-16103559
 ] 

Gopal V commented on HIVE-17186:


bq. Is there any way for Hive to fix this?

No, this is {{0.1+0.2 != 0.3}} problem with IEEE 754 arithmetic.

Decimal and 0.1BD + 0.2BD wouldn't cause these rounding errors.

> `double` type constant operation loses precision
> 
>
> Key: HIVE-17186
> URL: https://issues.apache.org/jira/browse/HIVE-17186
> Project: Hive
>  Issue Type: Bug
>Reporter: Dongjoon Hyun
>
> This might be an issue where Hive loses a precision and generates a wrong 
> result when handling *double* constant operations. This was reported in the 
> following environment.
> *ENVIRONMENT*
> https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-gen/ddl/orc.sql
> *SQL*
> {code}
> hive> explain select l_discount from lineitem where l_discount between 0.06 - 
> 0.01 and 0.06 + 0.01 limit 10;
> OK
> Plan not optimized by CBO.
> Stage-0
>Fetch Operator
>   limit:10
>   Stage-1
>  Map 1 vectorized
>  File Output Operator [FS_9]
> compressed:false
> Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE 
> Column stats: COMPLETE
> table:{"input 
> format:":"org.apache.hadoop.mapred.TextInputFormat","output 
> format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
> Limit [LIM_8]
>Number of rows:10
>Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE 
> Column stats: COMPLETE
>Select Operator [OP_7]
>   outputColumnNames:["_col0"]
>   Statistics:Num rows: 294854 Data size: 2358832 
> Basic stats: COMPLETE Column stats: COMPLETE
>   Filter Operator [FIL_6]
>  predicate:l_discount BETWEEN 0.049996 AND 
> 0.06999 (type: boolean)
>  Statistics:Num rows: 294854 Data size: 2358832 
> Basic stats: COMPLETE Column stats: COMPLETE
>  TableScan [TS_0]
> alias:lineitem
> Statistics:Num rows: 589709 Data size: 
> 4832986297043 Basic stats: COMPLETE Column stats: COMPLETE
> hive> select max(l_discount) from lineitem where l_discount between 0.06 - 
> 0.01 and 0.06 + 0.01 limit 10;
> OK
> 0.06
> Time taken: 314.923 seconds, Fetched: 1 row(s)
> {code}
> Hive excludes 0.07 differently from the users' intuitiion. Also, this 
> difference makes some users confused because they believe that Hive's result 
> is the correct one. Is there any way for Hive to fix this?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16614) Support "set local time zone" statement

2017-07-27 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103555#comment-16103555
 ] 

Jesus Camacho Rodriguez commented on HIVE-16614:


[~ashutoshc], this should be ready to review, could you take a look at 
https://reviews.apache.org/r/61188/ ?

Thanks

> Support "set local time zone" statement
> ---
>
> Key: HIVE-16614
> URL: https://issues.apache.org/jira/browse/HIVE-16614
> Project: Hive
>  Issue Type: Improvement
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16614.01.patch, HIVE-16614.02.patch, 
> HIVE-16614.03.patch, HIVE-16614.patch
>
>
> HIVE-14412 introduces a timezone-aware timestamp.
> SQL has a concept of default time zone displacements, which are transparently 
> applied when converting between timezone-unaware types and timezone-aware 
> types and, in Hive's case, are also used to shift a timezone aware type to a 
> different time zone, depending on configuration.
> SQL also provides that the default time zone displacement be settable at a 
> session level, so that clients can access a database simultaneously from 
> different time zones and see time values in their own time zone.
> Currently the time zone displacement is fixed and is set based on the system 
> time zone where the Hive client runs (HiveServer2 or Hive CLI). It will be 
> more convenient for users if they have the ability to set their time zone of 
> choice.
> SQL defines "set time zone" with 2 ways of specifying the time zone, first 
> using an interval and second using the special keyword LOCAL.
> Examples:
>   • set time zone '-8:00';
>   • set time zone LOCAL;
> LOCAL means to set the current default time zone displacement to the 
> session's original default time zone displacement.
> Reference: SQL:2011 section 19.4



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17186) `double` type constant operation loses precision

2017-07-27 Thread Andrew Sherman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103551#comment-16103551
 ] 

Andrew Sherman commented on HIVE-17186:
---

This looks like an artifact of floating point arithmetic:

{noformat}
double d1 = 0.06D;
double d2 = 0.01D;
double d3 = d1 + d2;
double d4 = d1 - d2;
System.out.println("d3 = " + d3);
System.out.println("d4 = " + d4);
{noformat}
gives
{noformat}
d3 = 0.06999
d4 = 0.049996
{noformat}

> `double` type constant operation loses precision
> 
>
> Key: HIVE-17186
> URL: https://issues.apache.org/jira/browse/HIVE-17186
> Project: Hive
>  Issue Type: Bug
>Reporter: Dongjoon Hyun
>
> This might be an issue where Hive loses a precision and generates a wrong 
> result when handling *double* constant operations. This was reported in the 
> following environment.
> *ENVIRONMENT*
> https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-gen/ddl/orc.sql
> *SQL*
> {code}
> hive> explain select l_discount from lineitem where l_discount between 0.06 - 
> 0.01 and 0.06 + 0.01 limit 10;
> OK
> Plan not optimized by CBO.
> Stage-0
>Fetch Operator
>   limit:10
>   Stage-1
>  Map 1 vectorized
>  File Output Operator [FS_9]
> compressed:false
> Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE 
> Column stats: COMPLETE
> table:{"input 
> format:":"org.apache.hadoop.mapred.TextInputFormat","output 
> format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
> Limit [LIM_8]
>Number of rows:10
>Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE 
> Column stats: COMPLETE
>Select Operator [OP_7]
>   outputColumnNames:["_col0"]
>   Statistics:Num rows: 294854 Data size: 2358832 
> Basic stats: COMPLETE Column stats: COMPLETE
>   Filter Operator [FIL_6]
>  predicate:l_discount BETWEEN 0.049996 AND 
> 0.06999 (type: boolean)
>  Statistics:Num rows: 294854 Data size: 2358832 
> Basic stats: COMPLETE Column stats: COMPLETE
>  TableScan [TS_0]
> alias:lineitem
> Statistics:Num rows: 589709 Data size: 
> 4832986297043 Basic stats: COMPLETE Column stats: COMPLETE
> hive> select max(l_discount) from lineitem where l_discount between 0.06 - 
> 0.01 and 0.06 + 0.01 limit 10;
> OK
> 0.06
> Time taken: 314.923 seconds, Fetched: 1 row(s)
> {code}
> Hive excludes 0.07 differently from the users' intuitiion. Also, this 
> difference makes some users confused because they believe that Hive's result 
> is the correct one. Is there any way for Hive to fix this?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17164) Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)

2017-07-27 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-17164:

Status: Patch Available  (was: Open)

> Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)
> ---
>
> Key: HIVE-17164
> URL: https://issues.apache.org/jira/browse/HIVE-17164
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17164.01.patch
>
>
> Add disk storage backing.  Turn hive.vectorized.execution.ptf.enabled on by 
> default.
> Add hive.vectorized.ptf.max.memory.buffering.batch.count to specify the 
> maximum number of vectorized row batch to buffer in memory before spilling to 
> disk.
> Add hive.vectorized.testing.reducer.batch.size parameter to have the Tez 
> Reducer make small batches for making a lot of key group batches that cause 
> memory buffering and disk storage backing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17164) Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)

2017-07-27 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-17164:

Attachment: HIVE-17164.01.patch

> Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)
> ---
>
> Key: HIVE-17164
> URL: https://issues.apache.org/jira/browse/HIVE-17164
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17164.01.patch
>
>
> Add disk storage backing.  Turn hive.vectorized.execution.ptf.enabled on by 
> default.
> Add hive.vectorized.ptf.max.memory.buffering.batch.count to specify the 
> maximum number of vectorized row batch to buffer in memory before spilling to 
> disk.
> Add hive.vectorized.testing.reducer.batch.size parameter to have the Tez 
> Reducer make small batches for making a lot of key group batches that cause 
> memory buffering and disk storage backing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15217) Add watch mode to llap status tool

2017-07-27 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103544#comment-16103544
 ] 

Prasanth Jayachandran commented on HIVE-15217:
--

[~leftylev] Thanks for the reminder again! Updated "LLAP Status" section in the 
wiki with all command options for llap status tool.

> Add watch mode to llap status tool
> --
>
> Key: HIVE-15217
> URL: https://issues.apache.org/jira/browse/HIVE-15217
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15217.1.patch, HIVE-15217.2.patch, 
> HIVE-15217.3.patch
>
>
> There is few seconds overhead for launching the llap status command. To avoid 
> we can add "watch" mode to llap status tool that refreshes the status after 
> configured interval. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-27 Thread Janaki Lahorani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103523#comment-16103523
 ] 

Janaki Lahorani commented on HIVE-16998:


Addressed comments from [~stakiar].
DPP for all joins: hive.spark.dynamic.partition.pruning is true
DPP for map joins: hive.spark.dynamic.partition.pruning is false and 
hive.spark.dynamic.partition.pruning.map.join.only is true
Fixed bug: remove unnecessary pruning sink
Fixed comments.
Will upload to RB after test results.

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17185) TestHiveMetaStoreStatsMerge.testStatsMerge is failing

2017-07-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103519#comment-16103519
 ] 

Ashutosh Chauhan commented on HIVE-17185:
-

[~pxiong] Can you please take a look?

> TestHiveMetaStoreStatsMerge.testStatsMerge is failing
> -
>
> Key: HIVE-17185
> URL: https://issues.apache.org/jira/browse/HIVE-17185
> Project: Hive
>  Issue Type: Test
>  Components: Metastore, Test
>Affects Versions: 3.0.0
>Reporter: Ashutosh Chauhan
>
> Likely because of HIVE-16997



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17184) Unexpected new line in beeline when running with -f option

2017-07-27 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-17184:
--


> Unexpected new line in beeline when running with -f option
> --
>
> Key: HIVE-17184
> URL: https://issues.apache.org/jira/browse/HIVE-17184
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>
> When running in -f mode on BeeLine I see an extra new line getting added at 
> the end of the results.
> {noformat}
> vihang-MBP:bin vihang$ beeline -f /tmp/query.sql 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$ beeline -e "select * from test;" 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17179) Add InterfaceAudience and InterfaceStability annotations for Hook APIs

2017-07-27 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103513#comment-16103513
 ] 

Aihua Xu commented on HIVE-17179:
-

+1.

> Add InterfaceAudience and InterfaceStability annotations for Hook APIs
> --
>
> Key: HIVE-17179
> URL: https://issues.apache.org/jira/browse/HIVE-17179
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hooks
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17179.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-27 Thread Janaki Lahorani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani updated HIVE-16998:
---
Attachment: HIVE16998.3.patch

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-11681) sometimes when query mr job progress, stream closed exception will happen

2017-07-27 Thread frank luo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103506#comment-16103506
 ] 

frank luo commented on HIVE-11681:
--

https://issues.apache.org/jira/browse/HADOOP-13809 is a similar case.

I believe they are all related to 
https://bugs.openjdk.java.net/browse/JDK-6947916, which hasn't been released.

I am able to recreate it with oracle jdk 1.8.0_131.

> sometimes when query mr job progress, stream closed exception will happen
> -
>
> Key: HIVE-11681
> URL: https://issues.apache.org/jira/browse/HIVE-11681
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.1
>Reporter: wangwenli
>
> sometimes the hiveserver will throw below exception , 
> 2015-08-28 05:05:44,107 | FATAL | Thread-82995 | error parsing conf 
> mapred-default.xml | 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2404)
> java.io.IOException: Stream closed
>   at 
> java.util.zip.InflaterInputStream.ensureOpen(InflaterInputStream.java:84)
>   at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:160)
>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
>   at 
> com.sun.org.apache.xerces.internal.impl.XMLEntityManager$RewindableInputStream.read(XMLEntityManager.java:2902)
>   at 
> com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:302)
>   at 
> com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1753)
>   at 
> com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1426)
>   at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2807)
>   at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
>   at 
> com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:117)
>   at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
>   at 
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
>   at 
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
>   at 
> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
>   at 
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:243)
>   at 
> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:347)
>   at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
>   at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2246)
>   at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2234)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2305)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2258)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2175)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:854)
>   at 
> org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2069)
>   at org.apache.hadoop.mapred.JobConf.(JobConf.java:477)
>   at org.apache.hadoop.mapred.JobConf.(JobConf.java:467)
>   at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:187)
>   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:580)
>   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:578)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1612)
>   at 
> org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:578)
>   at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:596)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:289)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:435)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:159)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72)
> after analysis, we found the root cause, below is step to reproduce the issue
> 1.  open one beeline 

[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications

2017-07-27 Thread Janaki Lahorani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103489#comment-16103489
 ] 

Janaki Lahorani commented on HIVE-16759:


Thanks [~vihangk1].  I attached the patch again.

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, 
> HIVE16759.3.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >