[jira] [Commented] (HIVE-16738) Notification ID generation in DBNotification might not be unique across HS2 instances.

2017-06-14 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050034#comment-16050034
 ] 

anishek commented on HIVE-16738:


No, I havent started on this one, please go ahead.

> Notification ID generation in DBNotification might not be unique across HS2 
> instances.
> --
>
> Key: HIVE-16738
> URL: https://issues.apache.org/jira/browse/HIVE-16738
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> Going to explain the problem in scope of "replication" feature for hive 2 
> that is being built, as it is easier to explain:
> To allow replication to work we need to set 
> "hive.metastore.transactional.event.listeners"  to DBNotificationListener. 
> For use cases where there are multiple HiveServer2 Instances running 
> {code}
>  private void process(NotificationEvent event, ListenerEvent listenerEvent) 
> throws MetaException {
> event.setMessageFormat(msgFactory.getMessageFormat());
> synchronized (NOTIFICATION_TBL_LOCK) {
>   LOG.debug("DbNotificationListener: Processing : {}:{}", 
> event.getEventId(),
>   event.getMessage());
>   HMSHandler.getMSForConf(hiveConf).addNotificationEvent(event);
> }
>   // Set the DB_NOTIFICATION_EVENT_ID for future reference by other 
> listeners.
>   if (event.isSetEventId()) {
> listenerEvent.putParameter(
> MetaStoreEventListenerConstants.DB_NOTIFICATION_EVENT_ID_KEY_NAME,
> Long.toString(event.getEventId()));
>   }
>   }
> {code}
> the above code in DBNotificationListner having the object lock wont be 
> guarantee enough to make sure that all events get a unique id. The 
> transaction isolation level at the db "read-comitted" or "repeatable-read"  
> would  also not guarantee the same, unless a lock is at the db level 
> preferably on table {{NOTIFICATION_SEQUENCE}} which only has one row.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16905) Add zookeeper ACL for hiveserver2

2017-06-14 Thread Saijin Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saijin Huang updated HIVE-16905:

Description: Add zookeeper ACL for hiveserver2

> Add zookeeper ACL for hiveserver2
> -
>
> Key: HIVE-16905
> URL: https://issues.apache.org/jira/browse/HIVE-16905
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Saijin Huang
>Assignee: Saijin Huang
>
> Add zookeeper ACL for hiveserver2



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16905) Add zookeeper ACL for hiveserver2

2017-06-14 Thread Saijin Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saijin Huang reassigned HIVE-16905:
---


> Add zookeeper ACL for hiveserver2
> -
>
> Key: HIVE-16905
> URL: https://issues.apache.org/jira/browse/HIVE-16905
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Saijin Huang
>Assignee: Saijin Huang
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16904) during repl load for large number of partitions the metadata file can be huge and can lead to out of memory

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek reassigned HIVE-16904:
--


> during repl load for large number of partitions the metadata file can be huge 
> and can lead to out of memory 
> 
>
> Key: HIVE-16904
> URL: https://issues.apache.org/jira/browse/HIVE-16904
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> the metadata pertaining to a table + its partitions is stored in a single 
> file, During repl load all the data is loaded in memory in one shot and then 
> individual partitions processed. This can lead to huge memory overhead as the 
> entire file is read in memory. try to deserialize the partition objects with 
> some sort of streaming json deserializer. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16876) RpcServer should be re-created when Rpc configs change

2017-06-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049986#comment-16049986
 ] 

Xuefu Zhang commented on HIVE-16876:


+1

> RpcServer should be re-created when Rpc configs change
> --
>
> Key: HIVE-16876
> URL: https://issues.apache.org/jira/browse/HIVE-16876
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-16876.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16903) LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE

2017-06-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-16903:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Thanks [~gopalv]. Committed to master.

> LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE
> --
>
> Key: HIVE-16903
> URL: https://issues.apache.org/jira/browse/HIVE-16903
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HIVE-16903.1.patch
>
>
> https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java#L130



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-14 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049923#comment-16049923
 ] 

Chao Sun commented on HIVE-11297:
-

Sure. Added comments in RB. Regarding the output file, you can just use 
{{-Dtest.output.overwrite=true}} to generate new file.

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, 
> HIVE-11297.3.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-06-14 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049902#comment-16049902
 ] 

liyunzhang_intel commented on HIVE-11297:
-

[~csun]: can you help to view HIVE-11297.3.patch which changes 
{{SplitOpTreeForDPP.java}} and {{spark.dynamic.partition.pruning.q.out}}? thanks

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, 
> HIVE-11297.3.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16332) When create a partitioned text format table with one partition, after we change the format of table to orc, then the array type field may output error.

2017-06-14 Thread Zhizhen Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhizhen Hou updated HIVE-16332:
---
Summary: When create a partitioned text format table with one partition, 
after we change the format of table to orc, then the array type field may 
output error.  (was: We create a partitioned text format table with one 
partition, after we change the format of table to orc, then the array type 
field may output error.)

> When create a partitioned text format table with one partition, after we 
> change the format of table to orc, then the array type field may output error.
> ---
>
> Key: HIVE-16332
> URL: https://issues.apache.org/jira/browse/HIVE-16332
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.1
>Reporter: Zhizhen Hou
>Priority: Critical
>
> ##The step to reproduce the result.
> 1. First crate a text format table with array type field in hive.
> ```
>  create table test_text_orc (
>   col_int bigint,
>   col_text string, 
>   col_array array, 
>   col_map map
>   ) 
>   PARTITIONED BY (
>day string
>)
>ROW FORMAT DELIMITED
>  FIELDS TERMINATED BY ',' 
>  collection items TERMINATED  BY ']'
>  map keys TERMINATED BY ':'
>   ;
>  
> ```
> 2. Create new text file hive-orc-text-file-array-error-test.txt.
> ```
> 1,text_value1,array_value1]array_value2]array_value3, 
> map_key1:map_value1,map_key2:map_value2
> 2,text_value2,array_value4, map_key1:map_value3
> ,text_value3,, map_key1:]map_key3:map_value3
> ```
> 3.  Load the data into one partition.
> ```
>  LOAD DATA local INPATH '.hive-orc-text-file-array-error-test.txt' overwrite 
> into table test_text_orc partition(day=20170329)
> ```
> 4. select the data to verify the result.
> ```
> hive> select * from test.test_text_orc;
> OK
> 1 text_value1 ["array_value1","array_value2","array_value3"]  {" 
> map_key1":"map_value1","map_key2":"map_value2"}  20170329
> 2 text_value2 ["array_value4"]{"map_key1":"map_value3"}   
> 20170329
> NULL  text_value3 []  {" map_key1":"","map_key3":"map_value3"}
> 20170329
> ```
> 5. Alter table format of table to orc;
> ```
>  alter table test_text_orc set fileformat orc;
> ```
> 6. Check the result again, and you can see the  error result.
> ```
> hive> select * from test.test_text_orc;
> OK
> 1 text_value1 ["array_value1","array_value2","array_value3"]  {" 
> map_key1":"map_value1","map_key2":"map_value2"}  20170329
> 2 text_value2 ["array_value4","array_value2","array_value3"]  
> {"map_key1":"map_value3"}   20170329
> NULL  text_value3 ["array_value4","array_value2","array_value3"]  
> {"map_key3":"map_value3"," map_key1":""}20170329
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16903) LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE

2017-06-14 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049847#comment-16049847
 ] 

Gopal V commented on HIVE-16903:


+1

> LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE
> --
>
> Key: HIVE-16903
> URL: https://issues.apache.org/jira/browse/HIVE-16903
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-16903.1.patch
>
>
> https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java#L130



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16903) LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE

2017-06-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049828#comment-16049828
 ] 

Hive QA commented on HIVE-16903:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12873045/HIVE-16903.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10831 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5647/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5647/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5647/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12873045 - PreCommit-HIVE-Build

> LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE
> --
>
> Key: HIVE-16903
> URL: https://issues.apache.org/jira/browse/HIVE-16903
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-16903.1.patch
>
>
> https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java#L130



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16902) investigate "failed to remove operation log" errors

2017-06-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049775#comment-16049775
 ] 

Hive QA commented on HIVE-16902:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12873026/HIVE-16902.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10830 tests 
executed
*Failed tests:*
{noformat}
TestJdbcWithMiniKdcSQLAuthHttp - did not produce a TEST-*.xml file (likely 
timed out) (batchId=238)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5646/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5646/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5646/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12873026 - PreCommit-HIVE-Build

> investigate "failed to remove operation log" errors
> ---
>
> Key: HIVE-16902
> URL: https://issues.apache.org/jira/browse/HIVE-16902
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16902.1.patch
>
>
> When we call {{set a=3;}} from beeline, the following exception is thrown. 
> {noformat}
> [HiveServer2-Handler-Pool: Thread-46]: Failed to remove corresponding log 
> file of operation: OperationHandle [opType=GET_TABLES, 
> getHandleIdentifier()=50f58d7b-f935-4590-922f-de7051a34658]
> java.io.FileNotFoundException: File does not exist: 
> /var/log/hive/operation_logs/7f613077-e29d-484a-96e1-43c81f9c0999/hive_20170531101400_28d52b7d-ffb9-4815-8c6c-662319628915
>   at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2275)
>   at 
> org.apache.hadoop.hive.ql.session.OperationLog$LogFile.remove(OperationLog.java:122)
>   at 
> org.apache.hadoop.hive.ql.session.OperationLog.close(OperationLog.java:90)
>   at 
> org.apache.hive.service.cli.operation.Operation.cleanupOperationLog(Operation.java:287)
>   at 
> org.apache.hive.service.cli.operation.MetadataOperation.close(MetadataOperation.java:58)
>   at 
> org.apache.hive.service.cli.operation.OperationManager.closeOperation(OperationManager.java:273)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.closeOperation(HiveSessionImpl.java:822)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> 

[jira] [Assigned] (HIVE-16903) LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE

2017-06-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned HIVE-16903:
---

Assignee: Rajesh Balamohan

> LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE
> --
>
> Key: HIVE-16903
> URL: https://issues.apache.org/jira/browse/HIVE-16903
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-16903.1.patch
>
>
> https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java#L130



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16903) LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE

2017-06-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-16903:

Attachment: HIVE-16903.1.patch

\cc [~sseth]. 

> LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE
> --
>
> Key: HIVE-16903
> URL: https://issues.apache.org/jira/browse/HIVE-16903
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-16903.1.patch
>
>
> https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java#L130



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16903) LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE

2017-06-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-16903:

Status: Patch Available  (was: Open)

> LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE
> --
>
> Key: HIVE-16903
> URL: https://issues.apache.org/jira/browse/HIVE-16903
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-16903.1.patch
>
>
> https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java#L130



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16903) LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE

2017-06-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-16903:

Component/s: llap

> LLAP: Fix config name issue in SHUFFLE_MANAGE_OS_CACHE
> --
>
> Key: HIVE-16903
> URL: https://issues.apache.org/jira/browse/HIVE-16903
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Priority: Trivial
>
> https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java#L130



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-8434) Vectorization logic using wrong values for DATE and TIMESTAMP partitioning columns in vectorized row batches...

2017-06-14 Thread Charles Pritchard (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049759#comment-16049759
 ] 

Charles Pritchard commented on HIVE-8434:
-

Hit this in 1.2.1 when using MONTH(CAST(datestr_partitioncol as date)) on 
select and group by -- gives unstable results. Seeing a lot of 7 and 31.

> Vectorization logic using wrong values for DATE and TIMESTAMP partitioning 
> columns in vectorized row batches...
> ---
>
> Key: HIVE-8434
> URL: https://issues.apache.org/jira/browse/HIVE-8434
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 0.14.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8434.01.patch, HIVE-8434.02.patch
>
>
> VectorizedRowBatchCtx.addPartitionColsToBatch uses wrong values to populate 
> DATE and TIMESTAMP data types.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16902) investigate "failed to remove operation log" errors

2017-06-14 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16902:

Status: Patch Available  (was: Open)

patch-1: in some cases, actually no log file is created since no log is to be 
printed to client console. So just try to remove the log file if the file 
exists.

> investigate "failed to remove operation log" errors
> ---
>
> Key: HIVE-16902
> URL: https://issues.apache.org/jira/browse/HIVE-16902
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16902.1.patch
>
>
> When we call {{set a=3;}} from beeline, the following exception is thrown. 
> {noformat}
> [HiveServer2-Handler-Pool: Thread-46]: Failed to remove corresponding log 
> file of operation: OperationHandle [opType=GET_TABLES, 
> getHandleIdentifier()=50f58d7b-f935-4590-922f-de7051a34658]
> java.io.FileNotFoundException: File does not exist: 
> /var/log/hive/operation_logs/7f613077-e29d-484a-96e1-43c81f9c0999/hive_20170531101400_28d52b7d-ffb9-4815-8c6c-662319628915
>   at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2275)
>   at 
> org.apache.hadoop.hive.ql.session.OperationLog$LogFile.remove(OperationLog.java:122)
>   at 
> org.apache.hadoop.hive.ql.session.OperationLog.close(OperationLog.java:90)
>   at 
> org.apache.hive.service.cli.operation.Operation.cleanupOperationLog(Operation.java:287)
>   at 
> org.apache.hive.service.cli.operation.MetadataOperation.close(MetadataOperation.java:58)
>   at 
> org.apache.hive.service.cli.operation.OperationManager.closeOperation(OperationManager.java:273)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.closeOperation(HiveSessionImpl.java:822)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1857)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>   at com.sun.proxy.$Proxy38.closeOperation(Unknown Source)
>   at 
> org.apache.hive.service.cli.CLIService.closeOperation(CLIService.java:475)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.CloseOperation(ThriftCLIService.java:671)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1677)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1662)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:605)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16902) investigate "failed to remove operation log" errors

2017-06-14 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16902:

Attachment: HIVE-16902.1.patch

> investigate "failed to remove operation log" errors
> ---
>
> Key: HIVE-16902
> URL: https://issues.apache.org/jira/browse/HIVE-16902
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16902.1.patch
>
>
> When we call {{set a=3;}} from beeline, the following exception is thrown. 
> {noformat}
> [HiveServer2-Handler-Pool: Thread-46]: Failed to remove corresponding log 
> file of operation: OperationHandle [opType=GET_TABLES, 
> getHandleIdentifier()=50f58d7b-f935-4590-922f-de7051a34658]
> java.io.FileNotFoundException: File does not exist: 
> /var/log/hive/operation_logs/7f613077-e29d-484a-96e1-43c81f9c0999/hive_20170531101400_28d52b7d-ffb9-4815-8c6c-662319628915
>   at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2275)
>   at 
> org.apache.hadoop.hive.ql.session.OperationLog$LogFile.remove(OperationLog.java:122)
>   at 
> org.apache.hadoop.hive.ql.session.OperationLog.close(OperationLog.java:90)
>   at 
> org.apache.hive.service.cli.operation.Operation.cleanupOperationLog(Operation.java:287)
>   at 
> org.apache.hive.service.cli.operation.MetadataOperation.close(MetadataOperation.java:58)
>   at 
> org.apache.hive.service.cli.operation.OperationManager.closeOperation(OperationManager.java:273)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.closeOperation(HiveSessionImpl.java:822)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1857)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>   at com.sun.proxy.$Proxy38.closeOperation(Unknown Source)
>   at 
> org.apache.hive.service.cli.CLIService.closeOperation(CLIService.java:475)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.CloseOperation(ThriftCLIService.java:671)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1677)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1662)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:605)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16902) investigate "failed to remove operation log" errors

2017-06-14 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu reassigned HIVE-16902:
---


> investigate "failed to remove operation log" errors
> ---
>
> Key: HIVE-16902
> URL: https://issues.apache.org/jira/browse/HIVE-16902
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> When we call {{set a=3;}} from beeline, the following exception is thrown. 
> {noformat}
> [HiveServer2-Handler-Pool: Thread-46]: Failed to remove corresponding log 
> file of operation: OperationHandle [opType=GET_TABLES, 
> getHandleIdentifier()=50f58d7b-f935-4590-922f-de7051a34658]
> java.io.FileNotFoundException: File does not exist: 
> /var/log/hive/operation_logs/7f613077-e29d-484a-96e1-43c81f9c0999/hive_20170531101400_28d52b7d-ffb9-4815-8c6c-662319628915
>   at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2275)
>   at 
> org.apache.hadoop.hive.ql.session.OperationLog$LogFile.remove(OperationLog.java:122)
>   at 
> org.apache.hadoop.hive.ql.session.OperationLog.close(OperationLog.java:90)
>   at 
> org.apache.hive.service.cli.operation.Operation.cleanupOperationLog(Operation.java:287)
>   at 
> org.apache.hive.service.cli.operation.MetadataOperation.close(MetadataOperation.java:58)
>   at 
> org.apache.hive.service.cli.operation.OperationManager.closeOperation(OperationManager.java:273)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.closeOperation(HiveSessionImpl.java:822)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1857)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>   at com.sun.proxy.$Proxy38.closeOperation(Unknown Source)
>   at 
> org.apache.hive.service.cli.CLIService.closeOperation(CLIService.java:475)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.CloseOperation(ThriftCLIService.java:671)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1677)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1662)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:605)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16288) Add blobstore tests for ORC and RCFILE file formats

2017-06-14 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049527#comment-16049527
 ] 

Thomas Poepping commented on HIVE-16288:


I hate to bump this again, but it looks like branch-2 is still wrong. Can you 
help again, [~ashutoshc]? Thank you!

> Add blobstore tests for ORC and RCFILE file formats
> ---
>
> Key: HIVE-16288
> URL: https://issues.apache.org/jira/browse/HIVE-16288
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Fix For: 2.3.0, 3.0.0
>
> Attachments: HIVE-16288.patch
>
>
> This patch adds four tests each for ORC and RCFILE when running against 
> blobstore filesystems:
>   * Test for bucketed tables
>   * Test for nonpartitioned tables
>   * Test for partitioned tables
>   * Test for partitioned tables with nonstandard partition locations



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16901) Distcp optimization - One distcp per CopyTask

2017-06-14 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16901:

Description: 
Currently, if a CopyTask is created to copy a list of files, then distcp is 
invoked for each and every file. Instead, need to pass the list of source files 
to be copied to distcp tool which basically copies the files in parallel and 
hence gets lot of performance gain.

If the copy of list of files fail, then traverse the destination directory to 
see which file is missing and checksum mismatches, then trigger copy of those 
files one by one.

  was:
Currently, if a ReplCopyTask is created to copy a list of files, then distcp is 
invoked for each and every file. Instead, need to pass the list of source files 
to be copied to distcp tool which basically copies the files in parallel and 
hence gets lot of performance gain.

If the copy of list of files fail, then traverse the destination directory to 
see which file is missing and checksum mismatches, then trigger copy of those 
files one by one.


> Distcp optimization - One distcp per CopyTask 
> --
>
> Key: HIVE-16901
> URL: https://issues.apache.org/jira/browse/HIVE-16901
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
>
> Currently, if a CopyTask is created to copy a list of files, then distcp is 
> invoked for each and every file. Instead, need to pass the list of source 
> files to be copied to distcp tool which basically copies the files in 
> parallel and hence gets lot of performance gain.
> If the copy of list of files fail, then traverse the destination directory to 
> see which file is missing and checksum mismatches, then trigger copy of those 
> files one by one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16901) Distcp optimization - One distcp per CopyTask

2017-06-14 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16901:

Summary: Distcp optimization - One distcp per CopyTask   (was: Distcp 
optimization - One distcp per ReplCopyTask )

> Distcp optimization - One distcp per CopyTask 
> --
>
> Key: HIVE-16901
> URL: https://issues.apache.org/jira/browse/HIVE-16901
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
>
> Currently, if a ReplCopyTask is created to copy a list of files, then distcp 
> is invoked for each and every file. Instead, need to pass the list of source 
> files to be copied to distcp tool which basically copies the files in 
> parallel and hence gets lot of performance gain.
> If the copy of list of files fail, then traverse the destination directory to 
> see which file is missing and checksum mismatches, then trigger copy of those 
> files one by one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16901) Distcp optimization - One distcp per ReplCopyTask

2017-06-14 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan reassigned HIVE-16901:
---


> Distcp optimization - One distcp per ReplCopyTask 
> --
>
> Key: HIVE-16901
> URL: https://issues.apache.org/jira/browse/HIVE-16901
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
>
> Currently, if a ReplCopyTask is created to copy a list of files, then distcp 
> is invoked for each and every file. Instead, need to pass the list of source 
> files to be copied to distcp tool which basically copies the files in 
> parallel and hence gets lot of performance gain.
> If the copy of list of files fail, then traverse the destination directory to 
> see which file is missing and checksum mismatches, then trigger copy of those 
> files one by one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16885) Non-equi Joins: Filter clauses should be pushed into the ON clause

2017-06-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049458#comment-16049458
 ] 

Hive QA commented on HIVE-16885:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12872992/HIVE-16885.01.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10834 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5645/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5645/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5645/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12872992 - PreCommit-HIVE-Build

> Non-equi Joins: Filter clauses should be pushed into the ON clause
> --
>
> Key: HIVE-16885
> URL: https://issues.apache.org/jira/browse/HIVE-16885
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16885.01.patch, HIVE-16885.patch
>
>
> FIL_24 -> MAPJOIN_23
> {code}
> hive> explain  select * from part where p_size > (select max(p_size) from 
> part group by p_type);
> Warning: Map Join MAPJOIN[14][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 3 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_26]
> Select Operator [SEL_25] (rows=110 width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_24] (rows=110 width=625)
> predicate:(_col5 > _col9)
> Map Join Operator [MAPJOIN_23] (rows=330 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
> <-Reducer 3 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_21]
> Select Operator [SEL_20] (rows=165 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_19] (rows=165 width=109)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_18]
>   PartitionCols:_col0
>   Group By Operator [GBY_17] (rows=14190 width=109)
> 
> Output:["_col0","_col1"],aggregations:["max(p_size)"],keys:p_type
> Select Operator [SEL_16] (rows=2 width=109)
>   Output:["p_type","p_size"]
>   TableScan [TS_2] (rows=2 width=109)
> 
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Select Operator 

[jira] [Commented] (HIVE-16885) Non-equi Joins: Filter clauses should be pushed into the ON clause

2017-06-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049384#comment-16049384
 ] 

Hive QA commented on HIVE-16885:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12872992/HIVE-16885.01.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10834 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5644/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5644/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5644/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12872992 - PreCommit-HIVE-Build

> Non-equi Joins: Filter clauses should be pushed into the ON clause
> --
>
> Key: HIVE-16885
> URL: https://issues.apache.org/jira/browse/HIVE-16885
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16885.01.patch, HIVE-16885.patch
>
>
> FIL_24 -> MAPJOIN_23
> {code}
> hive> explain  select * from part where p_size > (select max(p_size) from 
> part group by p_type);
> Warning: Map Join MAPJOIN[14][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 3 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_26]
> Select Operator [SEL_25] (rows=110 width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_24] (rows=110 width=625)
> predicate:(_col5 > _col9)
> Map Join Operator [MAPJOIN_23] (rows=330 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
> <-Reducer 3 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_21]
> Select Operator [SEL_20] (rows=165 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_19] (rows=165 width=109)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_18]
>   PartitionCols:_col0
>   Group By Operator [GBY_17] (rows=14190 width=109)
> 
> Output:["_col0","_col1"],aggregations:["max(p_size)"],keys:p_type
> Select Operator [SEL_16] (rows=2 width=109)
>   Output:["p_type","p_size"]
>   TableScan [TS_2] (rows=2 width=109)
> 
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Select Operator 

[jira] [Updated] (HIVE-16835) Addendum to HIVE-16745

2017-06-14 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-16835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-16835:
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

> Addendum to HIVE-16745
> --
>
> Key: HIVE-16835
> URL: https://issues.apache.org/jira/browse/HIVE-16835
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Fix For: 3.0.0
>
> Attachments: HIVE-16835.01.patch
>
>
> HIVE-16745 missed fixing the syntax error in hive-schema-1.1.0.mysql.sql



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16835) Addendum to HIVE-16745

2017-06-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049347#comment-16049347
 ] 

Sergio Peña commented on HIVE-16835:


Looks simple
+1

> Addendum to HIVE-16745
> --
>
> Key: HIVE-16835
> URL: https://issues.apache.org/jira/browse/HIVE-16835
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-16835.01.patch
>
>
> HIVE-16745 missed fixing the syntax error in hive-schema-1.1.0.mysql.sql



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client

2017-06-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049337#comment-16049337
 ] 

Hive QA commented on HIVE-14747:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12872986/HIVE-14747.02.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10831 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=157)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5643/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5643/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5643/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12872986 - PreCommit-HIVE-Build

> Remove JAVA paths from profiles by sending them from ptest-client
> -
>
> Key: HIVE-14747
> URL: https://issues.apache.org/jira/browse/HIVE-14747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-14747.01.patch, HIVE-14747.02.patch
>
>
> Hive ptest uses some properties files per branch that contain information 
> about how to execute the tests.
> This profile includes JAVA paths to build and execute the tests. We should 
> get rid of these by passing such information from Jenkins to the 
> ptest-server. In case a profile needs a different java version, then we can 
> create a specific Jenkins job for that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16885) Non-equi Joins: Filter clauses should be pushed into the ON clause

2017-06-14 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16885:
---
Attachment: (was: HIVE-16885.01.patch)

> Non-equi Joins: Filter clauses should be pushed into the ON clause
> --
>
> Key: HIVE-16885
> URL: https://issues.apache.org/jira/browse/HIVE-16885
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16885.01.patch, HIVE-16885.patch
>
>
> FIL_24 -> MAPJOIN_23
> {code}
> hive> explain  select * from part where p_size > (select max(p_size) from 
> part group by p_type);
> Warning: Map Join MAPJOIN[14][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 3 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_26]
> Select Operator [SEL_25] (rows=110 width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_24] (rows=110 width=625)
> predicate:(_col5 > _col9)
> Map Join Operator [MAPJOIN_23] (rows=330 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
> <-Reducer 3 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_21]
> Select Operator [SEL_20] (rows=165 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_19] (rows=165 width=109)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_18]
>   PartitionCols:_col0
>   Group By Operator [GBY_17] (rows=14190 width=109)
> 
> Output:["_col0","_col1"],aggregations:["max(p_size)"],keys:p_type
> Select Operator [SEL_16] (rows=2 width=109)
>   Output:["p_type","p_size"]
>   TableScan [TS_2] (rows=2 width=109)
> 
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Select Operator [SEL_22] (rows=2 width=621)
> 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
> TableScan [TS_0] (rows=2 width=621)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16885) Non-equi Joins: Filter clauses should be pushed into the ON clause

2017-06-14 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16885:
---
Attachment: HIVE-16885.01.patch

> Non-equi Joins: Filter clauses should be pushed into the ON clause
> --
>
> Key: HIVE-16885
> URL: https://issues.apache.org/jira/browse/HIVE-16885
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16885.01.patch, HIVE-16885.patch
>
>
> FIL_24 -> MAPJOIN_23
> {code}
> hive> explain  select * from part where p_size > (select max(p_size) from 
> part group by p_type);
> Warning: Map Join MAPJOIN[14][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 3 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_26]
> Select Operator [SEL_25] (rows=110 width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_24] (rows=110 width=625)
> predicate:(_col5 > _col9)
> Map Join Operator [MAPJOIN_23] (rows=330 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
> <-Reducer 3 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_21]
> Select Operator [SEL_20] (rows=165 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_19] (rows=165 width=109)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_18]
>   PartitionCols:_col0
>   Group By Operator [GBY_17] (rows=14190 width=109)
> 
> Output:["_col0","_col1"],aggregations:["max(p_size)"],keys:p_type
> Select Operator [SEL_16] (rows=2 width=109)
>   Output:["p_type","p_size"]
>   TableScan [TS_2] (rows=2 width=109)
> 
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Select Operator [SEL_22] (rows=2 width=621)
> 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
> TableScan [TS_0] (rows=2 width=621)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16885) Non-equi Joins: Filter clauses should be pushed into the ON clause

2017-06-14 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16885:
---
Attachment: HIVE-16885.01.patch

> Non-equi Joins: Filter clauses should be pushed into the ON clause
> --
>
> Key: HIVE-16885
> URL: https://issues.apache.org/jira/browse/HIVE-16885
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16885.01.patch, HIVE-16885.patch
>
>
> FIL_24 -> MAPJOIN_23
> {code}
> hive> explain  select * from part where p_size > (select max(p_size) from 
> part group by p_type);
> Warning: Map Join MAPJOIN[14][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 3 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_26]
> Select Operator [SEL_25] (rows=110 width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_24] (rows=110 width=625)
> predicate:(_col5 > _col9)
> Map Join Operator [MAPJOIN_23] (rows=330 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
> <-Reducer 3 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_21]
> Select Operator [SEL_20] (rows=165 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_19] (rows=165 width=109)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_18]
>   PartitionCols:_col0
>   Group By Operator [GBY_17] (rows=14190 width=109)
> 
> Output:["_col0","_col1"],aggregations:["max(p_size)"],keys:p_type
> Select Operator [SEL_16] (rows=2 width=109)
>   Output:["p_type","p_size"]
>   TableScan [TS_2] (rows=2 width=109)
> 
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Select Operator [SEL_22] (rows=2 width=621)
> 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
> TableScan [TS_0] (rows=2 width=621)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client

2017-06-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049276#comment-16049276
 ] 

Hive QA commented on HIVE-14747:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12872982/HIVE-14747.01.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10817 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_decimal] 
(batchId=9)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=157)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=103)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5642/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5642/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5642/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12872982 - PreCommit-HIVE-Build

> Remove JAVA paths from profiles by sending them from ptest-client
> -
>
> Key: HIVE-14747
> URL: https://issues.apache.org/jira/browse/HIVE-14747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-14747.01.patch, HIVE-14747.02.patch
>
>
> Hive ptest uses some properties files per branch that contain information 
> about how to execute the tests.
> This profile includes JAVA paths to build and execute the tests. We should 
> get rid of these by passing such information from Jenkins to the 
> ptest-server. In case a profile needs a different java version, then we can 
> create a specific Jenkins job for that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-16357) Failed folder creation when creating a new table is reported incorrectly

2017-06-14 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16357 started by Barna Zsombor Klara.
--
> Failed folder creation when creating a new table is reported incorrectly
> 
>
> Key: HIVE-16357
> URL: https://issues.apache.org/jira/browse/HIVE-16357
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> If the directory for a Hive table could not be created, them the HMS will 
> throw a metaexception:
> {code}
>  if (tblPath != null) {
>   if (!wh.isDir(tblPath)) {
> if (!wh.mkdirs(tblPath, true)) {
>   throw new MetaException(tblPath
>   + " is not a directory or unable to create one");
> }
> madeDir = true;
>   }
> }
> {code}
> However in the finally block we always try to call the 
> DbNotificationListener, which in turn will also throw an exception because 
> the directory is missing, overwriting the initial exception with a 
> FileNotFoundException.
> Actual stacktrace seen by the caller:
> {code}
> 2017-04-03T05:58:00,128 ERROR [pool-7-thread-2] metastore.RetryingHMSHandler: 
> MetaException(message:java.lang.RuntimeException: 
> java.io.FileNotFoundException: File file:/.../0 does not exist)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:6074)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1496)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at com.sun.proxy.$Proxy28.create_table_with_environment_context(Unknown 
> Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11125)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11109)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File 
> file:/.../0 does not exist
>   at 
> org.apache.hive.hcatalog.listener.DbNotificationListener$FileIterator.(DbNotificationListener.java:203)
>   at 
> org.apache.hive.hcatalog.listener.DbNotificationListener.onCreateTable(DbNotificationListener.java:137)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1463)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1482)
>   ... 20 more
> Caused by: java.io.FileNotFoundException: File file:/.../0 does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
>   at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
>   at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:243)
>   at 
> org.apache.hadoop.fs.ProxyFileSystem.listStatus(ProxyFileSystem.java:195)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:243)
>   at 

[jira] [Assigned] (HIVE-16357) Failed folder creation when creating a new table is reported incorrectly

2017-06-14 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara reassigned HIVE-16357:
--

Assignee: Barna Zsombor Klara

> Failed folder creation when creating a new table is reported incorrectly
> 
>
> Key: HIVE-16357
> URL: https://issues.apache.org/jira/browse/HIVE-16357
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> If the directory for a Hive table could not be created, them the HMS will 
> throw a metaexception:
> {code}
>  if (tblPath != null) {
>   if (!wh.isDir(tblPath)) {
> if (!wh.mkdirs(tblPath, true)) {
>   throw new MetaException(tblPath
>   + " is not a directory or unable to create one");
> }
> madeDir = true;
>   }
> }
> {code}
> However in the finally block we always try to call the 
> DbNotificationListener, which in turn will also throw an exception because 
> the directory is missing, overwriting the initial exception with a 
> FileNotFoundException.
> Actual stacktrace seen by the caller:
> {code}
> 2017-04-03T05:58:00,128 ERROR [pool-7-thread-2] metastore.RetryingHMSHandler: 
> MetaException(message:java.lang.RuntimeException: 
> java.io.FileNotFoundException: File file:/.../0 does not exist)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:6074)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1496)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at com.sun.proxy.$Proxy28.create_table_with_environment_context(Unknown 
> Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11125)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11109)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File 
> file:/.../0 does not exist
>   at 
> org.apache.hive.hcatalog.listener.DbNotificationListener$FileIterator.(DbNotificationListener.java:203)
>   at 
> org.apache.hive.hcatalog.listener.DbNotificationListener.onCreateTable(DbNotificationListener.java:137)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1463)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1482)
>   ... 20 more
> Caused by: java.io.FileNotFoundException: File file:/.../0 does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
>   at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
>   at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1555)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:574)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:243)
>   at 
> org.apache.hadoop.fs.ProxyFileSystem.listStatus(ProxyFileSystem.java:195)
>   at 
> 

[jira] [Commented] (HIVE-16738) Notification ID generation in DBNotification might not be unique across HS2 instances.

2017-06-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049242#comment-16049242
 ] 

Sergio Peña commented on HIVE-16738:


Ah, thanks, I forgot bout the embedded metastore.  So this is issue is the same 
as HIVE-16886 then. Btw, are you working on this patch? I was thinking about 
providing a patch for HIVE-16886, but if you have some work in progress, then 
I'll wait.

> Notification ID generation in DBNotification might not be unique across HS2 
> instances.
> --
>
> Key: HIVE-16738
> URL: https://issues.apache.org/jira/browse/HIVE-16738
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> Going to explain the problem in scope of "replication" feature for hive 2 
> that is being built, as it is easier to explain:
> To allow replication to work we need to set 
> "hive.metastore.transactional.event.listeners"  to DBNotificationListener. 
> For use cases where there are multiple HiveServer2 Instances running 
> {code}
>  private void process(NotificationEvent event, ListenerEvent listenerEvent) 
> throws MetaException {
> event.setMessageFormat(msgFactory.getMessageFormat());
> synchronized (NOTIFICATION_TBL_LOCK) {
>   LOG.debug("DbNotificationListener: Processing : {}:{}", 
> event.getEventId(),
>   event.getMessage());
>   HMSHandler.getMSForConf(hiveConf).addNotificationEvent(event);
> }
>   // Set the DB_NOTIFICATION_EVENT_ID for future reference by other 
> listeners.
>   if (event.isSetEventId()) {
> listenerEvent.putParameter(
> MetaStoreEventListenerConstants.DB_NOTIFICATION_EVENT_ID_KEY_NAME,
> Long.toString(event.getEventId()));
>   }
>   }
> {code}
> the above code in DBNotificationListner having the object lock wont be 
> guarantee enough to make sure that all events get a unique id. The 
> transaction isolation level at the db "read-comitted" or "repeatable-read"  
> would  also not guarantee the same, unless a lock is at the db level 
> preferably on table {{NOTIFICATION_SEQUENCE}} which only has one row.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client

2017-06-14 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-14747:
---
Attachment: HIVE-14747.02.patch

Added comments in the second version of the patch.

> Remove JAVA paths from profiles by sending them from ptest-client
> -
>
> Key: HIVE-14747
> URL: https://issues.apache.org/jira/browse/HIVE-14747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-14747.01.patch, HIVE-14747.02.patch
>
>
> Hive ptest uses some properties files per branch that contain information 
> about how to execute the tests.
> This profile includes JAVA paths to build and execute the tests. We should 
> get rid of these by passing such information from Jenkins to the 
> ptest-server. In case a profile needs a different java version, then we can 
> create a specific Jenkins job for that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client

2017-06-14 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-14747:
---
Status: Patch Available  (was: In Progress)

> Remove JAVA paths from profiles by sending them from ptest-client
> -
>
> Key: HIVE-14747
> URL: https://issues.apache.org/jira/browse/HIVE-14747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-14747.01.patch
>
>
> Hive ptest uses some properties files per branch that contain information 
> about how to execute the tests.
> This profile includes JAVA paths to build and execute the tests. We should 
> get rid of these by passing such information from Jenkins to the 
> ptest-server. In case a profile needs a different java version, then we can 
> create a specific Jenkins job for that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client

2017-06-14 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-14747:
---
Attachment: HIVE-14747.01.patch

> Remove JAVA paths from profiles by sending them from ptest-client
> -
>
> Key: HIVE-14747
> URL: https://issues.apache.org/jira/browse/HIVE-14747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-14747.01.patch
>
>
> Hive ptest uses some properties files per branch that contain information 
> about how to execute the tests.
> This profile includes JAVA paths to build and execute the tests. We should 
> get rid of these by passing such information from Jenkins to the 
> ptest-server. In case a profile needs a different java version, then we can 
> create a specific Jenkins job for that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client

2017-06-14 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-14747 started by Barna Zsombor Klara.
--
> Remove JAVA paths from profiles by sending them from ptest-client
> -
>
> Key: HIVE-14747
> URL: https://issues.apache.org/jira/browse/HIVE-14747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
>
> Hive ptest uses some properties files per branch that contain information 
> about how to execute the tests.
> This profile includes JAVA paths to build and execute the tests. We should 
> get rid of these by passing such information from Jenkins to the 
> ptest-server. In case a profile needs a different java version, then we can 
> create a specific Jenkins job for that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client

2017-06-14 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara reassigned HIVE-14747:
--

Assignee: Barna Zsombor Klara

> Remove JAVA paths from profiles by sending them from ptest-client
> -
>
> Key: HIVE-14747
> URL: https://issues.apache.org/jira/browse/HIVE-14747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
>
> Hive ptest uses some properties files per branch that contain information 
> about how to execute the tests.
> This profile includes JAVA paths to build and execute the tests. We should 
> get rid of these by passing such information from Jenkins to the 
> ptest-server. In case a profile needs a different java version, then we can 
> create a specific Jenkins job for that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16865) Handle replication bootstrap of large databases

2017-06-14 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048991#comment-16048991
 ] 

anishek edited comment on HIVE-16865 at 6/14/17 9:59 AM:
-

h4. Bootstrap Replication Dump
* db metadata one at a time
* function metadata one at a time
* get all tableNames — which should not be overwhelming 
* tables is also requested one a time. — all partition definitions are loaded 
in one go for a table and we dont expect a table with more than 5-10 partition 
definition columns
* the partition object  themselves will be done in batches via the 
PartitionIteratable
* only problem seems to be when writing data to _files where in we load all the 
file status objects per partition ( for partitioned tables) and per table 
otherwise , in memory. this might lead  OOM cases :: decision : this is not a 
problem as for split computation we will do the same, where we have not faced 
this issue.
* we create replCopyTask that will create the _files for all tables / 
partitions etc during analysis time and then go to execution engine, this will  
lead to lot of objects stored in memory given the above scale targets. 
*possibly* 
** move the dump enclosed in a task itself which manage its own thread pools to 
subsequently analyze/dump tables in execution phase, this will lead to possible 
blurring of demarcation of execution vs analysis phase within hive. 
** Another mode might be to provide lazy incremental task, from analysis to 
execution phase, such that both phases run simultaneously rather than one 
completing before another is started, this will lead to significant change in 
code to allow the same and currently only seems to be required only for 
replication.
** we might have to do the same for _incremental replication dump_ too as the 
_*from*_ and _*to*_ event ids might have millions of events will all of them 
being inserts, though the creation of _files is handled differently here where 
in we write the files along with metadata, we should be able to do the same for 
bootstrap replication also rather than creating replcopy task. this would mean 
the  replCopyTask should effectively be only used during load time. The only 
problem using this approach is that since the process is single threaded we are 
going to dump data sequentially and it might take long time, unless we do some 
threading in ReplicationSemanticAnalyzer to dump tables with some parallel 
since there is no dependency between tables when dumping them, a similar 
approach might be required for partitions also within tables. 

h4.Bootstrap Replication Load
* list all the table metadata files per db. For massive databases we will load 
a per above on the order of a million filestatus objects in memory. This seems 
to significant higher order of objects loaded than probably during split 
computation and hence might need to look at it.  most probably move to 
{code}org.apache.hadoop.fs.RemoteIterator  
listFiles(Path f, boolean recursive){code}
* a task will be created for each type of operation, in case of bootstrap one 
task per table / partition / function /database, hence we will encounter the 
last problem in _*Bootstrap Replication Dump*_ 



h4. Additional thoughts
* Since there can be multiple instance of metastores, from an integration w.r.t 
beacon for replication, would it be better to have a dedicated metastore 
instance for replication related workload(at least for bootstrap), since the 
execution of tasks will take place on the metastore instance it might be better 
served for the customer to have one metastore for replication and others to 
handle normal workloads. This can be achieved, I think, based on how the URL's 
are configured on HS2 client/orchestration engine of replication . 
* On calling distcp in replcopytask can we log the sourcepath to destpath else 
if there are problems during copying we wont know the actual paths.
* On replica warehouse since replication tasks will run alongside normal 
execution of other hive tasks assuming there are multiple db's on replica, how 
do we constraint resource allocation for replication vs normal task ? how do we 
manage this such that we dont lag behind replication significantly ? 



was (Author: anishek):
h4. Bootstrap Replication Dump
* db metadata one at a time
* function metadata one at a time
* get all tableNames — which should not be overwhelming 
* tables is also requested one a time. — all partition definitions are loaded 
in one go for a table and we dont expect a table with more than 5-10 partition 
definition columns
* the partition object  themselves will be done in batches via the 
PartitionIteratable
* only problem seems to be when writing data to _files where in we load all the 
file status objects per partition ( for partitioned tables) and per table 
otherwise , in memory. this might lead  OOM cases :: decision : this is not a 

[jira] [Commented] (HIVE-16865) Handle replication bootstrap of large databases

2017-06-14 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048991#comment-16048991
 ] 

anishek commented on HIVE-16865:


h4. Bootstrap Replication Dump
* db metadata one at a time
* function metadata one at a time
* get all tableNames — which should not be overwhelming 
* tables is also requested one a time. — all partition definitions are loaded 
in one go for a table and we dont expect a table with more than 5-10 partition 
definition columns
* the partition object  themselves will be done in batches via the 
PartitionIteratable
* only problem seems to be when writing data to _files where in we load all the 
file status objects per partition ( for partitioned tables) and per table 
otherwise , in memory. this might lead  OOM cases :: decision : this is not a 
problem as for split computation we will do the same, where we have not faced 
this issue.
* we create replCopyTask that will create the _files for all tables / 
partitions etc during analysis time and then go to execution engine, this will  
lead to lot of objects stored in memory given the above scale targets. 
*possibly* 
** move the dump enclosed in a task itself which manage its own thread pools to 
subsequently analyze/dump tables in execution phase, this will lead to possible 
blurring of demarcation of execution vs analysis phase within hive. 
** Another mode might be to provide lazy incremental task, from analysis to 
execution phase, such that both phases run simultaneously rather than one 
completing before another is started, this will lead to significant change in 
code to allow the same and currently only seems to be required only for 
replication.
** we might have to do the same for _incremental replication dump_ too as the 
_*from*_ and _*to*_ event ids might have millions of events will all of them 
being inserts, though the creation of _files is handled differently here where 
in we write the files along with metadata, we should be able to do the same for 
bootstrap replication also rather than creating replcopy task. this would mean 
the  replCopyTask should effectively be only used during load time. The only 
problem using this approach is that since the process is single threaded we are 
going to dump data sequentially and it might take long time, unless we do some 
threading in ReplicationSemanticAnalyzer to dump tables with some parallel 
since there is no dependency between tables when dumping them, a similar 
approach might be required for partitions also within tables. 

h4.Bootstrap Replication Load
* list all the table metadata files per db. For massive databases we will load 
a per above on the order of a million filestatus objects in memory. This seems 
to significant higher order of objects loaded than probably during split 
computation and hence might need to look at it.  most probably move to 
{code}org.apache.hadoop.fs.RemoteIterator  
listFiles(Path f, boolean recursive){code}
* a task will be created for each type of operation, in case of bootstrap one 
task per table / partition / function /database, hence we will encounter the 
last problem in _*Bootstrap Replication Dump*_ 



h4. Additional thoughts
* Since there can be multiple instance of metastores, from an integration w.r.t 
beacon for replication, would it be better to have a dedicated metastore 
instance for replication related workload(at least for bootstrap), since the 
execution of tasks will take place on the metastore instance it might be better 
served for the customer to have one metastore for replication and others to 
handle normal workloads. This can be achieved, I think, based on how the URL's 
are configured on HS2/beacon side. 
* On calling distcp in replcopytask can we log the sourcepath to destpath else 
if there are problems during copying we wont know the actual paths.
* On replica warehouse since replication tasks will run alongside normal 
execution of other hive tasks assuming there are multiple db's on replica, how 
do we constraint resource allocation for replication vs normal task ? how do we 
manage this such that we dont lag behind replication significantly ? 


> Handle replication bootstrap of large databases
> ---
>
> Key: HIVE-16865
> URL: https://issues.apache.org/jira/browse/HIVE-16865
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> for larger databases make sure that we can handle replication bootstrap.
> * Assuming large database can have close to million tables or a few tables 
> with few hundred thousand partitions. 
> *  for function replication if a primary warehouse has large number of custom 
> functions defined such that the same binary file in 

[jira] [Updated] (HIVE-16900) optimization to give distcp a list of input files to copy to a destination target directory during repl load

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16900:
---
Description: 
During repl Copy currently we only allow operations per file as against list of 
files supported by distcp, During bootstrap table/partitions load it will be 
great to load all files listed in {noformat}_files{noformat} in a single distcp 
job to make it more efficient, this would require changes to the _shims_ sub 
project in hive to additionally expose api's which take multiple source files.


  was:
During repl Copy currently we only allow operations per file as against list of 
files supported by distcp, During bootstrap table/partitions load it will be 
great to load all files listed in _files in a single distcp job to make it more 
efficient, this would require changes to the _shims_ sub project in hive to 
additionally expose api's which take multiple source files.



> optimization to give distcp a list of input files to copy to a destination 
> target directory during repl load
> 
>
> Key: HIVE-16900
> URL: https://issues.apache.org/jira/browse/HIVE-16900
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> During repl Copy currently we only allow operations per file as against list 
> of files supported by distcp, During bootstrap table/partitions load it will 
> be great to load all files listed in {noformat}_files{noformat} in a single 
> distcp job to make it more efficient, this would require changes to the 
> _shims_ sub project in hive to additionally expose api's which take multiple 
> source files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16900) optimization to give distcp a list of input files to copy to a destination target directory during repl load

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16900:
---
Summary: optimization to give distcp a list of input files to copy to a 
destination target directory during repl load  (was: optimization to give 
distcp a list of input files to copy to a destination target directory)

> optimization to give distcp a list of input files to copy to a destination 
> target directory during repl load
> 
>
> Key: HIVE-16900
> URL: https://issues.apache.org/jira/browse/HIVE-16900
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> During repl Copy currently we only allow operations per file as against list 
> of files supported by distcp, During bootstrap table/partitions load it will 
> be great to load all files listed in _files in a single distcp job to make it 
> more efficient, this would require changes to the _shims_ sub project in hive 
> to additionally expose api's which take multiple source files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16900) optimization to give distcp a list of input files to copy to a destination target directory

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek reassigned HIVE-16900:
--


> optimization to give distcp a list of input files to copy to a destination 
> target directory
> ---
>
> Key: HIVE-16900
> URL: https://issues.apache.org/jira/browse/HIVE-16900
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> During repl Copy currently we only allow operations per file as against list 
> of files supported by distcp, During bootstrap table/partitions load it will 
> be great to load all files listed in _files in a single distcp job to make it 
> more efficient, this would require changes to the _shims_ sub project in hive 
> to additionally expose api's which take multiple source files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16895) Multi-threaded execution of bootstrap dump of partitions / functions

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16895:
---
Issue Type: Sub-task  (was: Improvement)
Parent: HIVE-16865

>  Multi-threaded execution of bootstrap dump of partitions / functions
> -
>
> Key: HIVE-16895
> URL: https://issues.apache.org/jira/browse/HIVE-16895
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> to allow faster execution of bootstrap dump phase we dump multiple partitions 
> from same table simultaneously. 
> even though dumping  functions is  not going to be a blocker, moving to 
> similar execution modes for all metastore objects will make code more 
> coherent. 
> Bootstrap dump at db level does :
> * boostrap of all tables
> ** boostrap of all partitions in a table.  (scope of current jira) 
> * boostrap of all functions (scope of current jira) 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16897) repl load does not lead to excessive memory consumption for multiple functions from same binary jar

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16897:
---
Issue Type: Sub-task  (was: Improvement)
Parent: HIVE-16865

> repl load does not lead to excessive memory consumption for multiple 
> functions from same binary  jar
> 
>
> Key: HIVE-16897
> URL: https://issues.apache.org/jira/browse/HIVE-16897
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> as part of function replication we currently keep a separate copy of the 
> binary jar associated with the function ( this should be same on the primary 
> warehouse also since each hdfs jar location given during creation of function 
> will download the resource in a separate resource location thus leading to 
> the same jar being included in class path multiple times)
> this will lead to excessive space used to keep all jars in classpath, solve 
> this by identifying the common binary jar ( using checksum from primary on 
> replica) and not creating multiple copies thus preventing excessive memory 
> usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16896:
---
Issue Type: Sub-task  (was: Improvement)
Parent: HIVE-16865

> move replication load related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16896
> URL: https://issues.apache.org/jira/browse/HIVE-16896
> Project: Hive
>  Issue Type: Sub-task
>Reporter: anishek
>Assignee: anishek
>
> we want to not create too many tasks in memory in the analysis phase while 
> loading data. Currently we load all the files in the bootstrap dump location 
> as {{FileStatus[]}} and then iterate over it to load objects, we should 
> rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIteratorlistFiles(Path 
> f, boolean recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => 
> execution phase, we are going to move the whole repl load functionality to 
> execution phase so we can better control creation/execution of tasks (not 
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to 
> see if we want to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16893) move replication dump related work in semantic analysis phase to execution phase using a task

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16893:
---
Issue Type: Sub-task  (was: Improvement)
Parent: HIVE-16865

> move replication dump related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16893
> URL: https://issues.apache.org/jira/browse/HIVE-16893
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> Since we run in to the possibility of creating a large number tasks during 
> replication bootstrap dump
> * we may not be able to hold all of them in memory for really large 
> databases, which might not hold true once we complete HIVE-16892
> * Also a compile time lock is taken such that only one query is run in this 
> phase which in replication bootstrap scenario is going to be a very long 
> running task and hence moving it to execution phase will limit the lock 
> period in compile phase.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16894) Multi-threaded execution of bootstrap dump of tables.

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16894:
---
Issue Type: Sub-task  (was: Improvement)
Parent: HIVE-16865

> Multi-threaded execution of bootstrap dump of tables. 
> --
>
> Key: HIVE-16894
> URL: https://issues.apache.org/jira/browse/HIVE-16894
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> after completing HIVE-16893 the bootstrap process will dump single table at a 
> time and hence will be very time consuming while not optimally utilizing the 
> available resources. Since there is no dependency between dumps of various 
> tables we should be able to do this in parallel.
> Bootstrap dump at db level does :
> * boostrap of all tables (scope of current jira) 
> ** boostrap of all partitions in a table. 
> * boostrap of all functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16892) Move creation of _files from ReplCopyTask to analysis phase for boostrap replication

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16892:
---
Issue Type: Sub-task  (was: Improvement)
Parent: HIVE-16865

> Move creation of _files from ReplCopyTask to analysis phase for boostrap 
> replication 
> -
>
> Key: HIVE-16892
> URL: https://issues.apache.org/jira/browse/HIVE-16892
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> during replication boostrap we create the _files via ReplCopyTask for 
> partitions and tables, this can be done inline as part of analysis phase 
> rather than creating the replCopytask,
> This is done to prevent creation of huge number of these tasks in memory 
> before giving it to the execution engine. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16898) Validation of file after distcp in repl load

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek reassigned HIVE-16898:
--


> Validation of file after distcp in repl load 
> -
>
> Key: HIVE-16898
> URL: https://issues.apache.org/jira/browse/HIVE-16898
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> time between deciding the source and destination path for distcp to invoking 
> of distcp can have a change of the source file, hence distcp might copy the 
> wrong file to destination, hence we should an additional check on the 
> checksum of the source file path after distcp finishes to make sure the path 
> didnot change during the copy process. if it has take additional steps to 
> delete the previous file on destination and copy the new source and repeat 
> the same process as above till we copy the correct file. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16898) Validation of source file after distcp in repl load

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16898:
---
Summary: Validation of source file after distcp in repl load   (was: 
Validation of file after distcp in repl load )

> Validation of source file after distcp in repl load 
> 
>
> Key: HIVE-16898
> URL: https://issues.apache.org/jira/browse/HIVE-16898
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> time between deciding the source and destination path for distcp to invoking 
> of distcp can have a change of the source file, hence distcp might copy the 
> wrong file to destination, hence we should an additional check on the 
> checksum of the source file path after distcp finishes to make sure the path 
> didnot change during the copy process. if it has take additional steps to 
> delete the previous file on destination and copy the new source and repeat 
> the same process as above till we copy the correct file. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16897) repl load does not lead to excessive memory consumption for multiple functions from same binary jar

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek reassigned HIVE-16897:
--


> repl load does not lead to excessive memory consumption for multiple 
> functions from same binary  jar
> 
>
> Key: HIVE-16897
> URL: https://issues.apache.org/jira/browse/HIVE-16897
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> as part of function replication we currently keep a separate copy of the 
> binary jar associated with the function ( this should be same on the primary 
> warehouse also since each hdfs jar location given during creation of function 
> will download the resource in a separate resource location thus leading to 
> the same jar being included in class path multiple times)
> this will lead to excessive space used to keep all jars in classpath, solve 
> this by identifying the common binary jar ( using checksum from primary on 
> replica) and not creating multiple copies thus preventing excessive memory 
> usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek reassigned HIVE-16896:
--


> move replication load related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16896
> URL: https://issues.apache.org/jira/browse/HIVE-16896
> Project: Hive
>  Issue Type: Improvement
>Reporter: anishek
>Assignee: anishek
>
> we want to not create too many tasks in memory in the analysis phase while 
> loading data. Currently we load all the files in the bootstrap dump location 
> as {{FileStatus[]}} and then iterate over it to load objects, we should 
> rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIteratorlistFiles(Path 
> f, boolean recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => 
> execution phase, we are going to move the whole repl load functionality to 
> execution phase so we can better control creation/execution of tasks (not 
> related to hive {{Task}}, we may get rid of ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to 
> see if we want to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16876) RpcServer should be re-created when Rpc configs change

2017-06-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048890#comment-16048890
 ] 

Hive QA commented on HIVE-16876:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12872937/HIVE-16876.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10831 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5641/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5641/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5641/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12872937 - PreCommit-HIVE-Build

> RpcServer should be re-created when Rpc configs change
> --
>
> Key: HIVE-16876
> URL: https://issues.apache.org/jira/browse/HIVE-16876
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-16876.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16893) move replication dump related work in semantic analysis phase to execution phase using a task

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16893:
---
Summary: move replication dump related work in semantic analysis phase to 
execution phase using a task  (was: move replication related work in semantic 
analysis phase to execution phase using a task)

> move replication dump related work in semantic analysis phase to execution 
> phase using a task
> -
>
> Key: HIVE-16893
> URL: https://issues.apache.org/jira/browse/HIVE-16893
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> Since we run in to the possibility of creating a large number tasks during 
> replication bootstrap dump
> * we may not be able to hold all of them in memory for really large 
> databases, which might not hold true once we complete HIVE-16892
> * Also a compile time lock is taken such that only one query is run in this 
> phase which in replication bootstrap scenario is going to be a very long 
> running task and hence moving it to execution phase will limit the lock 
> period in compile phase.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16894) Multi-threaded execution of bootstrap dump of tables.

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16894:
---
Description: 
after completing HIVE-16893 the bootstrap process will dump single table at a 
time and hence will be very time consuming while not optimally utilizing the 
available resources. Since there is no dependency between dumps of various 
tables we should be able to do this in parallel.

Bootstrap dump at db level does :
* boostrap of all tables (scope of current jira) 
** boostrap of all partitions in a table. 
* boostrap of all functions


  was:
after completing HIVE-16893 the bootstrap process will dump single table at a 
time and hence will be very time consuming while not optimally utilizing the 
available resources. 

Bootstrap dump at db level does :
* boostrap of all tables (scope of current jira) 
** boostrap of all partitions in a table. 
* boostrap of all functions



> Multi-threaded execution of bootstrap dump of tables. 
> --
>
> Key: HIVE-16894
> URL: https://issues.apache.org/jira/browse/HIVE-16894
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> after completing HIVE-16893 the bootstrap process will dump single table at a 
> time and hence will be very time consuming while not optimally utilizing the 
> available resources. Since there is no dependency between dumps of various 
> tables we should be able to do this in parallel.
> Bootstrap dump at db level does :
> * boostrap of all tables (scope of current jira) 
> ** boostrap of all partitions in a table. 
> * boostrap of all functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16895) Multi-threaded execution of bootstrap dump of partitions / functions

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek reassigned HIVE-16895:
--


>  Multi-threaded execution of bootstrap dump of partitions / functions
> -
>
> Key: HIVE-16895
> URL: https://issues.apache.org/jira/browse/HIVE-16895
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> to allow faster execution of bootstrap dump phase we dump multiple partitions 
> from same table simultaneously. 
> even though dumping  functions is  not going to be a blocker, moving to 
> similar execution modes for all metastore objects will make code more 
> coherent. 
> Bootstrap dump at db level does :
> * boostrap of all tables
> ** boostrap of all partitions in a table.  (scope of current jira) 
> * boostrap of all functions (scope of current jira) 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16600) Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel order by in multi_insert cases

2017-06-14 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-16600:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks [~kellyzly] for the contribution!

> Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel 
> order by in multi_insert cases
> 
>
> Key: HIVE-16600
> URL: https://issues.apache.org/jira/browse/HIVE-16600
> Project: Hive
>  Issue Type: Sub-task
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: 3.0.0
>
> Attachments: HIVE-16600.10.patch, HIVE-16600.11.patch, 
> HIVE-16600.12.patch, HIVE-16600.13.patch, HIVE-16600.1.patch, 
> HIVE-16600.2.patch, HIVE-16600.3.patch, HIVE-16600.4.patch, 
> HIVE-16600.5.patch, HIVE-16600.6.patch, HIVE-16600.7.patch, 
> HIVE-16600.8.patch, HIVE-16600.9.patch, mr.explain, 
> mr.explain.log.HIVE-16600, Node.java, 
> TestSetSparkReduceParallelism_MultiInsertCase.java
>
>
> multi_insert_gby.case.q
> {code}
> set hive.exec.reducers.bytes.per.reducer=256;
> set hive.optimize.sampling.orderby=true;
> drop table if exists e1;
> drop table if exists e2;
> create table e1 (key string, value string);
> create table e2 (key string);
> FROM (select key, cast(key as double) as keyD, value from src order by key) a
> INSERT OVERWRITE TABLE e1
> SELECT key, value
> INSERT OVERWRITE TABLE e2
> SELECT key;
> select * from e1;
> select * from e2;
> {code} 
> the parallelism of Sort is 1 even we enable parallel order 
> by("hive.optimize.sampling.orderby" is set as "true").  This is not 
> reasonable because the parallelism  should be calcuated by  
> [Utilities.estimateReducers|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L170]
> this is because SetSparkReducerParallelism#needSetParallelism returns false 
> when [children size of 
> RS|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207]
>  is greater than 1.
> in this case, the children size of {{RS[2]}} is two.
> the logical plan of the case
> {code}
>TS[0]-SEL[1]-RS[2]-SEL[3]-SEL[4]-FS[5]
> -SEL[6]-FS[7]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16894) Multi-threaded execution of bootstrap dump of tables.

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek reassigned HIVE-16894:
--


> Multi-threaded execution of bootstrap dump of tables. 
> --
>
> Key: HIVE-16894
> URL: https://issues.apache.org/jira/browse/HIVE-16894
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> after completing HIVE-16893 the bootstrap process will dump single table at a 
> time and hence will be very time consuming while not optimally utilizing the 
> available resources. 
> Bootstrap dump at db level does :
> * boostrap of all tables (scope of current jira) 
> ** boostrap of all partitions in a table. 
> * boostrap of all functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16893) move replication related work in semantic analysis phase to execution phase using a task

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek reassigned HIVE-16893:
--


> move replication related work in semantic analysis phase to execution phase 
> using a task
> 
>
> Key: HIVE-16893
> URL: https://issues.apache.org/jira/browse/HIVE-16893
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> Since we run in to the possibility of creating a large number tasks during 
> replication bootstrap dump
> * we may not be able to hold all of them in memory for really large 
> databases, which might not hold true once we complete HIVE-16892
> * Also a compile time lock is taken such that only one query is run in this 
> phase which in replication bootstrap scenario is going to be a very long 
> running task and hence moving it to execution phase will limit the lock 
> period in compile phase.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16892) Move creation of _files from ReplCopyTask to analysis phase for boostrap replication

2017-06-14 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek reassigned HIVE-16892:
--


> Move creation of _files from ReplCopyTask to analysis phase for boostrap 
> replication 
> -
>
> Key: HIVE-16892
> URL: https://issues.apache.org/jira/browse/HIVE-16892
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 3.0.0
>
>
> during replication boostrap we create the _files via ReplCopyTask for 
> partitions and tables, this can be done inline as part of analysis phase 
> rather than creating the replCopytask,
> This is done to prevent creation of huge number of these tasks in memory 
> before giving it to the execution engine. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16600) Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel order by in multi_insert cases

2017-06-14 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048850#comment-16048850
 ] 

liyunzhang_intel commented on HIVE-16600:
-

[~lirui]: really thanks for review.

> Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel 
> order by in multi_insert cases
> 
>
> Key: HIVE-16600
> URL: https://issues.apache.org/jira/browse/HIVE-16600
> Project: Hive
>  Issue Type: Sub-task
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16600.10.patch, HIVE-16600.11.patch, 
> HIVE-16600.12.patch, HIVE-16600.13.patch, HIVE-16600.1.patch, 
> HIVE-16600.2.patch, HIVE-16600.3.patch, HIVE-16600.4.patch, 
> HIVE-16600.5.patch, HIVE-16600.6.patch, HIVE-16600.7.patch, 
> HIVE-16600.8.patch, HIVE-16600.9.patch, mr.explain, 
> mr.explain.log.HIVE-16600, Node.java, 
> TestSetSparkReduceParallelism_MultiInsertCase.java
>
>
> multi_insert_gby.case.q
> {code}
> set hive.exec.reducers.bytes.per.reducer=256;
> set hive.optimize.sampling.orderby=true;
> drop table if exists e1;
> drop table if exists e2;
> create table e1 (key string, value string);
> create table e2 (key string);
> FROM (select key, cast(key as double) as keyD, value from src order by key) a
> INSERT OVERWRITE TABLE e1
> SELECT key, value
> INSERT OVERWRITE TABLE e2
> SELECT key;
> select * from e1;
> select * from e2;
> {code} 
> the parallelism of Sort is 1 even we enable parallel order 
> by("hive.optimize.sampling.orderby" is set as "true").  This is not 
> reasonable because the parallelism  should be calcuated by  
> [Utilities.estimateReducers|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L170]
> this is because SetSparkReducerParallelism#needSetParallelism returns false 
> when [children size of 
> RS|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207]
>  is greater than 1.
> in this case, the children size of {{RS[2]}} is two.
> the logical plan of the case
> {code}
>TS[0]-SEL[1]-RS[2]-SEL[3]-SEL[4]-FS[5]
> -SEL[6]-FS[7]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-06-14 Thread Alexander Kolbasov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048842#comment-16048842
 ] 

Alexander Kolbasov commented on HIVE-16886:
---

A reasonable way to make it unique is 

* Declare a unique constraint
* Add a new value into a separate row

It would be good to have some retry transaction logic, but it may be a bigger 
change. 

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16600) Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel order by in multi_insert cases

2017-06-14 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048841#comment-16048841
 ] 

Rui Li commented on HIVE-16600:
---

+1

> Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel 
> order by in multi_insert cases
> 
>
> Key: HIVE-16600
> URL: https://issues.apache.org/jira/browse/HIVE-16600
> Project: Hive
>  Issue Type: Sub-task
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16600.10.patch, HIVE-16600.11.patch, 
> HIVE-16600.12.patch, HIVE-16600.13.patch, HIVE-16600.1.patch, 
> HIVE-16600.2.patch, HIVE-16600.3.patch, HIVE-16600.4.patch, 
> HIVE-16600.5.patch, HIVE-16600.6.patch, HIVE-16600.7.patch, 
> HIVE-16600.8.patch, HIVE-16600.9.patch, mr.explain, 
> mr.explain.log.HIVE-16600, Node.java, 
> TestSetSparkReduceParallelism_MultiInsertCase.java
>
>
> multi_insert_gby.case.q
> {code}
> set hive.exec.reducers.bytes.per.reducer=256;
> set hive.optimize.sampling.orderby=true;
> drop table if exists e1;
> drop table if exists e2;
> create table e1 (key string, value string);
> create table e2 (key string);
> FROM (select key, cast(key as double) as keyD, value from src order by key) a
> INSERT OVERWRITE TABLE e1
> SELECT key, value
> INSERT OVERWRITE TABLE e2
> SELECT key;
> select * from e1;
> select * from e2;
> {code} 
> the parallelism of Sort is 1 even we enable parallel order 
> by("hive.optimize.sampling.orderby" is set as "true").  This is not 
> reasonable because the parallelism  should be calcuated by  
> [Utilities.estimateReducers|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L170]
> this is because SetSparkReducerParallelism#needSetParallelism returns false 
> when [children size of 
> RS|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207]
>  is greater than 1.
> in this case, the children size of {{RS[2]}} is two.
> the logical plan of the case
> {code}
>TS[0]-SEL[1]-RS[2]-SEL[3]-SEL[4]-FS[5]
> -SEL[6]-FS[7]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16600) Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel order by in multi_insert cases

2017-06-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048827#comment-16048827
 ] 

Hive QA commented on HIVE-16600:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12872933/HIVE-16600.13.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10817 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=103)
org.apache.hadoop.hive.metastore.TestMetaStoreAuthorization.testMetaStoreAuthorization
 (batchId=205)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5640/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5640/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5640/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12872933 - PreCommit-HIVE-Build

> Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel 
> order by in multi_insert cases
> 
>
> Key: HIVE-16600
> URL: https://issues.apache.org/jira/browse/HIVE-16600
> Project: Hive
>  Issue Type: Sub-task
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16600.10.patch, HIVE-16600.11.patch, 
> HIVE-16600.12.patch, HIVE-16600.13.patch, HIVE-16600.1.patch, 
> HIVE-16600.2.patch, HIVE-16600.3.patch, HIVE-16600.4.patch, 
> HIVE-16600.5.patch, HIVE-16600.6.patch, HIVE-16600.7.patch, 
> HIVE-16600.8.patch, HIVE-16600.9.patch, mr.explain, 
> mr.explain.log.HIVE-16600, Node.java, 
> TestSetSparkReduceParallelism_MultiInsertCase.java
>
>
> multi_insert_gby.case.q
> {code}
> set hive.exec.reducers.bytes.per.reducer=256;
> set hive.optimize.sampling.orderby=true;
> drop table if exists e1;
> drop table if exists e2;
> create table e1 (key string, value string);
> create table e2 (key string);
> FROM (select key, cast(key as double) as keyD, value from src order by key) a
> INSERT OVERWRITE TABLE e1
> SELECT key, value
> INSERT OVERWRITE TABLE e2
> SELECT key;
> select * from e1;
> select * from e2;
> {code} 
> the parallelism of Sort is 1 even we enable parallel order 
> by("hive.optimize.sampling.orderby" is set as "true").  This is not 
> reasonable because the parallelism  should be calcuated by  
> [Utilities.estimateReducers|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L170]
> this is because SetSparkReducerParallelism#needSetParallelism returns false 
> when [children size of 
> RS|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207]
>  is greater than 1.
> in this case, the children size of {{RS[2]}} is two.
> the logical plan of the case
> {code}
>TS[0]-SEL[1]-RS[2]-SEL[3]-SEL[4]-FS[5]
> -SEL[6]-FS[7]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16876) RpcServer should be re-created when Rpc configs change

2017-06-14 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-16876:
--
Status: Patch Available  (was: Open)

> RpcServer should be re-created when Rpc configs change
> --
>
> Key: HIVE-16876
> URL: https://issues.apache.org/jira/browse/HIVE-16876
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-16876.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16876) RpcServer should be re-created when Rpc configs change

2017-06-14 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-16876:
--
Attachment: HIVE-16876.1.patch

I've made all Rpc configs immutable at runtime.

> RpcServer should be re-created when Rpc configs change
> --
>
> Key: HIVE-16876
> URL: https://issues.apache.org/jira/browse/HIVE-16876
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-16876.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16600) Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel order by in multi_insert cases

2017-06-14 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-16600:

Attachment: HIVE-16600.13.patch

[~lirui]: thanks for review. update HIVE-16600.13.patch here and on RB 
according to last round of review.



> Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel 
> order by in multi_insert cases
> 
>
> Key: HIVE-16600
> URL: https://issues.apache.org/jira/browse/HIVE-16600
> Project: Hive
>  Issue Type: Sub-task
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16600.10.patch, HIVE-16600.11.patch, 
> HIVE-16600.12.patch, HIVE-16600.13.patch, HIVE-16600.1.patch, 
> HIVE-16600.2.patch, HIVE-16600.3.patch, HIVE-16600.4.patch, 
> HIVE-16600.5.patch, HIVE-16600.6.patch, HIVE-16600.7.patch, 
> HIVE-16600.8.patch, HIVE-16600.9.patch, mr.explain, 
> mr.explain.log.HIVE-16600, Node.java, 
> TestSetSparkReduceParallelism_MultiInsertCase.java
>
>
> multi_insert_gby.case.q
> {code}
> set hive.exec.reducers.bytes.per.reducer=256;
> set hive.optimize.sampling.orderby=true;
> drop table if exists e1;
> drop table if exists e2;
> create table e1 (key string, value string);
> create table e2 (key string);
> FROM (select key, cast(key as double) as keyD, value from src order by key) a
> INSERT OVERWRITE TABLE e1
> SELECT key, value
> INSERT OVERWRITE TABLE e2
> SELECT key;
> select * from e1;
> select * from e2;
> {code} 
> the parallelism of Sort is 1 even we enable parallel order 
> by("hive.optimize.sampling.orderby" is set as "true").  This is not 
> reasonable because the parallelism  should be calcuated by  
> [Utilities.estimateReducers|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L170]
> this is because SetSparkReducerParallelism#needSetParallelism returns false 
> when [children size of 
> RS|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207]
>  is greater than 1.
> in this case, the children size of {{RS[2]}} is two.
> the logical plan of the case
> {code}
>TS[0]-SEL[1]-RS[2]-SEL[3]-SEL[4]-FS[5]
> -SEL[6]-FS[7]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode

2017-06-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048772#comment-16048772
 ] 

Hive QA commented on HIVE-16821:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12872928/HIVE-16821.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 10831 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat6]
 (batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[setop_no_distinct] 
(batchId=76)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=232)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=216)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=216)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5639/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5639/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5639/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12872928 - PreCommit-HIVE-Build

> Vectorization: support Explain Analyze in vectorized mode
> -
>
> Key: HIVE-16821
> URL: https://issues.apache.org/jira/browse/HIVE-16821
> Project: Hive
>  Issue Type: Bug
>  Components: Diagnosability, Vectorization
>Affects Versions: 2.1.1, 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-16821.1.patch, HIVE-16821.2.patch, 
> HIVE-16821.2.patch, HIVE-16821.3.patch
>
>
> Currently, to avoid a branch in the operator inner loop - the runtime stats 
> are only available in non-vector mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16864) add validation to stream position search in LLAP IO

2017-06-14 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048764#comment-16048764
 ] 

Prasanth Jayachandran commented on HIVE-16864:
--

orc_ppd_basic test failure related?

> add validation to stream position search in LLAP IO
> ---
>
> Key: HIVE-16864
> URL: https://issues.apache.org/jira/browse/HIVE-16864
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16864.01.patch, HIVE-16864.patch
>
>
> There's a TODO there to add the checks. We've seen some issues before where 
> incorrect ranges lead to obscure errors after this method returns a bad 
> result due to absence of validity checks; we also see one now.
> Adding the checks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)