date:20170828

[jira] [Updated] (HIVE-17405) HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT

2017-08-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17405:

Attachment: HIVE-17405.1.patch

> HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT
> -
>
> Key: HIVE-17405
> URL: https://issues.apache.org/jira/browse/HIVE-17405
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17405.1.patch
>
>
> In {{SparkCompiler#runDynamicPartitionPruning}} we should change {{new 
> ConstantPropagate().transform(parseContext)}} to {{new 
> ConstantPropagate(ConstantPropagateOption.SHORTCUT).transform(parseContext)}}
> Hive-on-Tez does the same thing.
> Running the full constant propagation isn't really necessary, we just want to 
> eliminate any {{and true}} predicates that were introduced by 
> {{SyntheticJoinPredicate}} and {{DynamicPartitionPruningOptimization}}. The 
> {{SyntheticJoinPredicate}} will introduce dummy filter predicates into the 
> operator tree, and {{DynamicPartitionPruningOptimization}} will replace them. 
> The predicates introduced via {{SyntheticJoinPredicate}} are necessary to 
> help {{DynamicPartitionPruningOptimization}} determine if DPP can be used or 
> not.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17405) HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT

2017-08-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17405:

Status: Patch Available  (was: Open)

> HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT
> -
>
> Key: HIVE-17405
> URL: https://issues.apache.org/jira/browse/HIVE-17405
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17405.1.patch
>
>
> In {{SparkCompiler#runDynamicPartitionPruning}} we should change {{new 
> ConstantPropagate().transform(parseContext)}} to {{new 
> ConstantPropagate(ConstantPropagateOption.SHORTCUT).transform(parseContext)}}
> Hive-on-Tez does the same thing.
> Running the full constant propagation isn't really necessary, we just want to 
> eliminate any {{and true}} predicates that were introduced by 
> {{SyntheticJoinPredicate}} and {{DynamicPartitionPruningOptimization}}. The 
> {{SyntheticJoinPredicate}} will introduce dummy filter predicates into the 
> operator tree, and {{DynamicPartitionPruningOptimization}} will replace them. 
> The predicates introduced via {{SyntheticJoinPredicate}} are necessary to 
> help {{DynamicPartitionPruningOptimization}} determine if DPP can be used or 
> not.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17405) HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT

2017-08-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-17405:
---


> HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT
> -
>
> Key: HIVE-17405
> URL: https://issues.apache.org/jira/browse/HIVE-17405
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> In {{SparkCompiler#runDynamicPartitionPruning}} we should change {{new 
> ConstantPropagate().transform(parseContext)}} to {{new 
> ConstantPropagate(ConstantPropagateOption.SHORTCUT).transform(parseContext)}}
> Hive-on-Tez does the same thing.
> Running the full constant propagation isn't really necessary, we just want to 
> eliminate any {{and true}} predicates that were introduced by 
> {{SyntheticJoinPredicate}} and {{DynamicPartitionPruningOptimization}}. The 
> {{SyntheticJoinPredicate}} will introduce dummy filter predicates into the 
> operator tree, and {{DynamicPartitionPruningOptimization}} will replace them. 
> The predicates introduced via {{SyntheticJoinPredicate}} are necessary to 
> help {{DynamicPartitionPruningOptimization}} determine if DPP can be used or 
> not.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17216) Additional qtests for HoS DPP

2017-08-28 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144722#comment-16144722
 ] 

Sahil Takiar commented on HIVE-17216:
-

[~janulatha], [~vihangk1] could you take a look? No actual code change, just 
testing updates.

{{spark_vectorized_dynamic_partition_pruning.q}} has been failing for a while 
HIVE-17200
The {{TestSparkCliDriver}} failed in the most recent run, but all the DPP tests 
run in {{TestMiniSparkOnYarnCliDriver}}

> Additional qtests for HoS DPP
> -
>
> Key: HIVE-17216
> URL: https://issues.apache.org/jira/browse/HIVE-17216
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17216.1.patch, HIVE-17216.2.patch, 
> HIVE-17216.3.patch
>
>
> There are a few queries that we can add to the HoS DPP tests to increase 
> coverage. There are a few query patterns that the current tests don't cover.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17216) Additional qtests for HoS DPP

2017-08-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144714#comment-16144714
 ] 

Hive QA commented on HIVE-17216:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884163/HIVE-17216.3.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11000 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=234)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=102)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testFloatCast2DoubleThriftSerializeInTasks
 (batchId=227)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6573/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6573/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6573/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884163 - PreCommit-HIVE-Build

> Additional qtests for HoS DPP
> -
>
> Key: HIVE-17216
> URL: https://issues.apache.org/jira/browse/HIVE-17216
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17216.1.patch, HIVE-17216.2.patch, 
> HIVE-17216.3.patch
>
>
> There are a few queries that we can add to the HoS DPP tests to increase 
> coverage. There are a few query patterns that the current tests don't cover.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144682#comment-16144682
 ] 

Hive QA commented on HIVE-16886:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884156/HIVE-16886.4.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 11014 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6572/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6572/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6572/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884156 - PreCommit-HIVE-Build

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, 
> HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian

[jira] [Commented] (HIVE-17381) When we enable Parquet Writer Version V2, hive throws an exception: Unsupported encoding: DELTA_BYTE_ARRAY.

2017-08-28 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144679#comment-16144679
 ] 

Ferdinand Xu commented on HIVE-17381:
-

Hi,
Thanks for your email. I am on a business trip from August 28th to August 30th 
and have a limited access to the mail. Please expect some deplays for reply.

Yours,
Ferdinand Xu



> When we enable Parquet Writer Version V2, hive throws an exception: 
> Unsupported encoding: DELTA_BYTE_ARRAY.
> ---
>
> Key: HIVE-17381
> URL: https://issues.apache.org/jira/browse/HIVE-17381
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ke Jia
>Assignee: Colin Ma
>
> when we set "hive.vectorized.execution.enabled=true" and 
> "parquet.writer.version=v2" simultaneously, hive throws the following 
> exception:
> Caused by: java.io.IOException: java.io.IOException: 
> java.lang.UnsupportedOperationException: Unsupported encoding: 
> DELTA_BYTE_ARRAY
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:232)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:142)
>   at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:254)
>   at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:208)
>   at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:83)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>   at org.apache.spark.scheduler.Task.run(Task.scala:86)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.lang.UnsupportedOperationException: 
> Unsupported encoding: DELTA_BYTE_ARRAY
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:167)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:52)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229)
>   ... 16 more



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17381) When we enable Parquet Writer Version V2, hive throws an exception: Unsupported encoding: DELTA_BYTE_ARRAY.

2017-08-28 Thread Colin Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma reassigned HIVE-17381:
---

Assignee: Colin Ma

> When we enable Parquet Writer Version V2, hive throws an exception: 
> Unsupported encoding: DELTA_BYTE_ARRAY.
> ---
>
> Key: HIVE-17381
> URL: https://issues.apache.org/jira/browse/HIVE-17381
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ke Jia
>Assignee: Colin Ma
>
> when we set "hive.vectorized.execution.enabled=true" and 
> "parquet.writer.version=v2" simultaneously, hive throws the following 
> exception:
> Caused by: java.io.IOException: java.io.IOException: 
> java.lang.UnsupportedOperationException: Unsupported encoding: 
> DELTA_BYTE_ARRAY
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:232)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:142)
>   at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:254)
>   at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:208)
>   at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:83)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>   at org.apache.spark.scheduler.Task.run(Task.scala:86)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.lang.UnsupportedOperationException: 
> Unsupported encoding: DELTA_BYTE_ARRAY
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:167)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:52)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229)
>   ... 16 more



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-15687) SQL Standard auth: INSERT and DELETE privileges don't actually exist.

2017-08-28 Thread youchuikai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144672#comment-16144672
 ] 

youchuikai commented on HIVE-15687:
---

[~cltlfcjin] Do you have method to solved the question of HIVE-15687?
You say：set hive.security.authorization.manager to 
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory
 to get v2 authorization to grant INSERT | DELETE，Is this method able to sovle 
the problem?

> SQL Standard auth: INSERT and DELETE privileges don't actually exist.
> -
>
> Key: HIVE-15687
> URL: https://issues.apache.org/jira/browse/HIVE-15687
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Lantao Jin
>
> The documentation states 
> https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization#SQLStandardBasedHiveAuthorization-ObjectPrivilegeCommands
>  that there are privilege types of INSERT | SELECT | UPDATE | DELETE | ALL.
> Experience suggests otherwise:
> {code}
> : jdbc:hive2://localhost:1/default> grant select on table secured_table 
> to role my_role;
> No rows affected (0.034 seconds)
> 0: jdbc:hive2://localhost:1/default> grant insert on table secured_table 
> to role my_role;
> Error: Error while compiling statement: FAILED: SemanticException Undefined 
> privilege Insert (state=42000,code=4)
> 0: jdbc:hive2://localhost:1/default> grant update on table secured_table 
> to role my_role;
> No rows affected (0.037 seconds)
> 0: jdbc:hive2://localhost:1/default> grant delete on table secured_table 
> to role my_role;
> Error: Error while compiling statement: FAILED: SemanticException Undefined 
> privilege Delete (state=42000,code=4)
> 0: jdbc:hive2://localhost:1/default> select version();
> +--+--+
> | _c0  |
> +--+--+
> | 2.1.0.2.6.0.0-369 r434bfeb707d21f6b44121ac7dfe5adbadb746387  |
> +--+--+
> {code}
> It would be good to support these, especially since Hive supports updates and 
> deletions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17404) Orc split generation cache does not handle files with file tail

2017-08-28 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-17404:



> Orc split generation cache does not handle files with file tail
> ---
>
> Key: HIVE-17404
> URL: https://issues.apache.org/jira/browse/HIVE-17404
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Blocker
>
> Some old files do not have Orc FileTail. If file tail does not exist, split 
> generation should fallback to old way of storing footers. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17404) Orc split generation cache does not handle files with file tail

2017-08-28 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17404:
-
Description: 
Some old files do not have Orc FileTail. If file tail does not exist, split 
generation should fallback to old way of storing footers. 
This can result in exceptions like below
{code}
ORC split generation failed with exception: Malformed ORC file. Invalid 
postscript length 9
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1735)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1822)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:450)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:569)
at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.orc.FileFormatException: Malformed ORC file. Invalid 
postscript length 9
at org.apache.orc.impl.ReaderImpl.ensureOrcFooter(ReaderImpl.java:297)
at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:470)
at 
org.apache.hadoop.hive.ql.io.orc.LocalCache.getAndValidate(LocalCache.java:103)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.getSplits(OrcInputFormat.java:804)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.runGetSplitsSync(OrcInputFormat.java:922)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.generateSplitWork(OrcInputFormat.java:891)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.scheduleSplits(OrcInputFormat.java:1763)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1707)
... 15 more
{code}

  was:Some old files do not have Orc FileTail. If file tail does not exist, 
split generation should fallback to old way of storing footers. 


> Orc split generation cache does not handle files with file tail
> ---
>
> Key: HIVE-17404
> URL: https://issues.apache.org/jira/browse/HIVE-17404
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Blocker
>
> Some old files do not have Orc FileTail. If file tail does not exist, split 
> generation should fallback to old way of storing footers. 
> This can result in exceptions like below
> {code}
> ORC split generation failed with exception: Malformed ORC file. Invalid 
> postscript length 9
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1735)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1822)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:450)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:569)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
>   at 
>

[jira] [Commented] (HIVE-17403) Fail concatenation for unmanaged tables

2017-08-28 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144653#comment-16144653
 ] 

Prasanth Jayachandran commented on HIVE-17403:
--

This could happen for managed tables as well since hive supports "LOAD DATA" 
commands. 

> Fail concatenation for unmanaged tables
> ---
>
> Key: HIVE-17403
> URL: https://issues.apache.org/jira/browse/HIVE-17403
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Blocker
>
> ALTER TABLE .. CONCATENATE should fail if the table is not managed by hive. 
> For unmanaged tables, file names can be anything. Hive has some assumptions 
> about file names which can result in data loss for unmanaged tables. 
> Example of this is a table/partition having 2 different files files 
> (part-m-0__1417075294718 and part-m-00018__1417075294718). Although both 
> are completely different files, hive thinks these are files generated by 
> separate instances of same task (because of failure or speculative 
> execution). Hive will end up removing this file
> {code}
> 2017-08-28T18:19:29,516 WARN  [b27f10d5-d957-4695-ab2a-1453401793df main]: 
> exec.Utilities (:()) - Duplicate taskid file removed: 
> file:/Users/table/part=20141120/.hive-staging_hive_2017-08-28_18-19-27_210_3381701454205724533-1/_tmp.-ext-1/part-m-00018__1417075294718
>  with length 958510. Existing file: 
> file:/Users/table/part=20141120/.hive-staging_hive_2017-08-28_18-19-27_210_3381701454205724533-1/_tmp.-ext-1/part-m-0__1417075294718
>  with length 1123116
> {code}
> DDL should restrict concatenation for unmanaged tables. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17403) Fail concatenation for unmanaged tables

2017-08-28 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-17403:



> Fail concatenation for unmanaged tables
> ---
>
> Key: HIVE-17403
> URL: https://issues.apache.org/jira/browse/HIVE-17403
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Blocker
>
> ALTER TABLE .. CONCATENATE should fail if the table is not managed by hive. 
> For unmanaged tables, file names can be anything. Hive has some assumptions 
> about file names which can result in data loss for unmanaged tables. 
> Example of this is a table/partition having 2 different files files 
> (part-m-0__1417075294718 and part-m-00018__1417075294718). Although both 
> are completely different files, hive thinks these are files generated by 
> separate instances of same task (because of failure or speculative 
> execution). Hive will end up removing this file
> {code}
> 2017-08-28T18:19:29,516 WARN  [b27f10d5-d957-4695-ab2a-1453401793df main]: 
> exec.Utilities (:()) - Duplicate taskid file removed: 
> file:/Users/table/part=20141120/.hive-staging_hive_2017-08-28_18-19-27_210_3381701454205724533-1/_tmp.-ext-1/part-m-00018__1417075294718
>  with length 958510. Existing file: 
> file:/Users/table/part=20141120/.hive-staging_hive_2017-08-28_18-19-27_210_3381701454205724533-1/_tmp.-ext-1/part-m-0__1417075294718
>  with length 1123116
> {code}
> DDL should restrict concatenation for unmanaged tables. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17401) Hive session idle timeout doesn't function properly

2017-08-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144652#comment-16144652
 ] 

Hive QA commented on HIVE-17401:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884150/HIVE-17401.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11014 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6571/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6571/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6571/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884150 - PreCommit-HIVE-Build

> Hive session idle timeout doesn't function properly
> ---
>
> Key: HIVE-17401
> URL: https://issues.apache.org/jira/browse/HIVE-17401
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-17401.1.patch, HIVE-17401.patch
>
>
> It's apparent in our production environment that HS2 leaks sessions, which at 
> least contributed to memory leaks in HS2. We further found that idle HS2 
> sessions rarely get timed out and the number of live session keeps increasing 
> as time goes on. Eventually, HS2 becomes irresponsive and demands a restart.
> Investigation shows that session idle timeout doesn't work appropriately.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17216) Additional qtests for HoS DPP

2017-08-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17216:

Attachment: HIVE-17216.3.patch

> Additional qtests for HoS DPP
> -
>
> Key: HIVE-17216
> URL: https://issues.apache.org/jira/browse/HIVE-17216
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17216.1.patch, HIVE-17216.2.patch, 
> HIVE-17216.3.patch
>
>
> There are a few queries that we can add to the HoS DPP tests to increase 
> coverage. There are a few query patterns that the current tests don't cover.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-28 Thread Alexander Kolbasov (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144623#comment-16144623
 ] 

Alexander Kolbasov commented on HIVE-16886:
---

Is there any reason you don't want to use

Query q = pm.newQuery(...);
q.setSerializeRead(true);

which does convert to appropriate SELECT FOR UPDATE statements? See 
http://www.datanucleus.org/products/accessplatform_3_0/jdo/transaction_types.html
 for some description.

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, 
> HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17373) Upgrade some dependency versions

2017-08-28 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144616#comment-16144616
 ] 

Prasanth Jayachandran commented on HIVE-17373:
--

This will help fixing LOG4J2-1542. Encountered it recently
{code}
Exception in thread "LocalContainerLauncher-SubTaskRunner" 
java.lang.ArrayIndexOutOfBoundsException: 0
at 
org.apache.logging.log4j.message.ParameterizedMessage.formatTo(ParameterizedMessage.java:221)
at 
org.apache.logging.log4j.message.ParameterizedMessage.getFormattedMessage(ParameterizedMessage.java:200)
at 
org.apache.logging.log4j.core.async.RingBufferLogEvent.setMessage(RingBufferLogEvent.java:126)
at 
org.apache.logging.log4j.core.async.RingBufferLogEvent.setValues(RingBufferLogEvent.java:104)
at 
org.apache.logging.log4j.core.async.RingBufferLogEventTranslator.translateTo(RingBufferLogEventTranslator.java:56)
at 
org.apache.logging.log4j.core.async.RingBufferLogEventTranslator.translateTo(RingBufferLogEventTranslator.java:34)
at 
com.lmax.disruptor.RingBuffer.translateAndPublish(RingBuffer.java:930)
at com.lmax.disruptor.RingBuffer.tryPublishEvent(RingBuffer.java:456)
at 
org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.tryPublish(AsyncLoggerDisruptor.java:190)
at 
org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:160)
at 
org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:156)
at 
org.apache.logging.log4j.core.async.AsyncLogger.logMessage(AsyncLogger.java:126)
at 
org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2005)
at 
org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1876)
at org.apache.logging.slf4j.Log4jLogger.debug(Log4jLogger.java:124)
{code}

> Upgrade some dependency versions
> 
>
> Key: HIVE-17373
> URL: https://issues.apache.org/jira/browse/HIVE-17373
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17373.1.patch, HIVE-17373.2.patch
>
>
> Upgrade some libraries including log4j to 2.8.2, accumulo to 1.8.1 and 
> commons-httpclient to 3.1. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17393) AMReporter need hearbeat every external 'AM'

2017-08-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144613#comment-16144613
 ] 

Hive QA commented on HIVE-17393:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884145/HIVE-17393.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11015 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6570/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6570/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6570/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884145 - PreCommit-HIVE-Build

> AMReporter need hearbeat every external 'AM'
> 
>
> Key: HIVE-17393
> URL: https://issues.apache.org/jira/browse/HIVE-17393
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17393.1.patch, HIVE-17393.2.patch, 
> HIVE-17393.3.patch
>
>
> AMReporter only remember first AM that submit the query and heartbeat to it. 
> In case of external client, there might be multiple 'AM's and every of them 
> need node heartbeat.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144596#comment-16144596
 ] 

ASF GitHub Bot commented on HIVE-16886:
---

GitHub user anishek opened a pull request:

https://github.com/apache/hive/pull/237

HIVE-16886: HMS log notifications may have duplicated event IDs if multiple 
HMS are running concurrently

…ltiple HMS are running concurrently

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anishek/hive HIVE-16886

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/237.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #237


commit 1d358291c5c5fc46e3d921d74967e51d1a7418b5
Author: Anishek Agarwal 
Date:   2017-08-24T00:10:00Z

HIVE-16886: HMS log notifications may have duplicated event IDs if multiple 
HMS are running concurrently




> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, 
> HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-28 Thread Alexander Kolbasov (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144590#comment-16144590
 ] 

Alexander Kolbasov commented on HIVE-16886:
---

[~anishek] Would you mind adding a link to reviewboard?

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, 
> HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-28 Thread anishek (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144587#comment-16144587
 ] 

anishek commented on HIVE-16886:


Failing tests are from older builds.

[~thejas]/[~daijy]/[~sankarh] please review.

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, 
> HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-28 Thread anishek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16886:
---
Attachment: HIVE-16886.4.patch

fixing oracle schema upgrades

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, 
> HIVE-16886.2.patch, HIVE-16886.3.patch, HIVE-16886.4.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17401) Hive session idle timeout doesn't function properly

2017-08-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144558#comment-16144558
 ] 

Hive QA commented on HIVE-17401:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884141/HIVE-17401.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 11014 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6569/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6569/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6569/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884141 - PreCommit-HIVE-Build

> Hive session idle timeout doesn't function properly
> ---
>
> Key: HIVE-17401
> URL: https://issues.apache.org/jira/browse/HIVE-17401
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-17401.1.patch, HIVE-17401.patch
>
>
> It's apparent in our production environment that HS2 leaks sessions, which at 
> least contributed to memory leaks in HS2. We further found that idle HS2 
> sessions rarely get timed out and the number of live session keeps increasing 
> as time goes on. Eventually, HS2 becomes irresponsive and demands a restart.
> Investigation shows that session idle timeout doesn't work appropriately.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17307) Change the metastore to not use the metrics code in hive/common

2017-08-28 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144510#comment-16144510
 ] 

Alan Gates commented on HIVE-17307:
---

Thanks for the review.  I've responded to the comments in the PR.

> Change the metastore to not use the metrics code in hive/common
> ---
>
> Key: HIVE-17307
> URL: https://issues.apache.org/jira/browse/HIVE-17307
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17307.2.patch, HIVE-17307.3.patch, HIVE-17307.patch
>
>
> As we move code into the standalone metastore module, it cannot use the 
> metrics in hive-common.  We could copy the current Metrics interface or we 
> could change the metastore code to directly use codahale metrics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17401) Hive session idle timeout doesn't function properly

2017-08-28 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-17401:
---
Attachment: HIVE-17401.1.patch

With the change in patch, I found activeCalls variable is no longer used. Patch 
#1 removes it. Plus, I don't feel it's strongly needed in the first place as 
well.

> Hive session idle timeout doesn't function properly
> ---
>
> Key: HIVE-17401
> URL: https://issues.apache.org/jira/browse/HIVE-17401
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-17401.1.patch, HIVE-17401.patch
>
>
> It's apparent in our production environment that HS2 leaks sessions, which at 
> least contributed to memory leaks in HS2. We further found that idle HS2 
> sessions rarely get timed out and the number of live session keeps increasing 
> as time goes on. Eventually, HS2 becomes irresponsive and demands a restart.
> Investigation shows that session idle timeout doesn't work appropriately.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-08-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144483#comment-16144483
 ] 

Hive QA commented on HIVE-17386:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884132/HIVE-17386.01.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11002 tests 
executed
*Failed tests:*
{noformat}
TestJdbcWithMiniHS2 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=227)
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=281)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=239)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit (batchId=276)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testShowLocksAgentInfo 
(batchId=283)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6568/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6568/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6568/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884132 - PreCommit-HIVE-Build

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.only.patch, HIVE-17386.01.patch, 
> HIVE-17386.01.patch, HIVE-17386.only.patch, HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17393) AMReporter need hearbeat every external 'AM'

2017-08-28 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17393:

Attachment: HIVE-17393.3.patch

Attached new patch to address comments.

> AMReporter need hearbeat every external 'AM'
> 
>
> Key: HIVE-17393
> URL: https://issues.apache.org/jira/browse/HIVE-17393
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17393.1.patch, HIVE-17393.2.patch, 
> HIVE-17393.3.patch
>
>
> AMReporter only remember first AM that submit the query and heartbeat to it. 
> In case of external client, there might be multiple 'AM's and every of them 
> need node heartbeat.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17393) AMReporter need hearbeat every external 'AM'

2017-08-28 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144466#comment-16144466
 ] 

Zhiyuan Yang commented on HIVE-17393:
-

Thanks [~sershe] for review! We need AMNodeInfo not to be static so that we can 
override its method to for unit test.

> AMReporter need hearbeat every external 'AM'
> 
>
> Key: HIVE-17393
> URL: https://issues.apache.org/jira/browse/HIVE-17393
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17393.1.patch, HIVE-17393.2.patch
>
>
> AMReporter only remember first AM that submit the query and heartbeat to it. 
> In case of external client, there might be multiple 'AM's and every of them 
> need node heartbeat.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-17323) Improve upon HIVE-16260

2017-08-28 Thread Deepak Jaiswal (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144278#comment-16144278
 ] 

Deepak Jaiswal edited comment on HIVE-17323 at 8/28/17 10:17 PM:
-

Initial patch. Added a test which ensures the semijoin edge feeding into a map 
AFTER the DPP is not there.

[~gopalv] [~jdere] Can you please review?


was (Author: djaiswal):
Initial patch. Added a test which ensures the semijoin edge feeding into a map 
AFTER the DPP is not there.

> Improve upon HIVE-16260
> ---
>
> Key: HIVE-17323
> URL: https://issues.apache.org/jira/browse/HIVE-17323
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-17323.1.patch
>
>
> HIVE-16260 allows removal of parallel edges of semijoin with mapjoins.
> https://issues.apache.org/jira/browse/HIVE-16260
> However, it should also consider dynamic partition pruning edge like semijoin 
> without removing it while traversing the query tree.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17307) Change the metastore to not use the metrics code in hive/common

2017-08-28 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144450#comment-16144450
 ] 

Vihang Karajgaonkar commented on HIVE-17307:


Hi [~alangates] Thanks for the patch. I have made some comments on the github 
request. Let me know what you think.

> Change the metastore to not use the metrics code in hive/common
> ---
>
> Key: HIVE-17307
> URL: https://issues.apache.org/jira/browse/HIVE-17307
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17307.2.patch, HIVE-17307.3.patch, HIVE-17307.patch
>
>
> As we move code into the standalone metastore module, it cannot use the 
> metrics in hive-common.  We could copy the current Metrics interface or we 
> could change the metastore code to directly use codahale metrics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17401) Hive session idle timeout doesn't function properly

2017-08-28 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-17401:
---
Attachment: HIVE-17401.patch

> Hive session idle timeout doesn't function properly
> ---
>
> Key: HIVE-17401
> URL: https://issues.apache.org/jira/browse/HIVE-17401
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-17401.patch
>
>
> It's apparent in our production environment that HS2 leaks sessions, which at 
> least contributed to memory leaks in HS2. We further found that idle HS2 
> sessions rarely get timed out and the number of live session keeps increasing 
> as time goes on. Eventually, HS2 becomes irresponsive and demands a restart.
> Investigation shows that session idle timeout doesn't work appropriately.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17401) Hive session idle timeout doesn't function properly

2017-08-28 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-17401:
---
Status: Patch Available  (was: Open)

> Hive session idle timeout doesn't function properly
> ---
>
> Key: HIVE-17401
> URL: https://issues.apache.org/jira/browse/HIVE-17401
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-17401.patch
>
>
> It's apparent in our production environment that HS2 leaks sessions, which at 
> least contributed to memory leaks in HS2. We further found that idle HS2 
> sessions rarely get timed out and the number of live session keeps increasing 
> as time goes on. Eventually, HS2 becomes irresponsive and demands a restart.
> Investigation shows that session idle timeout doesn't work appropriately.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17401) Hive session idle timeout doesn't function properly

2017-08-28 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144383#comment-16144383
 ] 

Xuefu Zhang commented on HIVE-17401:


{code}  
private void startTimeoutChecker() {
  ...
final Runnable timeoutChecker = new Runnable() {
  @Override
  public void run() {
sleepFor(interval);
while (!shutdown) {
  ...
if (sessionTimeout > 0 && session.getLastAccessTime() + 
sessionTimeout <= current
&& (!checkOperation || session.getNoOperationTime() > 
sessionTimeout)) {
  ...
} else {
  session.closeExpiredOperations();
}
  }
  sleepFor(interval);
}
  }
{code}
In the condition {{session.getNoOperationTime() > sessionTimeout}} is not true, 
execution will go to else clause. However, 
{{session.closeExpiredOperations();}} will eventually call 
HiveSessionImpl.acquire(), which will set {{lastIdleTime = 0}}. This will make 
the condition {{session.getNoOperationTime() > sessionTimeout}} never true, 
causing a session leak.

> Hive session idle timeout doesn't function properly
> ---
>
> Key: HIVE-17401
> URL: https://issues.apache.org/jira/browse/HIVE-17401
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> It's apparent in our production environment that HS2 leaks sessions, which at 
> least contributed to memory leaks in HS2. We further found that idle HS2 
> sessions rarely get timed out and the number of live session keeps increasing 
> as time goes on. Eventually, HS2 becomes irresponsive and demands a restart.
> Investigation shows that session idle timeout doesn't work appropriately.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-08-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17386:

Attachment: HIVE-17386.01.patch

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.only.patch, HIVE-17386.01.patch, 
> HIVE-17386.01.patch, HIVE-17386.only.patch, HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17323) Improve upon HIVE-16260

2017-08-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144375#comment-16144375
 ] 

Hive QA commented on HIVE-17323:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884116/HIVE-17323.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 11014 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6566/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6566/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6566/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884116 - PreCommit-HIVE-Build

> Improve upon HIVE-16260
> ---
>
> Key: HIVE-17323
> URL: https://issues.apache.org/jira/browse/HIVE-17323
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-17323.1.patch
>
>
> HIVE-16260 allows removal of parallel edges of semijoin with mapjoins.
> https://issues.apache.org/jira/browse/HIVE-16260
> However, it should also consider dynamic partition pruning edge like semijoin 
> without removing it while traversing the query tree.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-08-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144377#comment-16144377
 ] 

Hive QA commented on HIVE-17386:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884130/HIVE-17386.01.only.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6567/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6567/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6567/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-08-28 21:26:47.307
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-6567/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-08-28 21:26:47.309
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   e352ef4..dd04a92  master -> origin/master
+ git reset --hard HEAD
HEAD is now at e352ef4 HIVE-17309 alter partition onto a table not in current 
database throw InvalidOperationException (Wang Haihua via Alan Gates)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at dd04a92 HIVE-17375: stddev_samp,var_samp standard compliance 
(Zoltan Haindrich, reviewed by Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-08-28 21:26:50.913
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:3155
error: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: patch does 
not apply
error: patch failed: 
llap-client/src/java/org/apache/hadoop/hive/llap/tez/LlapProtocolClientProxy.java:71
error: 
llap-client/src/java/org/apache/hadoop/hive/llap/tez/LlapProtocolClientProxy.java:
 patch does not apply
error: 
llap-common/src/gen/protobuf/gen-java/org/apache/hadoop/hive/llap/plugin/rpc/LlapPluginProtocolProtos.java:
 No such file or directory
error: 
llap-common/src/java/org/apache/hadoop/hive/llap/impl/LlapPluginProtocolClientImpl.java:
 No such file or directory
error: 
llap-common/src/java/org/apache/hadoop/hive/llap/impl/ProtobufProxy.java: No 
such file or directory
error: llap-common/src/protobuf/LlapPluginProtocol.proto: No such file or 
directory
error: patch failed: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java:243
error: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java:
 patch does not apply
error: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/endpoint/LlapPluginServerImpl.java:
 No such file or directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884130 - PreCommit-HIVE-Build

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.only.patch, HIVE-17386.01.patch, 
> HIVE-17386.only.patch, HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is

[jira] [Assigned] (HIVE-17401) Hive session idle timeout doesn't function properly

2017-08-28 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-17401:
--


> Hive session idle timeout doesn't function properly
> ---
>
> Key: HIVE-17401
> URL: https://issues.apache.org/jira/browse/HIVE-17401
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> It's apparent in our production environment that HS2 leaks sessions, which at 
> least contributed to memory leaks in HS2. We further found that idle HS2 
> sessions rarely get timed out and the number of live session keeps increasing 
> as time goes on. Eventually, HS2 becomes irresponsive and demands a restart.
> Investigation shows that session idle timeout doesn't work appropriately.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-08-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17386:

Attachment: HIVE-17386.01.only.patch
HIVE-17386.01.patch

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.only.patch, HIVE-17386.01.patch, 
> HIVE-17386.only.patch, HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-08-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17386:

Attachment: (was: HIVE-17386.01.patch)

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.only.patch, HIVE-17386.01.patch, 
> HIVE-17386.only.patch, HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-08-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17386:

Attachment: HIVE-17386.01.patch

Updated the patch to handle concurrent changes.

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.patch, HIVE-17386.only.patch, 
> HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-08-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17386:

Attachment: HIVE-17386.01.only.patch

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.patch, HIVE-17386.only.patch, 
> HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-08-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17386:

Attachment: (was: HIVE-17386.01.only.patch)

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.patch, HIVE-17386.only.patch, 
> HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17375) stddev_samp,var_samp standard compliance

2017-08-28 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-17375:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

pushed to master, thank you Ashutosh for the review!

> stddev_samp,var_samp standard compliance
> 
>
> Key: HIVE-17375
> URL: https://issues.apache.org/jira/browse/HIVE-17375
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-17375.1.patch, HIVE-17375.2.patch
>
>
> these two udaf-s are returning 0 in case of only one element - however the 
> stadard requires NULL to be returned



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-16811) Estimate statistics in absence of stats

2017-08-28 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144344#comment-16144344
 ] 

Vineet Garg edited comment on HIVE-16811 at 8/28/17 9:02 PM:
-

Hi [~xuefuz] The estimation is largely done/controlled using configuration 
parameters and is pure random. e.g. ndv is set to 20% of the number of rows, 
number of nulls are set to 5% and so on. There is no hard and fast rule for 
estimation. This is done mostly to prevent planner bailing out and going 
non-cbo path.


was (Author: vgarg):
Hi [~xuefu.w...@kodak.com] The estimation is largely done/controlled using 
configuration parameters and is pure random. e.g. ndv is set to 20% of the 
number of rows, number of nulls are set to 5% and so on. There is no hard and 
fast rule for estimation. This is done mostly to prevent planner bailing out 
and going non-cbo path.

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.1.patch, HIVE-16811.2.patch, HIVE-16811.3.patch, 
> HIVE-16811.4.patch, HIVE-16811.5.patch, HIVE-16811.6.patch, 
> HIVE-16811.7.patch, HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats

2017-08-28 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144344#comment-16144344
 ] 

Vineet Garg commented on HIVE-16811:


Hi [~xuefu.w...@kodak.com] The estimation is largely done/controlled using 
configuration parameters and is pure random. e.g. ndv is set to 20% of the 
number of rows, number of nulls are set to 5% and so on. There is no hard and 
fast rule for estimation. This is done mostly to prevent planner bailing out 
and going non-cbo path.

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.1.patch, HIVE-16811.2.patch, HIVE-16811.3.patch, 
> HIVE-16811.4.patch, HIVE-16811.5.patch, HIVE-16811.6.patch, 
> HIVE-16811.7.patch, HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats

2017-08-28 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144331#comment-16144331
 ] 

Xuefu Zhang commented on HIVE-16811:


Could we have some high-level description on how we estimate stats if they are 
missing? The approach seems unclear unless one reads the code.

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, 
> HIVE-16811.1.patch, HIVE-16811.2.patch, HIVE-16811.3.patch, 
> HIVE-16811.4.patch, HIVE-16811.5.patch, HIVE-16811.6.patch, 
> HIVE-16811.7.patch, HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-08-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17386:

Attachment: (was: HIVE-17386.01.only.patch)

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.only.patch, HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-08-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17386:

Attachment: HIVE-17386.01.only.patch

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.only.patch, HIVE-17386.only.patch, 
> HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17400) Estimate stats in absence of stats for complex types

2017-08-28 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-17400:
--


> Estimate stats in absence of stats for complex types
> 
>
> Key: HIVE-17400
> URL: https://issues.apache.org/jira/browse/HIVE-17400
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>
> HIVE-16811 adds support for estimation of stats for primitive types if it 
> doesn't exist. This JIRA is to extend that support for complex data types.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17323) Improve upon HIVE-16260

2017-08-28 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-17323:
--
Attachment: HIVE-17323.1.patch

Initial patch. Added a test which ensures the semijoin edge feeding into a map 
AFTER the DPP is not there.

> Improve upon HIVE-16260
> ---
>
> Key: HIVE-17323
> URL: https://issues.apache.org/jira/browse/HIVE-17323
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-17323.1.patch
>
>
> HIVE-16260 allows removal of parallel edges of semijoin with mapjoins.
> https://issues.apache.org/jira/browse/HIVE-16260
> However, it should also consider dynamic partition pruning edge like semijoin 
> without removing it while traversing the query tree.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17323) Improve upon HIVE-16260

2017-08-28 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-17323:
--
Status: Patch Available  (was: In Progress)

> Improve upon HIVE-16260
> ---
>
> Key: HIVE-17323
> URL: https://issues.apache.org/jira/browse/HIVE-17323
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> HIVE-16260 allows removal of parallel edges of semijoin with mapjoins.
> https://issues.apache.org/jira/browse/HIVE-16260
> However, it should also consider dynamic partition pruning edge like semijoin 
> without removing it while traversing the query tree.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144269#comment-16144269
 ] 

Hive QA commented on HIVE-16886:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884096/HIVE-16886.3.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6565/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6565/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6565/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884096 - PreCommit-HIVE-Build

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, 
> HIVE-16886.2.patch, HIVE-16886.3.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17399) Do not remove semijoin branch if it feeds to TS->DPP_EVENT

2017-08-28 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-17399:
-


> Do not remove semijoin branch if it feeds to TS->DPP_EVENT
> --
>
> Key: HIVE-17399
> URL: https://issues.apache.org/jira/browse/HIVE-17399
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> If there is an incoming semijoin branch to a TS which has DPP event, then try 
> to keep it as it may serve as an excellent filter for DPP thus reducing the 
> input to join drastically.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17309) alter partition onto a table not in current database throw InvalidOperationException

2017-08-28 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17309:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Patch committed to master.  Thanks Wang for the contribution.

> alter partition onto a table not in current database throw 
> InvalidOperationException
> 
>
> Key: HIVE-17309
> URL: https://issues.apache.org/jira/browse/HIVE-17309
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.2, 2.1.1, 2.2.0
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Fix For: 3.0.0
>
> Attachments: HIVE-17309.1.patch, HIVE-17309.2.patch
>
>
> When executor alter partition onto a table which existed not in current 
> database, InvalidOperationException thrown.
> SQL example:
> {code}
> use default;
> ALTER TABLE anotherdb.test_table_for_alter_partition_nocurrentdb 
> partition(ds='haihua001') CHANGE COLUMN a a_new BOOLEAN;
> {code}
> We see this code in {{DDLTask.java}} potential problem that not transfer the 
> qualified table name with database name when {{db.alterPartitions}} called.
> {code}
>   if (allPartitions == null) {
> db.alterTable(alterTbl.getOldName(), tbl, alterTbl.getIsCascade(), 
> alterTbl.getEnvironmentContext());
>   } else {
> db.alterPartitions(tbl.getTableName(), allPartitions, 
> alterTbl.getEnvironmentContext());
>   }
> {code}
> stacktrace:
> {code}
> 2017-07-19T11:06:39,639  INFO [main] metastore.HiveMetaStore: New partition 
> values:[2017-07-14]
> 2017-07-19T11:06:39,654 ERROR [main] metastore.RetryingHMSHandler: 
> InvalidOperationException(message:alter is not possible)
> at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:526)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:3560)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at 
> com.sun.proxy.$Proxy21.alter_partitions_with_environment_context(Unknown 
> Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_partitions(HiveMetaStoreClient.java:1486)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy22.alter_partitions(Unknown Source)
> at org.apache.hadoop.hive.ql.metadata.Hive.alterPartitions(Hive.java:712)
> at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3338)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:368)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2166)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1837)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1713)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1543)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1174)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1164)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at

[jira] [Commented] (HIVE-17398) Support Costing/Heuristics to enable or disable DPP

2017-08-28 Thread Janaki Lahorani (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144222#comment-16144222
 ] 

Janaki Lahorani commented on HIVE-17398:


There are scenarios where the cost of doing DPP doesn't cover the benefit 
provided by DPP.  Don't use DPP if it doesn't provide tangible benefit.

> Support Costing/Heuristics to enable or disable DPP
> ---
>
> Key: HIVE-17398
> URL: https://issues.apache.org/jira/browse/HIVE-17398
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Janaki Lahorani
>Assignee: Janaki Lahorani
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17398) Support Costing/Heuristics to enable or disable DPP

2017-08-28 Thread Janaki Lahorani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani reassigned HIVE-17398:
--

Assignee: Janaki Lahorani

> Support Costing/Heuristics to enable or disable DPP
> ---
>
> Key: HIVE-17398
> URL: https://issues.apache.org/jira/browse/HIVE-17398
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Janaki Lahorani
>Assignee: Janaki Lahorani
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17397) vector_outer_join4.q.out explain plan not formatted correctly

2017-08-28 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-17397:
--


> vector_outer_join4.q.out explain plan not formatted correctly
> -
>
> Key: HIVE-17397
> URL: https://issues.apache.org/jira/browse/HIVE-17397
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Trivial
>
> {{vector_outer_join4.q}} user {{explain vectorization detail formatted}} 
> which just dumps a JSON string without any indentations and new lines. 
> Either, there is no option for {{explain vectorization detail formatted}} and 
> it should just be  {{explain vectorization detail}}  or this may be a bug.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17396) Support DPP with map joins where the source and target belong in the same stage

2017-08-28 Thread Janaki Lahorani (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144195#comment-16144195
 ] 

Janaki Lahorani commented on HIVE-17396:


HIVE.17225.1 has a potential fix.  This will be further enhanced in this JIRA.

> Support DPP with map joins where the source and target belong in the same 
> stage
> ---
>
> Key: HIVE-17396
> URL: https://issues.apache.org/jira/browse/HIVE-17396
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Janaki Lahorani
>Assignee: Janaki Lahorani
>
> When the target of a partition pruning sink operator is in not the same as 
> the target of hash table sink operator, both source and target gets scheduled 
> within the same spark job, and that can result in File Not Found Exception.  
> HIVE-17225 has a fix to disable DPP in that scenario.  This JIRA is to 
> support DPP for such cases.
> Test Case:
> SET hive.spark.dynamic.partition.pruning=true;
> SET hive.auto.convert.join=true;
> SET hive.strict.checks.cartesian.product=false;
> CREATE TABLE part_table1 (col int) PARTITIONED BY (part1_col int);
> CREATE TABLE part_table2 (col int) PARTITIONED BY (part2_col int);
> CREATE TABLE reg_table (col int);
> ALTER TABLE part_table1 ADD PARTITION (part1_col = 1);
> ALTER TABLE part_table2 ADD PARTITION (part2_col = 1);
> ALTER TABLE part_table2 ADD PARTITION (part2_col = 2);
> INSERT INTO TABLE part_table1 PARTITION (part1_col = 1) VALUES (1);
> INSERT INTO TABLE part_table2 PARTITION (part2_col = 1) VALUES (1);
> INSERT INTO TABLE part_table2 PARTITION (part2_col = 2) VALUES (2);
> INSERT INTO table reg_table VALUES (1), (2), (3), (4), (5), (6);
> EXPLAIN SELECT *
> FROM   part_table1 pt1,
>part_table2 pt2,
>reg_table rt
> WHERE  rt.col = pt1.part1_col
> ANDpt2.part2_col = pt1.part1_col;
> Plan:
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: pt1
>   Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: col (type: int), part1_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> Spark HashTable Sink Operator
>   keys:
> 0 _col1 (type: int)
> 1 _col1 (type: int)
> 2 _col0 (type: int)
> Select Operator
>   expressions: _col1 (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   Target column: part2_col (int)
>   partition key expr: part2_col
>   Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
>   target work: Map 2
> Local Work:
>   Map Reduce Local Work
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: pt2
>   Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: col (type: int), part2_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 2 Data size: 2 Basic stats: 
> COMPLETE Column stats: NONE
> Spark HashTable Sink Operator
>   keys:
> 0 _col1 (type: int)
> 1 _col1 (type: int)
> 2 _col0 (type: int)
> Local Work:
>   Map Reduce Local Work
>   Stage: Stage-1
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: rt
>   Statistics: Num rows: 6 Data size: 6 Basic stats: COMPLETE 
> Column stats: NONE
>   Filter

[jira] [Updated] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-28 Thread anishek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16886:
---
Attachment: HIVE-16886.3.patch

fixing errors for derby schema and removing the java lock for updating 
notification logs since we do a db lock for multiple HMS

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, 
> HIVE-16886.2.patch, HIVE-16886.3.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17396) Support DPP with map joins where the source and target belong in the same stage

2017-08-28 Thread Janaki Lahorani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani updated HIVE-17396:
---
Description: 
When the target of a partition pruning sink operator is in not the same as the 
target of hash table sink operator, both source and target gets scheduled 
within the same spark job, and that can result in File Not Found Exception.  
HIVE-17225 has a fix to disable DPP in that scenario.  This JIRA is to support 
DPP for such cases.

Test Case:
SET hive.spark.dynamic.partition.pruning=true;
SET hive.auto.convert.join=true;
SET hive.strict.checks.cartesian.product=false;

CREATE TABLE part_table1 (col int) PARTITIONED BY (part1_col int);
CREATE TABLE part_table2 (col int) PARTITIONED BY (part2_col int);

CREATE TABLE reg_table (col int);

ALTER TABLE part_table1 ADD PARTITION (part1_col = 1);

ALTER TABLE part_table2 ADD PARTITION (part2_col = 1);
ALTER TABLE part_table2 ADD PARTITION (part2_col = 2);

INSERT INTO TABLE part_table1 PARTITION (part1_col = 1) VALUES (1);

INSERT INTO TABLE part_table2 PARTITION (part2_col = 1) VALUES (1);
INSERT INTO TABLE part_table2 PARTITION (part2_col = 2) VALUES (2);

INSERT INTO table reg_table VALUES (1), (2), (3), (4), (5), (6);

EXPLAIN SELECT *
FROM   part_table1 pt1,
   part_table2 pt2,
   reg_table rt
WHERE  rt.col = pt1.part1_col
ANDpt2.part2_col = pt1.part1_col;

Plan:
STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-1 depends on stages: Stage-2
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-2
Spark
 A masked pattern was here 
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: pt1
  Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
Column stats: NONE
  Select Operator
expressions: col (type: int), part1_col (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
Column stats: NONE
Spark HashTable Sink Operator
  keys:
0 _col1 (type: int)
1 _col1 (type: int)
2 _col0 (type: int)
Select Operator
  expressions: _col1 (type: int)
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 1 Basic stats: 
COMPLETE Column stats: NONE
  Group By Operator
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 1 Basic stats: 
COMPLETE Column stats: NONE
Spark Partition Pruning Sink Operator
  Target column: part2_col (int)
  partition key expr: part2_col
  Statistics: Num rows: 1 Data size: 1 Basic stats: 
COMPLETE Column stats: NONE
  target work: Map 2
Local Work:
  Map Reduce Local Work
Map 2 
Map Operator Tree:
TableScan
  alias: pt2
  Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE 
Column stats: NONE
  Select Operator
expressions: col (type: int), part2_col (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE 
Column stats: NONE
Spark HashTable Sink Operator
  keys:
0 _col1 (type: int)
1 _col1 (type: int)
2 _col0 (type: int)
Local Work:
  Map Reduce Local Work

  Stage: Stage-1
Spark
 A masked pattern was here 
  Vertices:
Map 3 
Map Operator Tree:
TableScan
  alias: rt
  Statistics: Num rows: 6 Data size: 6 Basic stats: COMPLETE 
Column stats: NONE
  Filter Operator
predicate: col is not null (type: boolean)
Statistics: Num rows: 6 Data size: 6 Basic stats: COMPLETE 
Column stats: NONE
Select Operator
  expressions: col (type: int)
  outputColumnNames: _col0
  Statistics: Num rows: 6 Data size: 6 Basic stats: 
COMPLETE Column stats: NONE
  Map Join Operator
condition map:
 Inner Join 0 to 1
 Inner Join 0 to 2
keys:
  0 _col1 (type: int)
  1 _col1 (type: int)
  2 _col0

[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths and Get-Input-Summary thread pool

2017-08-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16949:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the review [~vihangk1]. Merged to master.

> Leak of threads from Get-Input-Paths and Get-Input-Summary thread pool
> --
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>Assignee: Sahil Takiar
> Attachments: HIVE-16949.1.patch
>
>
> The commit 
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
>  which was part of HIVE-15546 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads for each query which uses more than 1 partition. They are not 
> removed automatically. When queries spanning multiple partitions are made the 
> number of threads increases and is never reduced. On my machine hiveserver2 
> starts to get slower and slower once 10k threads are reached.
> Thread pools only shutdown automatically in special circumstances (see 
> [documentation section 
> _Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
>  This is not currently the case for the Get-Input-Paths thread pool. I would 
> add a _pool.shutdown()_ in a finally block just before returning the result 
> to make sure the threads are really shutdown.
> My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
> This prevents the the thread pool from being spawned 
> [\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
>  
> [\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].
> The same issue probably also applies to the [Get-Input-Summary thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17393) AMReporter need hearbeat every external 'AM'

2017-08-28 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144150#comment-16144150
 ] 

Sergey Shelukhin commented on HIVE-17393:
-

nit: getAMNodeInfo gets the same thing twice from the hashmap
Why did AMNodeInfo cease being static?

> AMReporter need hearbeat every external 'AM'
> 
>
> Key: HIVE-17393
> URL: https://issues.apache.org/jira/browse/HIVE-17393
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17393.1.patch, HIVE-17393.2.patch
>
>
> AMReporter only remember first AM that submit the query and heartbeat to it. 
> In case of external client, there might be multiple 'AM's and every of them 
> need node heartbeat.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17396) Support DPP with map joins where the source and target belong in the same stage

2017-08-28 Thread Janaki Lahorani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani reassigned HIVE-17396:
--

Assignee: Janaki Lahorani

> Support DPP with map joins where the source and target belong in the same 
> stage
> ---
>
> Key: HIVE-17396
> URL: https://issues.apache.org/jira/browse/HIVE-17396
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Janaki Lahorani
>Assignee: Janaki Lahorani
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17380) refactor LlapProtocolClientProxy to be usable with other protocols

2017-08-28 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144135#comment-16144135
 ] 

Sergey Shelukhin commented on HIVE-17380:
-

Not really, it's an internal change to use some code.

> refactor LlapProtocolClientProxy to be usable with other protocols
> --
>
> Key: HIVE-17380
> URL: https://issues.apache.org/jira/browse/HIVE-17380
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-17380.patch, HIVE-17380.patch
>
>
> This basically moves a bunch of code into a generic async PB RPC proxy, in 
> llap-common for now. Moving to common would require one to move LlapNodeId, 
> that can be done later.
> The only logic change is that concurrent hash map, that never expires, is 
> replaced by Guava cache. A path to shut down a proxy is added, but does 
> nothing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17048) Pass HiveOperation info to HiveSemanticAnalyzerHook through HiveSemanticAnalyzerHookContext

2017-08-28 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-17048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-17048:
---
Fix Version/s: 2.4.0
   2.1.2
   2.3.1
   2.2.1

> Pass HiveOperation info to HiveSemanticAnalyzerHook through 
> HiveSemanticAnalyzerHookContext
> ---
>
> Key: HIVE-17048
> URL: https://issues.apache.org/jira/browse/HIVE-17048
> Project: Hive
>  Issue Type: Improvement
>  Components: Hooks
>Affects Versions: 2.1.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 2.1.2, 3.0.0, 2.4.0, 2.2.1, 2.3.1
>
> Attachments: HIVE-17048.1.patch, HIVE-17048.2.patch
>
>
> Currently hive passes the following info to HiveSemanticAnalyzerHook through 
> HiveSemanticAnalyzerHookContext (see 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L553).
>  But the operation type (HiveOperation) is also needed in some cases, e.g., 
> when integrating with Sentry. 
> {noformat}
> hookCtx.setConf(conf);
> hookCtx.setUserName(userName);
> hookCtx.setIpAddress(SessionState.get().getUserIpAddress());
> hookCtx.setCommand(command);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17385) Fix incremental repl error for non-native tables

2017-08-28 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144111#comment-16144111
 ] 

Daniel Dai commented on HIVE-17385:
---

Yes, we shall use the same logic for both bootstrap and incremental dump, nice 
catch. However, shall we simply skip non-native table instead of throw 
exception? Also why do we do null check on ImportSemanticAnalyzer? Do you see a 
case we load a null table?

> Fix incremental repl error for non-native tables
> 
>
> Key: HIVE-17385
> URL: https://issues.apache.org/jira/browse/HIVE-17385
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17385.1.patch, HIVE-17385.2.patch, 
> HIVE-17385.3.patch, HIVE-17385.4.patch
>
>
> See below error with incremental replication for non-native (storage handler 
> based) tables. The bug is that we are not checking a table should be 
> dumped/exported or not during incremental dump.
> 2017-08-02T12:31:48,195 ERROR [HiveServer2-Background-Pool: Thread-8078]: 
> exec.DDLTask (DDLTask.java:failed(632)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:LOCATION may not be specified for HBase.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-08-28 Thread Ratandeep Ratti (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144090#comment-16144090
 ] 

Ratandeep Ratti commented on HIVE-17394:


I've found this problem with Hive-1.1 . Didn't look too closely at Hive-2.x / 
trunk. But from a high level by looking at the code it seems the problem will 
also exist there.

> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Ratandeep Ratti
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png
>
>
> The following methods in {{AvroDeserializer}} keep regenerating TypeInfo 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. Not sure why we have to determine that 
> information again.
> Moreover the cache in SchmaToTypeInfo does not help in nullable Avro records 
> case as checking if an Avro record schema object already exists in the cache 
> requires traversing all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this IMO is to make use of the column TypeInfo which is 
> already passed in the worker method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-08-28 Thread Ratandeep Ratti (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti updated HIVE-17394:
---
Description: 
The following methods in {{AvroDeserializer}} keep regenerating TypeInfo 
objects for every nullable  field in a row.

This is happening in the following methods.

{code}
private Object deserializeNullableUnion(Object datum, Schema fileSchema, Schema 
recordSchema) throws AvroSerdeException {
// elided
line 312:  return worker(datum, fileSchema, newRecordSchema,
SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
}
..
private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
recordSchema)
// elided
line 357: return worker(datum, currentFileSchema, schema,
  SchemaToTypeInfo.generateTypeInfo(schema, null));
{code}

This is really bad in terms of performance. I'm not sure why didn't we use the 
TypeInfo we already have instead of generating again for each nullable field.  
If you look at the {{worker}} method which calls the method 
{{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
column is already determined. Not sure why we have to determine that 
information again.

Moreover the cache in SchmaToTypeInfo does not help in nullable Avro records 
case as checking if an Avro record schema object already exists in the cache 
requires traversing all the fields in the record schema.

I've attached profiling snapshot which shows maximum time is being spent in the 
cache.

One way of fixing this IMO is to make use of the column TypeInfo which is 
already passed in the worker method.

  was:
The following methods in {{AvroDeserializer}} keep regenerating TypeInfo 
objects for every nullable  field in a row.

This is happening in the following methods.

{code}
private Object deserializeNullableUnion(Object datum, Schema fileSchema, Schema 
recordSchema) throws AvroSerdeException {
// elided
line 312:  return worker(datum, fileSchema, newRecordSchema,
SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
}
..
private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
recordSchema)
// elided
line 357: return worker(datum, currentFileSchema, schema,
  SchemaToTypeInfo.generateTypeInfo(schema, null));
{code}

This is really bad in terms of performance. I'm not sure why didn't we use the 
TypeInfo we already have instead of generating again for each nullable field.  
If you look at the {{worker}} method which calls the method 
{{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
column is already determined. Not sure why we have to determine that 
information again.

More the cache in SchmaToTypeInfo does not help in nullable Avro records case 
as checking if an Avro record schema object already exists in the cache 
requires traversing the all the fields in the record schema.

I've attached profiling snapshot which shows maximum time is being spent in the 
cache.

One way of fixing this IMO is to make use of the column TypeInfo which is 
already passed in the worker method.


> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Ratandeep Ratti
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png
>
>
> The following methods in {{AvroDeserializer}} keep regenerating TypeInfo 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. Not sure why we have to determine that 
> information again.
> Moreover the cache in SchmaToTypeInfo does not help in nullable Avro records 
> case as checking if an Avro record schema object already exists in the cache 
> requires traversing all the fields in the record schema.
> I've attached profiling

[jira] [Updated] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-08-28 Thread Ratandeep Ratti (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti updated HIVE-17394:
---
Attachment: AvroSerDeUnionTypeInfo.png
AvroSerDe.nps

> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Ratandeep Ratti
> Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png
>
>
> The following methods in {{AvroDeserializer}} keep regenerating TypeInfo 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. Not sure why we have to determine that 
> information again.
> More the cache in SchmaToTypeInfo does not help in nullable Avro records case 
> as checking if an Avro record schema object already exists in the cache 
> requires traversing the all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this IMO is to make use of the column TypeInfo which is 
> already passed in the worker method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17366) Constraint replication in bootstrap

2017-08-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144082#comment-16144082
 ] 

ASF GitHub Bot commented on HIVE-17366:
---

GitHub user daijyc opened a pull request:

https://github.com/apache/hive/pull/236

HIVE-17366: Constraint replication in bootstrap



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/daijyc/hive HIVE-17366

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/236.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #236


commit 44ab859e7d1041915fd910eda6fced270938fa56
Author: Daniel Dai 
Date:   2017-08-28T17:19:29Z

HIVE-17366: Constraint replication in bootstrap




> Constraint replication in bootstrap
> ---
>
> Key: HIVE-17366
> URL: https://issues.apache.org/jira/browse/HIVE-17366
> Project: Hive
>  Issue Type: New Feature
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-17366.1.patch
>
>
> Incremental constraint replication is tracked in HIVE-15705. This is to track 
> the bootstrap replication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row

2017-08-28 Thread Ratandeep Ratti (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti updated HIVE-17394:
---
Summary: AvroSerde is regenerating TypeInfo objects for each nullable Avro 
field for every row  (was: AvroSerde is regenerating TypeInfo objects for each 
nullable Avro field in a row)

> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -
>
> Key: HIVE-17394
> URL: https://issues.apache.org/jira/browse/HIVE-17394
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Ratandeep Ratti
>
> The following methods in {{AvroDeserializer}} keep regenerating TypeInfo 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
> SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>   SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. Not sure why we have to determine that 
> information again.
> More the cache in SchmaToTypeInfo does not help in nullable Avro records case 
> as checking if an Avro record schema object already exists in the cache 
> requires traversing the all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this IMO is to make use of the column TypeInfo which is 
> already passed in the worker method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-08-28 Thread anishek (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143952#comment-16143952
 ] 

anishek commented on HIVE-16886:


[~lina.li] changes to notification log are in nested transactions with 
enclosing transaction executing the necessary statements for the user initiated 
operations, its as you have stated in you example { Execute path SQL 
statements, Persist notification log }.



> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>Assignee: anishek
> Attachments: datastore-identity-holes.diff, HIVE-16886.1.patch, 
> HIVE-16886.2.patch
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17388) Spark Stats for the WebUI Query Plan

2017-08-28 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143953#comment-16143953
 ] 

Xuefu Zhang commented on HIVE-17388:


Sorry that I am behind reviewing this patch and the patch in the other JIRA. I 
will try to get to them this week.

> Spark Stats for the WebUI Query Plan
> 
>
> Key: HIVE-17388
> URL: https://issues.apache.org/jira/browse/HIVE-17388
> Project: Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Minor
>  Labels: features, patch
> Attachments: HIVE-17388.patch, running_1.png, running_2.png, 
> success_1.png, success_2.png, success_3.png
>
>
> Click on a Spark stage in the WebUI/Drilldown/Query Plan graph, and Spark 
> task progress as well as the log file path will be displayed if 
> hive.server2.webui.show.stats=true. If the task is successful, 
> SparkStatistics will also be shown.
> Screenshots attached are from a run on a CDH cluster.
> Issues:
> * SparkStatistics aren't shown if task fails or is running.
> * Will need rebasing after HIVE-17300 is committed (current patch includes 
> HIVE-17300 changes)
> * Will need testing upstream. 
> Suggestion
> * It would be really easy to incorporate a progress bar to follow Spark 
> progress, with only a few tweaks to the JavaScript in:
> service/src/resources/hive-webapps/static/js/query-plan-graph.js



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-14951) ArrayIndexOutOfBoundsException in GroupByOperator

2017-08-28 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143679#comment-16143679
 ] 

Zoltan Haindrich commented on HIVE-14951:
-

I think there are 2 issues here: there is one which affect the query plan 
printout - and gives misleading plan description ; by adding both Map's below 
the same groupby

the other is the issue which is because the second gby is not able to return 
the data for tag=1; I suspect that the purpose of the DUMMY_STORE operator 
would be to hide this tweak...however I don't see it during execution...

> ArrayIndexOutOfBoundsException in GroupByOperator
> -
>
> Key: HIVE-14951
> URL: https://issues.apache.org/jira/browse/HIVE-14951
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>
> Engine: 
> Tez
> Query:
> select * from (select distinct a from f16) as f16, (select distinct a from 
> f1) as fprime where f16.a = fprime.a;
> Table: 
> create table f1 (a int, b string);
> create table f16 (a int, b string);
> Config:
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=false;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17383) ArrayIndexOutOfBoundsException in VectorGroupByOperator

2017-08-28 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143467#comment-16143467
 ] 

liyunzhang_intel commented on HIVE-17383:
-

[~lirui]: after enable vectorization, it throws ArrayIndexOutOfBoundsException.
query
{code}
set hive.cbo.enable=false;
set hive.user.install.directory=file:///tmp;
set fs.default.name=file:///;
set fs.defaultFS=file:///;
set tez.staging-dir=/tmp;
set tez.ignore.lib.uris=true;
set tez.runtime.optimize.local.fetch=true;
set tez.local.mode=true;
set hive.explain.user=false;
set hive.vectorized.execution.enabled=true;
select count(*) from (select key from src group by key) s where s.key='98';
{code}
the explain
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  DagId: root_20170828025707_7b882df3-3e96-47f0-b189-9b6919d44512:1
  Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE)
  DagName: root_20170828025707_7b882df3-3e96-47f0-b189-9b6919d44512:1
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: src
  Statistics: Num rows: 2906 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: (key = '98') (type: boolean)
Statistics: Num rows: 1453 Data size: 2906 Basic stats: 
COMPLETE Column stats: NONE
Select Operator
  Statistics: Num rows: 1453 Data size: 2906 Basic stats: 
COMPLETE Column stats: NONE
  Group By Operator
keys: '98' (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1453 Data size: 2906 Basic stats: 
COMPLETE Column stats: NONE
Reduce Output Operator
  key expressions: '98' (type: string)
  sort order: +
  Map-reduce partition columns: '98' (type: string)
  Statistics: Num rows: 1453 Data size: 2906 Basic 
stats: COMPLETE Column stats: NONE
Execution mode: vectorized
Reducer 2 
Execution mode: vectorized
Reduce Operator Tree:
  Group By Operator
keys: '98' (type: string)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 726 Data size: 1452 Basic stats: COMPLETE 
Column stats: NONE
Select Operator
  Statistics: Num rows: 726 Data size: 1452 Basic stats: 
COMPLETE Column stats: NONE
  Group By Operator
aggregations: count()
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
Reduce Output Operator
  sort order: 
  Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
  value expressions: _col0 (type: bigint)
Reducer 3 
Execution mode: vectorized
Reduce Operator Tree:
  Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink
{code}

> ArrayIndexOutOfBoundsException in VectorGroupByOperator
> ---
>
> Key: HIVE-17383
> URL: https://issues.apache.org/jira/browse/HIVE-17383
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>
> Query to reproduce:
> {noformat}
> set hive.cbo.enable=false;
> select count(*) from (select key from src group by key) s where s.key='98';
> {noformat}
> The stack trace is:
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
>   at 
>

[jira] [Commented] (HIVE-16823) "ArrayIndexOutOfBoundsException" in spark_vectorized_dynamic_partition_pruning.q

2017-08-28 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143455#comment-16143455
 ] 

Rui Li commented on HIVE-16823:
---

[~kellyzly], the v1 patch doesn't fix the root cause of the issue so it's not 
the right way to go. Let's figure out a fix for HIVE-17383 first and come back 
here. Besides, we do need to get rid of the "and true and true" conditions. 
Seems tez doesn't have such filters in the query plans.

> "ArrayIndexOutOfBoundsException" in 
> spark_vectorized_dynamic_partition_pruning.q
> 
>
> Key: HIVE-16823
> URL: https://issues.apache.org/jira/browse/HIVE-16823
> Project: Hive
>  Issue Type: Bug
>Reporter: Jianguo Tian
>Assignee: liyunzhang_intel
> Attachments: explain.spark, explain.tez, HIVE-16823.1.patch, 
> HIVE-16823.patch
>
>
> spark_vectorized_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.vectorized.execution.enabled=true;
> set hive.strict.checks.cartesian.product=false;
> -- parent is reduce tasks
> select count(*) from srcpart join (select ds as ds, ds as `date` from srcpart 
> group by ds) s on (srcpart.ds = s.ds) where s.`date` = '2008-04-08';
> {code}
> The exceptions are as follows:
> {code}
> 2017-06-05T09:20:31,468 ERROR [Executor task launch worker-0] 
> spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:413)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:301)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:54)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) 
> ~[scala-library-2.11.8.jar:?]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:85) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_112]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_112]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:832)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
>

[jira] [Commented] (HIVE-17383) ArrayIndexOutOfBoundsException in VectorGroupByOperator

2017-08-28 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143449#comment-16143449
 ] 

Rui Li commented on HIVE-17383:
---

[~kellyzly], I don't see the works shown as vectorized in your explain. Have 
you enabled vectorization?

> ArrayIndexOutOfBoundsException in VectorGroupByOperator
> ---
>
> Key: HIVE-17383
> URL: https://issues.apache.org/jira/browse/HIVE-17383
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>
> Query to reproduce:
> {noformat}
> set hive.cbo.enable=false;
> select count(*) from (select key from src group by key) s where s.key='98';
> {noformat}
> The stack trace is:
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:831)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:174)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1046)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:462)
>   ... 18 more
> {noformat}
> More details can be found in HIVE-16823



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17383) ArrayIndexOutOfBoundsException in VectorGroupByOperator

2017-08-28 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143439#comment-16143439
 ] 

liyunzhang_intel commented on HIVE-17383:
-

[~lirui]: this passes in latest master(6be50b7) in my tez env. If there is some 
wrong with the configuration, tell me!
query
{code}
set hive.cbo.enable=false;
set hive.user.install.directory=file:///tmp;
set fs.default.name=file:///;
set fs.defaultFS=file:///;
set tez.staging-dir=/tmp;
set tez.ignore.lib.uris=true;
set tez.runtime.optimize.local.fetch=true;
set tez.local.mode=true;
set hive.explain.user=false;
explain select count(*) from (select key from src group by key) s where 
s.key='98';
{code}
explain
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  DagId: root_20170828023743_be3df7bf-49cc-4c71-a4a7-25814558804c:1
  Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE)
  DagName: root_20170828023743_be3df7bf-49cc-4c71-a4a7-25814558804c:1
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: src
  Statistics: Num rows: 2906 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: (key = '98') (type: boolean)
Statistics: Num rows: 1453 Data size: 2906 Basic stats: 
COMPLETE Column stats: NONE
Select Operator
  Statistics: Num rows: 1453 Data size: 2906 Basic stats: 
COMPLETE Column stats: NONE
  Group By Operator
keys: '98' (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1453 Data size: 2906 Basic stats: 
COMPLETE Column stats: NONE
Reduce Output Operator
  key expressions: '98' (type: string)
  sort order: +
  Map-reduce partition columns: '98' (type: string)
  Statistics: Num rows: 1453 Data size: 2906 Basic 
stats: COMPLETE Column stats: NONE
Reducer 2 
Reduce Operator Tree:
  Group By Operator
keys: '98' (type: string)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 726 Data size: 1452 Basic stats: COMPLETE 
Column stats: NONE
Select Operator
  Statistics: Num rows: 726 Data size: 1452 Basic stats: 
COMPLETE Column stats: NONE
  Group By Operator
aggregations: count()
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
Reduce Output Operator
  sort order: 
  Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: NONE
  value expressions: _col0 (type: bigint)
Reducer 3 
Reduce Operator Tree:
  Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink

{code}

> ArrayIndexOutOfBoundsException in VectorGroupByOperator
> ---
>
> Key: HIVE-17383
> URL: https://issues.apache.org/jira/browse/HIVE-17383
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>
> Query to reproduce:
> {noformat}
> set hive.cbo.enable=false;
> select count(*) from (select key from src group by key) s where s.key='98';
> {noformat}
> The stack trace is:
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:831)
>   at 
>

[jira] [Commented] (HIVE-17383) ArrayIndexOutOfBoundsException in VectorGroupByOperator

2017-08-28 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143413#comment-16143413
 ] 

Rui Li commented on HIVE-17383:
---

[~kellyzly], I can reproduce the issue with latest master.

> ArrayIndexOutOfBoundsException in VectorGroupByOperator
> ---
>
> Key: HIVE-17383
> URL: https://issues.apache.org/jira/browse/HIVE-17383
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>
> Query to reproduce:
> {noformat}
> set hive.cbo.enable=false;
> select count(*) from (select key from src group by key) s where s.key='98';
> {noformat}
> The stack trace is:
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:831)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:174)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1046)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:462)
>   ... 18 more
> {noformat}
> More details can be found in HIVE-16823



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-16823) "ArrayIndexOutOfBoundsException" in spark_vectorized_dynamic_partition_pruning.q

2017-08-28 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143397#comment-16143397
 ] 

liyunzhang_intel edited comment on HIVE-16823 at 8/28/17 6:06 AM:
--

[~lirui]: can you help review the patch?
i have 1 question about {{spark_vectorized_dynamic_partition_pruning.q}}, 
should we add {{-- SORT_QUERY_RESULTS}} to the file, otherwise 
the result of 
{code}
select distinct ds from srcpart
{code}
{code}
2008-04-09  
2008-04-08
{code}

while the result in the q.out is
{code}
2008-04-08  
2008-04-09
{code}


was (Author: kellyzly):
[~lirui]: can you help review the patch?
i have 1 question about {{spark_vectorized_dynamic_partition_pruning.q}}, 
should we add {{-- SORT_QUERY_RESULTS}} to the file, otherwise in the q.out 
the result of 
{code}
select distinct ds from srcpart
{code}
{code}
2008-04-09  
2008-04-08
{code}


> "ArrayIndexOutOfBoundsException" in 
> spark_vectorized_dynamic_partition_pruning.q
> 
>
> Key: HIVE-16823
> URL: https://issues.apache.org/jira/browse/HIVE-16823
> Project: Hive
>  Issue Type: Bug
>Reporter: Jianguo Tian
>Assignee: liyunzhang_intel
> Attachments: explain.spark, explain.tez, HIVE-16823.1.patch, 
> HIVE-16823.patch
>
>
> spark_vectorized_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.vectorized.execution.enabled=true;
> set hive.strict.checks.cartesian.product=false;
> -- parent is reduce tasks
> select count(*) from srcpart join (select ds as ds, ds as `date` from srcpart 
> group by ds) s on (srcpart.ds = s.ds) where s.`date` = '2008-04-08';
> {code}
> The exceptions are as follows:
> {code}
> 2017-06-05T09:20:31,468 ERROR [Executor task launch worker-0] 
> spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing 
> vector batch (tag=0) Column vector types: 0:BYTES, 1:BYTES
> ["2008-04-08", "2008-04-08"]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processVectors(SparkReduceRecordHandler.java:413)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:301)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:54)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893) 
> ~[scala-library-2.11.8.jar:?]
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) 
> ~[scala-library-2.11.8.jar:?]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>  ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1974) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:85) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) 
> ~[spark-core_2.11-2.0.0.jar:2.0.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_112]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_112]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1

[jira] [Commented] (HIVE-17383) ArrayIndexOutOfBoundsException in VectorGroupByOperator

2017-08-28 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143405#comment-16143405
 ] 

liyunzhang_intel commented on HIVE-17383:
-

[~lirui]:  can you help to verify whether ArrayIndexOutOfBoundsException appear 
or not in above query? in my env(hive version:f86878b). No similar exception is 
thrown, this query passes. If there is a RS follows the GBY, the exception will 
not be thrown.


> ArrayIndexOutOfBoundsException in VectorGroupByOperator
> ---
>
> Key: HIVE-17383
> URL: https://issues.apache.org/jira/browse/HIVE-17383
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>
> Query to reproduce:
> {noformat}
> set hive.cbo.enable=false;
> select count(*) from (select key from src group by key) s where s.key='98';
> {noformat}
> The stack trace is:
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:831)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:174)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1046)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:462)
>   ... 18 more
> {noformat}
> More details can be found in HIVE-16823



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

81 matches

Mail list logo