[jira] [Commented] (HIVE-13314) Hive on spark mapjoin errors if spark.master is not set

2016-03-21 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205852#comment-15205852
 ] 

Szehon Ho commented on HIVE-13314:
--

I think you are right, should have searched before I wasted few hours debugging 
this :)

> Hive on spark mapjoin errors if spark.master is not set
> ---
>
> Key: HIVE-13314
> URL: https://issues.apache.org/jira/browse/HIVE-13314
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Minor
>
> There are some errors that happen if spark.master is not set.
> This is despite the code defaulting to yarn-cluster if spark.master is not 
> set by user or on the config files: 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java#L51]
> The funny thing is that while it works the first time due to this default, 
> subsequent tries will fail as the hiveConf is refreshed without that default 
> being set.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java#L180]
> Exception is follows:
> {noformat}
> Job aborted due to stage failure: Task 40 in stage 1.0 failed 4 times, most 
> recent failure: Lost task 40.3 in stage 1.0 (TID 22, 
> d2409.halxg.cloudera.com): java.lang.RuntimeException: Error processing row: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:154)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:117)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:223)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:490)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
>   ... 16 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.isDedicatedCluster(SparkUtilities.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:124)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:114)
>   ... 24 more
> Driver stacktrace:
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13294) AvroSerde leaks the connection in a case when reading schema from a url

2016-03-21 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205836#comment-15205836
 ] 

Lefty Leverenz commented on HIVE-13294:
---

[~ctang.ma], I don't see this in master (for 2.1.0) and only got an email 
message for the branch-2.0 commit.  Will the commit to master come later?

> AvroSerde leaks the connection in a case when reading schema from a url
> ---
>
> Key: HIVE-13294
> URL: https://issues.apache.org/jira/browse/HIVE-13294
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 2.1.0, 2.0.1
>
> Attachments: HIVE-13294.1.patch, HIVE-13294.patch
>
>
> AvroSerde leaks the connection in a case when reading schema from url:
> In 
> public static Schema determineSchemaOrThrowException {
> ...
> return AvroSerdeUtils.getSchemaFor(new URL(schemaString).openStream());
> ...
> }
> The opened inputStream is never closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13300) Hive on spark throws exception for multi-insert with join

2016-03-21 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-13300:
-
Attachment: HIVE-13300.3.patch

> Hive on spark throws exception for multi-insert with join
> -
>
> Key: HIVE-13300
> URL: https://issues.apache.org/jira/browse/HIVE-13300
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.0.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-13300.2.patch, HIVE-13300.3.patch, HIVE-13300.patch
>
>
> For certain multi-insert queries, Hive on Spark throws a deserialization 
> error.
> {noformat}
> create table status_updates(userid int,status string,ds string);
> create table profiles(userid int,school string,gender int);
> drop table school_summary; create table school_summary(school string,cnt int) 
> partitioned by (ds string);
> drop table gender_summary; create table gender_summary(gender int,cnt int) 
> partitioned by (ds string);
> insert into status_updates values (1, "status_1", "2016-03-16");
> insert into profiles values (1, "school_1", 0);
> set hive.auto.convert.join=false;
> set hive.execution.engine=spark;
> FROM (SELECT a.status, b.school, b.gender
> FROM status_updates a JOIN profiles b
> ON (a.userid = b.userid and
> a.ds='2009-03-20' )
> ) subq1
> INSERT OVERWRITE TABLE gender_summary
> PARTITION(ds='2009-03-20')
> SELECT subq1.gender, COUNT(1) GROUP BY subq1.gender
> INSERT OVERWRITE TABLE school_summary
> PARTITION(ds='2009-03-20')
> SELECT subq1.school, COUNT(1) GROUP BY subq1.school
> {noformat}
> Error:
> {noformat}
> 16/03/17 13:29:00 [task-result-getter-3]: WARN scheduler.TaskSetManager: Lost 
> task 0.0 in stage 2.0 (TID 3, localhost): java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable 
> to deserialize reduce input key from x1x128x0x0 with properties 
> {serialization.sort.order.null=a, columns=reducesinkkey0, 
> serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
>  serialization.sort.order=+, columns.types=int}
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:279)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:724)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error: Unable to deserialize reduce input key from x1x128x0x0 with properties 
> {serialization.sort.order.null=a, columns=reducesinkkey0, 
> serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
>  serialization.sort.order=+, columns.types=int}
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:251)
>   ... 12 more
> Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:241)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:249)
>   ... 12 more
> Caused by: java.io.EOFException
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54)
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserializeInt(BinarySortableSerDe.java:597)
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:288)
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:237)
>   ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA

[jira] [Updated] (HIVE-13300) Hive on spark throws exception for multi-insert with join

2016-03-21 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-13300:
-
Attachment: (was: HIVE-13300.3.patch)

> Hive on spark throws exception for multi-insert with join
> -
>
> Key: HIVE-13300
> URL: https://issues.apache.org/jira/browse/HIVE-13300
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.0.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-13300.2.patch, HIVE-13300.3.patch, HIVE-13300.patch
>
>
> For certain multi-insert queries, Hive on Spark throws a deserialization 
> error.
> {noformat}
> create table status_updates(userid int,status string,ds string);
> create table profiles(userid int,school string,gender int);
> drop table school_summary; create table school_summary(school string,cnt int) 
> partitioned by (ds string);
> drop table gender_summary; create table gender_summary(gender int,cnt int) 
> partitioned by (ds string);
> insert into status_updates values (1, "status_1", "2016-03-16");
> insert into profiles values (1, "school_1", 0);
> set hive.auto.convert.join=false;
> set hive.execution.engine=spark;
> FROM (SELECT a.status, b.school, b.gender
> FROM status_updates a JOIN profiles b
> ON (a.userid = b.userid and
> a.ds='2009-03-20' )
> ) subq1
> INSERT OVERWRITE TABLE gender_summary
> PARTITION(ds='2009-03-20')
> SELECT subq1.gender, COUNT(1) GROUP BY subq1.gender
> INSERT OVERWRITE TABLE school_summary
> PARTITION(ds='2009-03-20')
> SELECT subq1.school, COUNT(1) GROUP BY subq1.school
> {noformat}
> Error:
> {noformat}
> 16/03/17 13:29:00 [task-result-getter-3]: WARN scheduler.TaskSetManager: Lost 
> task 0.0 in stage 2.0 (TID 3, localhost): java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable 
> to deserialize reduce input key from x1x128x0x0 with properties 
> {serialization.sort.order.null=a, columns=reducesinkkey0, 
> serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
>  serialization.sort.order=+, columns.types=int}
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:279)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:724)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error: Unable to deserialize reduce input key from x1x128x0x0 with properties 
> {serialization.sort.order.null=a, columns=reducesinkkey0, 
> serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
>  serialization.sort.order=+, columns.types=int}
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:251)
>   ... 12 more
> Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:241)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:249)
>   ... 12 more
> Caused by: java.io.EOFException
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54)
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserializeInt(BinarySortableSerDe.java:597)
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:288)
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:237)
>   ... 13 more
> {noformat}



--
This message was sent by 

[jira] [Commented] (HIVE-13107) LLAP: Rotate GC logs periodically to prevent full disks

2016-03-21 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205797#comment-15205797
 ] 

Prasanth Jayachandran commented on HIVE-13107:
--

+1

> LLAP: Rotate GC logs periodically to prevent full disks
> ---
>
> Key: HIVE-13107
> URL: https://issues.apache.org/jira/browse/HIVE-13107
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Trivial
> Attachments: HIVE-13107.1.patch
>
>
> STDOUT cannot be rotated easily, so log GC logs to a different file and 
> rotate periodically with -XX:+UseGCLogFileRotation
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-13327) SessionID added to HS2 threadname does not trim spaces

2016-03-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-13327.
--
   Resolution: Fixed
Fix Version/s: 2.1.0

Committed to master. Thanks [~gopalv]!

> SessionID added to HS2 threadname does not trim spaces
> --
>
> Key: HIVE-13327
> URL: https://issues.apache.org/jira/browse/HIVE-13327
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Carter Shanklin
>Assignee: Prasanth Jayachandran
> Fix For: 2.1.0
>
> Attachments: HIVE-13327.1.patch
>
>
> HIVE-13153 introduced off-by-one in appending spaces to thread names. 
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13322) LLAP: ZK registry throws at shutdown due to slf4j trying to initialize a log4j logger

2016-03-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13322:
-
Description: 
{noformat}
2016-03-08 23:56:34,883 Thread-5 FATAL Unable to register shutdown hook because 
JVM is shutting down. java.lang.IllegalStateException: Cannot add new shutdown 
hook as this is not started. Current state: STOPPED
at 
org.apache.logging.log4j.core.util.DefaultShutdownCallbackRegistry.addShutdownCallback(DefaultShutdownCallbackRegistry.java:113)
at 
org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:271)
at 
org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:256)
at 
org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:216)
at 
org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:146)
at 
org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)
at org.apache.logging.log4j.LogManager.getContext(LogManager.java:185)
at 
org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:103)
at 
org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:43)
at 
org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42)
at 
org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:29)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:305)
at 
org.apache.curator.utils.CloseableUtils.(CloseableUtils.java:33)
at 
org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.stop(LlapZookeeperRegistryImpl.java:584)
at 
org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.serviceStop(LlapRegistryService.java:105)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at 
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at 
org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
at 
org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
at 
org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.serviceStop(LlapDaemon.java:294)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at 
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:65)
at 
org.apache.hadoop.service.CompositeService$CompositeServiceShutdownHook.run(CompositeService.java:183)
at 
org.apache.hive.common.util.ShutdownHookManager$1.run(ShutdownHookManager.java:63)
{noformat}

NO PRECOMMIT TESTS

  was:
{noformat}
2016-03-08 23:56:34,883 Thread-5 FATAL Unable to register shutdown hook because 
JVM is shutting down. java.lang.IllegalStateException: Cannot add new shutdown 
hook as this is not started. Current state: STOPPED
at 
org.apache.logging.log4j.core.util.DefaultShutdownCallbackRegistry.addShutdownCallback(DefaultShutdownCallbackRegistry.java:113)
at 
org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:271)
at 
org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:256)
at 
org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:216)
at 
org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:146)
at 
org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)
at org.apache.logging.log4j.LogManager.getContext(LogManager.java:185)
at 
org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:103)
at 
org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:43)
at 
org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42)
at 
org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:29)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:305)
at 
org.apache.curator.utils.CloseableUtils.(CloseableUtils.java:33)
at 
org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.stop(LlapZookeeperRegistryImpl.java:584)
at 

[jira] [Commented] (HIVE-13327) SessionID added to HS2 threadname does not trim spaces

2016-03-21 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205786#comment-15205786
 ] 

Gopal V commented on HIVE-13327:


LGTM - +1

Since the trim() is inside the if() it can't NPE.

> SessionID added to HS2 threadname does not trim spaces
> --
>
> Key: HIVE-13327
> URL: https://issues.apache.org/jira/browse/HIVE-13327
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Carter Shanklin
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13327.1.patch
>
>
> HIVE-13153 introduced off-by-one in appending spaces to thread names. 
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13327) SessionID added to HS2 threadname does not trim spaces

2016-03-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13327:
-
Description: 
HIVE-13153 introduced off-by-one in appending spaces to thread names. 

NO PRECOMMIT TESTS

  was:HIVE-13153 introduced off-by-one in appending spaces to thread names. 


> SessionID added to HS2 threadname does not trim spaces
> --
>
> Key: HIVE-13327
> URL: https://issues.apache.org/jira/browse/HIVE-13327
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Carter Shanklin
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13327.1.patch
>
>
> HIVE-13153 introduced off-by-one in appending spaces to thread names. 
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13327) SessionID added to HS2 threadname does not trim spaces

2016-03-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13327:
-
Attachment: HIVE-13327.1.patch

> SessionID added to HS2 threadname does not trim spaces
> --
>
> Key: HIVE-13327
> URL: https://issues.apache.org/jira/browse/HIVE-13327
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Carter Shanklin
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13327.1.patch
>
>
> HIVE-13153 introduced off-by-one in appending spaces to thread names. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13322) LLAP: ZK registry throws at shutdown due to slf4j trying to initialize a log4j logger

2016-03-21 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205768#comment-15205768
 ] 

Prasanth Jayachandran commented on HIVE-13322:
--

LGTM, +1

> LLAP: ZK registry throws at shutdown due to slf4j trying to initialize a 
> log4j logger
> -
>
> Key: HIVE-13322
> URL: https://issues.apache.org/jira/browse/HIVE-13322
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Prasanth Jayachandran
>Priority: Minor
> Attachments: HIVE-13322.1.patch
>
>
> {noformat}
> 2016-03-08 23:56:34,883 Thread-5 FATAL Unable to register shutdown hook 
> because JVM is shutting down. java.lang.IllegalStateException: Cannot add new 
> shutdown hook as this is not started. Current state: STOPPED
>   at 
> org.apache.logging.log4j.core.util.DefaultShutdownCallbackRegistry.addShutdownCallback(DefaultShutdownCallbackRegistry.java:113)
>   at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:271)
>   at 
> org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:256)
>   at 
> org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:216)
>   at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:146)
>   at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)
>   at org.apache.logging.log4j.LogManager.getContext(LogManager.java:185)
>   at 
> org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:103)
>   at 
> org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:43)
>   at 
> org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42)
>   at 
> org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:29)
>   at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
>   at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:305)
>   at 
> org.apache.curator.utils.CloseableUtils.(CloseableUtils.java:33)
>   at 
> org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.stop(LlapZookeeperRegistryImpl.java:584)
>   at 
> org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.serviceStop(LlapRegistryService.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.serviceStop(LlapDaemon.java:294)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:65)
>   at 
> org.apache.hadoop.service.CompositeService$CompositeServiceShutdownHook.run(CompositeService.java:183)
>   at 
> org.apache.hive.common.util.ShutdownHookManager$1.run(ShutdownHookManager.java:63)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13322) LLAP: ZK registry throws at shutdown due to slf4j trying to initialize a log4j logger

2016-03-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13322:
-
Assignee: Gopal V  (was: Prasanth Jayachandran)

> LLAP: ZK registry throws at shutdown due to slf4j trying to initialize a 
> log4j logger
> -
>
> Key: HIVE-13322
> URL: https://issues.apache.org/jira/browse/HIVE-13322
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-13322.1.patch
>
>
> {noformat}
> 2016-03-08 23:56:34,883 Thread-5 FATAL Unable to register shutdown hook 
> because JVM is shutting down. java.lang.IllegalStateException: Cannot add new 
> shutdown hook as this is not started. Current state: STOPPED
>   at 
> org.apache.logging.log4j.core.util.DefaultShutdownCallbackRegistry.addShutdownCallback(DefaultShutdownCallbackRegistry.java:113)
>   at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:271)
>   at 
> org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:256)
>   at 
> org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:216)
>   at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:146)
>   at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)
>   at org.apache.logging.log4j.LogManager.getContext(LogManager.java:185)
>   at 
> org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:103)
>   at 
> org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:43)
>   at 
> org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42)
>   at 
> org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:29)
>   at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
>   at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:305)
>   at 
> org.apache.curator.utils.CloseableUtils.(CloseableUtils.java:33)
>   at 
> org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.stop(LlapZookeeperRegistryImpl.java:584)
>   at 
> org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.serviceStop(LlapRegistryService.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.serviceStop(LlapDaemon.java:294)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:65)
>   at 
> org.apache.hadoop.service.CompositeService$CompositeServiceShutdownHook.run(CompositeService.java:183)
>   at 
> org.apache.hive.common.util.ShutdownHookManager$1.run(ShutdownHookManager.java:63)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13322) LLAP: ZK registry throws at shutdown due to slf4j trying to initialize a log4j logger

2016-03-21 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-13322:
---
Status: Patch Available  (was: Open)

> LLAP: ZK registry throws at shutdown due to slf4j trying to initialize a 
> log4j logger
> -
>
> Key: HIVE-13322
> URL: https://issues.apache.org/jira/browse/HIVE-13322
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Prasanth Jayachandran
>Priority: Minor
> Attachments: HIVE-13322.1.patch
>
>
> {noformat}
> 2016-03-08 23:56:34,883 Thread-5 FATAL Unable to register shutdown hook 
> because JVM is shutting down. java.lang.IllegalStateException: Cannot add new 
> shutdown hook as this is not started. Current state: STOPPED
>   at 
> org.apache.logging.log4j.core.util.DefaultShutdownCallbackRegistry.addShutdownCallback(DefaultShutdownCallbackRegistry.java:113)
>   at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:271)
>   at 
> org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:256)
>   at 
> org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:216)
>   at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:146)
>   at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)
>   at org.apache.logging.log4j.LogManager.getContext(LogManager.java:185)
>   at 
> org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:103)
>   at 
> org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:43)
>   at 
> org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42)
>   at 
> org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:29)
>   at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
>   at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:305)
>   at 
> org.apache.curator.utils.CloseableUtils.(CloseableUtils.java:33)
>   at 
> org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.stop(LlapZookeeperRegistryImpl.java:584)
>   at 
> org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.serviceStop(LlapRegistryService.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.serviceStop(LlapDaemon.java:294)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:65)
>   at 
> org.apache.hadoop.service.CompositeService$CompositeServiceShutdownHook.run(CompositeService.java:183)
>   at 
> org.apache.hive.common.util.ShutdownHookManager$1.run(ShutdownHookManager.java:63)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13325) Excessive logging when ORC PPD fails type conversions

2016-03-21 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205751#comment-15205751
 ] 

Gopal V commented on HIVE-13325:


+1 tests pending.

> Excessive logging when ORC PPD fails type conversions
> -
>
> Key: HIVE-13325
> URL: https://issues.apache.org/jira/browse/HIVE-13325
> Project: Hive
>  Issue Type: Bug
>  Components: Logging, ORC
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13325.1.patch
>
>
> Timestamp was specified as "-MM-DD HH:MM:SS": 2016-01-23 00:00:00
> {code}
> 2016-02-10 02:15:43,175 [WARN] [TezChild] |orc.RecordReaderImpl|: Exception 
> when evaluating predicate. Skipping ORC PPD. Exception: 
> java.lang.IllegalArgumentException: ORC SARGS could not convert from String 
> to TIMESTAMP
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.getBaseObjectForComparison(RecordReaderImpl.java:659)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:373)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.evaluatePredicateProto(RecordReaderImpl.java:338)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:710)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.pickRowGroups(RecordReaderImpl.java:751)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:777)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:205)
> at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:598)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:183)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.(OrcRawRecordMerger.java:226)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:437)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1269)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1151)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:135)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:650)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:406)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
>

[jira] [Commented] (HIVE-13325) Excessive logging when ORC PPD fails type conversions

2016-03-21 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205745#comment-15205745
 ] 

Prasanth Jayachandran commented on HIVE-13325:
--

[~gopalv]/[~sseth] Can someone please take a look?

> Excessive logging when ORC PPD fails type conversions
> -
>
> Key: HIVE-13325
> URL: https://issues.apache.org/jira/browse/HIVE-13325
> Project: Hive
>  Issue Type: Bug
>  Components: Logging, ORC
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13325.1.patch
>
>
> Timestamp was specified as "-MM-DD HH:MM:SS": 2016-01-23 00:00:00
> {code}
> 2016-02-10 02:15:43,175 [WARN] [TezChild] |orc.RecordReaderImpl|: Exception 
> when evaluating predicate. Skipping ORC PPD. Exception: 
> java.lang.IllegalArgumentException: ORC SARGS could not convert from String 
> to TIMESTAMP
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.getBaseObjectForComparison(RecordReaderImpl.java:659)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:373)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.evaluatePredicateProto(RecordReaderImpl.java:338)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:710)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.pickRowGroups(RecordReaderImpl.java:751)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:777)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:205)
> at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:598)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:183)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.(OrcRawRecordMerger.java:226)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:437)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1269)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1151)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:135)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:650)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:406)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
> at 
> 

[jira] [Comment Edited] (HIVE-13325) Excessive logging when ORC PPD fails type conversions

2016-03-21 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205745#comment-15205745
 ] 

Prasanth Jayachandran edited comment on HIVE-13325 at 3/22/16 4:04 AM:
---

[~gopalv]/[~sseth] Can someone please take a look? This patch avoids logging 
the full stacktrace.


was (Author: prasanth_j):
[~gopalv]/[~sseth] Can someone please take a look?

> Excessive logging when ORC PPD fails type conversions
> -
>
> Key: HIVE-13325
> URL: https://issues.apache.org/jira/browse/HIVE-13325
> Project: Hive
>  Issue Type: Bug
>  Components: Logging, ORC
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13325.1.patch
>
>
> Timestamp was specified as "-MM-DD HH:MM:SS": 2016-01-23 00:00:00
> {code}
> 2016-02-10 02:15:43,175 [WARN] [TezChild] |orc.RecordReaderImpl|: Exception 
> when evaluating predicate. Skipping ORC PPD. Exception: 
> java.lang.IllegalArgumentException: ORC SARGS could not convert from String 
> to TIMESTAMP
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.getBaseObjectForComparison(RecordReaderImpl.java:659)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:373)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.evaluatePredicateProto(RecordReaderImpl.java:338)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:710)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.pickRowGroups(RecordReaderImpl.java:751)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:777)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:205)
> at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:598)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:183)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.(OrcRawRecordMerger.java:226)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:437)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1269)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1151)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:135)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:650)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:406)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> 

[jira] [Updated] (HIVE-13325) Excessive logging when ORC PPD fails type conversions

2016-03-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13325:
-
Status: Patch Available  (was: Open)

> Excessive logging when ORC PPD fails type conversions
> -
>
> Key: HIVE-13325
> URL: https://issues.apache.org/jira/browse/HIVE-13325
> Project: Hive
>  Issue Type: Bug
>  Components: Logging, ORC
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13325.1.patch
>
>
> Timestamp was specified as "-MM-DD HH:MM:SS": 2016-01-23 00:00:00
> {code}
> 2016-02-10 02:15:43,175 [WARN] [TezChild] |orc.RecordReaderImpl|: Exception 
> when evaluating predicate. Skipping ORC PPD. Exception: 
> java.lang.IllegalArgumentException: ORC SARGS could not convert from String 
> to TIMESTAMP
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.getBaseObjectForComparison(RecordReaderImpl.java:659)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:373)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.evaluatePredicateProto(RecordReaderImpl.java:338)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:710)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.pickRowGroups(RecordReaderImpl.java:751)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:777)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:205)
> at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:598)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:183)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.(OrcRawRecordMerger.java:226)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:437)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1269)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1151)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:135)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:650)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:406)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
> at 
> 

[jira] [Updated] (HIVE-13325) Excessive logging when ORC PPD fails type conversions

2016-03-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13325:
-
Attachment: HIVE-13325.1.patch

> Excessive logging when ORC PPD fails type conversions
> -
>
> Key: HIVE-13325
> URL: https://issues.apache.org/jira/browse/HIVE-13325
> Project: Hive
>  Issue Type: Bug
>  Components: Logging, ORC
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Siddharth Seth
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13325.1.patch
>
>
> Timestamp was specified as "-MM-DD HH:MM:SS": 2016-01-23 00:00:00
> {code}
> 2016-02-10 02:15:43,175 [WARN] [TezChild] |orc.RecordReaderImpl|: Exception 
> when evaluating predicate. Skipping ORC PPD. Exception: 
> java.lang.IllegalArgumentException: ORC SARGS could not convert from String 
> to TIMESTAMP
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.getBaseObjectForComparison(RecordReaderImpl.java:659)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:373)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.evaluatePredicateProto(RecordReaderImpl.java:338)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:710)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.pickRowGroups(RecordReaderImpl.java:751)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:777)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:205)
> at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:598)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:183)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.(OrcRawRecordMerger.java:226)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:437)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1269)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1151)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:135)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:650)
> at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
> at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:406)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)

[jira] [Commented] (HIVE-11388) Allow ACID Compactor components to run in multiple metastores

2016-03-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205729#comment-15205729
 ] 

Hive QA commented on HIVE-11388:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12794647/HIVE-11388.7.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9850 tests executed
*Failed tests:*
{noformat}
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7333/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7333/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7333/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12794647 - PreCommit-HIVE-TRUNK-Build

> Allow ACID Compactor components to run in multiple metastores
> -
>
> Key: HIVE-11388
> URL: https://issues.apache.org/jira/browse/HIVE-11388
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-11388.2.patch, HIVE-11388.4.patch, 
> HIVE-11388.5.patch, HIVE-11388.6.patch, HIVE-11388.7.patch, HIVE-11388.patch
>
>
> (this description is no loner accurate; see further comments)
> org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs 
> inside the metastore service to manage compactions of ACID tables.  There 
> should be exactly 1 instance of this thread (even with multiple Thrift 
> services).
> This is documented in 
> https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration
>  but not enforced.
> Should add enforcement, since more than 1 Initiator could cause concurrent 
> attempts to compact the same table/partition - which will not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13111) Fix timestamp / interval_day_time wrong results with HIVE-9862

2016-03-21 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13111:

Attachment: HIVE-13111.04.patch

> Fix timestamp / interval_day_time wrong results with HIVE-9862 
> ---
>
> Key: HIVE-13111
> URL: https://issues.apache.org/jira/browse/HIVE-13111
> Project: Hive
>  Issue Type: Bug
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13111.01.patch, HIVE-13111.02.patch, 
> HIVE-13111.03.patch, HIVE-13111.04.patch
>
>
> Fix timestamp / interval_day_time issues discovered when testing the 
> Vectorized Text patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13111) Fix timestamp / interval_day_time wrong results with HIVE-9862

2016-03-21 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13111:

Status: Patch Available  (was: In Progress)

> Fix timestamp / interval_day_time wrong results with HIVE-9862 
> ---
>
> Key: HIVE-13111
> URL: https://issues.apache.org/jira/browse/HIVE-13111
> Project: Hive
>  Issue Type: Bug
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13111.01.patch, HIVE-13111.02.patch, 
> HIVE-13111.03.patch, HIVE-13111.04.patch
>
>
> Fix timestamp / interval_day_time issues discovered when testing the 
> Vectorized Text patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13111) Fix timestamp / interval_day_time wrong results with HIVE-9862

2016-03-21 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13111:

Status: In Progress  (was: Patch Available)

> Fix timestamp / interval_day_time wrong results with HIVE-9862 
> ---
>
> Key: HIVE-13111
> URL: https://issues.apache.org/jira/browse/HIVE-13111
> Project: Hive
>  Issue Type: Bug
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13111.01.patch, HIVE-13111.02.patch, 
> HIVE-13111.03.patch, HIVE-13111.04.patch
>
>
> Fix timestamp / interval_day_time issues discovered when testing the 
> Vectorized Text patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-03-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9660:
---
Attachment: HIVE-9660.patch

This doesn't quite work for uncompressed, I'd need to fix some things. I was 
able to see at least some tests pass on this, though. Also, needs some comments 
and cleanup. Let's see what fails...

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.WIP2.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13324) LLAP: history log for FRAGMENT_START doesn't log DagId correctly

2016-03-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13324:

Description: 
{noformat}
$ grep -B 1 "TaskId=212" history.log 
Event=FRAGMENT_START, HostName=..., 
ApplicationId=application_1455662455106_2695, 
ContainerId=container_1_2695_01_000213, DagName=select
sum(l_extendedprice * l_discount...25(Stage-1), DagId=0, VertexName=Map 1, 
TaskId=212, TaskAttemptId=0, SubmitTime=1457493007357
--
Event=FRAGMENT_END, HostName=..., ApplicationId=application_1455662455106_2695, 
ContainerId=container_1_2695_01_000213, DagName=select
sum(l_extendedprice * l_discount...25(Stage-1), DagId=2, VertexName=Map 1, 
TaskId=212, TaskAttemptId=0, ThreadName=Task-Executor-1, Succeeded=true, 
StartTime=1457493007358, EndTime=1457493011916
--
Event=FRAGMENT_START, HostName=..., 
ApplicationId=application_1455662455106_2695, 
ContainerId=container_1_2695_01_000434, DagName=select
sum(l_extendedprice * l_discount...25(Stage-1), DagId=0, VertexName=Map 1, 
TaskId=212, TaskAttemptId=0, SubmitTime=1457493023131
--
Event=FRAGMENT_END, HostName=..., ApplicationId=application_1455662455106_2695, 
ContainerId=container_1_2695_01_000434, DagName=select
sum(l_extendedprice * l_discount...25(Stage-1), DagId=3, VertexName=Map 1, 
TaskId=212, TaskAttemptId=0, ThreadName=Task-Executor-2, Succeeded=true, 
StartTime=1457493023132, EndTime=1457493024695
{noformat}
etc. 
It's always 0.

  was:
{noformat}
$ grep -B 1 "TaskId=212" history.log 
Event=FRAGMENT_START, HostName=cn109-10.l42scl.hortonworks.com, 
ApplicationId=application_1455662455106_2695, 
ContainerId=container_1_2695_01_000213, DagName=select
sum(l_extendedprice * l_discount...25(Stage-1), DagId=0, VertexName=Map 1, 
TaskId=212, TaskAttemptId=0, SubmitTime=1457493007357
--
Event=FRAGMENT_END, HostName=cn109-10.l42scl.hortonworks.com, 
ApplicationId=application_1455662455106_2695, 
ContainerId=container_1_2695_01_000213, DagName=select
sum(l_extendedprice * l_discount...25(Stage-1), DagId=2, VertexName=Map 1, 
TaskId=212, TaskAttemptId=0, ThreadName=Task-Executor-1, Succeeded=true, 
StartTime=1457493007358, EndTime=1457493011916
--
Event=FRAGMENT_START, HostName=cn109-10.l42scl.hortonworks.com, 
ApplicationId=application_1455662455106_2695, 
ContainerId=container_1_2695_01_000434, DagName=select
sum(l_extendedprice * l_discount...25(Stage-1), DagId=0, VertexName=Map 1, 
TaskId=212, TaskAttemptId=0, SubmitTime=1457493023131
--
Event=FRAGMENT_END, HostName=cn109-10.l42scl.hortonworks.com, 
ApplicationId=application_1455662455106_2695, 
ContainerId=container_1_2695_01_000434, DagName=select
sum(l_extendedprice * l_discount...25(Stage-1), DagId=3, VertexName=Map 1, 
TaskId=212, TaskAttemptId=0, ThreadName=Task-Executor-2, Succeeded=true, 
StartTime=1457493023132, EndTime=1457493024695
{noformat}
etc. 
It's always 0.


> LLAP: history log for FRAGMENT_START doesn't log DagId correctly
> 
>
> Key: HIVE-13324
> URL: https://issues.apache.org/jira/browse/HIVE-13324
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Siddharth Seth
>
> {noformat}
> $ grep -B 1 "TaskId=212" history.log 
> Event=FRAGMENT_START, HostName=..., 
> ApplicationId=application_1455662455106_2695, 
> ContainerId=container_1_2695_01_000213, DagName=select
> sum(l_extendedprice * l_discount...25(Stage-1), DagId=0, VertexName=Map 1, 
> TaskId=212, TaskAttemptId=0, SubmitTime=1457493007357
> --
> Event=FRAGMENT_END, HostName=..., 
> ApplicationId=application_1455662455106_2695, 
> ContainerId=container_1_2695_01_000213, DagName=select
> sum(l_extendedprice * l_discount...25(Stage-1), DagId=2, VertexName=Map 1, 
> TaskId=212, TaskAttemptId=0, ThreadName=Task-Executor-1, Succeeded=true, 
> StartTime=1457493007358, EndTime=1457493011916
> --
> Event=FRAGMENT_START, HostName=..., 
> ApplicationId=application_1455662455106_2695, 
> ContainerId=container_1_2695_01_000434, DagName=select
> sum(l_extendedprice * l_discount...25(Stage-1), DagId=0, VertexName=Map 1, 
> TaskId=212, TaskAttemptId=0, SubmitTime=1457493023131
> --
> Event=FRAGMENT_END, HostName=..., 
> ApplicationId=application_1455662455106_2695, 
> ContainerId=container_1_2695_01_000434, DagName=select
> sum(l_extendedprice * l_discount...25(Stage-1), DagId=3, VertexName=Map 1, 
> TaskId=212, TaskAttemptId=0, ThreadName=Task-Executor-2, Succeeded=true, 
> StartTime=1457493023132, EndTime=1457493024695
> {noformat}
> etc. 
> It's always 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13197) Add adapted constprog2.q and constprog_partitioner.q tests back

2016-03-21 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-13197:

Status: Patch Available  (was: Open)

> Add adapted constprog2.q and constprog_partitioner.q tests back
> ---
>
> Key: HIVE-13197
> URL: https://issues.apache.org/jira/browse/HIVE-13197
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Jesus Camacho Rodriguez
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-13197.patch
>
>
> HIVE-12749 removes constprog2.q and constprog_partitioner.q tests, as they 
> did not test constant propagation anymore. Ideally, we should create them 
> again with compatible types to test constant propagation and constant 
> propagation in the presence of partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13197) Add adapted constprog2.q and constprog_partitioner.q tests back

2016-03-21 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-13197:

Attachment: HIVE-13197.patch

> Add adapted constprog2.q and constprog_partitioner.q tests back
> ---
>
> Key: HIVE-13197
> URL: https://issues.apache.org/jira/browse/HIVE-13197
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Jesus Camacho Rodriguez
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-13197.patch
>
>
> HIVE-12749 removes constprog2.q and constprog_partitioner.q tests, as they 
> did not test constant propagation anymore. Ideally, we should create them 
> again with compatible types to test constant propagation and constant 
> propagation in the presence of partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12650) Spark-submit is killed when Hive times out. Killing spark-submit doesn't cancel AM request. When AM is finally launched, it tries to connect back to Hive and gets refus

2016-03-21 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205603#comment-15205603
 ] 

Rui Li commented on HIVE-12650:
---

Regarding better error message, do you think we can throw a timeout exception 
if SparkContext is not up after certain amount of time? Otherwise user only 
gets a timeout on the future and doesn't know the cause. On the other hand, 
this means adding another property and I think it only works for yarn-client.

> Spark-submit is killed when Hive times out. Killing spark-submit doesn't 
> cancel AM request. When AM is finally launched, it tries to connect back to 
> Hive and gets refused.
> ---
>
> Key: HIVE-12650
> URL: https://issues.apache.org/jira/browse/HIVE-12650
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Xuefu Zhang
>
> I think hive.spark.client.server.connect.timeout should be set greater than 
> spark.yarn.am.waitTime. The default value for 
> spark.yarn.am.waitTime is 100s, and the default value for 
> hive.spark.client.server.connect.timeout is 90s, which is not good. We can 
> increase it to a larger value such as 120s.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13277) Exception "Unable to create serializer 'org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer' " occurred during query execution on spark engine when ve

2016-03-21 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205599#comment-15205599
 ] 

Xin Hao commented on HIVE-13277:


Hi, Kapil & Rui,
TPCx-BB query2 is only an example here. Many queries in TPCx-BB failed due to 
similar reason. 

> Exception "Unable to create serializer 
> 'org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer' " 
> occurred during query execution on spark engine when vectorized execution is 
> switched on
> -
>
> Key: HIVE-13277
> URL: https://issues.apache.org/jira/browse/HIVE-13277
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Hive Version: Apache Hive 2.0.0
> Spark Version: Apache Spark 1.6.0
>Reporter: Xin Hao
>
> Found when executing TPCx-BB query2 for Hive on Spark engine, and switch on :
> Found during TPCx-BB query2 execution on spark engine when vectorized 
> execution is switched on:
> (1) set hive.vectorized.execution.enabled=true; 
> (2) set hive.vectorized.execution.reduce.enabled=true; (default value for 
> Apache Hive 2.0.0)
> It's OK for spark engine when hive.vectorized.execution.enabled is switched 
> off:
> (1) set hive.vectorized.execution.enabled=false;
> (2) set hive.vectorized.execution.reduce.enabled=true;
> For MR engine, the query could pass and no exception occurred when vectorized 
> execution is either switched on or switched off.
> Detail Error Message is below:
> {noformat}
> 2016-03-14T10:09:33,692 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - 16/03/14 10:09:33 INFO 
> spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 154 
> bytes
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - 16/03/14 10:09:33 WARN 
> scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 (TID 25, bhx3): 
> java.lang.RuntimeException: Failed to load plan: 
> hdfs://bhx3:8020/tmp/hive/root/40b90ebd-32d4-47bc-a5ab-12ff1c05d0d2/hive_2016-03-14_10-08-56_307_7692316402338632647-1/-mr-10002/ab0c0021-0c1a-496e-9703-87d5879353c8/reduce.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - Serialization trace:
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - childOperators 
> (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - childOperators 
> (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - childOperators 
> (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - reducer 
> (org.apache.hadoop.hive.ql.plan.ReduceWork)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:451)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.Utilities.getReduceWork(Utilities.java:306)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.init(SparkReduceRecordHandler.java:117)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:46)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:28)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:192)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> 

[jira] [Commented] (HIVE-13277) Exception "Unable to create serializer 'org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer' " occurred during query execution on spark engine when ve

2016-03-21 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205580#comment-15205580
 ] 

Rui Li commented on HIVE-13277:
---

Yes I'm using ORC table. Pinging [~xhao1] regarding whether there're other 
queries that have this issue.

> Exception "Unable to create serializer 
> 'org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer' " 
> occurred during query execution on spark engine when vectorized execution is 
> switched on
> -
>
> Key: HIVE-13277
> URL: https://issues.apache.org/jira/browse/HIVE-13277
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Hive Version: Apache Hive 2.0.0
> Spark Version: Apache Spark 1.6.0
>Reporter: Xin Hao
>
> Found when executing TPCx-BB query2 for Hive on Spark engine, and switch on :
> Found during TPCx-BB query2 execution on spark engine when vectorized 
> execution is switched on:
> (1) set hive.vectorized.execution.enabled=true; 
> (2) set hive.vectorized.execution.reduce.enabled=true; (default value for 
> Apache Hive 2.0.0)
> It's OK for spark engine when hive.vectorized.execution.enabled is switched 
> off:
> (1) set hive.vectorized.execution.enabled=false;
> (2) set hive.vectorized.execution.reduce.enabled=true;
> For MR engine, the query could pass and no exception occurred when vectorized 
> execution is either switched on or switched off.
> Detail Error Message is below:
> {noformat}
> 2016-03-14T10:09:33,692 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - 16/03/14 10:09:33 INFO 
> spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 154 
> bytes
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - 16/03/14 10:09:33 WARN 
> scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 (TID 25, bhx3): 
> java.lang.RuntimeException: Failed to load plan: 
> hdfs://bhx3:8020/tmp/hive/root/40b90ebd-32d4-47bc-a5ab-12ff1c05d0d2/hive_2016-03-14_10-08-56_307_7692316402338632647-1/-mr-10002/ab0c0021-0c1a-496e-9703-87d5879353c8/reduce.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - Serialization trace:
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - childOperators 
> (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - childOperators 
> (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - childOperators 
> (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - reducer 
> (org.apache.hadoop.hive.ql.plan.ReduceWork)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:451)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.Utilities.getReduceWork(Utilities.java:306)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.init(SparkReduceRecordHandler.java:117)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:46)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:28)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:192)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> 

[jira] [Updated] (HIVE-13300) Hive on spark throws exception for multi-insert with join

2016-03-21 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-13300:
-
Attachment: HIVE-13300.3.patch

Address comments.

> Hive on spark throws exception for multi-insert with join
> -
>
> Key: HIVE-13300
> URL: https://issues.apache.org/jira/browse/HIVE-13300
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.0.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-13300.2.patch, HIVE-13300.3.patch, HIVE-13300.patch
>
>
> For certain multi-insert queries, Hive on Spark throws a deserialization 
> error.
> {noformat}
> create table status_updates(userid int,status string,ds string);
> create table profiles(userid int,school string,gender int);
> drop table school_summary; create table school_summary(school string,cnt int) 
> partitioned by (ds string);
> drop table gender_summary; create table gender_summary(gender int,cnt int) 
> partitioned by (ds string);
> insert into status_updates values (1, "status_1", "2016-03-16");
> insert into profiles values (1, "school_1", 0);
> set hive.auto.convert.join=false;
> set hive.execution.engine=spark;
> FROM (SELECT a.status, b.school, b.gender
> FROM status_updates a JOIN profiles b
> ON (a.userid = b.userid and
> a.ds='2009-03-20' )
> ) subq1
> INSERT OVERWRITE TABLE gender_summary
> PARTITION(ds='2009-03-20')
> SELECT subq1.gender, COUNT(1) GROUP BY subq1.gender
> INSERT OVERWRITE TABLE school_summary
> PARTITION(ds='2009-03-20')
> SELECT subq1.school, COUNT(1) GROUP BY subq1.school
> {noformat}
> Error:
> {noformat}
> 16/03/17 13:29:00 [task-result-getter-3]: WARN scheduler.TaskSetManager: Lost 
> task 0.0 in stage 2.0 (TID 3, localhost): java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable 
> to deserialize reduce input key from x1x128x0x0 with properties 
> {serialization.sort.order.null=a, columns=reducesinkkey0, 
> serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
>  serialization.sort.order=+, columns.types=int}
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:279)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:724)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error: Unable to deserialize reduce input key from x1x128x0x0 with properties 
> {serialization.sort.order.null=a, columns=reducesinkkey0, 
> serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
>  serialization.sort.order=+, columns.types=int}
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:251)
>   ... 12 more
> Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:241)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:249)
>   ... 12 more
> Caused by: java.io.EOFException
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54)
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserializeInt(BinarySortableSerDe.java:597)
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:288)
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:237)
>   ... 13 more
> {noformat}



--
This message was sent 

[jira] [Commented] (HIVE-13115) MetaStore Direct SQL getPartitions call fail when the columns schemas for a partition are null

2016-03-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205476#comment-15205476
 ] 

Hive QA commented on HIVE-13115:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12794308/HIVE-13115.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9849 tests executed
*Failed tests:*
{noformat}
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testTempTable
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7332/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7332/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7332/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12794308 - PreCommit-HIVE-TRUNK-Build

> MetaStore Direct SQL getPartitions call fail when the columns schemas for a 
> partition are null
> --
>
> Key: HIVE-13115
> URL: https://issues.apache.org/jira/browse/HIVE-13115
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>  Labels: DirectSql, MetaStore, ORM
> Attachments: HIVE-13115.patch, HIVE-13115.reproduce.issue.patch
>
>
> We are seeing the following exception in our MetaStore logs
> {noformat}
> 2016-02-11 00:00:19,002 DEBUG metastore.MetaStoreDirectSql 
> (MetaStoreDirectSql.java:timingTrace(602)) - Direct SQL query in 5.842372ms + 
> 1.066728ms, the query is [select "PARTITIONS"."PART_ID" from "PARTITIONS"  
> inner join "TBLS" on "PART
> ITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ?   inner join 
> "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID"  and "DBS"."NAME" = ?  order by 
> "PART_NAME" asc]
> 2016-02-11 00:00:19,021 ERROR metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(2243)) - Direct SQL failed, falling 
> back to ORM
> MetaException(message:Unexpected null for one of the IDs, SD 6437, column 
> null, serde 6437 for a non- view)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:360)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitions(MetaStoreDirectSql.java:224)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:1563)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:1559)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2208)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsInternal(ObjectStore.java:1570)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:1553)
> at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108)
> at com.sun.proxy.$Proxy5.getPartitions(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:2526)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions.getResult(ThriftHiveMetastore.java:8747)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions.getResult(ThriftHiveMetastore.java:8731)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> 

[jira] [Updated] (HIVE-13250) Compute predicate conversions on the client, instead of per row group

2016-03-21 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-13250:

Status: Open  (was: Patch Available)

> Compute predicate conversions on the client, instead of per row group
> -
>
> Key: HIVE-13250
> URL: https://issues.apache.org/jira/browse/HIVE-13250
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Siddharth Seth
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-13250.2.patch, HIVE-13250.2.patch, HIVE-13250.patch
>
>
> When running a query for the form 
> select count from table where ts_field = "2016-01-23 00:00:00";
> or
> select count from table where ts_field = 1453507200
> ts_field is of type TIMESTAMP
> The predicate is converted to whatever format is appropriate for TIMESTAMP 
> processing on each and every row group.
> It would be far more efficient to process this once on the client - or even 
> once per task.
> The same applies to ORC splt elimination as well - this is applied for each 
> stripe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13250) Compute predicate conversions on the client, instead of per row group

2016-03-21 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-13250:

Attachment: HIVE-13250.2.patch

> Compute predicate conversions on the client, instead of per row group
> -
>
> Key: HIVE-13250
> URL: https://issues.apache.org/jira/browse/HIVE-13250
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Siddharth Seth
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-13250.2.patch, HIVE-13250.2.patch, HIVE-13250.patch
>
>
> When running a query for the form 
> select count from table where ts_field = "2016-01-23 00:00:00";
> or
> select count from table where ts_field = 1453507200
> ts_field is of type TIMESTAMP
> The predicate is converted to whatever format is appropriate for TIMESTAMP 
> processing on each and every row group.
> It would be far more efficient to process this once on the client - or even 
> once per task.
> The same applies to ORC splt elimination as well - this is applied for each 
> stripe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11388) Allow ACID Compactor components to run in multiple metastores

2016-03-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11388:
--
Attachment: HIVE-11388.7.patch

> Allow ACID Compactor components to run in multiple metastores
> -
>
> Key: HIVE-11388
> URL: https://issues.apache.org/jira/browse/HIVE-11388
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-11388.2.patch, HIVE-11388.4.patch, 
> HIVE-11388.5.patch, HIVE-11388.6.patch, HIVE-11388.7.patch, HIVE-11388.patch
>
>
> (this description is no loner accurate; see further comments)
> org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs 
> inside the metastore service to manage compactions of ACID tables.  There 
> should be exactly 1 instance of this thread (even with multiple Thrift 
> services).
> This is documented in 
> https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration
>  but not enforced.
> Should add enforcement, since more than 1 Initiator could cause concurrent 
> attempts to compact the same table/partition - which will not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13326) HiveServer2: Make ZK config publishing configurable

2016-03-21 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-13326:

Attachment: HIVE-13326.1.patch

> HiveServer2: Make ZK config publishing configurable
> ---
>
> Key: HIVE-13326
> URL: https://issues.apache.org/jira/browse/HIVE-13326
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-13326.1.patch
>
>
> We should revert to older behaviour when config publishing is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13326) HiveServer2: Make ZK config publishing configurable

2016-03-21 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-13326:

Status: Patch Available  (was: Open)

> HiveServer2: Make ZK config publishing configurable
> ---
>
> Key: HIVE-13326
> URL: https://issues.apache.org/jira/browse/HIVE-13326
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-13326.1.patch
>
>
> We should revert to older behaviour when config publishing is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13151) Clean up UGI objects in FileSystem cache for transactions

2016-03-21 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205436#comment-15205436
 ] 

Wei Zheng commented on HIVE-13151:
--

{quote}
Shouldn't this be solved in Hadoop? If anything else is using UGI the cache 
will still leak. What do other submodules do?
{quote}
[~thejas] Can you comment on this?

> Clean up UGI objects in FileSystem cache for transactions
> -
>
> Key: HIVE-13151
> URL: https://issues.apache.org/jira/browse/HIVE-13151
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13151.1.patch, HIVE-13151.2.patch, 
> HIVE-13151.3.patch
>
>
> One issue with FileSystem.CACHE is that it does not clean itself. The key in 
> that cache includes UGI object. When new UGI objects are created and used 
> with the FileSystem api, new entries get added to the cache.
> We need to manually clean up those UGI objects once they are no longer in use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13151) Clean up UGI objects in FileSystem cache for transactions

2016-03-21 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205392#comment-15205392
 ] 

Eugene Koifman commented on HIVE-13151:
---

Shouldn't this be solved in Hadoop?  If anything else is using UGI the cache 
will still leak.  What do other submodules do?


1. TestTxnCommands2 - seems to have unused imports added
2. Would it make sense to include CompactionInfo.getFullPartitionName() in the 
error message for more context?  (Worker/Initiator/Cleaner - basically 
throughout)


> Clean up UGI objects in FileSystem cache for transactions
> -
>
> Key: HIVE-13151
> URL: https://issues.apache.org/jira/browse/HIVE-13151
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13151.1.patch, HIVE-13151.2.patch, 
> HIVE-13151.3.patch
>
>
> One issue with FileSystem.CACHE is that it does not clean itself. The key in 
> that cache includes UGI object. When new UGI objects are created and used 
> with the FileSystem api, new entries get added to the cache.
> We need to manually clean up those UGI objects once they are no longer in use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13326) HiveServer2: Make ZK config publishing configurable

2016-03-21 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-13326:

Component/s: JDBC

> HiveServer2: Make ZK config publishing configurable
> ---
>
> Key: HIVE-13326
> URL: https://issues.apache.org/jira/browse/HIVE-13326
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> We should revert to older behaviour when config publishing is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13286) Query ID is being reused across queries

2016-03-21 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205308#comment-15205308
 ] 

Vikram Dixit K commented on HIVE-13286:
---

I tested the latest patch. It works as expected. +1

> Query ID is being reused across queries
> ---
>
> Key: HIVE-13286
> URL: https://issues.apache.org/jira/browse/HIVE-13286
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Aihua Xu
>Priority: Critical
> Attachments: HIVE-13286.1.patch, HIVE-13286.2.patch, 
> HIVE-13286.3.patch, HIVE-13286.4.patch
>
>
> [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is 
> being reused across queries. This defeats the purpose of a query id. I am not 
> sure what the purpose of the change in that jira is but it breaks the 
> assumption about a query id being unique for each query. Please take a look 
> into this at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10751) Hive View Specification Needs Update

2016-03-21 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205299#comment-15205299
 ] 

Wei Zheng commented on HIVE-10751:
--

[~leftylev] Thanks for catching that. I've updated that wikipage.

> Hive View Specification Needs Update
> 
>
> Key: HIVE-10751
> URL: https://issues.apache.org/jira/browse/HIVE-10751
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.13.1
>Reporter: Tim Gattone
>Assignee: Wei Zheng
>Priority: Trivial
>
> On this page: 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/AlterView
>  
> There is no mention in the spec of the optional [db_name] to qualify the View 
> with, but it is mentioned with CREATE TABLE.
>  
> Isn’t this an oversight?  My understanding is that we can qualify the CREATE 
> VIEW statement with the database such as CREATE VIEW jrnl.sbb_base_v
>  
> The reason I ask is another vendor is questioning this syntax.
>  
> CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_comment], 
> ...) ]
>   [COMMENT view_comment]
>   [TBLPROPERTIES (property_name = property_value, ...)]
>   AS SELECT ...;
> =
> Email response from Hortonworks:
> Subject: Hive - Question
>  
> Good morning Tim, 
>  
> I have received this email thread from Doug. I have looked into it below are 
> my findings. 
>  
> Yes. We can qualify the CREATE VIEW statement with the database. 
>  
> hive> create view db1.vw3 as select * from default.x1;
> OK
> Time taken: 0.084 seconds
> hive>
>  
> This could be a possible documentation issue or improvement, which can be 
> addressed by raising an Apache JIRA. This documentation is maintained by 
> apache and HWX contributes to it. Anyone can register and raise Apache JIRA.  
>  
> Please let me know if you have any questions or concerns. 
>   
>  
> Thank you,
> Sai



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11388) Allow ACID Compactor components to run in multiple metastores

2016-03-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11388:
--
Attachment: HIVE-11388.6.patch

> Allow ACID Compactor components to run in multiple metastores
> -
>
> Key: HIVE-11388
> URL: https://issues.apache.org/jira/browse/HIVE-11388
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-11388.2.patch, HIVE-11388.4.patch, 
> HIVE-11388.5.patch, HIVE-11388.6.patch, HIVE-11388.patch
>
>
> (this description is no loner accurate; see further comments)
> org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs 
> inside the metastore service to manage compactions of ACID tables.  There 
> should be exactly 1 instance of this thread (even with multiple Thrift 
> services).
> This is documented in 
> https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration
>  but not enforced.
> Should add enforcement, since more than 1 Initiator could cause concurrent 
> attempts to compact the same table/partition - which will not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13300) Hive on spark throws exception for multi-insert with join

2016-03-21 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205293#comment-15205293
 ] 

Xuefu Zhang commented on HIVE-13300:


We should be able to make a keywritable directly from the input (key) with the 
first n-1 bytes without coping all bytes and stripping off the last byte, which 
might gives slight performance advantage.

> Hive on spark throws exception for multi-insert with join
> -
>
> Key: HIVE-13300
> URL: https://issues.apache.org/jira/browse/HIVE-13300
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.0.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-13300.2.patch, HIVE-13300.patch
>
>
> For certain multi-insert queries, Hive on Spark throws a deserialization 
> error.
> {noformat}
> create table status_updates(userid int,status string,ds string);
> create table profiles(userid int,school string,gender int);
> drop table school_summary; create table school_summary(school string,cnt int) 
> partitioned by (ds string);
> drop table gender_summary; create table gender_summary(gender int,cnt int) 
> partitioned by (ds string);
> insert into status_updates values (1, "status_1", "2016-03-16");
> insert into profiles values (1, "school_1", 0);
> set hive.auto.convert.join=false;
> set hive.execution.engine=spark;
> FROM (SELECT a.status, b.school, b.gender
> FROM status_updates a JOIN profiles b
> ON (a.userid = b.userid and
> a.ds='2009-03-20' )
> ) subq1
> INSERT OVERWRITE TABLE gender_summary
> PARTITION(ds='2009-03-20')
> SELECT subq1.gender, COUNT(1) GROUP BY subq1.gender
> INSERT OVERWRITE TABLE school_summary
> PARTITION(ds='2009-03-20')
> SELECT subq1.school, COUNT(1) GROUP BY subq1.school
> {noformat}
> Error:
> {noformat}
> 16/03/17 13:29:00 [task-result-getter-3]: WARN scheduler.TaskSetManager: Lost 
> task 0.0 in stage 2.0 (TID 3, localhost): java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable 
> to deserialize reduce input key from x1x128x0x0 with properties 
> {serialization.sort.order.null=a, columns=reducesinkkey0, 
> serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
>  serialization.sort.order=+, columns.types=int}
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:279)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:724)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error: Unable to deserialize reduce input key from x1x128x0x0 with properties 
> {serialization.sort.order.null=a, columns=reducesinkkey0, 
> serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
>  serialization.sort.order=+, columns.types=int}
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:251)
>   ... 12 more
> Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:241)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:249)
>   ... 12 more
> Caused by: java.io.EOFException
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54)
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserializeInt(BinarySortableSerDe.java:597)
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:288)
>   

[jira] [Commented] (HIVE-12619) Switching the field order within an array of structs causes the query to fail

2016-03-21 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205252#comment-15205252
 ] 

Jimmy Xiang commented on HIVE-12619:


Patch v6 is uploaded to RB: https://reviews.apache.org/r/45128/

> Switching the field order within an array of structs causes the query to fail
> -
>
> Key: HIVE-12619
> URL: https://issues.apache.org/jira/browse/HIVE-12619
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Ang Zhang
>Assignee: Mohammad Kamrul Islam
>Priority: Minor
> Attachments: HIVE-12619.2.patch, HIVE-12619.3.patch, 
> HIVE-12619.4.patch, HIVE-12619.5.patch, HIVE-12619.6.patch
>
>
> Switching the field order within an array of structs causes the query to fail 
> or return the wrong data for the fields, but switching the field order within 
> just a struct works.
> How to reproduce:
> Case1 if the two fields have the same type, query will return wrong data for 
> the fields
> drop table if exists schema_test;
> create table schema_test (msg array) stored 
> as parquet;
> insert into table schema_test select stack(2, array(named_struct('f1', 'abc', 
> 'f2', 'abc2')), array(named_struct('f1', 'efg', 'f2', 'efg2'))) from one 
> limit 2;
> select * from schema_test;
> --returns
> --[{"f1":"efg","f2":"efg2"}]
> --[{"f1":"abc","f2":"abc2"}]
> alter table schema_test change msg msg array;
> select * from schema_test;
> --returns
> --[{"f2":"efg","f1":"efg2"}]
> --[{"f2":"abc","f1":"abc2"}]
> Case2: if the two fields have different type, the query will fail
> drop table if exists schema_test;
> create table schema_test (msg array) stored as 
> parquet;
> insert into table schema_test select stack(2, array(named_struct('f1', 'abc', 
> 'f2', 1)), array(named_struct('f1', 'efg', 'f2', 2))) from one limit 2;
> select * from schema_test;
> --returns
> --[{"f1":"efg","f2":2}]
> --[{"f1":"abc","f2":1}]
> alter table schema_test change msg msg array;
> select * from schema_test;
> Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
> org.apache.hadoop.io.IntWritable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13254) GBY cardinality estimation is wrong partition columns is involved

2016-03-21 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205238#comment-15205238
 ] 

Prasanth Jayachandran commented on HIVE-13254:
--

[~jcamachorodriguez] To give more context on this issue. Most TPCDS queries 
joins fact table with multiple dimension tables followed by some aggregation. 
The query provided in the description is one such kind. The plans generated for 
the query is attached with and without hive.transpose.aggr.join enabled. The 
execution time for all such queries is significantly slower (in order of 
4x-6x). For this specific query, without this optimization it took 17.9s on 1TB 
and with hive.transpose.aggr.join enabled the execution time is 98.24s.

My initial understanding of this optimization is, only map-side GBY will be 
pushed through the join. This means reduction in the number of rows that will 
be broadcasted to the fact table. But looking at the plans, the GBY is pushed 
at the logical level which then gets compiled to map-side GBY and reduce-side 
GBY followed by JOIN. This shuffles approximately 600M before joining which I 
think is adding to the overall execution time. I filed this issue thinking that 
the map-side GBY will broadcast it's output to fact table for join.  Per 
[~ashutoshc]'s comment below, it seems like we don't support that. 

> GBY cardinality estimation is wrong partition columns is involved
> -
>
> Key: HIVE-13254
> URL: https://issues.apache.org/jira/browse/HIVE-13254
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0, 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Jesus Camacho Rodriguez
> Attachments: q3.svg, q3_ef_transpose_aggr.svg
>
>
> When running the following query on TPCDS-1000 scale, setting 
> hive.transpose.aggr.join=true is expected to generate optimal plan but it was 
> not generating. 
> {code:title=Query}
> SELECT `date_dim`.`d_day_name` AS `d_day_name`, 
>`item`.`i_category` AS `i_category` 
> FROM   `store_sales` `store_sales` 
>INNER JOIN `item` `item` 
>ON ( `store_sales`.`ss_item_sk` = `item`.`i_item_sk` ) 
>INNER JOIN `date_dim` `date_dim` 
>ON ( `store_sales`.`ss_sold_date_sk` = `date_dim`.`d_date_sk` 
> ) 
> GROUP  BY `d_day_name`, 
>   `i_category`;
> {code}
> The reason for that is stats annotation rule for GROUP BY is not considering 
> partition column into account. For the above query, the generated plan is 
> attached. As we can see from the plan, GBY is pushed to fact table 
> (store_sales) but that output of GBY shuffled to perform join instead of 
> MapJoin conversion. This is because of wrong estimation of cardinality/data 
> size of GBY on store_sales (Map 1). 
> What's happening internally is, GBY computes estimated cardinality which in 
> this case is NDV(ss_item_sk) * NDV(ss_sold_date_sk) = 338901 * 1823 ~= 617M. 
> This estimate is wrong as ss_sold_date_sk is partition column and estimator 
> assumes its non-partition column. In this case, not every tasks reads data 
> from all partitions. We need to take estimated task parallelism into account. 
> For example: If task parallelism is determined to be 100 the estimate from 
> GBY should be ~6M which should convert this vertex into map join vertex. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12619) Switching the field order within an array of structs causes the query to fail

2016-03-21 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-12619:
---
Status: Patch Available  (was: Open)

> Switching the field order within an array of structs causes the query to fail
> -
>
> Key: HIVE-12619
> URL: https://issues.apache.org/jira/browse/HIVE-12619
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Ang Zhang
>Assignee: Mohammad Kamrul Islam
>Priority: Minor
> Attachments: HIVE-12619.2.patch, HIVE-12619.3.patch, 
> HIVE-12619.4.patch, HIVE-12619.5.patch, HIVE-12619.6.patch
>
>
> Switching the field order within an array of structs causes the query to fail 
> or return the wrong data for the fields, but switching the field order within 
> just a struct works.
> How to reproduce:
> Case1 if the two fields have the same type, query will return wrong data for 
> the fields
> drop table if exists schema_test;
> create table schema_test (msg array) stored 
> as parquet;
> insert into table schema_test select stack(2, array(named_struct('f1', 'abc', 
> 'f2', 'abc2')), array(named_struct('f1', 'efg', 'f2', 'efg2'))) from one 
> limit 2;
> select * from schema_test;
> --returns
> --[{"f1":"efg","f2":"efg2"}]
> --[{"f1":"abc","f2":"abc2"}]
> alter table schema_test change msg msg array;
> select * from schema_test;
> --returns
> --[{"f2":"efg","f1":"efg2"}]
> --[{"f2":"abc","f1":"abc2"}]
> Case2: if the two fields have different type, the query will fail
> drop table if exists schema_test;
> create table schema_test (msg array) stored as 
> parquet;
> insert into table schema_test select stack(2, array(named_struct('f1', 'abc', 
> 'f2', 1)), array(named_struct('f1', 'efg', 'f2', 2))) from one limit 2;
> select * from schema_test;
> --returns
> --[{"f1":"efg","f2":2}]
> --[{"f1":"abc","f2":1}]
> alter table schema_test change msg msg array;
> select * from schema_test;
> Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
> org.apache.hadoop.io.IntWritable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12619) Switching the field order within an array of structs causes the query to fail

2016-03-21 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-12619:
---
Attachment: HIVE-12619.6.patch

> Switching the field order within an array of structs causes the query to fail
> -
>
> Key: HIVE-12619
> URL: https://issues.apache.org/jira/browse/HIVE-12619
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Ang Zhang
>Assignee: Mohammad Kamrul Islam
>Priority: Minor
> Attachments: HIVE-12619.2.patch, HIVE-12619.3.patch, 
> HIVE-12619.4.patch, HIVE-12619.5.patch, HIVE-12619.6.patch
>
>
> Switching the field order within an array of structs causes the query to fail 
> or return the wrong data for the fields, but switching the field order within 
> just a struct works.
> How to reproduce:
> Case1 if the two fields have the same type, query will return wrong data for 
> the fields
> drop table if exists schema_test;
> create table schema_test (msg array) stored 
> as parquet;
> insert into table schema_test select stack(2, array(named_struct('f1', 'abc', 
> 'f2', 'abc2')), array(named_struct('f1', 'efg', 'f2', 'efg2'))) from one 
> limit 2;
> select * from schema_test;
> --returns
> --[{"f1":"efg","f2":"efg2"}]
> --[{"f1":"abc","f2":"abc2"}]
> alter table schema_test change msg msg array;
> select * from schema_test;
> --returns
> --[{"f2":"efg","f1":"efg2"}]
> --[{"f2":"abc","f1":"abc2"}]
> Case2: if the two fields have different type, the query will fail
> drop table if exists schema_test;
> create table schema_test (msg array) stored as 
> parquet;
> insert into table schema_test select stack(2, array(named_struct('f1', 'abc', 
> 'f2', 1)), array(named_struct('f1', 'efg', 'f2', 2))) from one limit 2;
> select * from schema_test;
> --returns
> --[{"f1":"efg","f2":2}]
> --[{"f1":"abc","f2":1}]
> alter table schema_test change msg msg array;
> select * from schema_test;
> Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
> org.apache.hadoop.io.IntWritable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12612) beeline always exits with 0 status when reading query from standard input

2016-03-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-12612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205232#comment-15205232
 ] 

Sergio Peña commented on HIVE-12612:


I think beeline should stop as soon as an error occurs. Take mysql as an 
example:
{noformat}
$ echo "s; show databases;" | mysql
ERROR 1064 (42000) at line 1: You have an error in your SQL syntax; check the 
manual that corresponds to your MySQL server version for the right syntax to 
use near 's' at line 1
{noformat}

The {{s}} command does not exist, so Mysql fails without executing the rest of 
the commands.

> beeline always exits with 0 status when reading query from standard input
> -
>
> Key: HIVE-12612
> URL: https://issues.apache.org/jira/browse/HIVE-12612
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.1.0
> Environment: CDH5.5.0
>Reporter: Paulo Sequeira
>Assignee: Reuben Kuhnert
>Priority: Minor
>
> Similar to what was reported on HIVE-6978, but now it only happens when the 
> query is read from the standard input. For example, the following fails as 
> expected:
> {code}
> bash$ if beeline -u "jdbc:hive2://..." -e "boo;" ; then echo "Ok?!" ; else 
> echo "Failed!" ; fi
> Connecting to jdbc:hive2://...
> Connected to: Apache Hive (version 1.1.0-cdh5.5.0)
> Driver: Hive JDBC (version 1.1.0-cdh5.5.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Error: Error while compiling statement: FAILED: ParseException line 1:0 
> cannot recognize input near 'boo' '' '' (state=42000,code=4)
> Closing: 0: jdbc:hive2://...
> Failed!
> {code}
> But the following does not:
> {code}
> bash$ if echo "boo;"|beeline -u "jdbc:hive2://..." ; then echo "Ok?!" ; else 
> echo "Failed!" ; fi
> Connecting to jdbc:hive2://...
> Connected to: Apache Hive (version 1.1.0-cdh5.5.0)
> Driver: Hive JDBC (version 1.1.0-cdh5.5.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 1.1.0-cdh5.5.0 by Apache Hive
> 0: jdbc:hive2://...:8> Error: Error while compiling statement: FAILED: 
> ParseException line 1:0 cannot recognize input near 'boo' '' '' 
> (state=42000,code=4)
> 0: jdbc:hive2://...:8> Closing: 0: jdbc:hive2://...
> Ok?!
> {code}
> This was misleading our batch scripts to always believe that the execution of 
> the queries succeded, when sometimes that was not the case. 
> h2. Workaround
> We found we can work around the issue by always using the -e or the -f 
> parameters, and even reading the standard input through the /dev/stdin device 
> (this was useful because a lot of the scripts fed the queries from here 
> documents), like this:
> {code:title=some-script.sh}
> #!/bin/sh
> set -o nounset -o errexit -o pipefail
> # As beeline is failing to report an error status if reading the query
> # to be executed from STDIN, check whether no -f or -e option is used
> # and, in that case, pretend it has to read the query from a regular
> # file using -f to read from /dev/stdin
> function beeline_workaround_exit_status () {
> for arg in "$@"
> do if [ "$arg" = "-f" -o "$arg" = "-e" ]
>then beeline -u "..." "$@"
> return
>fi
> done
> beeline -u "..." "$@" -f /dev/stdin
> }
> beeline_workaround_exit_status < boo;
> EOF
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12439) CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements

2016-03-21 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-12439:
-
   Resolution: Fixed
Fix Version/s: 2.1.0
   1.3.0
   Status: Resolved  (was: Patch Available)

Thanks [~ekoifman]. Committed to master and branch-1.

> CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements
> --
>
> Key: HIVE-12439
> URL: https://issues.apache.org/jira/browse/HIVE-12439
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Fix For: 1.3.0, 2.1.0
>
> Attachments: HIVE-12439.1.patch, HIVE-12439.2.patch, 
> HIVE-12439.3.patch
>
>
> # add a safeguard to make sure IN clause is not too large; break up by txn id 
> to delete from TXN_COMPONENTS where tc_txnid in ...
> # TxnHandler. openTxns() - use 1 insert with many rows in values() clause, 
> rather than 1 DB roundtrip per row



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13313) TABLESAMPLE ROWS feature broken for Vectorization

2016-03-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205223#comment-15205223
 ] 

Hive QA commented on HIVE-13313:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12794298/HIVE-13313.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 9835 tests executed
*Failed tests:*
{noformat}
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_skewtable
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7331/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7331/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7331/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12794298 - PreCommit-HIVE-TRUNK-Build

> TABLESAMPLE ROWS feature broken for Vectorization
> -
>
> Key: HIVE-13313
> URL: https://issues.apache.org/jira/browse/HIVE-13313
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13313.01.patch
>
>
> For vectorization, the ROWS clause is ignored causing many rows to be 
> returned.
> SELECT * FROM source TABLESAMPLE(10 ROWS);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13254) GBY cardinality estimation is wrong partition columns is involved

2016-03-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13254:
-
Attachment: q3.svg

> GBY cardinality estimation is wrong partition columns is involved
> -
>
> Key: HIVE-13254
> URL: https://issues.apache.org/jira/browse/HIVE-13254
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0, 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Jesus Camacho Rodriguez
> Attachments: q3.svg, q3_ef_transpose_aggr.svg
>
>
> When running the following query on TPCDS-1000 scale, setting 
> hive.transpose.aggr.join=true is expected to generate optimal plan but it was 
> not generating. 
> {code:title=Query}
> SELECT `date_dim`.`d_day_name` AS `d_day_name`, 
>`item`.`i_category` AS `i_category` 
> FROM   `store_sales` `store_sales` 
>INNER JOIN `item` `item` 
>ON ( `store_sales`.`ss_item_sk` = `item`.`i_item_sk` ) 
>INNER JOIN `date_dim` `date_dim` 
>ON ( `store_sales`.`ss_sold_date_sk` = `date_dim`.`d_date_sk` 
> ) 
> GROUP  BY `d_day_name`, 
>   `i_category`;
> {code}
> The reason for that is stats annotation rule for GROUP BY is not considering 
> partition column into account. For the above query, the generated plan is 
> attached. As we can see from the plan, GBY is pushed to fact table 
> (store_sales) but that output of GBY shuffled to perform join instead of 
> MapJoin conversion. This is because of wrong estimation of cardinality/data 
> size of GBY on store_sales (Map 1). 
> What's happening internally is, GBY computes estimated cardinality which in 
> this case is NDV(ss_item_sk) * NDV(ss_sold_date_sk) = 338901 * 1823 ~= 617M. 
> This estimate is wrong as ss_sold_date_sk is partition column and estimator 
> assumes its non-partition column. In this case, not every tasks reads data 
> from all partitions. We need to take estimated task parallelism into account. 
> For example: If task parallelism is determined to be 100 the estimate from 
> GBY should be ~6M which should convert this vertex into map join vertex. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13277) Exception "Unable to create serializer 'org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer' " occurred during query execution on spark engine when ve

2016-03-21 Thread Kapil Rastogi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205220#comment-15205220
 ] 

Kapil Rastogi commented on HIVE-13277:
--

[~lirui] Are you using ORC data-set to run this query? 

Given the TPCx-BB query set, is this is the only query failing? 

> Exception "Unable to create serializer 
> 'org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer' " 
> occurred during query execution on spark engine when vectorized execution is 
> switched on
> -
>
> Key: HIVE-13277
> URL: https://issues.apache.org/jira/browse/HIVE-13277
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Hive Version: Apache Hive 2.0.0
> Spark Version: Apache Spark 1.6.0
>Reporter: Xin Hao
>
> Found when executing TPCx-BB query2 for Hive on Spark engine, and switch on :
> Found during TPCx-BB query2 execution on spark engine when vectorized 
> execution is switched on:
> (1) set hive.vectorized.execution.enabled=true; 
> (2) set hive.vectorized.execution.reduce.enabled=true; (default value for 
> Apache Hive 2.0.0)
> It's OK for spark engine when hive.vectorized.execution.enabled is switched 
> off:
> (1) set hive.vectorized.execution.enabled=false;
> (2) set hive.vectorized.execution.reduce.enabled=true;
> For MR engine, the query could pass and no exception occurred when vectorized 
> execution is either switched on or switched off.
> Detail Error Message is below:
> {noformat}
> 2016-03-14T10:09:33,692 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - 16/03/14 10:09:33 INFO 
> spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 154 
> bytes
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - 16/03/14 10:09:33 WARN 
> scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 (TID 25, bhx3): 
> java.lang.RuntimeException: Failed to load plan: 
> hdfs://bhx3:8020/tmp/hive/root/40b90ebd-32d4-47bc-a5ab-12ff1c05d0d2/hive_2016-03-14_10-08-56_307_7692316402338632647-1/-mr-10002/ab0c0021-0c1a-496e-9703-87d5879353c8/reduce.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - Serialization trace:
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - childOperators 
> (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - childOperators 
> (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - childOperators 
> (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - reducer 
> (org.apache.hadoop.hive.ql.plan.ReduceWork)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:451)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.Utilities.getReduceWork(Utilities.java:306)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.init(SparkReduceRecordHandler.java:117)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:46)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:28)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:192)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> 

[jira] [Commented] (HIVE-12612) beeline always exits with 0 status when reading query from standard input

2016-03-21 Thread Reuben Kuhnert (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205219#comment-15205219
 ] 

Reuben Kuhnert commented on HIVE-12612:
---

Was able to reproduce this issue earlier, but was wondering on what we want the 
desired behavior to be. For example, if the user executes something like:

{code}
echo "should-fail; show tables;" | beeline
{code}

what should the output be? Should it return '0' because the last command ran 
successfully, or should it return some error code because the first command 
failed? In addition, when running a standard (long-running) beeline session, 
should we ever return a failure result (if say one of the commands during the 
session fails)? Seems like the workaround here is the only realistic solution, 
but would love input.

> beeline always exits with 0 status when reading query from standard input
> -
>
> Key: HIVE-12612
> URL: https://issues.apache.org/jira/browse/HIVE-12612
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.1.0
> Environment: CDH5.5.0
>Reporter: Paulo Sequeira
>Assignee: Reuben Kuhnert
>Priority: Minor
>
> Similar to what was reported on HIVE-6978, but now it only happens when the 
> query is read from the standard input. For example, the following fails as 
> expected:
> {code}
> bash$ if beeline -u "jdbc:hive2://..." -e "boo;" ; then echo "Ok?!" ; else 
> echo "Failed!" ; fi
> Connecting to jdbc:hive2://...
> Connected to: Apache Hive (version 1.1.0-cdh5.5.0)
> Driver: Hive JDBC (version 1.1.0-cdh5.5.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Error: Error while compiling statement: FAILED: ParseException line 1:0 
> cannot recognize input near 'boo' '' '' (state=42000,code=4)
> Closing: 0: jdbc:hive2://...
> Failed!
> {code}
> But the following does not:
> {code}
> bash$ if echo "boo;"|beeline -u "jdbc:hive2://..." ; then echo "Ok?!" ; else 
> echo "Failed!" ; fi
> Connecting to jdbc:hive2://...
> Connected to: Apache Hive (version 1.1.0-cdh5.5.0)
> Driver: Hive JDBC (version 1.1.0-cdh5.5.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 1.1.0-cdh5.5.0 by Apache Hive
> 0: jdbc:hive2://...:8> Error: Error while compiling statement: FAILED: 
> ParseException line 1:0 cannot recognize input near 'boo' '' '' 
> (state=42000,code=4)
> 0: jdbc:hive2://...:8> Closing: 0: jdbc:hive2://...
> Ok?!
> {code}
> This was misleading our batch scripts to always believe that the execution of 
> the queries succeded, when sometimes that was not the case. 
> h2. Workaround
> We found we can work around the issue by always using the -e or the -f 
> parameters, and even reading the standard input through the /dev/stdin device 
> (this was useful because a lot of the scripts fed the queries from here 
> documents), like this:
> {code:title=some-script.sh}
> #!/bin/sh
> set -o nounset -o errexit -o pipefail
> # As beeline is failing to report an error status if reading the query
> # to be executed from STDIN, check whether no -f or -e option is used
> # and, in that case, pretend it has to read the query from a regular
> # file using -f to read from /dev/stdin
> function beeline_workaround_exit_status () {
> for arg in "$@"
> do if [ "$arg" = "-f" -o "$arg" = "-e" ]
>then beeline -u "..." "$@"
> return
>fi
> done
> beeline -u "..." "$@" -f /dev/stdin
> }
> beeline_workaround_exit_status < boo;
> EOF
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13295) Improvement to LDAP search queries in HS2 LDAP Authenticator

2016-03-21 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-13295:
-
Attachment: HIVE-13295.2.patch

In-corporating feedback from review.

> Improvement to LDAP search queries in HS2 LDAP Authenticator
> 
>
> Key: HIVE-13295
> URL: https://issues.apache.org/jira/browse/HIVE-13295
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 1.3.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-13295.1.patch, HIVE-13295.2.patch
>
>
> As more usecases, for various LDAP flavors and deployments, emerge, Hive's 
> LDAP authentication provider needs additional configuration properties to 
> make it more flexible to work with different LDAP deployments.
> For example:
> 1) Not every LDAP server supports a "memberOf" property on user entries that 
> refer to the groups the user belongs to. This attribute is used for group 
> filter support. So instead of relying on this attribute to be set, we can 
> reverse the search and find all the groups that have an attribute, that 
> refers to its members, set. For example "member" or "memberUid" etc.
> Since this atttribute name differs from ldap to ldap, its best we make this 
> configurable, with a default value of "member"
> 2) In HIVE-12885, a new property was introduced to make the attribute for an 
> user/group search key user-configurable instead of assuming its "uid" (when 
> baseDN is set) or "cn" (otherwise). This change was deferred from the initial 
> patch.
> 3) LDAP Groups can have various ObjectClass'es. For example objectClass=group 
> or objectClass=groupOfNames or objectClass=posixGroup or 
> objectClass=groupOfUniqueNames etc. There could be other we dont know of.
> So we need a property to make this user-configurable with a certain default. 
> 4) There is also a bug where the lists for groupFilter and userFilter are not 
> re-initialized each time init() is called.
> These lists are only re-initialized if the new HiveConf has userFilter or 
> groupFilter set values. Otherwise, the provider will use values from previous 
> initialization.
> I found this bug when writing some new tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13295) Improvement to LDAP search queries in HS2 LDAP Authenticator

2016-03-21 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-13295:
-
Status: Patch Available  (was: Open)

> Improvement to LDAP search queries in HS2 LDAP Authenticator
> 
>
> Key: HIVE-13295
> URL: https://issues.apache.org/jira/browse/HIVE-13295
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 1.3.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-13295.1.patch, HIVE-13295.2.patch
>
>
> As more usecases, for various LDAP flavors and deployments, emerge, Hive's 
> LDAP authentication provider needs additional configuration properties to 
> make it more flexible to work with different LDAP deployments.
> For example:
> 1) Not every LDAP server supports a "memberOf" property on user entries that 
> refer to the groups the user belongs to. This attribute is used for group 
> filter support. So instead of relying on this attribute to be set, we can 
> reverse the search and find all the groups that have an attribute, that 
> refers to its members, set. For example "member" or "memberUid" etc.
> Since this atttribute name differs from ldap to ldap, its best we make this 
> configurable, with a default value of "member"
> 2) In HIVE-12885, a new property was introduced to make the attribute for an 
> user/group search key user-configurable instead of assuming its "uid" (when 
> baseDN is set) or "cn" (otherwise). This change was deferred from the initial 
> patch.
> 3) LDAP Groups can have various ObjectClass'es. For example objectClass=group 
> or objectClass=groupOfNames or objectClass=posixGroup or 
> objectClass=groupOfUniqueNames etc. There could be other we dont know of.
> So we need a property to make this user-configurable with a certain default. 
> 4) There is also a bug where the lists for groupFilter and userFilter are not 
> re-initialized each time init() is called.
> These lists are only re-initialized if the new HiveConf has userFilter or 
> groupFilter set values. Otherwise, the provider will use values from previous 
> initialization.
> I found this bug when writing some new tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13295) Improvement to LDAP search queries in HS2 LDAP Authenticator

2016-03-21 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-13295:
-
Status: Open  (was: Patch Available)

> Improvement to LDAP search queries in HS2 LDAP Authenticator
> 
>
> Key: HIVE-13295
> URL: https://issues.apache.org/jira/browse/HIVE-13295
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 1.3.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-13295.1.patch
>
>
> As more usecases, for various LDAP flavors and deployments, emerge, Hive's 
> LDAP authentication provider needs additional configuration properties to 
> make it more flexible to work with different LDAP deployments.
> For example:
> 1) Not every LDAP server supports a "memberOf" property on user entries that 
> refer to the groups the user belongs to. This attribute is used for group 
> filter support. So instead of relying on this attribute to be set, we can 
> reverse the search and find all the groups that have an attribute, that 
> refers to its members, set. For example "member" or "memberUid" etc.
> Since this atttribute name differs from ldap to ldap, its best we make this 
> configurable, with a default value of "member"
> 2) In HIVE-12885, a new property was introduced to make the attribute for an 
> user/group search key user-configurable instead of assuming its "uid" (when 
> baseDN is set) or "cn" (otherwise). This change was deferred from the initial 
> patch.
> 3) LDAP Groups can have various ObjectClass'es. For example objectClass=group 
> or objectClass=groupOfNames or objectClass=posixGroup or 
> objectClass=groupOfUniqueNames etc. There could be other we dont know of.
> So we need a property to make this user-configurable with a certain default. 
> 4) There is also a bug where the lists for groupFilter and userFilter are not 
> re-initialized each time init() is called.
> These lists are only re-initialized if the new HiveConf has userFilter or 
> groupFilter set values. Otherwise, the provider will use values from previous 
> initialization.
> I found this bug when writing some new tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13178) Enhance ORC Schema Evolution to handle more standard data type conversions

2016-03-21 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13178:

Attachment: HIVE-13178.05.patch

> Enhance ORC Schema Evolution to handle more standard data type conversions
> --
>
> Key: HIVE-13178
> URL: https://issues.apache.org/jira/browse/HIVE-13178
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13178.01.patch, HIVE-13178.02.patch, 
> HIVE-13178.03.patch, HIVE-13178.04.patch, HIVE-13178.05.patch
>
>
> Currently, SHORT -> INT -> BIGINT is supported.
> Handle ORC data type conversions permitted by Implicit conversion allowed by 
> TypeIntoUtils.implicitConvertible method.
>*   STRING_GROUP -> DOUBLE
>*   STRING_GROUP -> DECIMAL
>*   DATE_GROUP -> STRING
>*   NUMERIC_GROUP -> STRING
>*   STRING_GROUP -> STRING_GROUP
>*
>*   // Upward from "lower" type to "higher" numeric type:
>*   BYTE -> SHORT -> INT -> BIGINT -> FLOAT -> DOUBLE -> DECIMAL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13296) Add vectorized Q test with complex types showing count(*) etc work correctly

2016-03-21 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205171#comment-15205171
 ] 

Prasanth Jayachandran commented on HIVE-13296:
--

+1

> Add vectorized Q test with complex types showing count(*) etc work correctly
> 
>
> Key: HIVE-13296
> URL: https://issues.apache.org/jira/browse/HIVE-13296
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13296.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13151) Clean up UGI objects in FileSystem cache for transactions

2016-03-21 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205150#comment-15205150
 ] 

Wei Zheng commented on HIVE-13151:
--

Test failure for testTempTable doesn't seem related.

[~ekoifman] Can you take a look?

> Clean up UGI objects in FileSystem cache for transactions
> -
>
> Key: HIVE-13151
> URL: https://issues.apache.org/jira/browse/HIVE-13151
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13151.1.patch, HIVE-13151.2.patch, 
> HIVE-13151.3.patch
>
>
> One issue with FileSystem.CACHE is that it does not clean itself. The key in 
> that cache includes UGI object. When new UGI objects are created and used 
> with the FileSystem api, new entries get added to the cache.
> We need to manually clean up those UGI objects once they are no longer in use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13249) Hard upper bound on number of open transactions

2016-03-21 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205139#comment-15205139
 ] 

Wei Zheng commented on HIVE-13249:
--

[~alangates] [~ekoifman] ping..

> Hard upper bound on number of open transactions
> ---
>
> Key: HIVE-13249
> URL: https://issues.apache.org/jira/browse/HIVE-13249
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13249.1.patch, HIVE-13249.2.patch
>
>
> We need to have a safeguard by adding an upper bound for open transactions to 
> avoid huge number of open-transaction requests, usually due to improper 
> configuration of clients such as Storm.
> Once that limit is reached, clients will start failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-13323) ORC: propagate RLE information from decoder to vectors

2016-03-21 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-13323:
--

Assignee: Gopal V  (was: Matt McCline)

> ORC: propagate RLE information from decoder to vectors
> --
>
> Key: HIVE-13323
> URL: https://issues.apache.org/jira/browse/HIVE-13323
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Vectorization
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: orc-rlev2-fill.png
>
>
> !orc-rlev2-fill.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13323) ORC: propagate RLE information from decoder to vectors

2016-03-21 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-13323:
---
Description: 
!orc-rlev2-fill.png!


> ORC: propagate RLE information from decoder to vectors
> --
>
> Key: HIVE-13323
> URL: https://issues.apache.org/jira/browse/HIVE-13323
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Vectorization
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: orc-rlev2-fill.png
>
>
> !orc-rlev2-fill.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13323) ORC: propagate RLE information from decoder to vectors

2016-03-21 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-13323:
---
Attachment: orc-rlev2-fill.png

> ORC: propagate RLE information from decoder to vectors
> --
>
> Key: HIVE-13323
> URL: https://issues.apache.org/jira/browse/HIVE-13323
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Vectorization
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: orc-rlev2-fill.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13286) Query ID is being reused across queries

2016-03-21 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205076#comment-15205076
 ] 

Aihua Xu commented on HIVE-13286:
-

[~vikram.dixit] How is the new patch? Can you take a look?

> Query ID is being reused across queries
> ---
>
> Key: HIVE-13286
> URL: https://issues.apache.org/jira/browse/HIVE-13286
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Aihua Xu
>Priority: Critical
> Attachments: HIVE-13286.1.patch, HIVE-13286.2.patch, 
> HIVE-13286.3.patch, HIVE-13286.4.patch
>
>
> [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is 
> being reused across queries. This defeats the purpose of a query id. I am not 
> sure what the purpose of the change in that jira is but it breaks the 
> assumption about a query id being unique for each query. Please take a look 
> into this at the earliest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13250) Compute predicate conversions on the client, instead of per row group

2016-03-21 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205041#comment-15205041
 ] 

Ashutosh Chauhan commented on HIVE-13250:
-

I think we can special case this for equality predicate. I will update the 
patch for that.

> Compute predicate conversions on the client, instead of per row group
> -
>
> Key: HIVE-13250
> URL: https://issues.apache.org/jira/browse/HIVE-13250
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Siddharth Seth
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-13250.2.patch, HIVE-13250.patch
>
>
> When running a query for the form 
> select count from table where ts_field = "2016-01-23 00:00:00";
> or
> select count from table where ts_field = 1453507200
> ts_field is of type TIMESTAMP
> The predicate is converted to whatever format is appropriate for TIMESTAMP 
> processing on each and every row group.
> It would be far more efficient to process this once on the client - or even 
> once per task.
> The same applies to ORC splt elimination as well - this is applied for each 
> stripe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13321) Add support for different output strategies

2016-03-21 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205023#comment-15205023
 ] 

Sanjay Radia commented on HIVE-13321:
-

bq. A common pattern in MapReduce and Hive is to write all output into a 
temporary folder and then rename this temporary folder to match the final 
output location. When using some of the newer FileSystems with Hive, the 
performance can be improved by directly writing output and avoiding the 
temporary folder write & rename.
Note: the temp folder was necessary to deal with failures and also with 
multiple attempts.  Rename in traditional fs's are very low cost and involve 
not copy of data unless across volumes. In case of MapReduce the tmp folder is 
a subdir in the output folder so that the rename is not across volumes. In the 
cloud's object stores (like S3) the rename require a data copy (hence 
HADOOP-9565's proposal to add a server-side copy - but that is still an extra 
copy that you are trying to avoid in this Jira.)
Optimization for cloud storage makes a lot of sense, but one has to deal with 
the failure case and multiple attempts/speculative execution;  the output 
directory cannot be left in a mess. Could you please elaborate on how you plan 
to deal with failures.

> Add support for different output strategies
> ---
>
> Key: HIVE-13321
> URL: https://issues.apache.org/jira/browse/HIVE-13321
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rob Leidle
>
> The Hadoop ecosystem has expanded to support a wider variety of data-stores 
> and filesystems than simply HDFS. These FileSystems have different write 
> atomicity and read consistency guarantees.  There are enhancements we can 
> make to Hive to ensure Hive works even better with a wider variety of 
> FileSystems in the Hadoop ecosystem. We can see work going on in the Hadoop 
> project to robustly support these FileSystems. One such example is 
> HADOOP-9565 where the behavior of MapReduce output is enhanced to do what is 
> optimal for different FileSystems.
>  
> A common pattern in MapReduce and Hive is to write all output into a 
> temporary folder and then rename this temporary folder to match the final 
> output location. When using some of the newer FileSystems with Hive, the 
> performance can be improved by directly writing output and avoiding the 
> temporary folder write & rename.
>  
> The proposal is to enhance Hive to support different strategies for file 
> output. One such strategy would be a concept named “DirectWrite”. DirectWrite 
> will be optionally enabled, likely on a per-FileSystem basis. When 
> DirectWrite is enabled, all Hive job output will be written directly to the 
> output location.
>  
> This is an umbrella JIRA for all the tasks related to this functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13322) LLAP: ZK registry throws at shutdown due to slf4j trying to initialize a log4j logger

2016-03-21 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205007#comment-15205007
 ] 

Gopal V commented on HIVE-13322:


{{at org.apache.curator.utils.CloseableUtils.(CloseableUtils.java:33)}}

Hilarious, error from a static init block - we need to add a ref to the class 
in the serviceStart, I guess.

> LLAP: ZK registry throws at shutdown due to slf4j trying to initialize a 
> log4j logger
> -
>
> Key: HIVE-13322
> URL: https://issues.apache.org/jira/browse/HIVE-13322
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Prasanth Jayachandran
>Priority: Minor
>
> {noformat}
> 2016-03-08 23:56:34,883 Thread-5 FATAL Unable to register shutdown hook 
> because JVM is shutting down. java.lang.IllegalStateException: Cannot add new 
> shutdown hook as this is not started. Current state: STOPPED
>   at 
> org.apache.logging.log4j.core.util.DefaultShutdownCallbackRegistry.addShutdownCallback(DefaultShutdownCallbackRegistry.java:113)
>   at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:271)
>   at 
> org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:256)
>   at 
> org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:216)
>   at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:146)
>   at 
> org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)
>   at org.apache.logging.log4j.LogManager.getContext(LogManager.java:185)
>   at 
> org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:103)
>   at 
> org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:43)
>   at 
> org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42)
>   at 
> org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:29)
>   at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
>   at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:305)
>   at 
> org.apache.curator.utils.CloseableUtils.(CloseableUtils.java:33)
>   at 
> org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.stop(LlapZookeeperRegistryImpl.java:584)
>   at 
> org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.serviceStop(LlapRegistryService.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.serviceStop(LlapDaemon.java:294)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:65)
>   at 
> org.apache.hadoop.service.CompositeService$CompositeServiceShutdownHook.run(CompositeService.java:183)
>   at 
> org.apache.hive.common.util.ShutdownHookManager$1.run(ShutdownHookManager.java:63)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12619) Switching the field order within an array of structs causes the query to fail

2016-03-21 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204960#comment-15204960
 ] 

Jimmy Xiang commented on HIVE-12619:


Sure. I will upload the next patch to RB.

> Switching the field order within an array of structs causes the query to fail
> -
>
> Key: HIVE-12619
> URL: https://issues.apache.org/jira/browse/HIVE-12619
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Ang Zhang
>Assignee: Mohammad Kamrul Islam
>Priority: Minor
> Attachments: HIVE-12619.2.patch, HIVE-12619.3.patch, 
> HIVE-12619.4.patch, HIVE-12619.5.patch
>
>
> Switching the field order within an array of structs causes the query to fail 
> or return the wrong data for the fields, but switching the field order within 
> just a struct works.
> How to reproduce:
> Case1 if the two fields have the same type, query will return wrong data for 
> the fields
> drop table if exists schema_test;
> create table schema_test (msg array) stored 
> as parquet;
> insert into table schema_test select stack(2, array(named_struct('f1', 'abc', 
> 'f2', 'abc2')), array(named_struct('f1', 'efg', 'f2', 'efg2'))) from one 
> limit 2;
> select * from schema_test;
> --returns
> --[{"f1":"efg","f2":"efg2"}]
> --[{"f1":"abc","f2":"abc2"}]
> alter table schema_test change msg msg array;
> select * from schema_test;
> --returns
> --[{"f2":"efg","f1":"efg2"}]
> --[{"f2":"abc","f1":"abc2"}]
> Case2: if the two fields have different type, the query will fail
> drop table if exists schema_test;
> create table schema_test (msg array) stored as 
> parquet;
> insert into table schema_test select stack(2, array(named_struct('f1', 'abc', 
> 'f2', 1)), array(named_struct('f1', 'efg', 'f2', 2))) from one limit 2;
> select * from schema_test;
> --returns
> --[{"f1":"efg","f2":2}]
> --[{"f1":"abc","f2":1}]
> alter table schema_test change msg msg array;
> select * from schema_test;
> Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
> org.apache.hadoop.io.IntWritable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13302) direct SQL: cast to date doesn't work on Oracle

2016-03-21 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204948#comment-15204948
 ] 

Sushanth Sowmyan commented on HIVE-13302:
-

+1.

Looks reasonable to me, and looking at TO_DATE syntax.

> direct SQL: cast to date doesn't work on Oracle
> ---
>
> Key: HIVE-13302
> URL: https://issues.apache.org/jira/browse/HIVE-13302
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13302.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13283) LLAP: make sure IO elevator is enabled by default in the daemons

2016-03-21 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204929#comment-15204929
 ] 

Vikram Dixit K commented on HIVE-13283:
---

Minor comment:

{code}
+String defaultVal = null;
+String val = HiveConf.getVar(conf, ConfVars.LLAP_IO_ENABLED, defaultVal);
+boolean isEnabled = (val != null) ? Boolean.parseBoolean(val) : 
LlapProxy.isDaemon();
{code}

You could just use the HiveConf.getBoolVar() method with a default 
LlapProxy.isDaemon() to get the same effect.

Just for clarification, user could set this configuration to false and the 
daemon would not use the IOElevator? Is that the right way to look at this 
configuration or did you want the daemon to always use IOElevator?

> LLAP: make sure IO elevator is enabled by default in the daemons
> 
>
> Key: HIVE-13283
> URL: https://issues.apache.org/jira/browse/HIVE-13283
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13283.01.patch, HIVE-13283.02.patch, 
> HIVE-13283.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13320) Apply HIVE-11544 to explicit conversions as well as implicit ones

2016-03-21 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-13320:
---
Summary: Apply HIVE-11544 to explicit conversions as well as implicit ones  
(was: Apple HIVE-11544 to explicit conversions as well as implicit ones)

> Apply HIVE-11544 to explicit conversions as well as implicit ones
> -
>
> Key: HIVE-13320
> URL: https://issues.apache.org/jira/browse/HIVE-13320
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.3.0, 1.2.1, 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
>
> Parsing 1 million blank values through cast(x as int) is 3x slower than 
> parsing a valid single digit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11887) spark tests break the build on a shared machine, can break HiveQA

2016-03-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11887:

Attachment: HIVE-11887.01.patch

> spark tests break the build on a shared machine, can break HiveQA
> -
>
> Key: HIVE-11887
> URL: https://issues.apache.org/jira/browse/HIVE-11887
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11887.01.patch, HIVE-11887.patch
>
>
> Spark download creates UDFExampleAdd jar in /tmp; when building on a shared 
> machine, someone else's jar from a build prevents this jar from being created 
> (I have no permissions to this file because it was created by a different 
> user) and the build fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11887) spark tests break the build on a shared machine, can break HiveQA

2016-03-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11887:

Summary: spark tests break the build on a shared machine, can break HiveQA  
(was: spark tests break the build on a shared machine)

> spark tests break the build on a shared machine, can break HiveQA
> -
>
> Key: HIVE-11887
> URL: https://issues.apache.org/jira/browse/HIVE-11887
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11887.01.patch, HIVE-11887.patch
>
>
> Spark download creates UDFExampleAdd jar in /tmp; when building on a shared 
> machine, someone else's jar from a build prevents this jar from being created 
> (I have no permissions to this file because it was created by a different 
> user) and the build fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11887) spark tests break the build on a shared machine

2016-03-21 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204901#comment-15204901
 ] 

Sergey Shelukhin commented on HIVE-11887:
-

This also breaks HiveQA occasionally:
{noformat}
 [exec] + sed '/package /d' 
/home/hiveptest/54.166.14.147-hiveptest-0/apache-github-source-source/itests/../contrib/src/java/org/apache/hadoop/hive/contrib/udf/example/UDFExampleAdd.java
 [exec] + javac -cp 
/home/hiveptest/54.166.14.147-hiveptest-0/maven/org/apache/hive/hive-exec/2.1.0-SNAPSHOT/hive-exec-2.1.0-SNAPSHOT.jar
 /tmp/UDFExampleAdd.java -d /tmp
 [exec] error: error reading /tmp/UDFExampleAdd.java; 
/tmp/UDFExampleAdd.java (No such file or directory)
 [exec] 1 error
 [exec] + jar -cf /tmp/udfexampleadd-1.0.jar -C /tmp UDFExampleAdd.class
 [exec] /tmp/UDFExampleAdd.class : no such file or directory
{noformat}
In some JIRA.
I am going to submit a patch to remove this jar for the time being, and disable 
tests that depend on it if it's not too many. It can be restored in some 
acceptable manner.

> spark tests break the build on a shared machine
> ---
>
> Key: HIVE-11887
> URL: https://issues.apache.org/jira/browse/HIVE-11887
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11887.patch
>
>
> Spark download creates UDFExampleAdd jar in /tmp; when building on a shared 
> machine, someone else's jar from a build prevents this jar from being created 
> (I have no permissions to this file because it was created by a different 
> user) and the build fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13303) spill to YARN directories, not tmp, when available

2016-03-21 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204898#comment-15204898
 ] 

Sergey Shelukhin commented on HIVE-13303:
-

TestParseNegative failed due to conflict with /tmp/ UDF jar (unrelated). The 
rest are unrelated.


> spill to YARN directories, not tmp, when available
> --
>
> Key: HIVE-13303
> URL: https://issues.apache.org/jira/browse/HIVE-13303
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13303.patch
>
>
> RowContainer::setupWriter, HybridHashTableContainer::spillPartition, 
> (KeyValueContainer|ObjectContainer)::setupOutput, 
> VectorMapJoinRowBytesContainer::setupOutputFileStreams create files in tmp. 
> Maybe some other code does it too, those are the ones I see on the execution 
> path. When there are multiple YARN output directories and multiple tasks 
> running on a machine, it's better to use the YARN directories. The only 
> question is cleanup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13241) LLAP: Incremental Caching marks some small chunks as "incomplete CB"

2016-03-21 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204886#comment-15204886
 ] 

Sergey Shelukhin commented on HIVE-13241:
-

I don't think test failures are related.

> LLAP: Incremental Caching marks some small chunks as "incomplete CB"
> 
>
> Key: HIVE-13241
> URL: https://issues.apache.org/jira/browse/HIVE-13241
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13241.01.patch, HIVE-13241.patch
>
>
> Run #3 of a query with 1 node still has cache misses.
> {code}
> LLAP IO Summary
> --
>   VERTICES ROWGROUPS  META_HIT  META_MISS  DATA_HIT  DATA_MISS  ALLOCATION
>  USED  TOTAL_IO
> --
>  Map 111  1116  01.65GB93.61MB  0B
>0B32.72s
> --
> {code}
> {code}
> 2016-03-08T21:05:39,417 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:prepareRangesForCompressedRead(695)) - Locking 
> 0x1c44401d(1) due to reuse
> 2016-03-08T21:05:39,417 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:prepareRangesForCompressedRead(701)) - Adding an 
> already-uncompressed buffer 0x1c44401d(2)
> 2016-03-08T21:05:39,417 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:prepareRangesForCompressedRead(695)) - Locking 
> 0x4e51b032(1) due to reuse
> 2016-03-08T21:05:39,417 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:prepareRangesForCompressedRead(701)) - Adding an 
> already-uncompressed buffer 0x4e51b032(2)
> 2016-03-08T21:05:39,418 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:addOneCompressionBuffer(1161)) - Found CB at 1373931, 
> chunk length 86587, total 86590, compressed
> 2016-03-08T21:05:39,418 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:addIncompleteCompressionBuffer(1241)) - Replacing 
> data range [1373931, 1408408), size: 34474(!) type: direct (and 0 previous 
> chunks) with incomplete CB start: 1373931 end: 1408408 in the buffers
> 2016-03-08T21:05:39,418 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:createRgColumnStreamData(441)) - Getting data for 
> column 7 RG 14 stream DATA at 1460521, 319811 index position 0: compressed 
> [1626961, 1780332)
> {code}
> {code}
> 2016-03-08T21:05:38,925 INFO  
> [IO-Elevator-Thread-7[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.OrcEncodedDataReader (OrcEncodedDataReader.java:readFileData(878)) - 
> Disk ranges after disk read (file 5372745, base offset 3): [{start: 18986 
> end: 20660 cache buffer: 0x660faf7c(1)}, {start: 20660 end: 35775 cache 
> buffer: 0x1dcb1d97(1)}, {start: 318852 end: 422353 cache buffer: 
> 0x6c7f9a05(1)}, {start: 1148616 end: 1262468 cache buffer: 0x196e1d41(1)}, 
> {start: 1262468 end: 1376342 cache buffer: 0x201255f(1)}, {data range 
> [1376342, 1410766), size: 34424 type: direct}, {start: 1631359 end: 1714694 
> cache buffer: 0x47e3a72d(1)}, {start: 1714694 end: 1785770 cache buffer: 
> 0x57dca266(1)}, {start: 4975035 end: 5095215 cache buffer: 0x3e3139c9(1)}, 
> {start: 5095215 end: 5197863 cache buffer: 0x3511c88d(1)}, {start: 7448387 
> end: 7572268 cache buffer: 0x6f11dbcd(1)}, {start: 7572268 end: 7696182 cache 
> buffer: 0x5d6c9bdb(1)}, {data range [7696182, 7710537), size: 14355 type: 
> direct}, {start: 8235756 end: 8345367 cache buffer: 0x6a241ece(1)}, {start: 
> 8345367 end: 8455009 cache buffer: 0x51caf6a7(1)}, {data range [8455009, 
> 8497906), size: 42897 type: direct}, {start: 9035815 end: 9159708 cache 
> buffer: 0x306480e0(1)}, {start: 9159708 end: 9283629 cache buffer: 
> 0x9ef7774(1)}, {data range [9283629, 9297965), size: 14336 type: direct}, 
> {start: 9989884 end: 10113731 cache buffer: 0x43f7cae9(1)}, {start: 10113731 
> end: 10237589 cache buffer: 0x458e63fe(1)}, {data range [10237589, 10252034), 
> size: 14445 type: direct}, {start: 11897896 end: 12021787 cache buffer: 
> 0x51f9982f(1)}, {start: 12021787 end: 12145656 cache buffer: 0x23df01b3(1)}, 
> {data range [12145656, 12160046), size: 14390 type: 

[jira] [Commented] (HIVE-13283) LLAP: make sure IO elevator is enabled by default in the daemons

2016-03-21 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204884#comment-15204884
 ] 

Sergey Shelukhin commented on HIVE-13283:
-

[~vikram.dixit] can you take a look?

> LLAP: make sure IO elevator is enabled by default in the daemons
> 
>
> Key: HIVE-13283
> URL: https://issues.apache.org/jira/browse/HIVE-13283
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13283.01.patch, HIVE-13283.02.patch, 
> HIVE-13283.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12992) Hive on tez: Bucket map join plan is incorrect

2016-03-21 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-12992:
--
Attachment: HIVE-12992.1.patch

> Hive on tez: Bucket map join plan is incorrect
> --
>
> Key: HIVE-12992
> URL: https://issues.apache.org/jira/browse/HIVE-12992
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>  Labels: tez
> Attachments: HIVE-12992.1.patch
>
>
> TPCH Query 9 fails when bucket map join is enabled:
> {code}
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 
> 5, vertexId=vertex_1450634494433_0007_2_06, diagnostics=[Exception in 
> EdgeManager, vertex=vertex_1450634494433_0007_2_06 [Reducer 5], Fail to 
> sendTezEventToDestinationTasks, event:DataMovementEvent [sourceIndex=0, 
> targetIndex=-1, version=0], sourceInfo:{ producerConsumerType=OUTPUT, 
> taskVertexName=Map 1, edgeVertexName=Reducer 5, 
> taskAttemptId=attempt_1450634494433_0007_2_05_00_0 }, 
> destinationInfo:null, EdgeInfo: sourceVertexName=Map 1, 
> destinationVertexName=Reducer 5, java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.CustomPartitionEdge.routeDataMovementEventToDestination(CustomPartitionEdge.java:88)
>   at 
> org.apache.tez.dag.app.dag.impl.Edge.sendTezEventToDestinationTasks(Edge.java:458)
>   at 
> org.apache.tez.dag.app.dag.impl.Edge.handleCompositeDataMovementEvent(Edge.java:386)
>   at 
> org.apache.tez.dag.app.dag.impl.Edge.sendTezEventToDestinationTasks(Edge.java:439)
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:4382)
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$4000(VertexImpl.java:202)
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4172)
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4164)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12992) Hive on tez: Bucket map join plan is incorrect

2016-03-21 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-12992:
--
Status: Patch Available  (was: Open)

> Hive on tez: Bucket map join plan is incorrect
> --
>
> Key: HIVE-12992
> URL: https://issues.apache.org/jira/browse/HIVE-12992
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.0.0, 1.2.1
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>  Labels: tez
> Attachments: HIVE-12992.1.patch
>
>
> TPCH Query 9 fails when bucket map join is enabled:
> {code}
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 
> 5, vertexId=vertex_1450634494433_0007_2_06, diagnostics=[Exception in 
> EdgeManager, vertex=vertex_1450634494433_0007_2_06 [Reducer 5], Fail to 
> sendTezEventToDestinationTasks, event:DataMovementEvent [sourceIndex=0, 
> targetIndex=-1, version=0], sourceInfo:{ producerConsumerType=OUTPUT, 
> taskVertexName=Map 1, edgeVertexName=Reducer 5, 
> taskAttemptId=attempt_1450634494433_0007_2_05_00_0 }, 
> destinationInfo:null, EdgeInfo: sourceVertexName=Map 1, 
> destinationVertexName=Reducer 5, java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.CustomPartitionEdge.routeDataMovementEventToDestination(CustomPartitionEdge.java:88)
>   at 
> org.apache.tez.dag.app.dag.impl.Edge.sendTezEventToDestinationTasks(Edge.java:458)
>   at 
> org.apache.tez.dag.app.dag.impl.Edge.handleCompositeDataMovementEvent(Edge.java:386)
>   at 
> org.apache.tez.dag.app.dag.impl.Edge.sendTezEventToDestinationTasks(Edge.java:439)
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:4382)
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$4000(VertexImpl.java:202)
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4172)
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4164)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13250) Compute predicate conversions on the client, instead of per row group

2016-03-21 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204859#comment-15204859
 ] 

Gopal V commented on HIVE-13250:


bq.  I'd expect the cast to change the value to whatever can be compared 
directly against storage. 

UDFs disable all predicate pushdown. Only constants are applied to PPD, so 
retaining the UDFToString disabled all PPD into the storage system.

> Compute predicate conversions on the client, instead of per row group
> -
>
> Key: HIVE-13250
> URL: https://issues.apache.org/jira/browse/HIVE-13250
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Siddharth Seth
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-13250.2.patch, HIVE-13250.patch
>
>
> When running a query for the form 
> select count from table where ts_field = "2016-01-23 00:00:00";
> or
> select count from table where ts_field = 1453507200
> ts_field is of type TIMESTAMP
> The predicate is converted to whatever format is appropriate for TIMESTAMP 
> processing on each and every row group.
> It would be far more efficient to process this once on the client - or even 
> once per task.
> The same applies to ORC splt elimination as well - this is applied for each 
> stripe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13241) LLAP: Incremental Caching marks some small chunks as "incomplete CB"

2016-03-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204852#comment-15204852
 ] 

Hive QA commented on HIVE-13241:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12794285/HIVE-13241.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 9801 tests executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-auto_join30.q-vector_data_types.q-filter_join_breaktask.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-explainuser_4.q-orc_merge6.q-mapreduce1.q-and-12-more - 
did not produce a TEST-*.xml file
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto_partitioned
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7330/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7330/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7330/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12794285 - PreCommit-HIVE-TRUNK-Build

> LLAP: Incremental Caching marks some small chunks as "incomplete CB"
> 
>
> Key: HIVE-13241
> URL: https://issues.apache.org/jira/browse/HIVE-13241
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13241.01.patch, HIVE-13241.patch
>
>
> Run #3 of a query with 1 node still has cache misses.
> {code}
> LLAP IO Summary
> --
>   VERTICES ROWGROUPS  META_HIT  META_MISS  DATA_HIT  DATA_MISS  ALLOCATION
>  USED  TOTAL_IO
> --
>  Map 111  1116  01.65GB93.61MB  0B
>0B32.72s
> --
> {code}
> {code}
> 2016-03-08T21:05:39,417 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:prepareRangesForCompressedRead(695)) - Locking 
> 0x1c44401d(1) due to reuse
> 2016-03-08T21:05:39,417 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:prepareRangesForCompressedRead(701)) - Adding an 
> already-uncompressed buffer 0x1c44401d(2)
> 2016-03-08T21:05:39,417 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:prepareRangesForCompressedRead(695)) - Locking 
> 0x4e51b032(1) due to reuse
> 2016-03-08T21:05:39,417 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:prepareRangesForCompressedRead(701)) - Adding an 
> already-uncompressed buffer 0x4e51b032(2)
> 2016-03-08T21:05:39,418 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:addOneCompressionBuffer(1161)) - Found CB at 1373931, 
> chunk length 86587, total 86590, compressed
> 2016-03-08T21:05:39,418 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:addIncompleteCompressionBuffer(1241)) - Replacing 
> data range [1373931, 1408408), size: 34474(!) type: direct (and 0 previous 
> chunks) with incomplete CB start: 1373931 end: 1408408 in the buffers
> 2016-03-08T21:05:39,418 INFO  
> 

[jira] [Commented] (HIVE-13262) LLAP: Remove log levels from DebugUtils

2016-03-21 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204850#comment-15204850
 ] 

Prasanth Jayachandran commented on HIVE-13262:
--

Rebased patch. [~sershe] Can you plz take a look again?

> LLAP: Remove log levels from DebugUtils
> ---
>
> Key: HIVE-13262
> URL: https://issues.apache.org/jira/browse/HIVE-13262
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13262.1.patch, HIVE-13262.2.patch, 
> HIVE-13262.2.patch
>
>
> DebugUtils has many hardcoded log levels. To enable logging we need to 
> recompile code with desired value. Instead configure add loggers for these 
> classes with log levels via log4j properties. Also use parametrized logging 
> in IO elevator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13262) LLAP: Remove log levels from DebugUtils

2016-03-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13262:
-
Attachment: HIVE-13262.2.patch

> LLAP: Remove log levels from DebugUtils
> ---
>
> Key: HIVE-13262
> URL: https://issues.apache.org/jira/browse/HIVE-13262
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13262.1.patch, HIVE-13262.2.patch, 
> HIVE-13262.2.patch
>
>
> DebugUtils has many hardcoded log levels. To enable logging we need to 
> recompile code with desired value. Instead configure add loggers for these 
> classes with log levels via log4j properties. Also use parametrized logging 
> in IO elevator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13264) JDBC driver makes 2 Open Session Calls for every open session

2016-03-21 Thread NITHIN MAHESH (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204813#comment-15204813
 ] 

NITHIN MAHESH commented on HIVE-13264:
--

Thanks for the comment [~damien.carol]. It was intended when I wanted to 
replace the open session call with a new interface method "list session". It is 
indeed not needed now. I will remove it.

> JDBC driver makes 2 Open Session Calls for every open session
> -
>
> Key: HIVE-13264
> URL: https://issues.apache.org/jira/browse/HIVE-13264
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Reporter: NITHIN MAHESH
>Assignee: NITHIN MAHESH
>  Labels: jdbc
> Attachments: HIVE-13264.1.patch, HIVE-13264.2.patch, HIVE-13264.patch
>
>
> When HTTP is used as the transport mode by the Hive JDBC driver, we noticed 
> that there is an additional open/close session just to validate the 
> connection. 
>  
> TCLIService.Iface client = new TCLIService.Client(new 
> TBinaryProtocol(transport));
>   TOpenSessionResp openResp = client.OpenSession(new TOpenSessionReq());
>   if (openResp != null) {
> client.CloseSession(new 
> TCloseSessionReq(openResp.getSessionHandle()));
>   }
>  
> The open session call is a costly one and should not be used to test 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13250) Compute predicate conversions on the client, instead of per row group

2016-03-21 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204812#comment-15204812
 ] 

Siddharth Seth commented on HIVE-13250:
---

bq. I misunderstood this bug report. Without patch, filter expression for {{ 
ts_field = "2016-01-23 00:00:00"}} gets executed as (UDFToString(ts_field) = 
'2016-01-23 00:00:00') In the patch I made changes such that cast is on 
constant (ts_field = UDFTOTimeStamp('2016-01-23 00:00:00')) which gets folded 
compile time to (ts_field = 2016-01-23 00:00:00.0)
I'd expect the cast to change the value to whatever can be compared directly 
against storage. However, I think the type promotion system is far more 
complicated - and this may not be possible always.

> Compute predicate conversions on the client, instead of per row group
> -
>
> Key: HIVE-13250
> URL: https://issues.apache.org/jira/browse/HIVE-13250
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Siddharth Seth
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-13250.2.patch, HIVE-13250.patch
>
>
> When running a query for the form 
> select count from table where ts_field = "2016-01-23 00:00:00";
> or
> select count from table where ts_field = 1453507200
> ts_field is of type TIMESTAMP
> The predicate is converted to whatever format is appropriate for TIMESTAMP 
> processing on each and every row group.
> It would be far more efficient to process this once on the client - or even 
> once per task.
> The same applies to ORC splt elimination as well - this is applied for each 
> stripe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13250) Compute predicate conversions on the client, instead of per row group

2016-03-21 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204809#comment-15204809
 ] 

Siddharth Seth commented on HIVE-13250:
---

[~ashutoshc] - this is what was observed.

The following exception was seen for every row group in ORC Files. Note how the 
constant is being cast each and every time. The intent was to avoid that. It 
seems like this is something the can be avoided on the client itself by casting 
the constant to whatever format the column requires. Now, with schema 
evolution, this may not always be possible - which is why the suggestion for 
once per task.
{code}
2016-02-10 02:15:43,175 [WARN] [TezChild] |orc.RecordReaderImpl|: Exception 
when evaluating predicate. Skipping ORC PPD. Exception: 
java.lang.IllegalArgumentException: ORC SARGS could not convert from String to 
TIMESTAMP
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.getBaseObjectForComparison(RecordReaderImpl.java:659)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.evaluatePredicateRange(RecordReaderImpl.java:373)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.evaluatePredicateProto(RecordReaderImpl.java:338)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:710)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.pickRowGroups(RecordReaderImpl.java:751)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:777)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:205)
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:598)
at 
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:183)
at 
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.(OrcRawRecordMerger.java:226)
at 
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:437)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1269)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1151)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:135)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
at 
org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:650)
at 
org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
at 
org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
at 
org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:406)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 

[jira] [Commented] (HIVE-12619) Switching the field order within an array of structs causes the query to fail

2016-03-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204797#comment-15204797
 ] 

Sergio Peña commented on HIVE-12619:


[~jxiang] could you upload the patch to the review board? I have some comments 
I think it would be easier to leave there.

> Switching the field order within an array of structs causes the query to fail
> -
>
> Key: HIVE-12619
> URL: https://issues.apache.org/jira/browse/HIVE-12619
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Ang Zhang
>Assignee: Mohammad Kamrul Islam
>Priority: Minor
> Attachments: HIVE-12619.2.patch, HIVE-12619.3.patch, 
> HIVE-12619.4.patch, HIVE-12619.5.patch
>
>
> Switching the field order within an array of structs causes the query to fail 
> or return the wrong data for the fields, but switching the field order within 
> just a struct works.
> How to reproduce:
> Case1 if the two fields have the same type, query will return wrong data for 
> the fields
> drop table if exists schema_test;
> create table schema_test (msg array) stored 
> as parquet;
> insert into table schema_test select stack(2, array(named_struct('f1', 'abc', 
> 'f2', 'abc2')), array(named_struct('f1', 'efg', 'f2', 'efg2'))) from one 
> limit 2;
> select * from schema_test;
> --returns
> --[{"f1":"efg","f2":"efg2"}]
> --[{"f1":"abc","f2":"abc2"}]
> alter table schema_test change msg msg array;
> select * from schema_test;
> --returns
> --[{"f2":"efg","f1":"efg2"}]
> --[{"f2":"abc","f1":"abc2"}]
> Case2: if the two fields have different type, the query will fail
> drop table if exists schema_test;
> create table schema_test (msg array) stored as 
> parquet;
> insert into table schema_test select stack(2, array(named_struct('f1', 'abc', 
> 'f2', 1)), array(named_struct('f1', 'efg', 'f2', 2))) from one limit 2;
> select * from schema_test;
> --returns
> --[{"f1":"efg","f2":2}]
> --[{"f1":"abc","f2":1}]
> alter table schema_test change msg msg array;
> select * from schema_test;
> Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
> org.apache.hadoop.io.IntWritable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13298) nested join support causes undecipherable errors in SemanticAnalyzer

2016-03-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13298:

   Resolution: Fixed
Fix Version/s: 2.1.0
   1.3.0
   Status: Resolved  (was: Patch Available)

Committed to master and branch-1.

> nested join support causes undecipherable errors in SemanticAnalyzer
> 
>
> Key: HIVE-13298
> URL: https://issues.apache.org/jira/browse/HIVE-13298
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 1.3.0, 2.1.0
>
> Attachments: HIVE-13298.01.patch, HIVE-13298.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size

2016-03-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13291:
-
   Resolution: Fixed
Fix Version/s: 2.1.0
   Status: Resolved  (was: Patch Available)

Committed to master

> ORC BI Split strategy should consider block size instead of file size
> -
>
> Key: HIVE-13291
> URL: https://issues.apache.org/jira/browse/HIVE-13291
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Prasanth Jayachandran
> Fix For: 2.1.0
>
> Attachments: HIVE-13291.1.patch, HIVE-13291.2.patch, 
> HIVE-13291.3.patch
>
>
> When we force split strategy to use "BI" (using 
> hive.exec.orc.split.strategy), entire file is considered as single split. 
> This might be inefficient when the files are large. Instead, BI should 
> consider splitting at block boundary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13311) MetaDataFormatUtils throws NPE when HiveDecimal.create is null

2016-03-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204701#comment-15204701
 ] 

Sergio Peña commented on HIVE-13311:


Thanks [~sircodesalot] for your contribution.
I committed this to master.

> MetaDataFormatUtils throws NPE when HiveDecimal.create is null
> --
>
> Key: HIVE-13311
> URL: https://issues.apache.org/jira/browse/HIVE-13311
> Project: Hive
>  Issue Type: Bug
>Reporter: Reuben Kuhnert
>Assignee: Reuben Kuhnert
>Priority: Minor
> Fix For: 2.1.0
>
> Attachments: HIVE-13311.01.patch
>
>
> The {{MetadataFormatUtils.convertToString}} functions have guards to validate 
> for when valid is null, however the {{HiveDecimal.create}} can return null 
> and will throw exceptions when {{.toString()}} is called.
> {code}
>   private static String convertToString(Decimal val) {
> if (val == null) {
>   return "";
> }
> // HERE: Will throw NPE when HiveDecimal.create returns null.
> return HiveDecimal.create(new BigInteger(val.getUnscaled()), 
> val.getScale()).toString();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13311) MetaDataFormatUtils throws NPE when HiveDecimal.create is null

2016-03-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-13311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-13311:
---
   Resolution: Fixed
Fix Version/s: 2.1.0
   Status: Resolved  (was: Patch Available)

> MetaDataFormatUtils throws NPE when HiveDecimal.create is null
> --
>
> Key: HIVE-13311
> URL: https://issues.apache.org/jira/browse/HIVE-13311
> Project: Hive
>  Issue Type: Bug
>Reporter: Reuben Kuhnert
>Assignee: Reuben Kuhnert
>Priority: Minor
> Fix For: 2.1.0
>
> Attachments: HIVE-13311.01.patch
>
>
> The {{MetadataFormatUtils.convertToString}} functions have guards to validate 
> for when valid is null, however the {{HiveDecimal.create}} can return null 
> and will throw exceptions when {{.toString()}} is called.
> {code}
>   private static String convertToString(Decimal val) {
> if (val == null) {
>   return "";
> }
> // HERE: Will throw NPE when HiveDecimal.create returns null.
> return HiveDecimal.create(new BigInteger(val.getUnscaled()), 
> val.getScale()).toString();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13254) GBY cardinality estimation is wrong partition columns is involved

2016-03-21 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204700#comment-15204700
 ] 

Ashutosh Chauhan commented on HIVE-13254:
-

[~prasanth_j] Are you suggesting that Map1 -> Reducer2 edge should have been 
broadcast instead of shuffle ? I am not sure we support broadcast edge between 
Map side GBY & Reduce side GBY?


> GBY cardinality estimation is wrong partition columns is involved
> -
>
> Key: HIVE-13254
> URL: https://issues.apache.org/jira/browse/HIVE-13254
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0, 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Jesus Camacho Rodriguez
> Attachments: q3_ef_transpose_aggr.svg
>
>
> When running the following query on TPCDS-1000 scale, setting 
> hive.transpose.aggr.join=true is expected to generate optimal plan but it was 
> not generating. 
> {code:title=Query}
> SELECT `date_dim`.`d_day_name` AS `d_day_name`, 
>`item`.`i_category` AS `i_category` 
> FROM   `store_sales` `store_sales` 
>INNER JOIN `item` `item` 
>ON ( `store_sales`.`ss_item_sk` = `item`.`i_item_sk` ) 
>INNER JOIN `date_dim` `date_dim` 
>ON ( `store_sales`.`ss_sold_date_sk` = `date_dim`.`d_date_sk` 
> ) 
> GROUP  BY `d_day_name`, 
>   `i_category`;
> {code}
> The reason for that is stats annotation rule for GROUP BY is not considering 
> partition column into account. For the above query, the generated plan is 
> attached. As we can see from the plan, GBY is pushed to fact table 
> (store_sales) but that output of GBY shuffled to perform join instead of 
> MapJoin conversion. This is because of wrong estimation of cardinality/data 
> size of GBY on store_sales (Map 1). 
> What's happening internally is, GBY computes estimated cardinality which in 
> this case is NDV(ss_item_sk) * NDV(ss_sold_date_sk) = 338901 * 1823 ~= 617M. 
> This estimate is wrong as ss_sold_date_sk is partition column and estimator 
> assumes its non-partition column. In this case, not every tasks reads data 
> from all partitions. We need to take estimated task parallelism into account. 
> For example: If task parallelism is determined to be 100 the estimate from 
> GBY should be ~6M which should convert this vertex into map join vertex. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13149) Remove some unnecessary HMS connections from HS2

2016-03-21 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-13149:

Status: Patch Available  (was: In Progress)

Attached the patch-6: fix the failed unit test. CodahaleMetrics will be created 
when we need to access HMS not when every time we start the session, so 
postpone to get an instance of CodahaleMetrics.

> Remove some unnecessary HMS connections from HS2 
> -
>
> Key: HIVE-13149
> URL: https://issues.apache.org/jira/browse/HIVE-13149
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-13149.1.patch, HIVE-13149.2.patch, 
> HIVE-13149.3.patch, HIVE-13149.4.patch, HIVE-13149.5.patch, HIVE-13149.6.patch
>
>
> In SessionState class, currently we will always try to get a HMS connection 
> in {{start(SessionState startSs, boolean isAsync, LogHelper console)}} 
> regardless of if the connection will be used later or not. 
> When SessionState is accessed by the tasks in TaskRunner.java, although most 
> of the tasks other than some like StatsTask, don't need to access HMS. 
> Currently a new HMS connection will be established for each Task thread. If 
> HiveServer2 is configured to run in parallel and the query involves many 
> tasks, then the connections are created but unused.
> {noformat}
>   @Override
>   public void run() {
> runner = Thread.currentThread();
> try {
>   OperationLog.setCurrentOperationLog(operationLog);
>   SessionState.start(ss);
>   runSequential();
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13149) Remove some unnecessary HMS connections from HS2

2016-03-21 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-13149:

Attachment: HIVE-13149.6.patch

> Remove some unnecessary HMS connections from HS2 
> -
>
> Key: HIVE-13149
> URL: https://issues.apache.org/jira/browse/HIVE-13149
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-13149.1.patch, HIVE-13149.2.patch, 
> HIVE-13149.3.patch, HIVE-13149.4.patch, HIVE-13149.5.patch, HIVE-13149.6.patch
>
>
> In SessionState class, currently we will always try to get a HMS connection 
> in {{start(SessionState startSs, boolean isAsync, LogHelper console)}} 
> regardless of if the connection will be used later or not. 
> When SessionState is accessed by the tasks in TaskRunner.java, although most 
> of the tasks other than some like StatsTask, don't need to access HMS. 
> Currently a new HMS connection will be established for each Task thread. If 
> HiveServer2 is configured to run in parallel and the query involves many 
> tasks, then the connections are created but unused.
> {noformat}
>   @Override
>   public void run() {
> runner = Thread.currentThread();
> try {
>   OperationLog.setCurrentOperationLog(operationLog);
>   SessionState.start(ss);
>   runSequential();
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13149) Remove some unnecessary HMS connections from HS2

2016-03-21 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-13149:

Status: In Progress  (was: Patch Available)

> Remove some unnecessary HMS connections from HS2 
> -
>
> Key: HIVE-13149
> URL: https://issues.apache.org/jira/browse/HIVE-13149
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-13149.1.patch, HIVE-13149.2.patch, 
> HIVE-13149.3.patch, HIVE-13149.4.patch, HIVE-13149.5.patch
>
>
> In SessionState class, currently we will always try to get a HMS connection 
> in {{start(SessionState startSs, boolean isAsync, LogHelper console)}} 
> regardless of if the connection will be used later or not. 
> When SessionState is accessed by the tasks in TaskRunner.java, although most 
> of the tasks other than some like StatsTask, don't need to access HMS. 
> Currently a new HMS connection will be established for each Task thread. If 
> HiveServer2 is configured to run in parallel and the query involves many 
> tasks, then the connections are created but unused.
> {noformat}
>   @Override
>   public void run() {
> runner = Thread.currentThread();
> try {
>   OperationLog.setCurrentOperationLog(operationLog);
>   SessionState.start(ss);
>   runSequential();
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13149) Remove some unnecessary HMS connections from HS2

2016-03-21 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-13149:

Attachment: (was: HIVE-13149.6.patch)

> Remove some unnecessary HMS connections from HS2 
> -
>
> Key: HIVE-13149
> URL: https://issues.apache.org/jira/browse/HIVE-13149
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-13149.1.patch, HIVE-13149.2.patch, 
> HIVE-13149.3.patch, HIVE-13149.4.patch, HIVE-13149.5.patch
>
>
> In SessionState class, currently we will always try to get a HMS connection 
> in {{start(SessionState startSs, boolean isAsync, LogHelper console)}} 
> regardless of if the connection will be used later or not. 
> When SessionState is accessed by the tasks in TaskRunner.java, although most 
> of the tasks other than some like StatsTask, don't need to access HMS. 
> Currently a new HMS connection will be established for each Task thread. If 
> HiveServer2 is configured to run in parallel and the query involves many 
> tasks, then the connections are created but unused.
> {noformat}
>   @Override
>   public void run() {
> runner = Thread.currentThread();
> try {
>   OperationLog.setCurrentOperationLog(operationLog);
>   SessionState.start(ss);
>   runSequential();
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13294) AvroSerde leaks the connection in a case when reading schema from a url

2016-03-21 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-13294:
---
   Resolution: Fixed
Fix Version/s: 2.0.1
   2.1.0
   Status: Resolved  (was: Patch Available)

Committed to 2.1.0 and 2.0.1. Thanks [~aihuaxu] for review.

> AvroSerde leaks the connection in a case when reading schema from a url
> ---
>
> Key: HIVE-13294
> URL: https://issues.apache.org/jira/browse/HIVE-13294
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 2.1.0, 2.0.1
>
> Attachments: HIVE-13294.1.patch, HIVE-13294.patch
>
>
> AvroSerde leaks the connection in a case when reading schema from url:
> In 
> public static Schema determineSchemaOrThrowException {
> ...
> return AvroSerdeUtils.getSchemaFor(new URL(schemaString).openStream());
> ...
> }
> The opened inputStream is never closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13298) nested join support causes undecipherable errors in SemanticAnalyzer

2016-03-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204455#comment-15204455
 ] 

Hive QA commented on HIVE-13298:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12794283/HIVE-13298.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9832 tests executed
*Failed tests:*
{noformat}
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7329/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7329/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7329/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12794283 - PreCommit-HIVE-TRUNK-Build

> nested join support causes undecipherable errors in SemanticAnalyzer
> 
>
> Key: HIVE-13298
> URL: https://issues.apache.org/jira/browse/HIVE-13298
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13298.01.patch, HIVE-13298.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13141) Hive on Spark over HBase should accept parameters starting with "zookeeper.znode"

2016-03-21 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-13141:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.1.0
   Status: Resolved  (was: Patch Available)

Thanks Nemon for the patch. Integrated into trunk.

> Hive on Spark over HBase should accept parameters starting with 
> "zookeeper.znode"
> -
>
> Key: HIVE-13141
> URL: https://issues.apache.org/jira/browse/HIVE-13141
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Minor
> Fix For: 2.1.0
>
> Attachments: HIVE-13141.patch
>
>
> HBase related paramters has been added by HIVE-12708.
> Following the same way,parameters starting with "zookeeper.znode" should be 
> add too,which are also HBase related paramters .
> Refering to http://blog.cloudera.com/blog/2013/10/what-are-hbase-znodes/
> I have seen a failure with Hive on Spark over HBase  due to customize 
> zookeeper.znode.parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-13306) Better Decimal vectorization

2016-03-21 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-13306:
-

Assignee: Teddy Choi

> Better Decimal vectorization
> 
>
> Key: HIVE-13306
> URL: https://issues.apache.org/jira/browse/HIVE-13306
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
>
> Decimal Vectorization Requirements
> • Today, the LongColumnVector, DoubleColumnVector, BytesColumnVector, 
> TimestampColumnVector classes store the data as primitive Java data types 
> long, double, or byte arrays for efficiency.
> • DecimalColumnVector is different - it has an array of Object references 
> to HiveDecimal objects.
> • The HiveDecimal object uses an internal object BigDecimal for its 
> implementation.  Further, BigDecimal itself uses an internal object 
> BigInteger for its implementation, and BigInteger uses an int array.  4 
> objects total.
> • And, HiveDecimal is an immutable object which means arithmetic and 
> other operations produce new HiveDecimal object with 3 new objects underneath.
> • A major reason Vectorization is fast is the ColumnVector classes except 
> DecimalColumnVector do not have to allocate additional memory per row.   This 
> avoids memory fragmentation and pressure on the Java Garbage Collector that 
> DecimalColumnVector can generate.  It is very significant.
> • What can be done with DecimalColumnVector to make it much more 
> efficient?
> o Design several new decimal classes that allow the caller to manage the 
> decimal storage.
> o If it takes N int values to store a decimal (e.g. N=1..5), then a new 
> DecimalColumnVector would have an int[] of length N*1024 (where 1024 is the 
> default column vector size).
> o Why store a decimal in separate int values?
> • Java does not support 128 bit integers.
> • Java does not support unsigned integers.
> • In order to do multiplication of a decimal represented in a long you 
> need twice the storage (i.e. 128 bits).  So you need to represent parts in 32 
> bit integers.
> • But really since we do not have unsigned, really you can only do 
> multiplications on N-1 bits or 31 bits.
> • So, 5 ints are needed for decimal storage... of 38 digits.
> o It makes sense to have just one algorithm for decimals rather than one 
> for HiveDecimal and another for DecimalColumnVector.  So, make HiveDecimal 
> store N int values, too.
> o A lower level primitive decimal class would accept decimals stored as 
> int arrays and produces results into int arrays.  It would be used by 
> HiveDecimal and DecimalColumnVector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >