[jira] [Commented] (HIVE-10005) remove some unnecessary branches from the inner loop

2015-03-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366978#comment-14366978
 ] 

Hive QA commented on HIVE-10005:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12705309/HIVE-10005.1.patch

{color:red}ERROR:{color} -1 due to 194 failed/errored test(s), 7770 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join19
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join21
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join24
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join26
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join33
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_stats2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_if_with_path_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_spark1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_spark2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_spark3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_spark4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_7

[jira] [Commented] (HIVE-10006) RSC has memory leak while execute multi queries.[Spark Branch]

2015-03-18 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366899#comment-14366899
 ] 

Chengxiang Li commented on HIVE-10006:
--

Root Cause:
In RSC, while spark call CombineHiveInputFormat::getSplits to split the job 
into tasks in a thread called dag-scheduler-event-loop, MapWork would be 
added to a ThreadLocal map of dag-scheduler-event-loop, and never get 
removed. As the dag-scheduler-event-loop thread is a long live and daemon 
thread, so all the MapWorks would be hold in the ThreadLocal map until RSC jvm 
crash or exit.
Hive hit this issue on MR mode as well, it just lucky that the thread which 
calls CombineHiveInputFormat::getSplits is TaskRunner, which would be abandoned 
after query finished, so Hive driver does not get memory leak on this.

 RSC has memory leak while execute multi queries.[Spark Branch]
 --

 Key: HIVE-10006
 URL: https://issues.apache.org/jira/browse/HIVE-10006
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 1.1.0
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Critical
  Labels: Spark-M5

 While execute query with RSC, MapWork/ReduceWork number is increased all the 
 time, and lead to OOM at the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10005) remove some unnecessary branches from the inner loop

2015-03-18 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-10005:
--
Attachment: HIVE-10005.1.patch

 remove some unnecessary branches from the inner loop
 

 Key: HIVE-10005
 URL: https://issues.apache.org/jira/browse/HIVE-10005
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-10005.1.patch


 Operator.forward is doing too much. There's no reason to do the done 
 checking per row and update it inline. It's much more efficient to just do 
 that when the event that completes an operator happens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10003) MiniTez ut fail with missing configs

2015-03-18 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366871#comment-14366871
 ] 

Gopal V commented on HIVE-10003:


[~hagleitn]: The LLAP mode is turned on for MiniTez tests as well.

https://github.com/apache/hive/blob/llap/data/conf/tez/hive-site.xml#L43

 MiniTez ut fail with missing configs
 

 Key: HIVE-10003
 URL: https://issues.apache.org/jira/browse/HIVE-10003
 Project: Hive
  Issue Type: Sub-task
Reporter: Gunther Hagleitner
Assignee: Gopal V
 Attachments: HIVE-10003.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10003) MiniTez ut fail with missing configs

2015-03-18 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366904#comment-14366904
 ] 

Gopal V commented on HIVE-10003:


[~sseth]: I tried copying your default llap-daemon-site.xml from 
src/test/resources into the data/conf/tez directory to match the locations, but 
that doesn't work at all.

Looks like this config file is not getting added to the MiniTez cluster in the 
test-cases. Can take a look at this later, but assigned to you since you might 
know whether we can run anything with a MiniTez without an Llap daemon attached 
to it, even if I configure localhost into the hosts list?

 MiniTez ut fail with missing configs
 

 Key: HIVE-10003
 URL: https://issues.apache.org/jira/browse/HIVE-10003
 Project: Hive
  Issue Type: Sub-task
Reporter: Gunther Hagleitner
Assignee: Siddharth Seth
 Attachments: HIVE-10003.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10003) MiniTez ut fail with missing configs

2015-03-18 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366889#comment-14366889
 ] 

Gopal V commented on HIVE-10003:


I had to rebase to reproduce this error, looks like this came in as part of 
HIVE-.

 MiniTez ut fail with missing configs
 

 Key: HIVE-10003
 URL: https://issues.apache.org/jira/browse/HIVE-10003
 Project: Hive
  Issue Type: Sub-task
Reporter: Gunther Hagleitner
Assignee: Siddharth Seth
 Attachments: HIVE-10003.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10003) MiniTez ut fail with missing configs

2015-03-18 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10003:
---
Assignee: Siddharth Seth  (was: Gopal V)

 MiniTez ut fail with missing configs
 

 Key: HIVE-10003
 URL: https://issues.apache.org/jira/browse/HIVE-10003
 Project: Hive
  Issue Type: Sub-task
Reporter: Gunther Hagleitner
Assignee: Siddharth Seth
 Attachments: HIVE-10003.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9970) Hive on spark

2015-03-18 Thread Amithsha (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366945#comment-14366945
 ] 

Amithsha commented on HIVE-9970:


And also while using beeline i am getting this error




2015-03-18 16:03:21,458 ERROR [pool-3-thread-8]: DataNucleus.Datastore 
(Log4JLogger.java:error(115)) - An exception was thrown while adding/validating 
class(es) : Specified key was too long; max key length is 767 bytes
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was 
too long; max key length is 767 bytes
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.Util.getInstance(Util.java:386)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1054)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4237)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4169)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2617)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2778)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2819)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2768)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:949)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:795)
at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:254)
at 
org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:760)
at 
org.datanucleus.store.rdbms.table.TableImpl.createIndices(TableImpl.java:648)
at 
org.datanucleus.store.rdbms.table.TableImpl.validateIndices(TableImpl.java:593)
at 
org.datanucleus.store.rdbms.table.TableImpl.validateConstraints(TableImpl.java:390)
at 
org.datanucleus.store.rdbms.table.ClassTable.validateConstraints(ClassTable.java:3463)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3464)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:3190)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2841)
at 
org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:122)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:1605)
at 
org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:954)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:679)
at 
org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:408)
at 
org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:947)
at 
org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:370)
at org.datanucleus.store.query.Query.executeQuery(Query.java:1744)
at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672)
at org.datanucleus.store.query.Query.execute(Query.java:1654)
at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:221)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.ensureDbInit(MetaStoreDirectSql.java:172)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.init(MetaStoreDirectSql.java:130)
at 
org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:275)
at 
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:238)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:56)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:65)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:579)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:557)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:933)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:907)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 

[jira] [Updated] (HIVE-9664) Hive add jar command should be able to download and add jars from a repository

2015-03-18 Thread Anant Nag (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anant Nag updated HIVE-9664:

Attachment: HIVE-9664.patch

 Hive add jar command should be able to download and add jars from a 
 repository
 

 Key: HIVE-9664
 URL: https://issues.apache.org/jira/browse/HIVE-9664
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Anant Nag
Assignee: Anant Nag
  Labels: hive, patch
 Attachments: HIVE-9664.patch, HIVE-9664.patch, HIVE-9664.patch


 Currently Hive's add jar command takes a local path to the dependency jar. 
 This clutters the local file-system as users may forget to remove this jar 
 later
 It would be nice if Hive supported a Gradle like notation to download the jar 
 from a repository.
 Example:  add jar org:module:version
 
 It should also be backward compatible and should take jar from the local 
 file-system as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9970) Hive on spark

2015-03-18 Thread Amithsha (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367048#comment-14367048
 ] 

Amithsha commented on HIVE-9970:


mysql version
Server version: 5.1.73-log Source distribution

 Hive on spark
 -

 Key: HIVE-9970
 URL: https://issues.apache.org/jira/browse/HIVE-9970
 Project: Hive
  Issue Type: Bug
Reporter: Amithsha

 Hi all,
 Recently i have configured Spark 1.2.0 and my environment is hadoop
 2.6.0 hive 1.1.0 Here i have tried hive on Spark while executing
 insert into i am getting the following g error.
 Query ID = hadoop2_20150313162828_8764adad-a8e4-49da-9ef5-35e4ebd6bc63
 Total jobs = 1
 Launching Job 1 out of 1
 In order to change the average load for a reducer (in bytes):
 set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
 set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
 set mapreduce.job.reduces=number
 Failed to execute spark task, with exception
 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create
 spark client.)'
 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.spark.SparkTask
 Have added the spark-assembly jar in hive lib
 And also in hive console using the command add jar followed by the steps
 set spark.home=/opt/spark-1.2.1/;
 add jar 
 /opt/spark-1.2.1/assembly/target/scala-2.10/spark-assembly-1.2.1-hadoop2.4.0.jar;
 set hive.execution.engine=spark;
 set spark.master=spark://xxx:7077;
 set spark.eventLog.enabled=true;
 set spark.executor.memory=512m;
 set spark.serializer=org.apache.spark.serializer.KryoSerializer;
 Can anyone suggest
 Thanks  Regards
 Amithsha



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]

2015-03-18 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367107#comment-14367107
 ] 

Chao commented on HIVE-9697:


[~lirui]: Yes, MR doesn't use stats. ContentSummary is the file size, so maybe 
that can (partially) explain why MR is more optimistic?

I briefly looked at the code. Looks like hive.stats.collect.rawdatasize 
controls whether a virtual column for rawDataSize is added to a TableScanDesc. 
This is later used in TableScanOperator#gatherStats, to determine whether 
rawDataSize will be collected.
This property is set to true in default.

But I haven't found out the relationship between stats collected through 
TableScanOperator#gatherStats and Statistics used by map join. Will investigate 
more later.

 Hive on Spark is not as aggressive as MR on map join [Spark Branch]
 ---

 Key: HIVE-9697
 URL: https://issues.apache.org/jira/browse/HIVE-9697
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xin Hao

 We have a finding during running some Big-Bench cases:
 when the same small table size threshold is used, Map Join operator will not 
 be generated in Stage Plans for Hive on Spark, while will be generated for 
 Hive on MR.
 For example, When we run BigBench Q25, the meta info of one input ORC table 
 is as below:
 totalSize=1748955 (about 1.5M)
 rawDataSize=123050375 (about 120M)
 If we use the following parameter settings,
 set hive.auto.convert.join=true;
 set hive.mapjoin.smalltable.filesize=2500;
 set hive.auto.convert.join.noconditionaltask=true;
 set hive.auto.convert.join.noconditionaltask.size=1; (100M)
 Map Join will be enabled for Hive on MR mode, while will not be enabled for 
 Hive on Spark.
 We found that for Hive on MR, the HDFS file size for the table 
 (ContentSummary.getLength(), should approximate the value of ‘totalSize’) 
 will be used to compare with the threshold 100M (smaller than 100M), while 
 for Hive on Spark 'rawDataSize' will be used to compare with the threshold 
 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark 
 for this case. And as a result Hive on Spark will get much lower performance 
 data than Hive on MR for this case.
 When we set  hive.auto.convert.join.noconditionaltask.size=15000; (150M), 
 MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have 
 similar performance data with Hive on MR by then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-9962) JsonSerDe does not support reader schema different from data schema

2015-03-18 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-9962.
-
Resolution: Fixed

 JsonSerDe does not support reader schema different from data schema
 ---

 Key: HIVE-9962
 URL: https://issues.apache.org/jira/browse/HIVE-9962
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog, Serializers/Deserializers
Reporter: Johndee Burks
Assignee: Naveen Gangam
Priority: Minor

 To reproduce the limitation do the following. 
 Create a two tables the first with full schema and the second with partial 
 schema. 
 {code}
 add jar 
 /opt/cloudera/parcels/CDH/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;
 CREATE TABLE json_full
 (autopolicy structis_active:boolean, policy_holder_name:string, 
 policy_num:string, vehicle:structbrand:structmodel:string, year:int, 
 price:double, vin:string)
 ROW FORMAT SERDE 'org.apache.hcatalog.data.JsonSerDe';
 CREATE TABLE json_part 
 (autopolicy structis_active:boolean, policy_holder_name:string, 
 policy_num:string, vehicle:structbrand:structmodel:string, year:int, 
 price:double)
 ROW FORMAT SERDE 'org.apache.hcatalog.data.JsonSerDe';
 {code}
 The data for the table is below: 
 {code}
 {autopolicy: {policy_holder_name: someone, policy_num: 20141012, 
 is_active: true, vehicle: {brand: {model: Lexus, year: 2012}, 
 vin: RANDOM123, price: 23450.50}}}
 {code}
 I put that data into a file and load it into the tables like this: 
 {code}
 load data local inpath 'data.json' into table json_full;
 load data local inpath 'data.json' into table json_part;
 {code}
 Then do a select against each table: 
 {code}
 select * from json_full;
 select * from json_part;
 {code}
 The second select should fail with an error simlar to that below: 
 {code}
 15/03/12 23:19:30 [main]: ERROR CliDriver: Failed with exception 
 java.io.IOException:java.lang.NullPointerException
 {code}
 The code that throws this error is below: 
 {code}
 172 private void populateRecord(ListObject r, JsonToken token, JsonParser 
 p, HCatSchema s) throws IOException { 
 173 if (token != JsonToken.FIELD_NAME) { 
 174 throw new IOException(Field name expected); 
 175 } 
 176 String fieldName = p.getText(); 
 177 int fpos; 
 178 try { 
 179 fpos = s.getPosition(fieldName); 
 180 } catch (NullPointerException npe) { 
 181 fpos = getPositionFromHiveInternalColumnName(fieldName); 
 182 LOG.debug(NPE finding position for field [{}] in schema [{}], 
 fieldName, s); 
 183 if (!fieldName.equalsIgnoreCase(getHiveInternalColumnName(fpos))) { 
 184 LOG.error(Hive internal column name {} and position  
 185 + encoding {} for the column name are at odds, fieldName, fpos); 
 186 throw npe; 
 187 } 
 188 if (fpos == -1) { 
 189 return; // unknown field, we return. 
 190 } 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9962) JsonSerDe does not support reader schema different from data schema

2015-03-18 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367211#comment-14367211
 ] 

Naveen Gangam commented on HIVE-9962:
-

I believe this has already been addressed via HIVE-6166 in hive 0.13 release. I 
have tested it and it appears to be working. The SerDe class being used above, 
org.apache.hcatalog.data.JsonSerDe, no longer exists. Please use the class from 
org.apache.hive.hcatalog.data.JsonSerDe package.

The table definition should look like this
{code}
CREATE TABLE json_part 
(autopolicy structis_active:boolean, policy_holder_name:string, 
policy_num:string, vehicle:structbrand:structmodel:string, year:int, 
price:double)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';
{code}

Closing this JIRA as duplicate. Please re-open if you have concerns.

 JsonSerDe does not support reader schema different from data schema
 ---

 Key: HIVE-9962
 URL: https://issues.apache.org/jira/browse/HIVE-9962
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog, Serializers/Deserializers
Reporter: Johndee Burks
Assignee: Naveen Gangam
Priority: Minor

 To reproduce the limitation do the following. 
 Create a two tables the first with full schema and the second with partial 
 schema. 
 {code}
 add jar 
 /opt/cloudera/parcels/CDH/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;
 CREATE TABLE json_full
 (autopolicy structis_active:boolean, policy_holder_name:string, 
 policy_num:string, vehicle:structbrand:structmodel:string, year:int, 
 price:double, vin:string)
 ROW FORMAT SERDE 'org.apache.hcatalog.data.JsonSerDe';
 CREATE TABLE json_part 
 (autopolicy structis_active:boolean, policy_holder_name:string, 
 policy_num:string, vehicle:structbrand:structmodel:string, year:int, 
 price:double)
 ROW FORMAT SERDE 'org.apache.hcatalog.data.JsonSerDe';
 {code}
 The data for the table is below: 
 {code}
 {autopolicy: {policy_holder_name: someone, policy_num: 20141012, 
 is_active: true, vehicle: {brand: {model: Lexus, year: 2012}, 
 vin: RANDOM123, price: 23450.50}}}
 {code}
 I put that data into a file and load it into the tables like this: 
 {code}
 load data local inpath 'data.json' into table json_full;
 load data local inpath 'data.json' into table json_part;
 {code}
 Then do a select against each table: 
 {code}
 select * from json_full;
 select * from json_part;
 {code}
 The second select should fail with an error simlar to that below: 
 {code}
 15/03/12 23:19:30 [main]: ERROR CliDriver: Failed with exception 
 java.io.IOException:java.lang.NullPointerException
 {code}
 The code that throws this error is below: 
 {code}
 172 private void populateRecord(ListObject r, JsonToken token, JsonParser 
 p, HCatSchema s) throws IOException { 
 173 if (token != JsonToken.FIELD_NAME) { 
 174 throw new IOException(Field name expected); 
 175 } 
 176 String fieldName = p.getText(); 
 177 int fpos; 
 178 try { 
 179 fpos = s.getPosition(fieldName); 
 180 } catch (NullPointerException npe) { 
 181 fpos = getPositionFromHiveInternalColumnName(fieldName); 
 182 LOG.debug(NPE finding position for field [{}] in schema [{}], 
 fieldName, s); 
 183 if (!fieldName.equalsIgnoreCase(getHiveInternalColumnName(fpos))) { 
 184 LOG.error(Hive internal column name {} and position  
 185 + encoding {} for the column name are at odds, fieldName, fpos); 
 186 throw npe; 
 187 } 
 188 if (fpos == -1) { 
 189 return; // unknown field, we return. 
 190 } 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9523) For partitioned tables same optimizations should be available as for bucketed tables and vice versa: ①[Sort Merge] PARTITION Map join and ②BUCKET pruning

2015-03-18 Thread Maciek Kocon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciek Kocon updated HIVE-9523:
---
  Description: 
Logically and functionally bucketing and partitioning are quite similar - both 
provide mechanism to segregate and separate the table's data based on its 
content. Thanks to that significant further optimisations like [partition] 
PRUNING or [bucket] MAP JOIN are possible.
The difference seems to be imposed by design where the PARTITIONing is 
open/explicit while BUCKETing is discrete/implicit.
Partitioning seems to be very common if not a standard feature in all current 
RDBMS while BUCKETING seems to be HIVE specific only.
In a way BUCKETING could be also called by hashing or simply IMPLICIT 
PARTITIONING.

Regardless of the fact that these two are recognised as two separate features 
available in Hive there should be nothing to prevent leveraging same existing 
query/join optimisations across the two.


①[Sort Merge] PARTITION Map join
Enable Bucket Map Join or better, the Sort Merge Bucket Map Join equivalent 
optimisations when PARTITIONING is used exclusively or in combination with 
BUCKETING.

For JOIN conditions where partitioning criteria are used respectively:
⋮ 
FROM TabA JOIN TabB
   ON TabA.partCol1 = TabB.partCol2
   AND TabA.partCol2 = TabB.partCol2

the optimizer could/should choose to treat it the same way as with bucketed 
tables: ⋮ 
FROM TabC
  JOIN TabD
 ON TabC.clusteredByCol1 = TabD.clusteredByCol2
   AND TabC.clusteredByCol2 = TabD.clusteredByCol2

and use either Bucket Map Join or better, the Sort Merge Bucket Map Join.

This is based on fact that same way as buckets translate to separate files, the 
partitions essentially provide the same mapping.
When data locality is known the optimizer could focus only on joining 
corresponding partitions rather than whole data sets.

②BUCKET pruning
Enable partition PRUNING equivalent optimisation for queries on BUCKETED tables

Simplest example is for queries like:
SELECT … FROM x WHERE colA=123123
to read only the relevant bucket file rather than all file-buckets that belong 
to a table.

  was:
For JOIN conditions where partitioning criteria are used respectively:
⋮ 
FROM TabA JOIN TabB
   ON TabA.partCol1 = TabB.partCol2
   AND TabA.partCol2 = TabB.partCol2

the optimizer could/should choose to treat it the same way as with bucketed 
tables: ⋮ 
FROM TabC
  JOIN TabD
 ON TabC.clusteredByCol1 = TabD.clusteredByCol2
   AND TabC.clusteredByCol2 = TabD.clusteredByCol2

and use either Bucket Map Join or better, the Sort Merge Bucket Map Join.

This is based on fact that same way as buckets translate to separate files, the 
partitions essentially provide the same mapping.
When data locality is known the optimizer could focus only on joining 
corresponding partitions rather than whole data sets.

#side notes:
⦿ Currently Table DDL Syntax where Partitioning and Bucketing defined at the 
same time is allowed:
CREATE TABLE
 ⋮
PARTITIONED BY(…) CLUSTERED BY(…) INTO … BUCKETS;

But in this case optimizer never chooses to use Bucket Map Join or Sort Merge 
Bucket Map Join which defeats the purpose of creating BUCKETed tables in such 
scenarios. Should that be raised as a separate BUG?

⦿ Currently partitioning and bucketing are two separate things but serve same 
purpose - shouldn't the concept be merged (explicit/implicit partitions?)

Affects Version/s: 1.1.0
   1.0.0
  Summary: For partitioned tables same optimizations should be 
available as for bucketed tables and vice versa: ①[Sort Merge] PARTITION Map 
join and ②BUCKET pruning  (was: when columns on which tables are partitioned 
are used in the join condition same join optimizations as for bucketed tables 
should be applied)

 For partitioned tables same optimizations should be available as for bucketed 
 tables and vice versa: ①[Sort Merge] PARTITION Map join and ②BUCKET pruning
 -

 Key: HIVE-9523
 URL: https://issues.apache.org/jira/browse/HIVE-9523
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer, Physical Optimizer, SQL
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.0
Reporter: Maciek Kocon
  Labels: gsoc2015

 Logically and functionally bucketing and partitioning are quite similar - 
 both provide mechanism to segregate and separate the table's data based on 
 its content. Thanks to that significant further optimisations like 
 [partition] PRUNING or [bucket] MAP JOIN are possible.
 The difference seems to be imposed by design where the PARTITIONing is 
 open/explicit while BUCKETing is discrete/implicit.
 Partitioning seems to be very common if 

[jira] [Assigned] (HIVE-9828) Semantic analyzer does not capture view parent entity for tables referred in view with union all

2015-03-18 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar reassigned HIVE-9828:
-

Assignee: Prasad Mujumdar

 Semantic analyzer does not capture view parent entity for tables referred in 
 view with union all 
 -

 Key: HIVE-9828
 URL: https://issues.apache.org/jira/browse/HIVE-9828
 Project: Hive
  Issue Type: Bug
  Components: Parser
Affects Versions: 1.1.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 1.2.0

 Attachments: HIVE-9828.1-npf.patch, HIVE-9828.1-npf.patch


 Hive compiler adds tables used in a view definition in the input entity list, 
 with the view as parent entity for the table.
 In case of a view with union all query, this is not being done property. For 
 example,
 {noformat}
 create view view1 as select t.id from (select tab1.id from db.tab1 union all 
 select tab2.id from db.tab2 ) t;
 {noformat}
 This query will capture tab1 and tab2 as read entity without view1 as parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10008) Need to refactor itests for hbase metastore [hbase-metastore branch]

2015-03-18 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-10008:
--
Attachment: HIVE-10008.2.patch

Previous patch accidentally omitted one commit. 

 Need to refactor itests for hbase metastore [hbase-metastore branch]
 

 Key: HIVE-10008
 URL: https://issues.apache.org/jira/browse/HIVE-10008
 Project: Hive
  Issue Type: Task
  Components: Tests
Affects Versions: hbase-metastore-branch
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-10008.2.patch, HIVE-10008.patch


 Much of the infrastructure for the itest/hive-unit/.../metastore/hbase tests 
 is repeated in each test.  This needs to be factored out into a base class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10009) LazyObjectInspectorFactory is not thread safe [Spark Branch]

2015-03-18 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-10009:
---
Affects Version/s: spark-branch

 LazyObjectInspectorFactory is not thread safe [Spark Branch]
 

 Key: HIVE-10009
 URL: https://issues.apache.org/jira/browse/HIVE-10009
 Project: Hive
  Issue Type: Bug
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch


 LazyObjectInspectorFactory is not thread safe, which causes random failures 
 in multiple thread environment such as Hive on Spark. We got exceptions like 
 below
 {noformat}
 java.lang.RuntimeException: Map operator initialization failed: 
 java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector
  cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:55)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
   at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170)
   at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170)
   at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
   at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
   at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:64)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector
  cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:154)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:199)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:355)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:92)
   ... 16 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-9979) LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data

2015-03-18 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-9979:
--

Assignee: Sergey Shelukhin  (was: Prasanth Jayachandran)

 LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data
 

 Key: HIVE-9979
 URL: https://issues.apache.org/jira/browse/HIVE-9979
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Sergey Shelukhin

 When the cache is enabled, queries throws different over-read exceptions.
 Looks like the batchSize changes as you read data, the end of stripe 
 batchSize is smaller than the default size (the super calls change it).
 {code}
 Caused by: java.io.EOFException: Can't finish byte read from uncompressed 
 stream DATA position: 262144 length: 262144 range: 0 offset: 46399488 limit: 
 46399488
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.commonReadByteArrays(RecordReaderImpl.java:1556)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.readOrcByteArrays(RecordReaderImpl.java:1569)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDirectTreeReader.nextVector(RecordReaderImpl.java:1691)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.nextVector(RecordReaderImpl.java:1517)
 at 
 org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:115)
 at 
 org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:108)
 at 
 org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:35)
 at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:314)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:280)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44)
 at 
 org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
 ... 4 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9919) upgrade scripts don't work on some auto-created DBs due to absence of tables

2015-03-18 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367861#comment-14367861
 ] 

Sergey Shelukhin commented on HIVE-9919:


[~thejas] can you take a look at the new patch? thanks

 upgrade scripts don't work on some auto-created DBs due to absence of tables
 

 Key: HIVE-9919
 URL: https://issues.apache.org/jira/browse/HIVE-9919
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-9919.01.patch, HIVE-9919.patch


 DataNucleus in its infinite wisdom doesn't create all tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9994) Hive query plan returns sensitive data to external applications

2015-03-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367882#comment-14367882
 ] 

Hive QA commented on HIVE-9994:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12705411/HIVE-9994.1.patch

{color:green}SUCCESS:{color} +1 7771 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3071/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3071/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3071/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12705411 - PreCommit-HIVE-TRUNK-Build

 Hive query plan returns sensitive data to external applications
 ---

 Key: HIVE-9994
 URL: https://issues.apache.org/jira/browse/HIVE-9994
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9994.1.patch, HIVE-9994.2.patch, HIVE-9994.3.patch


 Some applications are using getQueryString() method from the QueryPlan class 
 to get the query that is being executed by Hive. The query string returned is 
 not redacted, and it is returning sensitive information that is logged in 
 Navigator.
 We need to return data redacted from the QueryPlan to avoid other 
 applications to log sensitive data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9994) Hive query plan returns sensitive data to external applications

2015-03-18 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367683#comment-14367683
 ] 

Xuefu Zhang commented on HIVE-9994:
---

Patch looks good. One question: do we need to check null for the input in 
redactLogString() as it's a public method?

 Hive query plan returns sensitive data to external applications
 ---

 Key: HIVE-9994
 URL: https://issues.apache.org/jira/browse/HIVE-9994
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9994.1.patch


 Some applications are using getQueryString() method from the QueryPlan class 
 to get the query that is being executed by Hive. The query string returned is 
 not redacted, and it is returning sensitive information that is logged in 
 Navigator.
 We need to return data redacted from the QueryPlan to avoid other 
 applications to log sensitive data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9480) Build UDF TRUNC to implement FIRST_DAY as compared with LAST_DAY

2015-03-18 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-9480:
-
Labels:   (was: TODOC1.2)

 Build UDF TRUNC to implement FIRST_DAY as compared with LAST_DAY
 

 Key: HIVE-9480
 URL: https://issues.apache.org/jira/browse/HIVE-9480
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
 Fix For: 1.2.0

 Attachments: HIVE-9480.1.patch, HIVE-9480.3.patch, HIVE-9480.4.patch, 
 HIVE-9480.5.patch, HIVE-9480.6.patch, HIVE-9480.7.patch, HIVE-9480.8.patch, 
 HIVE-9480.9.patch


 Hive already supports LAST_DAY UDF, in some cases, FIRST_DAY is necessary to 
 do date/timestamp related computation. This JIRA is to track such an 
 implementation. Choose to impl TRUNC, a more standard way to get the first 
 day of a a month, e.g., SELECT TRUNC('2009-12-12', 'MM'); will return 
 2009-12-01, SELECT TRUNC('2009-12-12', 'YEAR'); will return 2009-01-01.
 BTW, this TRUNC is not as feature complete as aligned with Oracle one. only 
 'MM' and 'YEAR' are supported as format, however, it's a base to add on other 
 formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10002) fix yarn service registry not found in ut problem

2015-03-18 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-10002:
--
Attachment: HIVE-10002.1.patch

 fix yarn service registry not found in ut problem
 -

 Key: HIVE-10002
 URL: https://issues.apache.org/jira/browse/HIVE-10002
 Project: Hive
  Issue Type: Sub-task
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-10002.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9956) use BigDecimal.valueOf instead of new in TestFileDump

2015-03-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366756#comment-14366756
 ] 

Hive QA commented on HIVE-9956:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12705278/HIVE-9956.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7770 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3068/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3068/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3068/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12705278 - PreCommit-HIVE-TRUNK-Build

 use BigDecimal.valueOf instead of new in TestFileDump
 -

 Key: HIVE-9956
 URL: https://issues.apache.org/jira/browse/HIVE-9956
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-9956.1.patch, HIVE-9956.1.patch


 TestFileDump builds data row where one of the column is BigDecimal
 The test adds value 2.
 There are 2 ways to create BigDecimal object.
 1. use new
 2. use valueOf
 in this particular case 
 1. new will create 2.222153
 2. valueOf will use the canonical String representation and the result will 
 be 2.
 Probably we should use valueOf to create BigDecimal object
 TestTimestampWritable and TestHCatStores use valueOf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10003) MiniTez ut fail with missing configs

2015-03-18 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10003:
---
Attachment: HIVE-10003.1.patch

 MiniTez ut fail with missing configs
 

 Key: HIVE-10003
 URL: https://issues.apache.org/jira/browse/HIVE-10003
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gopal V
 Attachments: HIVE-10003.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10003) MiniTez ut fail with missing configs

2015-03-18 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-10003:
--
Issue Type: Sub-task  (was: Bug)
Parent: HIVE-7926

 MiniTez ut fail with missing configs
 

 Key: HIVE-10003
 URL: https://issues.apache.org/jira/browse/HIVE-10003
 Project: Hive
  Issue Type: Sub-task
Reporter: Gunther Hagleitner
Assignee: Gopal V
 Attachments: HIVE-10003.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10004) yarn service registry should be shim'd

2015-03-18 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-10004:
--
Issue Type: Sub-task  (was: Bug)
Parent: HIVE-7926

 yarn service registry should be shim'd
 --

 Key: HIVE-10004
 URL: https://issues.apache.org/jira/browse/HIVE-10004
 Project: Hive
  Issue Type: Sub-task
Reporter: Gunther Hagleitner
Assignee: Gopal V





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10004) yarn service registry should be an optional dependency

2015-03-18 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10004:
---
Summary: yarn service registry should be an optional dependency  (was: yarn 
service registry should be shim'd)

 yarn service registry should be an optional dependency
 --

 Key: HIVE-10004
 URL: https://issues.apache.org/jira/browse/HIVE-10004
 Project: Hive
  Issue Type: Sub-task
Reporter: Gunther Hagleitner
Assignee: Gopal V





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9277) Hybrid Hybrid Grace Hash Join

2015-03-18 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-9277:

Attachment: HIVE-9277.13.patch

WIP, upload 13th patch for tesing

 Hybrid Hybrid Grace Hash Join
 -

 Key: HIVE-9277
 URL: https://issues.apache.org/jira/browse/HIVE-9277
 Project: Hive
  Issue Type: New Feature
  Components: Physical Optimizer
Reporter: Wei Zheng
Assignee: Wei Zheng
  Labels: join
 Attachments: HIVE-9277.01.patch, HIVE-9277.02.patch, 
 HIVE-9277.03.patch, HIVE-9277.04.patch, HIVE-9277.05.patch, 
 HIVE-9277.06.patch, HIVE-9277.07.patch, HIVE-9277.08.patch, 
 HIVE-9277.13.patch, High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf


 We are proposing an enhanced hash join algorithm called _“hybrid hybrid grace 
 hash join”_.
 We can benefit from this feature as illustrated below:
 * The query will not fail even if the estimated memory requirement is 
 slightly wrong
 * Expensive garbage collection overhead can be avoided when hash table grows
 * Join execution using a Map join operator even though the small table 
 doesn't fit in memory as spilling some data from the build and probe sides 
 will still be cheaper than having to shuffle the large fact table
 The design was based on Hadoop’s parallel processing capability and 
 significant amount of memory available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9980) LLAP: Pass additional JVM args via Slider appConfig

2015-03-18 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9980:
--
Attachment: HIVE-9980.3.patch

 LLAP: Pass additional JVM args via Slider appConfig
 ---

 Key: HIVE-9980
 URL: https://issues.apache.org/jira/browse/HIVE-9980
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-9980.1.patch, HIVE-9980.2.patch, HIVE-9980.3.patch


 For profiling, JMX remote ports and to attach debuggers to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10003) MiniTez ut fail with missing configs

2015-03-18 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10003:
---
Fix Version/s: llap

 MiniTez ut fail with missing configs
 

 Key: HIVE-10003
 URL: https://issues.apache.org/jira/browse/HIVE-10003
 Project: Hive
  Issue Type: Sub-task
Reporter: Gunther Hagleitner
Assignee: Siddharth Seth
 Fix For: llap

 Attachments: HIVE-10003.1.patch, HIVE-10003.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9277) Hybrid Hybrid Grace Hash Join

2015-03-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368273#comment-14368273
 ] 

Hive QA commented on HIVE-9277:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12705464/HIVE-9277.13.patch

{color:green}SUCCESS:{color} +1 7772 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3074/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3074/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3074/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12705464 - PreCommit-HIVE-TRUNK-Build

 Hybrid Hybrid Grace Hash Join
 -

 Key: HIVE-9277
 URL: https://issues.apache.org/jira/browse/HIVE-9277
 Project: Hive
  Issue Type: New Feature
  Components: Physical Optimizer
Reporter: Wei Zheng
Assignee: Wei Zheng
  Labels: join
 Attachments: HIVE-9277.01.patch, HIVE-9277.02.patch, 
 HIVE-9277.03.patch, HIVE-9277.04.patch, HIVE-9277.05.patch, 
 HIVE-9277.06.patch, HIVE-9277.07.patch, HIVE-9277.08.patch, 
 HIVE-9277.13.patch, High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf


 We are proposing an enhanced hash join algorithm called _“hybrid hybrid grace 
 hash join”_.
 We can benefit from this feature as illustrated below:
 * The query will not fail even if the estimated memory requirement is 
 slightly wrong
 * Expensive garbage collection overhead can be avoided when hash table grows
 * Join execution using a Map join operator even though the small table 
 doesn't fit in memory as spilling some data from the build and probe sides 
 will still be cheaper than having to shuffle the large fact table
 The design was based on Hadoop’s parallel processing capability and 
 significant amount of memory available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9980) LLAP: Pass additional JVM args via Slider appConfig

2015-03-18 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368069#comment-14368069
 ] 

Gopal V commented on HIVE-9980:
---

Fixed --help as well.

{code}
usage: llap
 -a,--args args java arguments to the llap instance
 -d,--directory directory   Temp directory for jars etc.
 -H,--helpPrint help information
 -i,--instances instances   Specify the number of instances to run this on
 -n,--name name Cluster name for YARN registry
{code}

The only crucial detail is that to prevent the shell script from parsing the 
args themselves, they need to be escaped in quotes with an extra prefix space.

{code}
./dist/hive/bin/hive --service llap --instances 14 --name llap1 --args  
-agentpath:/opt/perf/libperfmap.so
{code}

 LLAP: Pass additional JVM args via Slider appConfig
 ---

 Key: HIVE-9980
 URL: https://issues.apache.org/jira/browse/HIVE-9980
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-9980.1.patch, HIVE-9980.2.patch


 For profiling, JMX remote ports and to attach debuggers to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9675) Support START TRANSACTION/COMMIT/ROLLBACK commands

2015-03-18 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-9675:
-
Attachment: HIVE-9675.6.patch

partial implementation; checkpoint to run full test suite

 Support START TRANSACTION/COMMIT/ROLLBACK commands
 --

 Key: HIVE-9675
 URL: https://issues.apache.org/jira/browse/HIVE-9675
 Project: Hive
  Issue Type: Bug
  Components: SQL, Transactions
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-9675.6.patch


 Hive 0.14 added support for insert/update/delete statements with ACID 
 semantics.  Hive 0.14 only supports auto-commit mode.  We need to add support 
 for START TRANSACTION/COMMIT/ROLLBACK commands so that the user can demarcate 
 transaction boundaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others

2015-03-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368156#comment-14368156
 ] 

Hive QA commented on HIVE-7018:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12705423/HIVE-7018.2.patch

{color:green}SUCCESS:{color} +1 7771 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3073/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3073/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3073/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12705423 - PreCommit-HIVE-TRUNK-Build

 Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
 not others
 -

 Key: HIVE-7018
 URL: https://issues.apache.org/jira/browse/HIVE-7018
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Yongzhi Chen
 Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch


 It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
 column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-9990) TestMultiSessionsHS2WithLocalClusterSpark is failing

2015-03-18 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reassigned HIVE-9990:
--

Assignee: Ferdinand Xu

 TestMultiSessionsHS2WithLocalClusterSpark is failing
 

 Key: HIVE-9990
 URL: https://issues.apache.org/jira/browse/HIVE-9990
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 1.2.0
Reporter: Xuefu Zhang
Assignee: Ferdinand Xu

 At least sometimes. I can reproduce it with mvn test 
 -Dtest=TestMultiSessionsHS2WithLocalClusterSpark -Phadoop-2 consistently on 
 my local box (both trunk and spark branch).
 {code}
 ---
  T E S T S
 ---
 Running org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark
 Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 92.438 sec 
  FAILURE! - in 
 org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark
 testSparkQuery(org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark)
   Time elapsed: 21.514 sec   ERROR!
 java.util.concurrent.ExecutionException: java.sql.SQLException: Error while 
 processing statement: FAILED: Execution Error, return code 3 from 
 org.apache.hadoop.hive.ql.exec.spark.SparkTask
   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296)
   at 
 org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)
   at 
 org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.verifyResult(TestMultiSessionsHS2WithLocalClusterSpark.java:244)
   at 
 org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testKvQuery(TestMultiSessionsHS2WithLocalClusterSpark.java:220)
   at 
 org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.access$000(TestMultiSessionsHS2WithLocalClusterSpark.java:53)
 {code}
 The error was also seen in HIVE-9934 test run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10013) NPE in LLAP logs in heartbeat

2015-03-18 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10013:

Issue Type: Sub-task  (was: Bug)
Parent: HIVE-7926

 NPE in LLAP logs in heartbeat
 -

 Key: HIVE-10013
 URL: https://issues.apache.org/jira/browse/HIVE-10013
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Siddharth Seth

 {noformat}
 2015-03-18 17:28:37,559 
 [TezTaskRunner_attempt_1424502260528_1294_1_00_25_0(container_1_1294_01_26_sershe_20150318172752_5ce4647e-177c-4b1e-8dfa-462230735854:1_Map
  1_25_0)] INFO org.apache.tez.runtime.task.TezTaskRunner: Encounted an error 
 while executing task: attempt_1424502260528_1294_1_00_25_0
 java.lang.NullPointerException
   at 
 org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.access$400(TaskReporter.java:120)
   at 
 org.apache.tez.runtime.task.TaskReporter.addEvents(TaskReporter.java:386)
   at 
 org.apache.tez.runtime.task.TezTaskRunner.addEvents(TezTaskRunner.java:278)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.sendTaskGeneratedEvents(LogicalIOProcessorRuntimeTask.java:596)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:355)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 2015-03-18 17:28:37,559 
 [TezTaskRunner_attempt_1424502260528_1294_1_00_25_0(container_1_1294_01_26_sershe_20150318172752_5ce4647e-177c-4b1e-8dfa-462230735854:1_Map
  1_25_0)] INFO org.apache.tez.runtime.task.TezTaskRunner: Ignoring the 
 following exception since a previous exception is already registered
 java.lang.NullPointerException
   at 
 org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.access$300(TaskReporter.java:120)
   at 
 org.apache.tez.runtime.task.TaskReporter.taskFailed(TaskReporter.java:382)
   at 
 org.apache.tez.runtime.task.TezTaskRunner.sendFailure(TezTaskRunner.java:260)
   at 
 org.apache.tez.runtime.task.TezTaskRunner.access$600(TezTaskRunner.java:52)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:227)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10006) RSC has memory leak while execute multi queries.[Spark Branch]

2015-03-18 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-10006:
-
Attachment: HIVE-10006.1-spark.patch

 RSC has memory leak while execute multi queries.[Spark Branch]
 --

 Key: HIVE-10006
 URL: https://issues.apache.org/jira/browse/HIVE-10006
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 1.1.0
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Critical
  Labels: Spark-M5
 Attachments: HIVE-10006.1-spark.patch


 While execute query with RSC, MapWork/ReduceWork number is increased all the 
 time, and lead to OOM at the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10003) MiniTez ut fail with missing configs

2015-03-18 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10003:
---
Assignee: Gunther Hagleitner  (was: Siddharth Seth)

 MiniTez ut fail with missing configs
 

 Key: HIVE-10003
 URL: https://issues.apache.org/jira/browse/HIVE-10003
 Project: Hive
  Issue Type: Sub-task
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: llap

 Attachments: HIVE-10003.1.patch, HIVE-10003.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10011) LLAP: NegativeArraySize exception on vector string reader

2015-03-18 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10011:
---
Summary: LLAP: NegativeArraySize exception on vector string reader  (was: 
LLAP: NegativeArraySize exception on some vector string reader)

 LLAP: NegativeArraySize exception on vector string reader
 -

 Key: HIVE-10011
 URL: https://issues.apache.org/jira/browse/HIVE-10011
 Project: Hive
  Issue Type: Sub-task
Reporter: Gopal V

 With some logging, I confirmed that the String length vectors contained junk 
 data  the length field is overflowing.
 {code}
 Caused by: java.lang.NegativeArraySizeException
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.commonReadByteArrays(RecordReaderImpl.java:1550)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.readOrcByteArrays(RecordReaderImpl.java:1569)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDirectTreeReader.nextVector(RecordReaderImpl.java:1691)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.nextVector(RecordReaderImpl.java:1517)
 at 
 org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:115)
 at 
 org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:108)
 at 
 org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:35)
 at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:314)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:272)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44)
 at 
 org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
 ... 4 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9984) JoinReorder's getOutputSize is exponential

2015-03-18 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-9984:
-
Fix Version/s: 1.2.0

 JoinReorder's getOutputSize is exponential
 --

 Key: HIVE-9984
 URL: https://issues.apache.org/jira/browse/HIVE-9984
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gopal V
 Fix For: 1.2.0

 Attachments: HIVE-9984.1.patch, HIVE-9984.2.patch


 Found by [~mmokhtar]. Causes major issues in large plans (50+ joins). Simple 
 fix would be to memoize the recursion. There should also be a flag to switch 
 this opt off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10003) MiniTez ut fail with missing configs

2015-03-18 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner resolved HIVE-10003.
---
Resolution: Fixed

turned off llap mode in tez unit tests for now. also committed gopal's changes 
to how we setup the tez env.

 MiniTez ut fail with missing configs
 

 Key: HIVE-10003
 URL: https://issues.apache.org/jira/browse/HIVE-10003
 Project: Hive
  Issue Type: Sub-task
Reporter: Gunther Hagleitner
Assignee: Siddharth Seth
 Attachments: HIVE-10003.1.patch, HIVE-10003.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10005) remove some unnecessary branches from the inner loop

2015-03-18 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-10005:
--
Attachment: HIVE-10005.2.patch

 remove some unnecessary branches from the inner loop
 

 Key: HIVE-10005
 URL: https://issues.apache.org/jira/browse/HIVE-10005
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-10005.1.patch, HIVE-10005.2.patch


 Operator.forward is doing too much. There's no reason to do the done 
 checking per row and update it inline. It's much more efficient to just do 
 that when the event that completes an operator happens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10005) remove some unnecessary branches from the inner loop

2015-03-18 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368229#comment-14368229
 ] 

Gunther Hagleitner commented on HIVE-10005:
---

[~gopalv] can you take a look?

 remove some unnecessary branches from the inner loop
 

 Key: HIVE-10005
 URL: https://issues.apache.org/jira/browse/HIVE-10005
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-10005.1.patch, HIVE-10005.2.patch


 Operator.forward is doing too much. There's no reason to do the done 
 checking per row and update it inline. It's much more efficient to just do 
 that when the event that completes an operator happens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9997) minor tweaks for bytes mapjoin hash table

2015-03-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368066#comment-14368066
 ] 

Ashutosh Chauhan commented on HIVE-9997:


+1

 minor tweaks for bytes mapjoin hash table
 -

 Key: HIVE-9997
 URL: https://issues.apache.org/jira/browse/HIVE-9997
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-9997.patch


 From HIVE-7617



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9980) LLAP: Pass additional JVM args via Slider appConfig

2015-03-18 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9980:
--
Attachment: HIVE-9980.2.patch

 LLAP: Pass additional JVM args via Slider appConfig
 ---

 Key: HIVE-9980
 URL: https://issues.apache.org/jira/browse/HIVE-9980
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-9980.1.patch, HIVE-9980.2.patch


 For profiling, JMX remote ports and to attach debuggers to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9979) LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data

2015-03-18 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9979:
---
Attachment: HIVE-9979.patch

preliminary patch with some fixes

 LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data
 

 Key: HIVE-9979
 URL: https://issues.apache.org/jira/browse/HIVE-9979
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-9979.patch


 When the cache is enabled, queries throws different over-read exceptions.
 Looks like the batchSize changes as you read data, the end of stripe 
 batchSize is smaller than the default size (the super calls change it).
 {code}
 Caused by: java.io.EOFException: Can't finish byte read from uncompressed 
 stream DATA position: 262144 length: 262144 range: 0 offset: 46399488 limit: 
 46399488
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.commonReadByteArrays(RecordReaderImpl.java:1556)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.readOrcByteArrays(RecordReaderImpl.java:1569)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDirectTreeReader.nextVector(RecordReaderImpl.java:1691)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.nextVector(RecordReaderImpl.java:1517)
 at 
 org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:115)
 at 
 org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:108)
 at 
 org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:35)
 at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:314)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:280)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44)
 at 
 org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
 ... 4 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10006) RSC has memory leak while execute multi queries.[Spark Branch]

2015-03-18 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-10006:
-
Attachment: HIVE-10006.2-spark.patch

 RSC has memory leak while execute multi queries.[Spark Branch]
 --

 Key: HIVE-10006
 URL: https://issues.apache.org/jira/browse/HIVE-10006
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 1.1.0
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Critical
  Labels: Spark-M5
 Attachments: HIVE-10006.1-spark.patch, HIVE-10006.2-spark.patch


 While execute query with RSC, MapWork/ReduceWork number is increased all the 
 time, and lead to OOM at the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10005) remove some unnecessary branches from the inner loop

2015-03-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368467#comment-14368467
 ] 

Hive QA commented on HIVE-10005:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12705483/HIVE-10005.2.patch

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 7771 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_decode_name
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_special_char
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppr_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqual_corr_expr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_json_tuple
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_parse_url_tuple
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union35
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_elt
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_windowing
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3076/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3076/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3076/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12705483 - PreCommit-HIVE-TRUNK-Build

 remove some unnecessary branches from the inner loop
 

 Key: HIVE-10005
 URL: https://issues.apache.org/jira/browse/HIVE-10005
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-10005.1.patch, HIVE-10005.2.patch


 Operator.forward is doing too much. There's no reason to do the done 
 checking per row and update it inline. It's much more efficient to just do 
 that when the event that completes an operator happens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9970) Hive on spark

2015-03-18 Thread Amithsha (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368526#comment-14368526
 ] 

Amithsha commented on HIVE-9970:


Mysql error solved after updating the version But hive on spark still in error 
state 
hive version 1.1.0
Spark 1.3.0

ERROR : Failed to execute spark task, with exception 
'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark 
client.)'
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client.
at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:57)
at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:116)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:113)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:95)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.util.concurrent.TimeoutException: Timed out waiting for client connection.
at com.google.common.base.Throwables.propagate(Throwables.java:156)
at 
org.apache.hive.spark.client.SparkClientImpl.init(SparkClientImpl.java:104)
at 
org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
at 
org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.init(RemoteHiveSparkClient.java:88)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:58)
at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
... 6 more
Caused by: java.util.concurrent.ExecutionException: 
java.util.concurrent.TimeoutException: Timed out waiting for client connection.
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
at 
org.apache.hive.spark.client.SparkClientImpl.init(SparkClientImpl.java:94)
... 10 more
Caused by: java.util.concurrent.TimeoutException: Timed out waiting for client 
connection.
at org.apache.hive.spark.client.rpc.RpcServer$2.run(RpcServer.java:134)
at 
io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
at 
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:123)
at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:744)

ERROR : Failed to execute spark task, with exception 
'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark 
client.)'
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client.
at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:57)
at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:116)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:113)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:95)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.util.concurrent.TimeoutException: Timed out waiting for client connection.
at com.google.common.base.Throwables.propagate(Throwables.java:156)
at 
org.apache.hive.spark.client.SparkClientImpl.init(SparkClientImpl.java:104)
at 
org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
at 
org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.init(RemoteHiveSparkClient.java:88)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:58)
at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
... 6 more
Caused by: java.util.concurrent.ExecutionException: 
java.util.concurrent.TimeoutException: Timed out waiting for client connection.
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
at 
org.apache.hive.spark.client.SparkClientImpl.init(SparkClientImpl.java:94)

[jira] [Updated] (HIVE-10005) remove some unnecessary branches from the inner loop

2015-03-18 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-10005:
--
Attachment: HIVE-10005.3.patch

 remove some unnecessary branches from the inner loop
 

 Key: HIVE-10005
 URL: https://issues.apache.org/jira/browse/HIVE-10005
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-10005.1.patch, HIVE-10005.2.patch, 
 HIVE-10005.3.patch


 Operator.forward is doing too much. There's no reason to do the done 
 checking per row and update it inline. It's much more efficient to just do 
 that when the event that completes an operator happens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9994) Hive query plan returns sensitive data to external applications

2015-03-18 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9994:
--
Attachment: HIVE-9994.3.patch

 Hive query plan returns sensitive data to external applications
 ---

 Key: HIVE-9994
 URL: https://issues.apache.org/jira/browse/HIVE-9994
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9994.1.patch, HIVE-9994.2.patch, HIVE-9994.3.patch


 Some applications are using getQueryString() method from the QueryPlan class 
 to get the query that is being executed by Hive. The query string returned is 
 not redacted, and it is returning sensitive information that is logged in 
 Navigator.
 We need to return data redacted from the QueryPlan to avoid other 
 applications to log sensitive data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10009) LazyObjectInspectorFactory is not thread safe [Spark Branch]

2015-03-18 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-10009:
---
Fix Version/s: spark-branch

 LazyObjectInspectorFactory is not thread safe [Spark Branch]
 

 Key: HIVE-10009
 URL: https://issues.apache.org/jira/browse/HIVE-10009
 Project: Hive
  Issue Type: Bug
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch


 LazyObjectInspectorFactory is not thread safe, which causes random failures 
 in multiple thread environment such as Hive on Spark. We got exceptions like 
 below
 {noformat}
 java.lang.RuntimeException: Map operator initialization failed: 
 java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector
  cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:55)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
   at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170)
   at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170)
   at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
   at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
   at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:64)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector
  cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:154)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:199)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:355)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:92)
   ... 16 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10009) LazyObjectInspectorFactory is not thread safe [Spark Branch]

2015-03-18 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-10009:
---
Attachment: HIVE-10009.1-spark.patch

 LazyObjectInspectorFactory is not thread safe [Spark Branch]
 

 Key: HIVE-10009
 URL: https://issues.apache.org/jira/browse/HIVE-10009
 Project: Hive
  Issue Type: Bug
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10009.1-spark.patch


 LazyObjectInspectorFactory is not thread safe, which causes random failures 
 in multiple thread environment such as Hive on Spark. We got exceptions like 
 below
 {noformat}
 java.lang.RuntimeException: Map operator initialization failed: 
 java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector
  cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:55)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
   at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170)
   at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170)
   at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
   at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
   at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:64)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector
  cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:154)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:199)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:355)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:92)
   ... 16 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others

2015-03-18 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367648#comment-14367648
 ] 

Yongzhi Chen commented on HIVE-7018:


This column is introduced by 0.10.0 only for mysql, this inconsistency cause 
some customers that upgrade from 0.9 versions worried. The
side-effect other than uncomfortable is some migrate issues as Chaoyu said. For 
better supportability, we'd better fix it in the future releases. 

 Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
 not others
 -

 Key: HIVE-7018
 URL: https://issues.apache.org/jira/browse/HIVE-7018
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Yongzhi Chen
 Attachments: HIVE-7018.1.patch


 It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
 column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9994) Hive query plan returns sensitive data to external applications

2015-03-18 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9994:
--
Attachment: HIVE-9994.1.patch

 Hive query plan returns sensitive data to external applications
 ---

 Key: HIVE-9994
 URL: https://issues.apache.org/jira/browse/HIVE-9994
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9994.1.patch


 Some applications are using getQueryString() method from the QueryPlan class 
 to get the query that is being executed by Hive. The query string returned is 
 not redacted, and it is returning sensitive information that is logged in 
 Navigator.
 We need to return data redacted from the QueryPlan to avoid other 
 applications to log sensitive data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9919) upgrade scripts don't work on some auto-created DBs due to absence of tables

2015-03-18 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368032#comment-14368032
 ] 

Thejas M Nair commented on HIVE-9919:
-

+1

 upgrade scripts don't work on some auto-created DBs due to absence of tables
 

 Key: HIVE-9919
 URL: https://issues.apache.org/jira/browse/HIVE-9919
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-9919.01.patch, HIVE-9919.patch


 DataNucleus in its infinite wisdom doesn't create all tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9980) LLAP: Pass additional JVM args via Slider appConfig

2015-03-18 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9980:
--
Issue Type: Sub-task  (was: Bug)
Parent: HIVE-7926

 LLAP: Pass additional JVM args via Slider appConfig
 ---

 Key: HIVE-9980
 URL: https://issues.apache.org/jira/browse/HIVE-9980
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor

 For profiling, JMX remote ports and to attach debuggers to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9980) LLAP: Pass additional JVM args via Slider appConfig

2015-03-18 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9980:
--
Attachment: HIVE-9980.1.patch

 LLAP: Pass additional JVM args via Slider appConfig
 ---

 Key: HIVE-9980
 URL: https://issues.apache.org/jira/browse/HIVE-9980
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-9980.1.patch


 For profiling, JMX remote ports and to attach debuggers to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10009) LazyObjectInspectorFactory is not thread safe [Spark Branch]

2015-03-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367757#comment-14367757
 ] 

Hive QA commented on HIVE-10009:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12705399/HIVE-10009.1-spark.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7644 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union31
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/789/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/789/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-789/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12705399 - PreCommit-HIVE-SPARK-Build

 LazyObjectInspectorFactory is not thread safe [Spark Branch]
 

 Key: HIVE-10009
 URL: https://issues.apache.org/jira/browse/HIVE-10009
 Project: Hive
  Issue Type: Bug
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10009.1-spark.patch


 LazyObjectInspectorFactory is not thread safe, which causes random failures 
 in multiple thread environment such as Hive on Spark. We got exceptions like 
 below
 {noformat}
 java.lang.RuntimeException: Map operator initialization failed: 
 java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector
  cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:55)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
   at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170)
   at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170)
   at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
   at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
   at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:64)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector
  cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:154)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:199)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:355)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:92)
   ... 16 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9994) Hive query plan returns sensitive data to external applications

2015-03-18 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367771#comment-14367771
 ] 

Xuefu Zhang commented on HIVE-9994:
---

+1

 Hive query plan returns sensitive data to external applications
 ---

 Key: HIVE-9994
 URL: https://issues.apache.org/jira/browse/HIVE-9994
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9994.1.patch, HIVE-9994.2.patch, HIVE-9994.3.patch


 Some applications are using getQueryString() method from the QueryPlan class 
 to get the query that is being executed by Hive. The query string returned is 
 not redacted, and it is returning sensitive information that is logged in 
 Navigator.
 We need to return data redacted from the QueryPlan to avoid other 
 applications to log sensitive data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others

2015-03-18 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-7018:
---
Attachment: HIVE-7018.2.patch

 Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
 not others
 -

 Key: HIVE-7018
 URL: https://issues.apache.org/jira/browse/HIVE-7018
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Yongzhi Chen
 Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch


 It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
 column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9979) LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data

2015-03-18 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367927#comment-14367927
 ] 

Sergey Shelukhin commented on HIVE-9979:


I'm not familiar enough with the code yet, but batchSize can be 0 here (by 
mistake probably). Not sure if it could be the cause

 LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data
 

 Key: HIVE-9979
 URL: https://issues.apache.org/jira/browse/HIVE-9979
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Sergey Shelukhin

 When the cache is enabled, queries throws different over-read exceptions.
 Looks like the batchSize changes as you read data, the end of stripe 
 batchSize is smaller than the default size (the super calls change it).
 {code}
 Caused by: java.io.EOFException: Can't finish byte read from uncompressed 
 stream DATA position: 262144 length: 262144 range: 0 offset: 46399488 limit: 
 46399488
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.commonReadByteArrays(RecordReaderImpl.java:1556)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.readOrcByteArrays(RecordReaderImpl.java:1569)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDirectTreeReader.nextVector(RecordReaderImpl.java:1691)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.nextVector(RecordReaderImpl.java:1517)
 at 
 org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:115)
 at 
 org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:108)
 at 
 org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:35)
 at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:314)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:280)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44)
 at 
 org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
 ... 4 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9979) LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data

2015-03-18 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367936#comment-14367936
 ] 

Sergey Shelukhin commented on HIVE-9979:


nm, looks like that would be handled where it fails

 LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data
 

 Key: HIVE-9979
 URL: https://issues.apache.org/jira/browse/HIVE-9979
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Sergey Shelukhin

 When the cache is enabled, queries throws different over-read exceptions.
 Looks like the batchSize changes as you read data, the end of stripe 
 batchSize is smaller than the default size (the super calls change it).
 {code}
 Caused by: java.io.EOFException: Can't finish byte read from uncompressed 
 stream DATA position: 262144 length: 262144 range: 0 offset: 46399488 limit: 
 46399488
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.commonReadByteArrays(RecordReaderImpl.java:1556)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.readOrcByteArrays(RecordReaderImpl.java:1569)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDirectTreeReader.nextVector(RecordReaderImpl.java:1691)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.nextVector(RecordReaderImpl.java:1517)
 at 
 org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:115)
 at 
 org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:108)
 at 
 org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:35)
 at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:314)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:280)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44)
 at 
 org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
 ... 4 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9956) use BigDecimal.valueOf instead of new in TestFileDump

2015-03-18 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367945#comment-14367945
 ] 

Alexander Pivovarov commented on HIVE-9956:
---

I think both failed tests have no relation to patch #1.

 use BigDecimal.valueOf instead of new in TestFileDump
 -

 Key: HIVE-9956
 URL: https://issues.apache.org/jira/browse/HIVE-9956
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-9956.1.patch, HIVE-9956.1.patch


 TestFileDump builds data row where one of the column is BigDecimal
 The test adds value 2.
 There are 2 ways to create BigDecimal object.
 1. use new
 2. use valueOf
 in this particular case 
 1. new will create 2.222153
 2. valueOf will use the canonical String representation and the result will 
 be 2.
 Probably we should use valueOf to create BigDecimal object
 TestTimestampWritable and TestHCatStores use valueOf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)