[jira] [Commented] (HIVE-10005) remove some unnecessary branches from the inner loop
[ https://issues.apache.org/jira/browse/HIVE-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366978#comment-14366978 ] Hive QA commented on HIVE-10005: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12705309/HIVE-10005.1.patch {color:red}ERROR:{color} -1 due to 194 failed/errored test(s), 7770 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join17 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join19 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join21 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join24 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join26 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join33 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_filters org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_nulls org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_stats2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_if_with_path_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_spark1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_spark2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_spark3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_spark4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_7
[jira] [Commented] (HIVE-10006) RSC has memory leak while execute multi queries.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366899#comment-14366899 ] Chengxiang Li commented on HIVE-10006: -- Root Cause: In RSC, while spark call CombineHiveInputFormat::getSplits to split the job into tasks in a thread called dag-scheduler-event-loop, MapWork would be added to a ThreadLocal map of dag-scheduler-event-loop, and never get removed. As the dag-scheduler-event-loop thread is a long live and daemon thread, so all the MapWorks would be hold in the ThreadLocal map until RSC jvm crash or exit. Hive hit this issue on MR mode as well, it just lucky that the thread which calls CombineHiveInputFormat::getSplits is TaskRunner, which would be abandoned after query finished, so Hive driver does not get memory leak on this. RSC has memory leak while execute multi queries.[Spark Branch] -- Key: HIVE-10006 URL: https://issues.apache.org/jira/browse/HIVE-10006 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.1.0 Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Critical Labels: Spark-M5 While execute query with RSC, MapWork/ReduceWork number is increased all the time, and lead to OOM at the end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10005) remove some unnecessary branches from the inner loop
[ https://issues.apache.org/jira/browse/HIVE-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10005: -- Attachment: HIVE-10005.1.patch remove some unnecessary branches from the inner loop Key: HIVE-10005 URL: https://issues.apache.org/jira/browse/HIVE-10005 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-10005.1.patch Operator.forward is doing too much. There's no reason to do the done checking per row and update it inline. It's much more efficient to just do that when the event that completes an operator happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10003) MiniTez ut fail with missing configs
[ https://issues.apache.org/jira/browse/HIVE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366871#comment-14366871 ] Gopal V commented on HIVE-10003: [~hagleitn]: The LLAP mode is turned on for MiniTez tests as well. https://github.com/apache/hive/blob/llap/data/conf/tez/hive-site.xml#L43 MiniTez ut fail with missing configs Key: HIVE-10003 URL: https://issues.apache.org/jira/browse/HIVE-10003 Project: Hive Issue Type: Sub-task Reporter: Gunther Hagleitner Assignee: Gopal V Attachments: HIVE-10003.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10003) MiniTez ut fail with missing configs
[ https://issues.apache.org/jira/browse/HIVE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366904#comment-14366904 ] Gopal V commented on HIVE-10003: [~sseth]: I tried copying your default llap-daemon-site.xml from src/test/resources into the data/conf/tez directory to match the locations, but that doesn't work at all. Looks like this config file is not getting added to the MiniTez cluster in the test-cases. Can take a look at this later, but assigned to you since you might know whether we can run anything with a MiniTez without an Llap daemon attached to it, even if I configure localhost into the hosts list? MiniTez ut fail with missing configs Key: HIVE-10003 URL: https://issues.apache.org/jira/browse/HIVE-10003 Project: Hive Issue Type: Sub-task Reporter: Gunther Hagleitner Assignee: Siddharth Seth Attachments: HIVE-10003.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10003) MiniTez ut fail with missing configs
[ https://issues.apache.org/jira/browse/HIVE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366889#comment-14366889 ] Gopal V commented on HIVE-10003: I had to rebase to reproduce this error, looks like this came in as part of HIVE-. MiniTez ut fail with missing configs Key: HIVE-10003 URL: https://issues.apache.org/jira/browse/HIVE-10003 Project: Hive Issue Type: Sub-task Reporter: Gunther Hagleitner Assignee: Siddharth Seth Attachments: HIVE-10003.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10003) MiniTez ut fail with missing configs
[ https://issues.apache.org/jira/browse/HIVE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10003: --- Assignee: Siddharth Seth (was: Gopal V) MiniTez ut fail with missing configs Key: HIVE-10003 URL: https://issues.apache.org/jira/browse/HIVE-10003 Project: Hive Issue Type: Sub-task Reporter: Gunther Hagleitner Assignee: Siddharth Seth Attachments: HIVE-10003.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9970) Hive on spark
[ https://issues.apache.org/jira/browse/HIVE-9970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366945#comment-14366945 ] Amithsha commented on HIVE-9970: And also while using beeline i am getting this error 2015-03-18 16:03:21,458 ERROR [pool-3-thread-8]: DataNucleus.Datastore (Log4JLogger.java:error(115)) - An exception was thrown while adding/validating class(es) : Specified key was too long; max key length is 767 bytes com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.Util.getInstance(Util.java:386) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1054) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4237) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4169) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2617) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2778) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2819) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2768) at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:949) at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:795) at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:254) at org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:760) at org.datanucleus.store.rdbms.table.TableImpl.createIndices(TableImpl.java:648) at org.datanucleus.store.rdbms.table.TableImpl.validateIndices(TableImpl.java:593) at org.datanucleus.store.rdbms.table.TableImpl.validateConstraints(TableImpl.java:390) at org.datanucleus.store.rdbms.table.ClassTable.validateConstraints(ClassTable.java:3463) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3464) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:3190) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2841) at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:122) at org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:1605) at org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:954) at org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:679) at org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:408) at org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:947) at org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:370) at org.datanucleus.store.query.Query.executeQuery(Query.java:1744) at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672) at org.datanucleus.store.query.Query.execute(Query.java:1654) at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:221) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.ensureDbInit(MetaStoreDirectSql.java:172) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.init(MetaStoreDirectSql.java:130) at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:275) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:238) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:56) at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:65) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:579) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:557) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:933) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:907) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
[jira] [Updated] (HIVE-9664) Hive add jar command should be able to download and add jars from a repository
[ https://issues.apache.org/jira/browse/HIVE-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anant Nag updated HIVE-9664: Attachment: HIVE-9664.patch Hive add jar command should be able to download and add jars from a repository Key: HIVE-9664 URL: https://issues.apache.org/jira/browse/HIVE-9664 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Anant Nag Assignee: Anant Nag Labels: hive, patch Attachments: HIVE-9664.patch, HIVE-9664.patch, HIVE-9664.patch Currently Hive's add jar command takes a local path to the dependency jar. This clutters the local file-system as users may forget to remove this jar later It would be nice if Hive supported a Gradle like notation to download the jar from a repository. Example: add jar org:module:version It should also be backward compatible and should take jar from the local file-system as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9970) Hive on spark
[ https://issues.apache.org/jira/browse/HIVE-9970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367048#comment-14367048 ] Amithsha commented on HIVE-9970: mysql version Server version: 5.1.73-log Source distribution Hive on spark - Key: HIVE-9970 URL: https://issues.apache.org/jira/browse/HIVE-9970 Project: Hive Issue Type: Bug Reporter: Amithsha Hi all, Recently i have configured Spark 1.2.0 and my environment is hadoop 2.6.0 hive 1.1.0 Here i have tried hive on Spark while executing insert into i am getting the following g error. Query ID = hadoop2_20150313162828_8764adad-a8e4-49da-9ef5-35e4ebd6bc63 Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask Have added the spark-assembly jar in hive lib And also in hive console using the command add jar followed by the steps set spark.home=/opt/spark-1.2.1/; add jar /opt/spark-1.2.1/assembly/target/scala-2.10/spark-assembly-1.2.1-hadoop2.4.0.jar; set hive.execution.engine=spark; set spark.master=spark://xxx:7077; set spark.eventLog.enabled=true; set spark.executor.memory=512m; set spark.serializer=org.apache.spark.serializer.KryoSerializer; Can anyone suggest Thanks Regards Amithsha -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367107#comment-14367107 ] Chao commented on HIVE-9697: [~lirui]: Yes, MR doesn't use stats. ContentSummary is the file size, so maybe that can (partially) explain why MR is more optimistic? I briefly looked at the code. Looks like hive.stats.collect.rawdatasize controls whether a virtual column for rawDataSize is added to a TableScanDesc. This is later used in TableScanOperator#gatherStats, to determine whether rawDataSize will be collected. This property is set to true in default. But I haven't found out the relationship between stats collected through TableScanOperator#gatherStats and Statistics used by map join. Will investigate more later. Hive on Spark is not as aggressive as MR on map join [Spark Branch] --- Key: HIVE-9697 URL: https://issues.apache.org/jira/browse/HIVE-9697 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xin Hao We have a finding during running some Big-Bench cases: when the same small table size threshold is used, Map Join operator will not be generated in Stage Plans for Hive on Spark, while will be generated for Hive on MR. For example, When we run BigBench Q25, the meta info of one input ORC table is as below: totalSize=1748955 (about 1.5M) rawDataSize=123050375 (about 120M) If we use the following parameter settings, set hive.auto.convert.join=true; set hive.mapjoin.smalltable.filesize=2500; set hive.auto.convert.join.noconditionaltask=true; set hive.auto.convert.join.noconditionaltask.size=1; (100M) Map Join will be enabled for Hive on MR mode, while will not be enabled for Hive on Spark. We found that for Hive on MR, the HDFS file size for the table (ContentSummary.getLength(), should approximate the value of ‘totalSize’) will be used to compare with the threshold 100M (smaller than 100M), while for Hive on Spark 'rawDataSize' will be used to compare with the threshold 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark for this case. And as a result Hive on Spark will get much lower performance data than Hive on MR for this case. When we set hive.auto.convert.join.noconditionaltask.size=15000; (150M), MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have similar performance data with Hive on MR by then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-9962) JsonSerDe does not support reader schema different from data schema
[ https://issues.apache.org/jira/browse/HIVE-9962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam resolved HIVE-9962. - Resolution: Fixed JsonSerDe does not support reader schema different from data schema --- Key: HIVE-9962 URL: https://issues.apache.org/jira/browse/HIVE-9962 Project: Hive Issue Type: Improvement Components: HCatalog, Serializers/Deserializers Reporter: Johndee Burks Assignee: Naveen Gangam Priority: Minor To reproduce the limitation do the following. Create a two tables the first with full schema and the second with partial schema. {code} add jar /opt/cloudera/parcels/CDH/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar; CREATE TABLE json_full (autopolicy structis_active:boolean, policy_holder_name:string, policy_num:string, vehicle:structbrand:structmodel:string, year:int, price:double, vin:string) ROW FORMAT SERDE 'org.apache.hcatalog.data.JsonSerDe'; CREATE TABLE json_part (autopolicy structis_active:boolean, policy_holder_name:string, policy_num:string, vehicle:structbrand:structmodel:string, year:int, price:double) ROW FORMAT SERDE 'org.apache.hcatalog.data.JsonSerDe'; {code} The data for the table is below: {code} {autopolicy: {policy_holder_name: someone, policy_num: 20141012, is_active: true, vehicle: {brand: {model: Lexus, year: 2012}, vin: RANDOM123, price: 23450.50}}} {code} I put that data into a file and load it into the tables like this: {code} load data local inpath 'data.json' into table json_full; load data local inpath 'data.json' into table json_part; {code} Then do a select against each table: {code} select * from json_full; select * from json_part; {code} The second select should fail with an error simlar to that below: {code} 15/03/12 23:19:30 [main]: ERROR CliDriver: Failed with exception java.io.IOException:java.lang.NullPointerException {code} The code that throws this error is below: {code} 172 private void populateRecord(ListObject r, JsonToken token, JsonParser p, HCatSchema s) throws IOException { 173 if (token != JsonToken.FIELD_NAME) { 174 throw new IOException(Field name expected); 175 } 176 String fieldName = p.getText(); 177 int fpos; 178 try { 179 fpos = s.getPosition(fieldName); 180 } catch (NullPointerException npe) { 181 fpos = getPositionFromHiveInternalColumnName(fieldName); 182 LOG.debug(NPE finding position for field [{}] in schema [{}], fieldName, s); 183 if (!fieldName.equalsIgnoreCase(getHiveInternalColumnName(fpos))) { 184 LOG.error(Hive internal column name {} and position 185 + encoding {} for the column name are at odds, fieldName, fpos); 186 throw npe; 187 } 188 if (fpos == -1) { 189 return; // unknown field, we return. 190 } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9962) JsonSerDe does not support reader schema different from data schema
[ https://issues.apache.org/jira/browse/HIVE-9962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367211#comment-14367211 ] Naveen Gangam commented on HIVE-9962: - I believe this has already been addressed via HIVE-6166 in hive 0.13 release. I have tested it and it appears to be working. The SerDe class being used above, org.apache.hcatalog.data.JsonSerDe, no longer exists. Please use the class from org.apache.hive.hcatalog.data.JsonSerDe package. The table definition should look like this {code} CREATE TABLE json_part (autopolicy structis_active:boolean, policy_holder_name:string, policy_num:string, vehicle:structbrand:structmodel:string, year:int, price:double) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'; {code} Closing this JIRA as duplicate. Please re-open if you have concerns. JsonSerDe does not support reader schema different from data schema --- Key: HIVE-9962 URL: https://issues.apache.org/jira/browse/HIVE-9962 Project: Hive Issue Type: Improvement Components: HCatalog, Serializers/Deserializers Reporter: Johndee Burks Assignee: Naveen Gangam Priority: Minor To reproduce the limitation do the following. Create a two tables the first with full schema and the second with partial schema. {code} add jar /opt/cloudera/parcels/CDH/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar; CREATE TABLE json_full (autopolicy structis_active:boolean, policy_holder_name:string, policy_num:string, vehicle:structbrand:structmodel:string, year:int, price:double, vin:string) ROW FORMAT SERDE 'org.apache.hcatalog.data.JsonSerDe'; CREATE TABLE json_part (autopolicy structis_active:boolean, policy_holder_name:string, policy_num:string, vehicle:structbrand:structmodel:string, year:int, price:double) ROW FORMAT SERDE 'org.apache.hcatalog.data.JsonSerDe'; {code} The data for the table is below: {code} {autopolicy: {policy_holder_name: someone, policy_num: 20141012, is_active: true, vehicle: {brand: {model: Lexus, year: 2012}, vin: RANDOM123, price: 23450.50}}} {code} I put that data into a file and load it into the tables like this: {code} load data local inpath 'data.json' into table json_full; load data local inpath 'data.json' into table json_part; {code} Then do a select against each table: {code} select * from json_full; select * from json_part; {code} The second select should fail with an error simlar to that below: {code} 15/03/12 23:19:30 [main]: ERROR CliDriver: Failed with exception java.io.IOException:java.lang.NullPointerException {code} The code that throws this error is below: {code} 172 private void populateRecord(ListObject r, JsonToken token, JsonParser p, HCatSchema s) throws IOException { 173 if (token != JsonToken.FIELD_NAME) { 174 throw new IOException(Field name expected); 175 } 176 String fieldName = p.getText(); 177 int fpos; 178 try { 179 fpos = s.getPosition(fieldName); 180 } catch (NullPointerException npe) { 181 fpos = getPositionFromHiveInternalColumnName(fieldName); 182 LOG.debug(NPE finding position for field [{}] in schema [{}], fieldName, s); 183 if (!fieldName.equalsIgnoreCase(getHiveInternalColumnName(fpos))) { 184 LOG.error(Hive internal column name {} and position 185 + encoding {} for the column name are at odds, fieldName, fpos); 186 throw npe; 187 } 188 if (fpos == -1) { 189 return; // unknown field, we return. 190 } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9523) For partitioned tables same optimizations should be available as for bucketed tables and vice versa: ①[Sort Merge] PARTITION Map join and ②BUCKET pruning
[ https://issues.apache.org/jira/browse/HIVE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciek Kocon updated HIVE-9523: --- Description: Logically and functionally bucketing and partitioning are quite similar - both provide mechanism to segregate and separate the table's data based on its content. Thanks to that significant further optimisations like [partition] PRUNING or [bucket] MAP JOIN are possible. The difference seems to be imposed by design where the PARTITIONing is open/explicit while BUCKETing is discrete/implicit. Partitioning seems to be very common if not a standard feature in all current RDBMS while BUCKETING seems to be HIVE specific only. In a way BUCKETING could be also called by hashing or simply IMPLICIT PARTITIONING. Regardless of the fact that these two are recognised as two separate features available in Hive there should be nothing to prevent leveraging same existing query/join optimisations across the two. ①[Sort Merge] PARTITION Map join Enable Bucket Map Join or better, the Sort Merge Bucket Map Join equivalent optimisations when PARTITIONING is used exclusively or in combination with BUCKETING. For JOIN conditions where partitioning criteria are used respectively: ⋮ FROM TabA JOIN TabB ON TabA.partCol1 = TabB.partCol2 AND TabA.partCol2 = TabB.partCol2 the optimizer could/should choose to treat it the same way as with bucketed tables: ⋮ FROM TabC JOIN TabD ON TabC.clusteredByCol1 = TabD.clusteredByCol2 AND TabC.clusteredByCol2 = TabD.clusteredByCol2 and use either Bucket Map Join or better, the Sort Merge Bucket Map Join. This is based on fact that same way as buckets translate to separate files, the partitions essentially provide the same mapping. When data locality is known the optimizer could focus only on joining corresponding partitions rather than whole data sets. ②BUCKET pruning Enable partition PRUNING equivalent optimisation for queries on BUCKETED tables Simplest example is for queries like: SELECT … FROM x WHERE colA=123123 to read only the relevant bucket file rather than all file-buckets that belong to a table. was: For JOIN conditions where partitioning criteria are used respectively: ⋮ FROM TabA JOIN TabB ON TabA.partCol1 = TabB.partCol2 AND TabA.partCol2 = TabB.partCol2 the optimizer could/should choose to treat it the same way as with bucketed tables: ⋮ FROM TabC JOIN TabD ON TabC.clusteredByCol1 = TabD.clusteredByCol2 AND TabC.clusteredByCol2 = TabD.clusteredByCol2 and use either Bucket Map Join or better, the Sort Merge Bucket Map Join. This is based on fact that same way as buckets translate to separate files, the partitions essentially provide the same mapping. When data locality is known the optimizer could focus only on joining corresponding partitions rather than whole data sets. #side notes: ⦿ Currently Table DDL Syntax where Partitioning and Bucketing defined at the same time is allowed: CREATE TABLE ⋮ PARTITIONED BY(…) CLUSTERED BY(…) INTO … BUCKETS; But in this case optimizer never chooses to use Bucket Map Join or Sort Merge Bucket Map Join which defeats the purpose of creating BUCKETed tables in such scenarios. Should that be raised as a separate BUG? ⦿ Currently partitioning and bucketing are two separate things but serve same purpose - shouldn't the concept be merged (explicit/implicit partitions?) Affects Version/s: 1.1.0 1.0.0 Summary: For partitioned tables same optimizations should be available as for bucketed tables and vice versa: ①[Sort Merge] PARTITION Map join and ②BUCKET pruning (was: when columns on which tables are partitioned are used in the join condition same join optimizations as for bucketed tables should be applied) For partitioned tables same optimizations should be available as for bucketed tables and vice versa: ①[Sort Merge] PARTITION Map join and ②BUCKET pruning - Key: HIVE-9523 URL: https://issues.apache.org/jira/browse/HIVE-9523 Project: Hive Issue Type: Improvement Components: Logical Optimizer, Physical Optimizer, SQL Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.0 Reporter: Maciek Kocon Labels: gsoc2015 Logically and functionally bucketing and partitioning are quite similar - both provide mechanism to segregate and separate the table's data based on its content. Thanks to that significant further optimisations like [partition] PRUNING or [bucket] MAP JOIN are possible. The difference seems to be imposed by design where the PARTITIONing is open/explicit while BUCKETing is discrete/implicit. Partitioning seems to be very common if
[jira] [Assigned] (HIVE-9828) Semantic analyzer does not capture view parent entity for tables referred in view with union all
[ https://issues.apache.org/jira/browse/HIVE-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar reassigned HIVE-9828: - Assignee: Prasad Mujumdar Semantic analyzer does not capture view parent entity for tables referred in view with union all - Key: HIVE-9828 URL: https://issues.apache.org/jira/browse/HIVE-9828 Project: Hive Issue Type: Bug Components: Parser Affects Versions: 1.1.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 1.2.0 Attachments: HIVE-9828.1-npf.patch, HIVE-9828.1-npf.patch Hive compiler adds tables used in a view definition in the input entity list, with the view as parent entity for the table. In case of a view with union all query, this is not being done property. For example, {noformat} create view view1 as select t.id from (select tab1.id from db.tab1 union all select tab2.id from db.tab2 ) t; {noformat} This query will capture tab1 and tab2 as read entity without view1 as parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10008) Need to refactor itests for hbase metastore [hbase-metastore branch]
[ https://issues.apache.org/jira/browse/HIVE-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-10008: -- Attachment: HIVE-10008.2.patch Previous patch accidentally omitted one commit. Need to refactor itests for hbase metastore [hbase-metastore branch] Key: HIVE-10008 URL: https://issues.apache.org/jira/browse/HIVE-10008 Project: Hive Issue Type: Task Components: Tests Affects Versions: hbase-metastore-branch Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-10008.2.patch, HIVE-10008.patch Much of the infrastructure for the itest/hive-unit/.../metastore/hbase tests is repeated in each test. This needs to be factored out into a base class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10009) LazyObjectInspectorFactory is not thread safe [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-10009: --- Affects Version/s: spark-branch LazyObjectInspectorFactory is not thread safe [Spark Branch] Key: HIVE-10009 URL: https://issues.apache.org/jira/browse/HIVE-10009 Project: Hive Issue Type: Bug Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch LazyObjectInspectorFactory is not thread safe, which causes random failures in multiple thread environment such as Hive on Spark. We got exceptions like below {noformat} java.lang.RuntimeException: Map operator initialization failed: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:55) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:154) at org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:199) at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:355) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:92) ... 16 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9979) LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data
[ https://issues.apache.org/jira/browse/HIVE-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-9979: -- Assignee: Sergey Shelukhin (was: Prasanth Jayachandran) LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data Key: HIVE-9979 URL: https://issues.apache.org/jira/browse/HIVE-9979 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Sergey Shelukhin When the cache is enabled, queries throws different over-read exceptions. Looks like the batchSize changes as you read data, the end of stripe batchSize is smaller than the default size (the super calls change it). {code} Caused by: java.io.EOFException: Can't finish byte read from uncompressed stream DATA position: 262144 length: 262144 range: 0 offset: 46399488 limit: 46399488 at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.commonReadByteArrays(RecordReaderImpl.java:1556) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.readOrcByteArrays(RecordReaderImpl.java:1569) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDirectTreeReader.nextVector(RecordReaderImpl.java:1691) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.nextVector(RecordReaderImpl.java:1517) at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:115) at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:108) at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:35) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:314) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:280) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) ... 4 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9919) upgrade scripts don't work on some auto-created DBs due to absence of tables
[ https://issues.apache.org/jira/browse/HIVE-9919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367861#comment-14367861 ] Sergey Shelukhin commented on HIVE-9919: [~thejas] can you take a look at the new patch? thanks upgrade scripts don't work on some auto-created DBs due to absence of tables Key: HIVE-9919 URL: https://issues.apache.org/jira/browse/HIVE-9919 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-9919.01.patch, HIVE-9919.patch DataNucleus in its infinite wisdom doesn't create all tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9994) Hive query plan returns sensitive data to external applications
[ https://issues.apache.org/jira/browse/HIVE-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367882#comment-14367882 ] Hive QA commented on HIVE-9994: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12705411/HIVE-9994.1.patch {color:green}SUCCESS:{color} +1 7771 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3071/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3071/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3071/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12705411 - PreCommit-HIVE-TRUNK-Build Hive query plan returns sensitive data to external applications --- Key: HIVE-9994 URL: https://issues.apache.org/jira/browse/HIVE-9994 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9994.1.patch, HIVE-9994.2.patch, HIVE-9994.3.patch Some applications are using getQueryString() method from the QueryPlan class to get the query that is being executed by Hive. The query string returned is not redacted, and it is returning sensitive information that is logged in Navigator. We need to return data redacted from the QueryPlan to avoid other applications to log sensitive data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9994) Hive query plan returns sensitive data to external applications
[ https://issues.apache.org/jira/browse/HIVE-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367683#comment-14367683 ] Xuefu Zhang commented on HIVE-9994: --- Patch looks good. One question: do we need to check null for the input in redactLogString() as it's a public method? Hive query plan returns sensitive data to external applications --- Key: HIVE-9994 URL: https://issues.apache.org/jira/browse/HIVE-9994 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9994.1.patch Some applications are using getQueryString() method from the QueryPlan class to get the query that is being executed by Hive. The query string returned is not redacted, and it is returning sensitive information that is logged in Navigator. We need to return data redacted from the QueryPlan to avoid other applications to log sensitive data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9480) Build UDF TRUNC to implement FIRST_DAY as compared with LAST_DAY
[ https://issues.apache.org/jira/browse/HIVE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-9480: - Labels: (was: TODOC1.2) Build UDF TRUNC to implement FIRST_DAY as compared with LAST_DAY Key: HIVE-9480 URL: https://issues.apache.org/jira/browse/HIVE-9480 Project: Hive Issue Type: Improvement Components: UDF Reporter: Xiaobing Zhou Assignee: Xiaobing Zhou Fix For: 1.2.0 Attachments: HIVE-9480.1.patch, HIVE-9480.3.patch, HIVE-9480.4.patch, HIVE-9480.5.patch, HIVE-9480.6.patch, HIVE-9480.7.patch, HIVE-9480.8.patch, HIVE-9480.9.patch Hive already supports LAST_DAY UDF, in some cases, FIRST_DAY is necessary to do date/timestamp related computation. This JIRA is to track such an implementation. Choose to impl TRUNC, a more standard way to get the first day of a a month, e.g., SELECT TRUNC('2009-12-12', 'MM'); will return 2009-12-01, SELECT TRUNC('2009-12-12', 'YEAR'); will return 2009-01-01. BTW, this TRUNC is not as feature complete as aligned with Oracle one. only 'MM' and 'YEAR' are supported as format, however, it's a base to add on other formats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10002) fix yarn service registry not found in ut problem
[ https://issues.apache.org/jira/browse/HIVE-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10002: -- Attachment: HIVE-10002.1.patch fix yarn service registry not found in ut problem - Key: HIVE-10002 URL: https://issues.apache.org/jira/browse/HIVE-10002 Project: Hive Issue Type: Sub-task Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-10002.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9956) use BigDecimal.valueOf instead of new in TestFileDump
[ https://issues.apache.org/jira/browse/HIVE-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366756#comment-14366756 ] Hive QA commented on HIVE-9956: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12705278/HIVE-9956.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7770 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3068/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3068/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3068/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12705278 - PreCommit-HIVE-TRUNK-Build use BigDecimal.valueOf instead of new in TestFileDump - Key: HIVE-9956 URL: https://issues.apache.org/jira/browse/HIVE-9956 Project: Hive Issue Type: Bug Components: File Formats Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-9956.1.patch, HIVE-9956.1.patch TestFileDump builds data row where one of the column is BigDecimal The test adds value 2. There are 2 ways to create BigDecimal object. 1. use new 2. use valueOf in this particular case 1. new will create 2.222153 2. valueOf will use the canonical String representation and the result will be 2. Probably we should use valueOf to create BigDecimal object TestTimestampWritable and TestHCatStores use valueOf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10003) MiniTez ut fail with missing configs
[ https://issues.apache.org/jira/browse/HIVE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10003: --- Attachment: HIVE-10003.1.patch MiniTez ut fail with missing configs Key: HIVE-10003 URL: https://issues.apache.org/jira/browse/HIVE-10003 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gopal V Attachments: HIVE-10003.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10003) MiniTez ut fail with missing configs
[ https://issues.apache.org/jira/browse/HIVE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10003: -- Issue Type: Sub-task (was: Bug) Parent: HIVE-7926 MiniTez ut fail with missing configs Key: HIVE-10003 URL: https://issues.apache.org/jira/browse/HIVE-10003 Project: Hive Issue Type: Sub-task Reporter: Gunther Hagleitner Assignee: Gopal V Attachments: HIVE-10003.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10004) yarn service registry should be shim'd
[ https://issues.apache.org/jira/browse/HIVE-10004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10004: -- Issue Type: Sub-task (was: Bug) Parent: HIVE-7926 yarn service registry should be shim'd -- Key: HIVE-10004 URL: https://issues.apache.org/jira/browse/HIVE-10004 Project: Hive Issue Type: Sub-task Reporter: Gunther Hagleitner Assignee: Gopal V -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10004) yarn service registry should be an optional dependency
[ https://issues.apache.org/jira/browse/HIVE-10004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10004: --- Summary: yarn service registry should be an optional dependency (was: yarn service registry should be shim'd) yarn service registry should be an optional dependency -- Key: HIVE-10004 URL: https://issues.apache.org/jira/browse/HIVE-10004 Project: Hive Issue Type: Sub-task Reporter: Gunther Hagleitner Assignee: Gopal V -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9277) Hybrid Hybrid Grace Hash Join
[ https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-9277: Attachment: HIVE-9277.13.patch WIP, upload 13th patch for tesing Hybrid Hybrid Grace Hash Join - Key: HIVE-9277 URL: https://issues.apache.org/jira/browse/HIVE-9277 Project: Hive Issue Type: New Feature Components: Physical Optimizer Reporter: Wei Zheng Assignee: Wei Zheng Labels: join Attachments: HIVE-9277.01.patch, HIVE-9277.02.patch, HIVE-9277.03.patch, HIVE-9277.04.patch, HIVE-9277.05.patch, HIVE-9277.06.patch, HIVE-9277.07.patch, HIVE-9277.08.patch, HIVE-9277.13.patch, High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf We are proposing an enhanced hash join algorithm called _“hybrid hybrid grace hash join”_. We can benefit from this feature as illustrated below: * The query will not fail even if the estimated memory requirement is slightly wrong * Expensive garbage collection overhead can be avoided when hash table grows * Join execution using a Map join operator even though the small table doesn't fit in memory as spilling some data from the build and probe sides will still be cheaper than having to shuffle the large fact table The design was based on Hadoop’s parallel processing capability and significant amount of memory available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9980) LLAP: Pass additional JVM args via Slider appConfig
[ https://issues.apache.org/jira/browse/HIVE-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-9980: -- Attachment: HIVE-9980.3.patch LLAP: Pass additional JVM args via Slider appConfig --- Key: HIVE-9980 URL: https://issues.apache.org/jira/browse/HIVE-9980 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-9980.1.patch, HIVE-9980.2.patch, HIVE-9980.3.patch For profiling, JMX remote ports and to attach debuggers to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10003) MiniTez ut fail with missing configs
[ https://issues.apache.org/jira/browse/HIVE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10003: --- Fix Version/s: llap MiniTez ut fail with missing configs Key: HIVE-10003 URL: https://issues.apache.org/jira/browse/HIVE-10003 Project: Hive Issue Type: Sub-task Reporter: Gunther Hagleitner Assignee: Siddharth Seth Fix For: llap Attachments: HIVE-10003.1.patch, HIVE-10003.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9277) Hybrid Hybrid Grace Hash Join
[ https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368273#comment-14368273 ] Hive QA commented on HIVE-9277: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12705464/HIVE-9277.13.patch {color:green}SUCCESS:{color} +1 7772 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3074/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3074/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3074/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12705464 - PreCommit-HIVE-TRUNK-Build Hybrid Hybrid Grace Hash Join - Key: HIVE-9277 URL: https://issues.apache.org/jira/browse/HIVE-9277 Project: Hive Issue Type: New Feature Components: Physical Optimizer Reporter: Wei Zheng Assignee: Wei Zheng Labels: join Attachments: HIVE-9277.01.patch, HIVE-9277.02.patch, HIVE-9277.03.patch, HIVE-9277.04.patch, HIVE-9277.05.patch, HIVE-9277.06.patch, HIVE-9277.07.patch, HIVE-9277.08.patch, HIVE-9277.13.patch, High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf We are proposing an enhanced hash join algorithm called _“hybrid hybrid grace hash join”_. We can benefit from this feature as illustrated below: * The query will not fail even if the estimated memory requirement is slightly wrong * Expensive garbage collection overhead can be avoided when hash table grows * Join execution using a Map join operator even though the small table doesn't fit in memory as spilling some data from the build and probe sides will still be cheaper than having to shuffle the large fact table The design was based on Hadoop’s parallel processing capability and significant amount of memory available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9980) LLAP: Pass additional JVM args via Slider appConfig
[ https://issues.apache.org/jira/browse/HIVE-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368069#comment-14368069 ] Gopal V commented on HIVE-9980: --- Fixed --help as well. {code} usage: llap -a,--args args java arguments to the llap instance -d,--directory directory Temp directory for jars etc. -H,--helpPrint help information -i,--instances instances Specify the number of instances to run this on -n,--name name Cluster name for YARN registry {code} The only crucial detail is that to prevent the shell script from parsing the args themselves, they need to be escaped in quotes with an extra prefix space. {code} ./dist/hive/bin/hive --service llap --instances 14 --name llap1 --args -agentpath:/opt/perf/libperfmap.so {code} LLAP: Pass additional JVM args via Slider appConfig --- Key: HIVE-9980 URL: https://issues.apache.org/jira/browse/HIVE-9980 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-9980.1.patch, HIVE-9980.2.patch For profiling, JMX remote ports and to attach debuggers to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9675) Support START TRANSACTION/COMMIT/ROLLBACK commands
[ https://issues.apache.org/jira/browse/HIVE-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-9675: - Attachment: HIVE-9675.6.patch partial implementation; checkpoint to run full test suite Support START TRANSACTION/COMMIT/ROLLBACK commands -- Key: HIVE-9675 URL: https://issues.apache.org/jira/browse/HIVE-9675 Project: Hive Issue Type: Bug Components: SQL, Transactions Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-9675.6.patch Hive 0.14 added support for insert/update/delete statements with ACID semantics. Hive 0.14 only supports auto-commit mode. We need to add support for START TRANSACTION/COMMIT/ROLLBACK commands so that the user can demarcate transaction boundaries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others
[ https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368156#comment-14368156 ] Hive QA commented on HIVE-7018: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12705423/HIVE-7018.2.patch {color:green}SUCCESS:{color} +1 7771 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3073/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3073/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3073/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12705423 - PreCommit-HIVE-TRUNK-Build Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others - Key: HIVE-7018 URL: https://issues.apache.org/jira/browse/HIVE-7018 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Yongzhi Chen Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch It appears that at least postgres and oracle do not have the LINK_TARGET_ID column while mysql does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9990) TestMultiSessionsHS2WithLocalClusterSpark is failing
[ https://issues.apache.org/jira/browse/HIVE-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu reassigned HIVE-9990: -- Assignee: Ferdinand Xu TestMultiSessionsHS2WithLocalClusterSpark is failing Key: HIVE-9990 URL: https://issues.apache.org/jira/browse/HIVE-9990 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 1.2.0 Reporter: Xuefu Zhang Assignee: Ferdinand Xu At least sometimes. I can reproduce it with mvn test -Dtest=TestMultiSessionsHS2WithLocalClusterSpark -Phadoop-2 consistently on my local box (both trunk and spark branch). {code} --- T E S T S --- Running org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 92.438 sec FAILURE! - in org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark testSparkQuery(org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark) Time elapsed: 21.514 sec ERROR! java.util.concurrent.ExecutionException: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296) at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392) at org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.verifyResult(TestMultiSessionsHS2WithLocalClusterSpark.java:244) at org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testKvQuery(TestMultiSessionsHS2WithLocalClusterSpark.java:220) at org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.access$000(TestMultiSessionsHS2WithLocalClusterSpark.java:53) {code} The error was also seen in HIVE-9934 test run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10013) NPE in LLAP logs in heartbeat
[ https://issues.apache.org/jira/browse/HIVE-10013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10013: Issue Type: Sub-task (was: Bug) Parent: HIVE-7926 NPE in LLAP logs in heartbeat - Key: HIVE-10013 URL: https://issues.apache.org/jira/browse/HIVE-10013 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Siddharth Seth {noformat} 2015-03-18 17:28:37,559 [TezTaskRunner_attempt_1424502260528_1294_1_00_25_0(container_1_1294_01_26_sershe_20150318172752_5ce4647e-177c-4b1e-8dfa-462230735854:1_Map 1_25_0)] INFO org.apache.tez.runtime.task.TezTaskRunner: Encounted an error while executing task: attempt_1424502260528_1294_1_00_25_0 java.lang.NullPointerException at org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.access$400(TaskReporter.java:120) at org.apache.tez.runtime.task.TaskReporter.addEvents(TaskReporter.java:386) at org.apache.tez.runtime.task.TezTaskRunner.addEvents(TezTaskRunner.java:278) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.sendTaskGeneratedEvents(LogicalIOProcessorRuntimeTask.java:596) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:355) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-03-18 17:28:37,559 [TezTaskRunner_attempt_1424502260528_1294_1_00_25_0(container_1_1294_01_26_sershe_20150318172752_5ce4647e-177c-4b1e-8dfa-462230735854:1_Map 1_25_0)] INFO org.apache.tez.runtime.task.TezTaskRunner: Ignoring the following exception since a previous exception is already registered java.lang.NullPointerException at org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.access$300(TaskReporter.java:120) at org.apache.tez.runtime.task.TaskReporter.taskFailed(TaskReporter.java:382) at org.apache.tez.runtime.task.TezTaskRunner.sendFailure(TezTaskRunner.java:260) at org.apache.tez.runtime.task.TezTaskRunner.access$600(TezTaskRunner.java:52) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:227) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10006) RSC has memory leak while execute multi queries.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-10006: - Attachment: HIVE-10006.1-spark.patch RSC has memory leak while execute multi queries.[Spark Branch] -- Key: HIVE-10006 URL: https://issues.apache.org/jira/browse/HIVE-10006 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.1.0 Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Critical Labels: Spark-M5 Attachments: HIVE-10006.1-spark.patch While execute query with RSC, MapWork/ReduceWork number is increased all the time, and lead to OOM at the end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10003) MiniTez ut fail with missing configs
[ https://issues.apache.org/jira/browse/HIVE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10003: --- Assignee: Gunther Hagleitner (was: Siddharth Seth) MiniTez ut fail with missing configs Key: HIVE-10003 URL: https://issues.apache.org/jira/browse/HIVE-10003 Project: Hive Issue Type: Sub-task Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: llap Attachments: HIVE-10003.1.patch, HIVE-10003.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10011) LLAP: NegativeArraySize exception on vector string reader
[ https://issues.apache.org/jira/browse/HIVE-10011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10011: --- Summary: LLAP: NegativeArraySize exception on vector string reader (was: LLAP: NegativeArraySize exception on some vector string reader) LLAP: NegativeArraySize exception on vector string reader - Key: HIVE-10011 URL: https://issues.apache.org/jira/browse/HIVE-10011 Project: Hive Issue Type: Sub-task Reporter: Gopal V With some logging, I confirmed that the String length vectors contained junk data the length field is overflowing. {code} Caused by: java.lang.NegativeArraySizeException at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.commonReadByteArrays(RecordReaderImpl.java:1550) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.readOrcByteArrays(RecordReaderImpl.java:1569) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDirectTreeReader.nextVector(RecordReaderImpl.java:1691) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.nextVector(RecordReaderImpl.java:1517) at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:115) at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:108) at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:35) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:314) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:272) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) ... 4 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9984) JoinReorder's getOutputSize is exponential
[ https://issues.apache.org/jira/browse/HIVE-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-9984: - Fix Version/s: 1.2.0 JoinReorder's getOutputSize is exponential -- Key: HIVE-9984 URL: https://issues.apache.org/jira/browse/HIVE-9984 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gopal V Fix For: 1.2.0 Attachments: HIVE-9984.1.patch, HIVE-9984.2.patch Found by [~mmokhtar]. Causes major issues in large plans (50+ joins). Simple fix would be to memoize the recursion. There should also be a flag to switch this opt off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10003) MiniTez ut fail with missing configs
[ https://issues.apache.org/jira/browse/HIVE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-10003. --- Resolution: Fixed turned off llap mode in tez unit tests for now. also committed gopal's changes to how we setup the tez env. MiniTez ut fail with missing configs Key: HIVE-10003 URL: https://issues.apache.org/jira/browse/HIVE-10003 Project: Hive Issue Type: Sub-task Reporter: Gunther Hagleitner Assignee: Siddharth Seth Attachments: HIVE-10003.1.patch, HIVE-10003.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10005) remove some unnecessary branches from the inner loop
[ https://issues.apache.org/jira/browse/HIVE-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10005: -- Attachment: HIVE-10005.2.patch remove some unnecessary branches from the inner loop Key: HIVE-10005 URL: https://issues.apache.org/jira/browse/HIVE-10005 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-10005.1.patch, HIVE-10005.2.patch Operator.forward is doing too much. There's no reason to do the done checking per row and update it inline. It's much more efficient to just do that when the event that completes an operator happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10005) remove some unnecessary branches from the inner loop
[ https://issues.apache.org/jira/browse/HIVE-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368229#comment-14368229 ] Gunther Hagleitner commented on HIVE-10005: --- [~gopalv] can you take a look? remove some unnecessary branches from the inner loop Key: HIVE-10005 URL: https://issues.apache.org/jira/browse/HIVE-10005 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-10005.1.patch, HIVE-10005.2.patch Operator.forward is doing too much. There's no reason to do the done checking per row and update it inline. It's much more efficient to just do that when the event that completes an operator happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9997) minor tweaks for bytes mapjoin hash table
[ https://issues.apache.org/jira/browse/HIVE-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368066#comment-14368066 ] Ashutosh Chauhan commented on HIVE-9997: +1 minor tweaks for bytes mapjoin hash table - Key: HIVE-9997 URL: https://issues.apache.org/jira/browse/HIVE-9997 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-9997.patch From HIVE-7617 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9980) LLAP: Pass additional JVM args via Slider appConfig
[ https://issues.apache.org/jira/browse/HIVE-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-9980: -- Attachment: HIVE-9980.2.patch LLAP: Pass additional JVM args via Slider appConfig --- Key: HIVE-9980 URL: https://issues.apache.org/jira/browse/HIVE-9980 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-9980.1.patch, HIVE-9980.2.patch For profiling, JMX remote ports and to attach debuggers to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9979) LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data
[ https://issues.apache.org/jira/browse/HIVE-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-9979: --- Attachment: HIVE-9979.patch preliminary patch with some fixes LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data Key: HIVE-9979 URL: https://issues.apache.org/jira/browse/HIVE-9979 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Sergey Shelukhin Attachments: HIVE-9979.patch When the cache is enabled, queries throws different over-read exceptions. Looks like the batchSize changes as you read data, the end of stripe batchSize is smaller than the default size (the super calls change it). {code} Caused by: java.io.EOFException: Can't finish byte read from uncompressed stream DATA position: 262144 length: 262144 range: 0 offset: 46399488 limit: 46399488 at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.commonReadByteArrays(RecordReaderImpl.java:1556) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.readOrcByteArrays(RecordReaderImpl.java:1569) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDirectTreeReader.nextVector(RecordReaderImpl.java:1691) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.nextVector(RecordReaderImpl.java:1517) at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:115) at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:108) at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:35) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:314) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:280) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) ... 4 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10006) RSC has memory leak while execute multi queries.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-10006: - Attachment: HIVE-10006.2-spark.patch RSC has memory leak while execute multi queries.[Spark Branch] -- Key: HIVE-10006 URL: https://issues.apache.org/jira/browse/HIVE-10006 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.1.0 Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Critical Labels: Spark-M5 Attachments: HIVE-10006.1-spark.patch, HIVE-10006.2-spark.patch While execute query with RSC, MapWork/ReduceWork number is increased all the time, and lead to OOM at the end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10005) remove some unnecessary branches from the inner loop
[ https://issues.apache.org/jira/browse/HIVE-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368467#comment-14368467 ] Hive QA commented on HIVE-10005: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12705483/HIVE-10005.2.patch {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 7771 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_decode_name org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_special_char org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppr_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqual_corr_expr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_json_tuple org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_parse_url_tuple org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union35 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_elt org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_windowing {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3076/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3076/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3076/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12705483 - PreCommit-HIVE-TRUNK-Build remove some unnecessary branches from the inner loop Key: HIVE-10005 URL: https://issues.apache.org/jira/browse/HIVE-10005 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-10005.1.patch, HIVE-10005.2.patch Operator.forward is doing too much. There's no reason to do the done checking per row and update it inline. It's much more efficient to just do that when the event that completes an operator happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9970) Hive on spark
[ https://issues.apache.org/jira/browse/HIVE-9970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368526#comment-14368526 ] Amithsha commented on HIVE-9970: Mysql error solved after updating the version But hive on spark still in error state hive version 1.1.0 Spark 1.3.0 ERROR : Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client. at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:57) at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:116) at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:113) at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:95) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75) Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out waiting for client connection. at com.google.common.base.Throwables.propagate(Throwables.java:156) at org.apache.hive.spark.client.SparkClientImpl.init(SparkClientImpl.java:104) at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80) at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.init(RemoteHiveSparkClient.java:88) at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:58) at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55) ... 6 more Caused by: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out waiting for client connection. at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) at org.apache.hive.spark.client.SparkClientImpl.init(SparkClientImpl.java:94) ... 10 more Caused by: java.util.concurrent.TimeoutException: Timed out waiting for client connection. at org.apache.hive.spark.client.rpc.RpcServer$2.run(RpcServer.java:134) at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:123) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at java.lang.Thread.run(Thread.java:744) ERROR : Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client. at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:57) at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:116) at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:113) at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:95) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75) Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out waiting for client connection. at com.google.common.base.Throwables.propagate(Throwables.java:156) at org.apache.hive.spark.client.SparkClientImpl.init(SparkClientImpl.java:104) at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80) at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.init(RemoteHiveSparkClient.java:88) at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:58) at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55) ... 6 more Caused by: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out waiting for client connection. at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) at org.apache.hive.spark.client.SparkClientImpl.init(SparkClientImpl.java:94)
[jira] [Updated] (HIVE-10005) remove some unnecessary branches from the inner loop
[ https://issues.apache.org/jira/browse/HIVE-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10005: -- Attachment: HIVE-10005.3.patch remove some unnecessary branches from the inner loop Key: HIVE-10005 URL: https://issues.apache.org/jira/browse/HIVE-10005 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-10005.1.patch, HIVE-10005.2.patch, HIVE-10005.3.patch Operator.forward is doing too much. There's no reason to do the done checking per row and update it inline. It's much more efficient to just do that when the event that completes an operator happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9994) Hive query plan returns sensitive data to external applications
[ https://issues.apache.org/jira/browse/HIVE-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-9994: -- Attachment: HIVE-9994.3.patch Hive query plan returns sensitive data to external applications --- Key: HIVE-9994 URL: https://issues.apache.org/jira/browse/HIVE-9994 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9994.1.patch, HIVE-9994.2.patch, HIVE-9994.3.patch Some applications are using getQueryString() method from the QueryPlan class to get the query that is being executed by Hive. The query string returned is not redacted, and it is returning sensitive information that is logged in Navigator. We need to return data redacted from the QueryPlan to avoid other applications to log sensitive data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10009) LazyObjectInspectorFactory is not thread safe [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-10009: --- Fix Version/s: spark-branch LazyObjectInspectorFactory is not thread safe [Spark Branch] Key: HIVE-10009 URL: https://issues.apache.org/jira/browse/HIVE-10009 Project: Hive Issue Type: Bug Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch LazyObjectInspectorFactory is not thread safe, which causes random failures in multiple thread environment such as Hive on Spark. We got exceptions like below {noformat} java.lang.RuntimeException: Map operator initialization failed: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:55) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:154) at org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:199) at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:355) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:92) ... 16 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10009) LazyObjectInspectorFactory is not thread safe [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-10009: --- Attachment: HIVE-10009.1-spark.patch LazyObjectInspectorFactory is not thread safe [Spark Branch] Key: HIVE-10009 URL: https://issues.apache.org/jira/browse/HIVE-10009 Project: Hive Issue Type: Bug Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10009.1-spark.patch LazyObjectInspectorFactory is not thread safe, which causes random failures in multiple thread environment such as Hive on Spark. We got exceptions like below {noformat} java.lang.RuntimeException: Map operator initialization failed: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:55) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:154) at org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:199) at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:355) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:92) ... 16 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others
[ https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367648#comment-14367648 ] Yongzhi Chen commented on HIVE-7018: This column is introduced by 0.10.0 only for mysql, this inconsistency cause some customers that upgrade from 0.9 versions worried. The side-effect other than uncomfortable is some migrate issues as Chaoyu said. For better supportability, we'd better fix it in the future releases. Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others - Key: HIVE-7018 URL: https://issues.apache.org/jira/browse/HIVE-7018 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Yongzhi Chen Attachments: HIVE-7018.1.patch It appears that at least postgres and oracle do not have the LINK_TARGET_ID column while mysql does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9994) Hive query plan returns sensitive data to external applications
[ https://issues.apache.org/jira/browse/HIVE-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-9994: -- Attachment: HIVE-9994.1.patch Hive query plan returns sensitive data to external applications --- Key: HIVE-9994 URL: https://issues.apache.org/jira/browse/HIVE-9994 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9994.1.patch Some applications are using getQueryString() method from the QueryPlan class to get the query that is being executed by Hive. The query string returned is not redacted, and it is returning sensitive information that is logged in Navigator. We need to return data redacted from the QueryPlan to avoid other applications to log sensitive data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9919) upgrade scripts don't work on some auto-created DBs due to absence of tables
[ https://issues.apache.org/jira/browse/HIVE-9919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368032#comment-14368032 ] Thejas M Nair commented on HIVE-9919: - +1 upgrade scripts don't work on some auto-created DBs due to absence of tables Key: HIVE-9919 URL: https://issues.apache.org/jira/browse/HIVE-9919 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-9919.01.patch, HIVE-9919.patch DataNucleus in its infinite wisdom doesn't create all tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9980) LLAP: Pass additional JVM args via Slider appConfig
[ https://issues.apache.org/jira/browse/HIVE-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-9980: -- Issue Type: Sub-task (was: Bug) Parent: HIVE-7926 LLAP: Pass additional JVM args via Slider appConfig --- Key: HIVE-9980 URL: https://issues.apache.org/jira/browse/HIVE-9980 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Gopal V Priority: Minor For profiling, JMX remote ports and to attach debuggers to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9980) LLAP: Pass additional JVM args via Slider appConfig
[ https://issues.apache.org/jira/browse/HIVE-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-9980: -- Attachment: HIVE-9980.1.patch LLAP: Pass additional JVM args via Slider appConfig --- Key: HIVE-9980 URL: https://issues.apache.org/jira/browse/HIVE-9980 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-9980.1.patch For profiling, JMX remote ports and to attach debuggers to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10009) LazyObjectInspectorFactory is not thread safe [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367757#comment-14367757 ] Hive QA commented on HIVE-10009: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12705399/HIVE-10009.1-spark.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7644 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union31 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/789/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/789/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-789/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12705399 - PreCommit-HIVE-SPARK-Build LazyObjectInspectorFactory is not thread safe [Spark Branch] Key: HIVE-10009 URL: https://issues.apache.org/jira/browse/HIVE-10009 Project: Hive Issue Type: Bug Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10009.1-spark.patch LazyObjectInspectorFactory is not thread safe, which causes random failures in multiple thread environment such as Hive on Spark. We got exceptions like below {noformat} java.lang.RuntimeException: Map operator initialization failed: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:55) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:154) at org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:199) at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:355) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:92) ... 16 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9994) Hive query plan returns sensitive data to external applications
[ https://issues.apache.org/jira/browse/HIVE-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367771#comment-14367771 ] Xuefu Zhang commented on HIVE-9994: --- +1 Hive query plan returns sensitive data to external applications --- Key: HIVE-9994 URL: https://issues.apache.org/jira/browse/HIVE-9994 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9994.1.patch, HIVE-9994.2.patch, HIVE-9994.3.patch Some applications are using getQueryString() method from the QueryPlan class to get the query that is being executed by Hive. The query string returned is not redacted, and it is returning sensitive information that is logged in Navigator. We need to return data redacted from the QueryPlan to avoid other applications to log sensitive data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others
[ https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-7018: --- Attachment: HIVE-7018.2.patch Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others - Key: HIVE-7018 URL: https://issues.apache.org/jira/browse/HIVE-7018 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Yongzhi Chen Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch It appears that at least postgres and oracle do not have the LINK_TARGET_ID column while mysql does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9979) LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data
[ https://issues.apache.org/jira/browse/HIVE-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367927#comment-14367927 ] Sergey Shelukhin commented on HIVE-9979: I'm not familiar enough with the code yet, but batchSize can be 0 here (by mistake probably). Not sure if it could be the cause LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data Key: HIVE-9979 URL: https://issues.apache.org/jira/browse/HIVE-9979 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Sergey Shelukhin When the cache is enabled, queries throws different over-read exceptions. Looks like the batchSize changes as you read data, the end of stripe batchSize is smaller than the default size (the super calls change it). {code} Caused by: java.io.EOFException: Can't finish byte read from uncompressed stream DATA position: 262144 length: 262144 range: 0 offset: 46399488 limit: 46399488 at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.commonReadByteArrays(RecordReaderImpl.java:1556) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.readOrcByteArrays(RecordReaderImpl.java:1569) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDirectTreeReader.nextVector(RecordReaderImpl.java:1691) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.nextVector(RecordReaderImpl.java:1517) at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:115) at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:108) at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:35) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:314) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:280) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) ... 4 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9979) LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data
[ https://issues.apache.org/jira/browse/HIVE-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367936#comment-14367936 ] Sergey Shelukhin commented on HIVE-9979: nm, looks like that would be handled where it fails LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data Key: HIVE-9979 URL: https://issues.apache.org/jira/browse/HIVE-9979 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Sergey Shelukhin When the cache is enabled, queries throws different over-read exceptions. Looks like the batchSize changes as you read data, the end of stripe batchSize is smaller than the default size (the super calls change it). {code} Caused by: java.io.EOFException: Can't finish byte read from uncompressed stream DATA position: 262144 length: 262144 range: 0 offset: 46399488 limit: 46399488 at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.commonReadByteArrays(RecordReaderImpl.java:1556) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.readOrcByteArrays(RecordReaderImpl.java:1569) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDirectTreeReader.nextVector(RecordReaderImpl.java:1691) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.nextVector(RecordReaderImpl.java:1517) at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:115) at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:108) at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:35) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:314) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:280) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) ... 4 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9956) use BigDecimal.valueOf instead of new in TestFileDump
[ https://issues.apache.org/jira/browse/HIVE-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367945#comment-14367945 ] Alexander Pivovarov commented on HIVE-9956: --- I think both failed tests have no relation to patch #1. use BigDecimal.valueOf instead of new in TestFileDump - Key: HIVE-9956 URL: https://issues.apache.org/jira/browse/HIVE-9956 Project: Hive Issue Type: Bug Components: File Formats Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-9956.1.patch, HIVE-9956.1.patch TestFileDump builds data row where one of the column is BigDecimal The test adds value 2. There are 2 ways to create BigDecimal object. 1. use new 2. use valueOf in this particular case 1. new will create 2.222153 2. valueOf will use the canonical String representation and the result will be 2. Probably we should use valueOf to create BigDecimal object TestTimestampWritable and TestHCatStores use valueOf -- This message was sent by Atlassian JIRA (v6.3.4#6332)