Review Request 24962: HIVE-7730: Extend ReadEntity to add accessed columns from query
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24962/ --- Review request for hive, Prasad Mujumdar and Szehon Ho. Repository: hive-git Description --- External authorization model can not get accessed columns from query. Hive should store accessed columns to ReadEntity Diffs - ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java 7ed50b4 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b05d3b4 Diff: https://reviews.apache.org/r/24962/diff/ Testing --- Thanks, Xiaomeng Huang
[jira] [Created] (HIVE-7847) query orc partitioned table fail when table column type change
Zhichun Wu created HIVE-7847: Summary: query orc partitioned table fail when table column type change Key: HIVE-7847 URL: https://issues.apache.org/jira/browse/HIVE-7847 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.0, 0.12.0, 0.11.0 Reporter: Zhichun Wu Assignee: Zhichun Wu Fix For: 0.14.0 I use the following script to test orc column type change with partitioned table on branch-0.13: {code} use test; DROP TABLE if exists orc_change_type_staging; DROP TABLE if exists orc_change_type; CREATE TABLE orc_change_type_staging ( id int ); CREATE TABLE orc_change_type ( id int ) PARTITIONED BY (`dt` string) stored as orc; --- load staging table LOAD DATA LOCAL INPATH '../hive/examples/files/int.txt' OVERWRITE INTO TABLE orc_change_type_staging; --- populate orc hive table INSERT OVERWRITE TABLE orc_change_type partition(dt='20140718') select * FROM orc_change_type_staging limit 1; --- change column id from int to bigint ALTER TABLE orc_change_type CHANGE id id bigint; INSERT OVERWRITE TABLE orc_change_type partition(dt='20140719') select * FROM orc_change_type_staging limit 1; SELECT id FROM orc_change_type where dt between '20140718' and '20140719'; {code} if fails in the last query SELECT id FROM orc_change_type where dt between '20140718' and '20140719'; with exception: {code} Error: java.io.IOException: java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:256) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254) ... 11 more Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:717) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1788) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2997) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:153) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:127) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339) ... 15 more {code} The value object would be reused each time we deserialize the row, it will fail when we start to process the next path with different schema. Resetting value each time we finish reading one path would solve this problem.
[jira] [Commented] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106515#comment-14106515 ] Xiaomeng Huang commented on HIVE-7730: -- Thanks [~szehon] I have linked to review board. Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7847) query orc partitioned table fail when table column type change
[ https://issues.apache.org/jira/browse/HIVE-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichun Wu updated HIVE-7847: - Attachment: HIVE-7847.1.patch query orc partitioned table fail when table column type change -- Key: HIVE-7847 URL: https://issues.apache.org/jira/browse/HIVE-7847 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.11.0, 0.12.0, 0.13.0 Reporter: Zhichun Wu Assignee: Zhichun Wu Fix For: 0.14.0 Attachments: HIVE-7847.1.patch I use the following script to test orc column type change with partitioned table on branch-0.13: {code} use test; DROP TABLE if exists orc_change_type_staging; DROP TABLE if exists orc_change_type; CREATE TABLE orc_change_type_staging ( id int ); CREATE TABLE orc_change_type ( id int ) PARTITIONED BY (`dt` string) stored as orc; --- load staging table LOAD DATA LOCAL INPATH '../hive/examples/files/int.txt' OVERWRITE INTO TABLE orc_change_type_staging; --- populate orc hive table INSERT OVERWRITE TABLE orc_change_type partition(dt='20140718') select * FROM orc_change_type_staging limit 1; --- change column id from int to bigint ALTER TABLE orc_change_type CHANGE id id bigint; INSERT OVERWRITE TABLE orc_change_type partition(dt='20140719') select * FROM orc_change_type_staging limit 1; SELECT id FROM orc_change_type where dt between '20140718' and '20140719'; {code} if fails in the last query SELECT id FROM orc_change_type where dt between '20140718' and '20140719'; with exception: {code} Error: java.io.IOException: java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:256) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254) ... 11 more Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:717) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1788) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2997) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:153) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:127) at
[jira] [Updated] (HIVE-7847) query orc partitioned table fail when table column type change
[ https://issues.apache.org/jira/browse/HIVE-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhichun Wu updated HIVE-7847: - Status: Patch Available (was: Open) query orc partitioned table fail when table column type change -- Key: HIVE-7847 URL: https://issues.apache.org/jira/browse/HIVE-7847 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.0, 0.12.0, 0.11.0 Reporter: Zhichun Wu Assignee: Zhichun Wu Fix For: 0.14.0 Attachments: HIVE-7847.1.patch I use the following script to test orc column type change with partitioned table on branch-0.13: {code} use test; DROP TABLE if exists orc_change_type_staging; DROP TABLE if exists orc_change_type; CREATE TABLE orc_change_type_staging ( id int ); CREATE TABLE orc_change_type ( id int ) PARTITIONED BY (`dt` string) stored as orc; --- load staging table LOAD DATA LOCAL INPATH '../hive/examples/files/int.txt' OVERWRITE INTO TABLE orc_change_type_staging; --- populate orc hive table INSERT OVERWRITE TABLE orc_change_type partition(dt='20140718') select * FROM orc_change_type_staging limit 1; --- change column id from int to bigint ALTER TABLE orc_change_type CHANGE id id bigint; INSERT OVERWRITE TABLE orc_change_type partition(dt='20140719') select * FROM orc_change_type_staging limit 1; SELECT id FROM orc_change_type where dt between '20140718' and '20140719'; {code} if fails in the last query SELECT id FROM orc_change_type where dt between '20140718' and '20140719'; with exception: {code} Error: java.io.IOException: java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:256) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254) ... 11 more Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:717) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1788) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2997) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:153) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:127) at
Re: Review Request 24962: HIVE-7730: Extend ReadEntity to add accessed columns from query
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24962/#review51257 --- Hi Xiaomeng, patch looks good, just had some style comments. ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java https://reviews.apache.org/r/24962/#comment89359 Can we make this final, and not have a setter? The caller can just add to the list. It'll make the code a bit simpler. Also should it be set? ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java https://reviews.apache.org/r/24962/#comment89360 No need for '==true' part. ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java https://reviews.apache.org/r/24962/#comment89362 Can we indent this code block inside {}? - Szehon Ho On Aug. 22, 2014, 6:01 a.m., Xiaomeng Huang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24962/ --- (Updated Aug. 22, 2014, 6:01 a.m.) Review request for hive, Prasad Mujumdar and Szehon Ho. Repository: hive-git Description --- External authorization model can not get accessed columns from query. Hive should store accessed columns to ReadEntity Diffs - ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java 7ed50b4 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b05d3b4 Diff: https://reviews.apache.org/r/24962/diff/ Testing --- Thanks, Xiaomeng Huang
[jira] [Commented] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106523#comment-14106523 ] Szehon Ho commented on HIVE-7730: - Thanks Xiaomeng, patch looks good overall, I put some minor comments on rb. Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7384) Research into reduce-side join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106532#comment-14106532 ] Szehon Ho commented on HIVE-7384: - Thanks [~lianhuiwang] for the information. Research into reduce-side join [Spark Branch] - Key: HIVE-7384 URL: https://issues.apache.org/jira/browse/HIVE-7384 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Szehon Ho Attachments: Hive on Spark Reduce Side Join.docx, sales_items.txt, sales_products.txt, sales_stores.txt Hive's join operator is very sophisticated, especially for reduce-side join. While we expect that other types of join, such as map-side join and SMB map-side join, will work out of the box with our design, there may be some complication in reduce-side join, which extensively utilizes key tag and shuffle behavior. Our design principle prefers to making Hive implementation work out of box also, which might requires new functionality from Spark. The tasks is to research into this area, identifying requirements for Spark community and the work to be done on Hive to make reduce-side join work. A design doc might be needed for this. For more information, please refer to the overall design doc on wiki. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7654) A method to extrapolate columnStats for partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106528#comment-14106528 ] Szehon Ho commented on HIVE-7654: - Sorry for maybe a dumb question, but was curious is it a typical behavior to extrapolate in all cases? I can see it would be a good approx in some case, but would it ever be undesirable in some cases? A method to extrapolate columnStats for partitions of a table - Key: HIVE-7654 URL: https://issues.apache.org/jira/browse/HIVE-7654 Project: Hive Issue Type: New Feature Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch, HIVE-7654.8.patch In a PARTITIONED table, there are many partitions. For example, create table if not exists loc_orc ( state string, locid int, zip bigint ) partitioned by(year string) stored as orc; We assume there are 4 partitions, partition(year='2000'), partition(year='2001'), partition(year='2002') and partition(year='2003'). We can use the following command to compute statistics for columns state,locid of partition(year='2001') analyze table loc_orc partition(year='2001') compute statistics for columns state,locid; We need to know the “aggregated” column status for the whole table loc_orc. However, we may not have the column status for some partitions, e.g., partition(year='2002') and also we may not have the column status for some columns, e.g., zip bigint for partition(year='2001') We propose a method to extrapolate the missing column status for the partitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Timeline for release of Hive 0.14
+1, Eugene and I are working on getting HIVE-5317 (insert, update, delete) done and would like to get it in. Alan. Nick Dimiduk mailto:ndimi...@gmail.com August 20, 2014 at 12:27 It'd be great to get HIVE-4765 included in 0.14. The proposed changes are a big improvement for us HBase folks. Would someone mind having a look in that direction? Thanks, Nick Thejas Nair mailto:the...@hortonworks.com August 19, 2014 at 15:20 +1 Sounds good to me. Its already almost 4 months since the last release. It is time to start preparing for the next one. Thanks for volunteering! Vikram Dixit mailto:vik...@hortonworks.com August 19, 2014 at 14:02 Hi Folks, I was thinking that it was about time that we had a release of hive 0.14 given our commitment to having a release of hive on a periodic basis. We could cut a branch and start working on a release in say 2 weeks time around September 5th (Friday). After branching, we can focus on stabilizing for the release and hopefully have an RC in about 2 weeks post that. I would like to volunteer myself for the duties of the release manager for this version if the community agrees. Thanks Vikram. -- Sent with Postbox http://www.getpostbox.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106542#comment-14106542 ] Chinna Rao Lalam commented on HIVE-7702: Hi [~brocknoland], Compare against MR most of the times differences are due to sorting order only. Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7821) StarterProject: enable groupby4.q
[ https://issues.apache.org/jira/browse/HIVE-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106546#comment-14106546 ] Chinna Rao Lalam commented on HIVE-7821: Hi [~brocknoland], I don't know you have created for suhas. I am handling group by queries in the previous jira so i assigned my self to avoid duplicate work. I don't mind let Suhas can work on this. StarterProject: enable groupby4.q - Key: HIVE-7821 URL: https://issues.apache.org/jira/browse/HIVE-7821 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Suhas Satish -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7794) Enable tests on Spark branch (4) [Sparch Branch]
[ https://issues.apache.org/jira/browse/HIVE-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam reassigned HIVE-7794: -- Assignee: Chinna Rao Lalam Enable tests on Spark branch (4) [Sparch Branch] Key: HIVE-7794 URL: https://issues.apache.org/jira/browse/HIVE-7794 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Chinna Rao Lalam This jira is to enable *most* of the tests below. If tests don't pass because of some unsupported feature, ensure that a JIRA exists and move on. {noformat} vector_cast_constant.q,\ vector_data_types.q,\ vector_decimal_aggregate.q,\ vector_left_outer_join.q,\ vector_string_concat.q,\ vectorization_12.q,\ vectorization_13.q,\ vectorization_14.q,\ vectorization_15.q,\ vectorization_9.q,\ vectorization_part_project.q,\ vectorization_short_regress.q,\ vectorized_mapjoin.q,\ vectorized_nested_mapjoin.q,\ vectorized_ptf.q,\ vectorized_shufflejoin.q,\ vectorized_timestamp_funcs.q {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24962: HIVE-7730: Extend ReadEntity to add accessed columns from query
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24962/ --- (Updated Aug. 22, 2014, 6:47 a.m.) Review request for hive, Prasad Mujumdar and Szehon Ho. Repository: hive-git Description --- External authorization model can not get accessed columns from query. Hive should store accessed columns to ReadEntity Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java 7ed50b4 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b05d3b4 Diff: https://reviews.apache.org/r/24962/diff/ Testing --- Thanks, Xiaomeng Huang
Re: Review Request 24962: HIVE-7730: Extend ReadEntity to add accessed columns from query
On Aug. 22, 2014, 6:14 a.m., Szehon Ho wrote: ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java, line 54 https://reviews.apache.org/r/24962/diff/1/?file=666753#file666753line54 Can we make this final, and not have a setter? The caller can just add to the list. It'll make the code a bit simpler. Also should it be set? Thanks, I think it better to be list. I get accessed columns from tableToColumnAccessMap, which is a MapString, ListString. Hive's native authorization is use this list too. I get the column list via a table name, then set it to readEntity directly, don't need to add every one with a loop. so it is necessary to have a setter. BTW, I can also to add a API addAccessedColumn(String column) to add one column to this column list. On Aug. 22, 2014, 6:14 a.m., Szehon Ho wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 9521 https://reviews.apache.org/r/24962/diff/1/?file=666754#file666754line9521 No need for '==true' part. fixed. Thanks. On Aug. 22, 2014, 6:14 a.m., Szehon Ho wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 9539 https://reviews.apache.org/r/24962/diff/1/?file=666754#file666754line9539 Can we indent this code block inside {}? fixed. thanks. - Xiaomeng --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24962/#review51257 --- On Aug. 22, 2014, 6:47 a.m., Xiaomeng Huang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24962/ --- (Updated Aug. 22, 2014, 6:47 a.m.) Review request for hive, Prasad Mujumdar and Szehon Ho. Repository: hive-git Description --- External authorization model can not get accessed columns from query. Hive should store accessed columns to ReadEntity Diffs - ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java 7ed50b4 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b05d3b4 Diff: https://reviews.apache.org/r/24962/diff/ Testing --- Thanks, Xiaomeng Huang
[jira] [Commented] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106554#comment-14106554 ] Gopal V commented on HIVE-7832: --- Minor refactoring comments on RB. LGTM +1, pending tests pass. Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7832.1.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6245) HS2 creates DBs/Tables with wrong ownership when HMS setugi is true
[ https://issues.apache.org/jira/browse/HIVE-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106558#comment-14106558 ] Hive QA commented on HIVE-6245: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663248/HIVE-6245.4.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6116 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/451/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/451/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-451/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663248 HS2 creates DBs/Tables with wrong ownership when HMS setugi is true --- Key: HIVE-6245 URL: https://issues.apache.org/jira/browse/HIVE-6245 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0, 0.13.0 Reporter: Chaoyu Tang Assignee: Venki Korukanti Attachments: HIVE-6245.2.patch.txt, HIVE-6245.3.patch.txt, HIVE-6245.4.patch, HIVE-6245.patch The case with following settings is valid but does not work correctly in current HS2: == hive.server2.authentication=NONE (or LDAP) hive.server2.enable.doAs= true hive.metastore.sasl.enabled=false hive.metastore.execute.setugi=true == Ideally, HS2 is able to impersonate the logged in user (from Beeline, or JDBC application) and create DBs/Tables with user's ownership. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Timeline for release of Hive 0.14
Release 0.14 should include HIVE-6586 https://issues.apache.org/jira/browse/HIVE-6586 (various fixes to HiveConf.java parameters). I'll do that as soon as possible. 72 jiras have the TODOC14 label now, although my own tally is 99. This is more than mere mortals can accomplish in a few weeks. Therefore I recommend that you all plead with your managers to allocate some tech-writer resources to Hive wikidocs for the 0.14.0 release. I'll send out a state-of-the-docs message in a separate thread. -- Lefty On Fri, Aug 22, 2014 at 2:28 AM, Alan Gates ga...@hortonworks.com wrote: +1, Eugene and I are working on getting HIVE-5317 (insert, update, delete) done and would like to get it in. Alan. Nick Dimiduk ndimi...@gmail.com August 20, 2014 at 12:27 It'd be great to get HIVE-4765 included in 0.14. The proposed changes are a big improvement for us HBase folks. Would someone mind having a look in that direction? Thanks, Nick Thejas Nair the...@hortonworks.com August 19, 2014 at 15:20 +1 Sounds good to me. Its already almost 4 months since the last release. It is time to start preparing for the next one. Thanks for volunteering! Vikram Dixit vik...@hortonworks.com August 19, 2014 at 14:02 Hi Folks, I was thinking that it was about time that we had a release of hive 0.14 given our commitment to having a release of hive on a periodic basis. We could cut a branch and start working on a release in say 2 weeks time around September 5th (Friday). After branching, we can focus on stabilizing for the release and hopefully have an RC in about 2 weeks post that. I would like to volunteer myself for the duties of the release manager for this version if the community agrees. Thanks Vikram. -- Sent with Postbox http://www.getpostbox.com CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7832: - Attachment: HIVE-7832.2.patch Addressed Gopal's review comment. Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup
[ https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6847: --- Attachment: HIVE-6847.3.patch Improve / fix bugs in Hive scratch dir setup Key: HIVE-6847 URL: https://issues.apache.org/jira/browse/HIVE-6847 Project: Hive Issue Type: Bug Components: CLI, HiveServer2 Affects Versions: 0.14.0 Reporter: Vikram Dixit K Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-6847.1.patch, HIVE-6847.2.patch, HIVE-6847.3.patch Currently, the hive server creates scratch directory and changes permission to 777 however, this is not great with respect to security. We need to create user specific scratch directories instead. Also refer to HIVE-6782 1st iteration of the patch for approach. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup
[ https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6847: --- Status: Patch Available (was: Open) Fixes test failures Improve / fix bugs in Hive scratch dir setup Key: HIVE-6847 URL: https://issues.apache.org/jira/browse/HIVE-6847 Project: Hive Issue Type: Bug Components: CLI, HiveServer2 Affects Versions: 0.14.0 Reporter: Vikram Dixit K Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-6847.1.patch, HIVE-6847.2.patch, HIVE-6847.3.patch Currently, the hive server creates scratch directory and changes permission to 777 however, this is not great with respect to security. We need to create user specific scratch directories instead. Also refer to HIVE-6782 1st iteration of the patch for approach. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup
[ https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6847: --- Status: Open (was: Patch Available) Improve / fix bugs in Hive scratch dir setup Key: HIVE-6847 URL: https://issues.apache.org/jira/browse/HIVE-6847 Project: Hive Issue Type: Bug Components: CLI, HiveServer2 Affects Versions: 0.14.0 Reporter: Vikram Dixit K Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-6847.1.patch, HIVE-6847.2.patch, HIVE-6847.3.patch Currently, the hive server creates scratch directory and changes permission to 777 however, this is not great with respect to security. We need to create user specific scratch directories instead. Also refer to HIVE-6782 1st iteration of the patch for approach. -- This message was sent by Atlassian JIRA (v6.2#6252)
State of the docs
The backlog of Hive wikidoc tasks is large, and keeps on growing. Jiras that need documentation: 25 for releases 0.10, 0.11, and 0.12 (only 17 have TODOC labels) 37 for release 0.13 (only 25 have TODOC13 label) 99 for release 0.14 (only 72 have TODOC14 label) Also: 5 doc tasks not associated with jiras 36 wish-list tasks (clarifications, new docs, improvements) ~10 tasks or projects not associated with email These numbers are probably inaccurate but they give the general idea. Lately I've been making progress at a rate of 1 or 2 per day. I could do more if I stopped monitoring the mailing lists, but then (a) we'd miss a fair number of doc tasks, and (b) at best I might manage 4 a day. Some doc tasks can and should be done by developers, but IMHO the bulk of these tasks should be handled by tech writers. My attempts to recruit more volunteers have failed so far, although I'll keep trying. Can we get some corporate support for the Hive wiki? -- Lefty
[jira] [Commented] (HIVE-7222) Support timestamp column statistics in ORC and extend PPD for timestamp
[ https://issues.apache.org/jira/browse/HIVE-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106613#comment-14106613 ] Hive QA commented on HIVE-7222: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663459/HIVE-7222.1.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 6116 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_analyze org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_part_project org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_timestamp_funcs {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/452/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/452/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-452/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663459 Support timestamp column statistics in ORC and extend PPD for timestamp --- Key: HIVE-7222 URL: https://issues.apache.org/jira/browse/HIVE-7222 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Daniel Dai Labels: orcfile Attachments: HIVE-7222-1.patch, HIVE-7222.1.patch Add column statistics for timestamp columns in ORC. Also extend predicate pushdown to support timestamp column evaluation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3421) Column Level Top K Values Statistics
[ https://issues.apache.org/jira/browse/HIVE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106624#comment-14106624 ] wangmeng commented on HIVE-3421: this is very useful!!! I am waiting the coming version Column Level Top K Values Statistics Key: HIVE-3421 URL: https://issues.apache.org/jira/browse/HIVE-3421 Project: Hive Issue Type: New Feature Reporter: Feng Lu Assignee: Feng Lu Attachments: HIVE-3421.patch.1.txt, HIVE-3421.patch.2.txt, HIVE-3421.patch.3.txt, HIVE-3421.patch.4.txt, HIVE-3421.patch.5.txt, HIVE-3421.patch.6.txt, HIVE-3421.patch.7.txt, HIVE-3421.patch.8.txt, HIVE-3421.patch.9.txt, HIVE-3421.patch.txt Compute (estimate) top k values statistics for each column, and put the most skewed column into skewed info, if user hasn't specified skew. This feature depends on ListBucketing (create table skewed on) https://cwiki.apache.org/Hive/listbucketing.html. All column topk can be added to skewed info, if in the future skewed info supports multiple independent columns. The TopK algorithm is based on this paper: http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7848) Refresh SparkContext when spark configuration changes
Chinna Rao Lalam created HIVE-7848: -- Summary: Refresh SparkContext when spark configuration changes Key: HIVE-7848 URL: https://issues.apache.org/jira/browse/HIVE-7848 Project: Hive Issue Type: Sub-task Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7848) Refresh SparkContext when spark configuration changes
[ https://issues.apache.org/jira/browse/HIVE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-7848: --- Description: Recreate the spark client if spark configurations are updated (through set command). Refresh SparkContext when spark configuration changes - Key: HIVE-7848 URL: https://issues.apache.org/jira/browse/HIVE-7848 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: spark-branch Recreate the spark client if spark configurations are updated (through set command). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7848) Refresh SparkContext when spark configuration changes
[ https://issues.apache.org/jira/browse/HIVE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-7848: --- Fix Version/s: spark-branch Refresh SparkContext when spark configuration changes - Key: HIVE-7848 URL: https://issues.apache.org/jira/browse/HIVE-7848 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: spark-branch -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: State of the docs
I seen few inaccuracies in the wiki about some properties. Just few questions, how to report it? To who? For example, this page (1) say that property hive.fetch.task.conversion has default value of minimal. But that's wrong. (1) https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution Regards, Damien CAROL * tél : +33 (0)4 74 96 88 14 * fax : +33 (0)4 74 96 31 88 * email :dca...@blitzbs.com mailto:dca...@blitzbs.com BLITZ BUSINESS SERVICE Le 22/08/2014 09:50, Lefty Leverenz a écrit : The backlog of Hive wikidoc tasks is large, and keeps on growing. Jiras that need documentation: 25 for releases 0.10, 0.11, and 0.12 (only 17 have TODOC labels) 37 for release 0.13 (only 25 have TODOC13 label) 99 for release 0.14 (only 72 have TODOC14 label) Also: 5 doc tasks not associated with jiras 36 wish-list tasks (clarifications, new docs, improvements) ~10 tasks or projects not associated with email These numbers are probably inaccurate but they give the general idea. Lately I've been making progress at a rate of 1 or 2 per day. I could do more if I stopped monitoring the mailing lists, but then (a) we'd miss a fair number of doc tasks, and (b) at best I might manage 4 a day. Some doc tasks can and should be done by developers, but IMHO the bulk of these tasks should be handled by tech writers. My attempts to recruit more volunteers have failed so far, although I'll keep trying. Can we get some corporate support for the Hive wiki? -- Lefty
[jira] [Updated] (HIVE-6987) Metastore qop settings won't work with Hadoop-2.4
[ https://issues.apache.org/jira/browse/HIVE-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-6987: - Resolution: Duplicate Status: Resolved (was: Patch Available) Metastore qop settings won't work with Hadoop-2.4 - Key: HIVE-6987 URL: https://issues.apache.org/jira/browse/HIVE-6987 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Labels: patch Fix For: 0.14.0 Attachments: HIVE-6987.txt [HADOOP-10211|https://issues.apache.org/jira/browse/HADOOP-10211] made a backward incompatible change due to which the following hive call returns a null map: {code} MapString, String hadoopSaslProps = ShimLoader.getHadoopThriftAuthBridge(). getHadoopSaslProperties(conf); {code} Metastore uses the underlying hadoop.rpc.protection values to set the qop between metastore client/server. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6987) Metastore qop settings won't work with Hadoop-2.4
[ https://issues.apache.org/jira/browse/HIVE-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106644#comment-14106644 ] Jason Dere commented on HIVE-6987: -- I think this is the same issue as HIVE-7620, which was recently committed to trunk. Marking as a duplicate. Sorry about that [~skrho], hope to see your next contribution soon. Metastore qop settings won't work with Hadoop-2.4 - Key: HIVE-6987 URL: https://issues.apache.org/jira/browse/HIVE-6987 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Labels: patch Fix For: 0.14.0 Attachments: HIVE-6987.txt [HADOOP-10211|https://issues.apache.org/jira/browse/HADOOP-10211] made a backward incompatible change due to which the following hive call returns a null map: {code} MapString, String hadoopSaslProps = ShimLoader.getHadoopThriftAuthBridge(). getHadoopSaslProperties(conf); {code} Metastore uses the underlying hadoop.rpc.protection values to set the qop between metastore client/server. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7849) Support more generic predicate pushdown for hbase handler
Navis created HIVE-7849: --- Summary: Support more generic predicate pushdown for hbase handler Key: HIVE-7849 URL: https://issues.apache.org/jira/browse/HIVE-7849 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Navis Assignee: Navis Priority: Minor Currently, hbase handler supports AND conjugated filters only. This is the first try to support OR, NOT, IN, BETWEEN predicates for hbase. Mostly based on the work done by [~teddy.choi]. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7593) Instantiate SparkClient per user session [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-7593: --- Status: Open (was: Patch Available) Patch moved to HIVE-7848 Instantiate SparkClient per user session [Spark Branch] --- Key: HIVE-7593 URL: https://issues.apache.org/jira/browse/HIVE-7593 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chinna Rao Lalam Attachments: HIVE-7593-spark.patch, HIVE-7593.1-spark.patch SparkContext is the main class via which Hive talk to Spark cluster. SparkClient encapsulates a SparkContext instance. Currently all user sessions share a single SparkClient instance in HiveServer2. While this is good enough for a POC, even for our first two milestones, this is not desirable for a multi-tenancy environment and gives least flexibility to Hive users. Here is what we propose: 1. Have a SparkClient instance per user session. The SparkClient instance is created when user executes its first query in the session. It will get destroyed when user session ends. 2. The SparkClient is instantiated based on the spark configurations that are available to the user, including those defined at the global level and those overwritten by the user (thru set command, for instance). 3. Ideally, when user changes any spark configuration during the session, the old SparkClient instance should be destroyed and a new one based on the new configurations is created. This may turn out to be a little hard, and thus it's a nice-to-have. If not implemented, we need to document that subsequent configuration changes will not take effect in the current session. Please note that there is a thread-safety issue on Spark side where multiple SparkContext instances cannot coexist in the same JVM (SPARK-2243). We need to work with Spark community to get this addressed. Besides above functional requirements, avoid potential issues is also a consideration. For instance, sharing SC among users is bad, as resources (such as jar for UDF) will be also shared, which is problematic. On the other hand, one SC per job seems too expensive, as the resource needs to be re-rendered even there isn't any change. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7849) Support more generic predicate pushdown for hbase handler
[ https://issues.apache.org/jira/browse/HIVE-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7849: Attachment: HIVE-7849.1.patch.txt Running preliminary test. Need some more elaboration on interfaces, etc. Support more generic predicate pushdown for hbase handler - Key: HIVE-7849 URL: https://issues.apache.org/jira/browse/HIVE-7849 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-7849.1.patch.txt Currently, hbase handler supports AND conjugated filters only. This is the first try to support OR, NOT, IN, BETWEEN predicates for hbase. Mostly based on the work done by [~teddy.choi]. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7848) Refresh SparkContext when spark configuration changes
[ https://issues.apache.org/jira/browse/HIVE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-7848: --- Attachment: HIVE-7848-spark.patch Refresh SparkContext when spark configuration changes - Key: HIVE-7848 URL: https://issues.apache.org/jira/browse/HIVE-7848 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: spark-branch Attachments: HIVE-7848-spark.patch Recreate the spark client if spark configurations are updated (through set command). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7849) Support more generic predicate pushdown for hbase handler
[ https://issues.apache.org/jira/browse/HIVE-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7849: Status: Patch Available (was: Open) Support more generic predicate pushdown for hbase handler - Key: HIVE-7849 URL: https://issues.apache.org/jira/browse/HIVE-7849 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-7849.1.patch.txt Currently, hbase handler supports AND conjugated filters only. This is the first try to support OR, NOT, IN, BETWEEN predicates for hbase. Mostly based on the work done by [~teddy.choi]. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106654#comment-14106654 ] Hive QA commented on HIVE-7832: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663613/HIVE-7832.2.patch {color:red}ERROR:{color} -1 due to 28 failed/errored test(s), 6118 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_analyze org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_part_project org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_timestamp_funcs org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold org.apache.hadoop.hive.ql.io.orc.TestOrcFile.columnProjection[0] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.columnProjection[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.metaData[0] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.metaData[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.test1[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testPredicatePushdown[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testSeek[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testSnappy[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testStringAndBinaryStatistics[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testStripeLevelStats[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testTimestamp[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testUnionAndTimestamp[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testWithoutIndex[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testZeroCopySeek[1] org.apache.hadoop.hive.ql.io.orc.TestOrcNullOptimization.testMultiStripeWithNull {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/453/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/453/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-453/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 28 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663613 Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7848) Refresh SparkContext when spark configuration changes
[ https://issues.apache.org/jira/browse/HIVE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106652#comment-14106652 ] Chinna Rao Lalam commented on HIVE-7848: Review comments : bq. 1) null out sparkSession after closing Taken care this in the new patch bq. 2) null out the static SparkClient member variable when closed It is already taken care in org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.close() Refresh SparkContext when spark configuration changes - Key: HIVE-7848 URL: https://issues.apache.org/jira/browse/HIVE-7848 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: spark-branch Attachments: HIVE-7848-spark.patch Recreate the spark client if spark configurations are updated (through set command). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7799: Attachment: HIVE-7799.1-spark.patch HiveBaseFunctionResultList use RowContainer to store collected map output row, all rows should be added into RowContainer then start read from it, RowContainer does not support write after read. Remove current lazy execution mode as it depends on RowContainer write-after-read. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContrainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
Sathish created HIVE-7850: - Summary: Hive Query failed if the data type is arraystring with parquet files Key: HIVE-7850 URL: https://issues.apache.org/jira/browse/HIVE-7850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1, 0.14.0 Reporter: Sathish * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish updated HIVE-7850: -- Description: * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. was: * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at
[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7799: Status: Patch Available (was: Open) TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContrainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7799: Description: Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. was: Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContrainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at
[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7799: Attachment: HIVE-7799.2-spark.patch Process all inputs and stored output rows in RowContainer and then read from it is not very performance efficient. We could just use a queue to store output rows as ResultIterator only process next record while get next output row. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContrainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 24972: HIVE-7799 TRANSFORM failed in transform_ppr1.q
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24972/ --- Review request for hive, Brock Noland and Szehon Ho. Bugs: HIVE-7799 https://issues.apache.org/jira/browse/HIVE-7799 Repository: hive-git Description --- HiveBaseFunctionResultList use RowContainer to store collected map output row, all rows should be added into RowContainer then start read from it, RowContainer does not support write after read. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java 6568a76 Diff: https://reviews.apache.org/r/24972/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 24972: HIVE-7799 TRANSFORM failed in transform_ppr1.q
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24972/ --- (Updated 八月 22, 2014, 9:47 a.m.) Review request for hive, Brock Noland and Szehon Ho. Changes --- we do not need RowContainer here for persistent storage support, as ResultIterator just process new record on demand, a queue should just work fine. Bugs: HIVE-7799 https://issues.apache.org/jira/browse/HIVE-7799 Repository: hive-git Description --- HiveBaseFunctionResultList use RowContainer to store collected map output row, all rows should be added into RowContainer then start read from it, RowContainer does not support write after read. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java 6568a76 Diff: https://reviews.apache.org/r/24972/diff/ Testing --- Thanks, chengxiang li
[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish updated HIVE-7850: -- Fix Version/s: 0.14.0 Status: Patch Available (was: Open) Hive Query failed if the data type is arraystring with parquet files -- Key: HIVE-7850 URL: https://issues.apache.org/jira/browse/HIVE-7850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1, 0.14.0 Reporter: Sathish Labels: parquet, serde Fix For: 0.14.0 * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish updated HIVE-7850: -- Status: Open (was: Patch Available) Hive Query failed if the data type is arraystring with parquet files -- Key: HIVE-7850 URL: https://issues.apache.org/jira/browse/HIVE-7850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1, 0.14.0 Reporter: Sathish Labels: parquet, serde Fix For: 0.14.0 * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. -- This message was sent by Atlassian JIRA (v6.2#6252)
HIVE-7850 requet to assign jira ticket
Hi All, I have fix available for https://issues.apache.org/jira/browse/HIVE-7850 with me and can anyone provide permissions for me to submit the patch changes to this jira issue. Jira username : vallurisathish Regards Sathish Valluri smime.p7s Description: S/MIME cryptographic signature
[jira] [Commented] (HIVE-7833) Remove unwanted allocation in ORC RunLengthIntegerWriterV2 determine encoding function
[ https://issues.apache.org/jira/browse/HIVE-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106689#comment-14106689 ] Hive QA commented on HIVE-7833: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663497/HIVE-7833.2.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6115 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/454/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/454/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-454/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663497 Remove unwanted allocation in ORC RunLengthIntegerWriterV2 determine encoding function -- Key: HIVE-7833 URL: https://issues.apache.org/jira/browse/HIVE-7833 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7833.1.patch, HIVE-7833.2.patch RunLengthIntegerWriterV2.determineEncoding() is used heavily. There are unwanted buffer allocation for every invocation of the function which are not required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106691#comment-14106691 ] Hive QA commented on HIVE-4629: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663483/HIVE-4629.7.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/455/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/455/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-455/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-455/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriterV2.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target accumulo-handler/target hwi/target common/target common/src/gen contrib/target service/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1619731. At revision 1619731. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12663483 HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Shreepadma Venugopalan Assignee: Dong Chen Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, HIVE-4629.2.patch, HIVE-4629.3.patch.txt, HIVE-4629.4.patch, HIVE-4629.5.patch, HIVE-4629.6.patch, HIVE-4629.7.patch HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106708#comment-14106708 ] Hive QA commented on HIVE-7799: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663637/HIVE-7799.2-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5980 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_null {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/79/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/79/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-79/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663637 TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7851) Fix NPE in split generation on Tez 0.5
Gunther Hagleitner created HIVE-7851: Summary: Fix NPE in split generation on Tez 0.5 Key: HIVE-7851 URL: https://issues.apache.org/jira/browse/HIVE-7851 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7851) Fix NPE in split generation on Tez 0.5
[ https://issues.apache.org/jira/browse/HIVE-7851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7851: - Attachment: HIVE-7851.1.patch Fix NPE in split generation on Tez 0.5 -- Key: HIVE-7851 URL: https://issues.apache.org/jira/browse/HIVE-7851 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-7851.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-7851) Fix NPE in split generation on Tez 0.5
[ https://issues.apache.org/jira/browse/HIVE-7851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-7851. -- Resolution: Fixed Committed to tez branch. Fix NPE in split generation on Tez 0.5 -- Key: HIVE-7851 URL: https://issues.apache.org/jira/browse/HIVE-7851 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7851.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7851) Fix NPE in split generation on Tez 0.5
[ https://issues.apache.org/jira/browse/HIVE-7851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7851: - Fix Version/s: tez-branch Fix NPE in split generation on Tez 0.5 -- Key: HIVE-7851 URL: https://issues.apache.org/jira/browse/HIVE-7851 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-7851.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
HIVE-7850 requet to assign jira ticket
Hi All, I have fix available for https://issues.apache.org/jira/browse/HIVE-7850 with me and can anyone provide permissions for me to submit the patch changes to this jira issue. Jira username : vallurisathish Regards Sathish Valluri
[jira] [Commented] (HIVE-7772) Add tests for order/sort/distribute/cluster by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106731#comment-14106731 ] Rui Li commented on HIVE-7772: -- Hi [~brocknoland], I tested other cases with latest code again but the error message I mentioned earlier still remains. These queries have a stage of {{Stats-Aggr Operator}}, which I think caused the error. It seems HIVE-7819 only avoids the exception but we still don't have a proper counter for spark task, so that {{CounterStatsAggregator.connect}} returns false and leads to the connection error I mentioned. I think we can include the tests in the patch for now and add more once spark counter is ready? Add tests for order/sort/distribute/cluster by query [Spark Branch] --- Key: HIVE-7772 URL: https://issues.apache.org/jira/browse/HIVE-7772 Project: Hive Issue Type: Test Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-7772-spark.patch Now that these queries are supported, we should have tests to catch any problems we may have. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7828) TestCLIDriver.parquet_join.q is failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106737#comment-14106737 ] Hive QA commented on HIVE-7828: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663418/HIVE-7828.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6115 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/456/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/456/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-456/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663418 TestCLIDriver.parquet_join.q is failing on trunk Key: HIVE-7828 URL: https://issues.apache.org/jira/browse/HIVE-7828 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7828.patch The test is failing in the HiveQA tests of late. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish updated HIVE-7850: -- Attachment: HIVE-7850.patch This patch fixes this issue,Since this feature we want to use in the next release of Hive. Requesting someone to look into this patch changes and merge to the main branch. Hive Query failed if the data type is arraystring with parquet files -- Key: HIVE-7850 URL: https://issues.apache.org/jira/browse/HIVE-7850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.14.0, 0.13.1 Reporter: Sathish Labels: parquet, serde Fix For: 0.14.0 Attachments: HIVE-7850.patch * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish updated HIVE-7850: -- Status: Patch Available (was: Open) Hive Query failed if the data type is arraystring with parquet files -- Key: HIVE-7850 URL: https://issues.apache.org/jira/browse/HIVE-7850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1, 0.14.0 Reporter: Sathish Labels: parquet, serde Fix For: 0.14.0 Attachments: HIVE-7850.patch * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106775#comment-14106775 ] Sathish commented on HIVE-7850: --- Can someone look into this issue and provide any comments or suggestions for this fix. Provided the patch and waiting for this patch to be merged to the main branch as this feature of Hive we want use in our next release. Hive Query failed if the data type is arraystring with parquet files -- Key: HIVE-7850 URL: https://issues.apache.org/jira/browse/HIVE-7850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.14.0, 0.13.1 Reporter: Sathish Labels: parquet, serde Fix For: 0.14.0 Attachments: HIVE-7850.patch * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.
[ https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106795#comment-14106795 ] Hive QA commented on HIVE-7100: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663513/HIVE-7100.4.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6118 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/457/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/457/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-457/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663513 Users of hive should be able to specify skipTrash when dropping tables. --- Key: HIVE-7100 URL: https://issues.apache.org/jira/browse/HIVE-7100 Project: Hive Issue Type: Improvement Affects Versions: 0.13.0 Reporter: Ravi Prakash Assignee: Jayesh Attachments: HIVE-7100.1.patch, HIVE-7100.2.patch, HIVE-7100.3.patch, HIVE-7100.4.patch, HIVE-7100.patch Users of our clusters are often running up against their quota limits because of Hive tables. When they drop tables, they have to then manually delete the files from HDFS using skipTrash. This is cumbersome and unnecessary. We should enable users to skipTrash directly when dropping tables. We should also be able to provide this functionality without polluting SQL syntax. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7840) Generated hive-default.xml.template mistakenly refers to property names as keys
[ https://issues.apache.org/jira/browse/HIVE-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106843#comment-14106843 ] Hive QA commented on HIVE-7840: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663537/HIVE-7840.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6115 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/458/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/458/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-458/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663537 Generated hive-default.xml.template mistakenly refers to property names as keys --- Key: HIVE-7840 URL: https://issues.apache.org/jira/browse/HIVE-7840 Project: Hive Issue Type: Bug Reporter: Wilbur Yang Assignee: Wilbur Yang Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7840.patch When Hive is built with Maven, the default template for hive-site.xml (hive/packaging/target/apache-hive-0.14.0-SNAPSHOT-bin/apache-hive-0.14.0-SNAPSHOT-bin/conf/hive-default.xml.template) uses the key tag as opposed to the correct name tag. If a user were to create a custom hive-site.xml using this template, then it results in a rather confusing situation in which Hive logs that it has loaded hive-site.xml, but in reality none of those properties are registering correctly. *Wrong:* {quote} configuration ... property keyhive.exec.script.wrapper/key value/ description/ /property ... {quote} *Right:* {quote} configuration ... property namehive.exec.script.wrapper/name value/ description/ /property ... {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106868#comment-14106868 ] Venki Korukanti commented on HIVE-7799: --- I think with the v2 patch we need unbounded memory as we store the results in Queue and sometime a single input record could generate more than one record (UDTF) or some operators (such as group by) flush after processing many input records. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106874#comment-14106874 ] Venki Korukanti commented on HIVE-7799: --- Let me look at the RowContainer and see if we can modify/extend it to support read/write like a queue with a persistent support. As far as I see we need persistent support. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7852) [CBO] Handle unary operators
Ashutosh Chauhan created HIVE-7852: -- Summary: [CBO] Handle unary operators Key: HIVE-7852 URL: https://issues.apache.org/jira/browse/HIVE-7852 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Currently, query like select c1 from t1 where c2 = -6; throws exception because cbo path confuses unary -ve with binary -ve -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7836) Ease-out denominator for multi-attribute join case in statistics annotation
[ https://issues.apache.org/jira/browse/HIVE-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106938#comment-14106938 ] Hive QA commented on HIVE-7836: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663533/HIVE-7836.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6115 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/459/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/459/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-459/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663533 Ease-out denominator for multi-attribute join case in statistics annotation --- Key: HIVE-7836 URL: https://issues.apache.org/jira/browse/HIVE-7836 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7836.1.patch In cases where number of relations involved in join is less than the number of join attributes the denominator of join rule can get larger resulting in aggressive row count estimation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7852) [CBO] Handle unary operators
[ https://issues.apache.org/jira/browse/HIVE-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7852: --- Attachment: h-7852.patch [CBO] Handle unary operators Key: HIVE-7852 URL: https://issues.apache.org/jira/browse/HIVE-7852 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: h-7852.patch Currently, query like select c1 from t1 where c2 = -6; throws exception because cbo path confuses unary -ve with binary -ve -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7852) [CBO] Handle unary operators
[ https://issues.apache.org/jira/browse/HIVE-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7852: --- Status: Patch Available (was: Open) [CBO] Handle unary operators Key: HIVE-7852 URL: https://issues.apache.org/jira/browse/HIVE-7852 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: h-7852.patch Currently, query like select c1 from t1 where c2 = -6; throws exception because cbo path confuses unary -ve with binary -ve -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 24981: Handle unary op.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24981/ --- Review request for hive and John Pullokkaran. Bugs: HIVE-7852 https://issues.apache.org/jira/browse/HIVE-7852 Repository: hive Description --- Handle unary op. Diffs - branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java 1619831 branches/cbo/ql/src/test/queries/clientpositive/cbo_correctness.q 1619831 branches/cbo/ql/src/test/results/clientpositive/cbo_correctness.q.out 1619831 Diff: https://reviews.apache.org/r/24981/diff/ Testing --- Added new test. Thanks, Ashutosh Chauhan
[jira] [Updated] (HIVE-7836) Ease-out denominator for multi-attribute join case in statistics annotation
[ https://issues.apache.org/jira/browse/HIVE-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7836: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Ease-out denominator for multi-attribute join case in statistics annotation --- Key: HIVE-7836 URL: https://issues.apache.org/jira/browse/HIVE-7836 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.14.0 Attachments: HIVE-7836.1.patch In cases where number of relations involved in join is less than the number of join attributes the denominator of join rule can get larger resulting in aggressive row count estimation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7736) improve the columns stats update speed for all the partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7736: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Pengcheng! improve the columns stats update speed for all the partitions of a table Key: HIVE-7736 URL: https://issues.apache.org/jira/browse/HIVE-7736 Project: Hive Issue Type: Improvement Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7736.0.patch, HIVE-7736.1.patch, HIVE-7736.2.patch, HIVE-7736.3.patch, HIVE-7736.4.patch The current implementation of columns stats update for all the partitions of a table takes a long time when there are thousands of partitions. For example, on a given cluster, it took 600+ seconds to update all the partitions' columns stats for a table with 2 columns but 2000 partitions. ANALYZE TABLE src_stat_part partition (partitionId) COMPUTE STATISTICS for columns; We would like to improve the columns stats update speed for all the partitions of a table -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7848) Refresh SparkContext when spark configuration changes
[ https://issues.apache.org/jira/browse/HIVE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106988#comment-14106988 ] Chinna Rao Lalam commented on HIVE-7848: Still 2nd review comment need to address. Refresh SparkContext when spark configuration changes - Key: HIVE-7848 URL: https://issues.apache.org/jira/browse/HIVE-7848 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: spark-branch Attachments: HIVE-7848-spark.patch Recreate the spark client if spark configurations are updated (through set command). -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24602: HIVE-7689 : Enable Postgres as METASTORE back-end
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24602/ --- (Updated août 22, 2014, 3:50 après-midi) Review request for hive. Bugs: HIVE-7689 https://issues.apache.org/jira/browse/HIVE-7689 Repository: hive-git Description --- I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable these features : * LOCKS on postgres metastore * COMPACTION on postgres metastore * TRANSACTION on postgres metastore * fix metastore update script for postgres Diffs (updated) - metastore/scripts/upgrade/postgres/hive-txn-schema-0.13.0.postgres.sql 2ebd3b0 metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java 524a7a4 metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java 06d8ac0 metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 063dee6 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 264052f ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java f636cff ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java db62721 ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java 4625d27 Diff: https://reviews.apache.org/r/24602/diff/ Testing --- Using patched version in production. Enable concurrency with DbTxnManager. Thanks, Damien Carol
[jira] [Updated] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-7689: --- Attachment: HIVE-7889.3.patch Rebased the patch and rewrited some stuffs. Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on metastore. -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 24986: HIVE-7553: decouple the auxiliary jar loading from hive server2 starting phase
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24986/ --- Review request for hive. Bugs: HIVE-7553 https://issues.apache.org/jira/browse/HIVE-7553 Repository: hive-git Description --- HIVE-7553: decouple the auxiliary jar loading from hive server2 starting phase Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 15bc0a33b556b4be7a0a1fa671e5ee9a2a553fee hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatUtil.java 93a03adeab7ba3c3c91344955d303e4252005239 hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatClient.java 7df84e997af4c626cf4fe92b22293c3165a5e6cc ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultFetchFormatter.java 5924bcf1f55dc4c2dd06f312f929047b7df9de55 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 0c6a3d44ef1f796778768421dc02f8bf3ede6a8c ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionTask.java bd45df1a401d1adb009e953d08205c7d5c2d5de2 ql/src/java/org/apache/hadoop/hive/ql/exec/ListSinkOperator.java dcc19f70644c561e17df8c8660ca62805465f1d6 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 4cf4522ace1932a0cd7f8203a98e69a35e8e9e8e ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java eb2851b2c5fa52e0f555b3d8d1beea5d1ac3b225 ql/src/java/org/apache/hadoop/hive/ql/hooks/HookUtils.java 3f474f846c7af5f1f65f1c14f3ce51308f1279d4 ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughOutputFormat.java 0962cadce0d515e046371d0a816f4efd70b8eef7 ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java 9051ba6d80e619ddbb6c27bb161e1e7a5cdb08a5 ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 44f6198b55594e1394e9a1556603fe1c730fb438 ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 2f13ac2e30195a25844a25e9ec8a7c42ed99b75c ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java 46bf55d64eef0e994c103f3f2a16a81753aa48cf ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java d86df453cd7686627940ade62c0fd72f1636dd0b ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java 0a1c660b4bbd46d8410e646270b23c99a4de8b7e ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 51838ae0b8abd4e040f180cad2375355fbfff621 ql/src/java/org/apache/hadoop/hive/ql/plan/AggregationDesc.java 17eeae1a3435fceb4b57325675c58b599e0973ea ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 930acbc98e81f8d421cee1170659d8b7a427fe7d ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 39f1793aaa5bed8a494883cac516ad314be951f4 ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorFactory.java 70c76b1ca50dd0540e57125eee9e4aa347d085d9 ql/src/java/org/apache/hadoop/hive/ql/processors/HiveCommand.java ae532f6e6b1a2e626043865b7bb502377455e7e1 ql/src/java/org/apache/hadoop/hive/ql/processors/RefreshProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java fcfcf4227610279090a867cdeeb36dbc3d13d902 ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java e247184b7d95c85fd3e12432e7eb75eb1e2a0b68 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBridge.java 959007a54b335bb0bdef0256f60e6cbc65798dc7 ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java ef0052f5763922d50986f127c416af5eaa6ae30d ql/src/test/resources/SessionStateTest-V1.jar PRE-CREATION ql/src/test/resources/SessionStateTest-V2.jar PRE-CREATION service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java b39d64dbec7b7d4beb622ce72c86ed1c8264f042 Diff: https://reviews.apache.org/r/24986/diff/ Testing --- Thanks, cheng xu
[jira] [Commented] (HIVE-7841) Case, When, Lead, Lag UDF is missing annotation
[ https://issues.apache.org/jira/browse/HIVE-7841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107054#comment-14107054 ] Hive QA commented on HIVE-7841: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663556/HIVE-7841.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6115 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_case org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_when org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/460/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/460/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-460/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663556 Case, When, Lead, Lag UDF is missing annotation --- Key: HIVE-7841 URL: https://issues.apache.org/jira/browse/HIVE-7841 Project: Hive Issue Type: Bug Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-7841.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7799: Status: Open (was: Patch Available) TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7654) A method to extrapolate columnStats for partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107096#comment-14107096 ] Ashutosh Chauhan commented on HIVE-7654: [~szehon], api returns # of partitions for which partitions were found in metastore. If user of api is not interested in extrapolated stats, than she can check for # of partitions requested # of partitions returned and if those two numbers arent equal, than can choose to ignore result of this api. Api for unaggregated stats already exists, which they can use if they arent interested in this extrapolation. A method to extrapolate columnStats for partitions of a table - Key: HIVE-7654 URL: https://issues.apache.org/jira/browse/HIVE-7654 Project: Hive Issue Type: New Feature Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch, HIVE-7654.8.patch In a PARTITIONED table, there are many partitions. For example, create table if not exists loc_orc ( state string, locid int, zip bigint ) partitioned by(year string) stored as orc; We assume there are 4 partitions, partition(year='2000'), partition(year='2001'), partition(year='2002') and partition(year='2003'). We can use the following command to compute statistics for columns state,locid of partition(year='2001') analyze table loc_orc partition(year='2001') compute statistics for columns state,locid; We need to know the “aggregated” column status for the whole table loc_orc. However, we may not have the column status for some partitions, e.g., partition(year='2002') and also we may not have the column status for some columns, e.g., zip bigint for partition(year='2001') We propose a method to extrapolate the missing column status for the partitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7663) OrcRecordUpdater needs to implement getStats
[ https://issues.apache.org/jira/browse/HIVE-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-7663: - Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Patch checked in. Thank you Owen for the review. OrcRecordUpdater needs to implement getStats Key: HIVE-7663 URL: https://issues.apache.org/jira/browse/HIVE-7663 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.14.0 Attachments: HIVE-7663.patch OrcRecordUpdater.getStats currently returns null. It needs to track the stats and return a valid value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107106#comment-14107106 ] Chengxiang Li commented on HIVE-7799: - Thanks,[~venki387], it seems i miss something here, group by does need a persistent storage support here. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7841) Case, When, Lead, Lag UDF is missing annotation
[ https://issues.apache.org/jira/browse/HIVE-7841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7841: --- Component/s: UDF Documentation Case, When, Lead, Lag UDF is missing annotation --- Key: HIVE-7841 URL: https://issues.apache.org/jira/browse/HIVE-7841 Project: Hive Issue Type: Bug Components: Documentation, UDF Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.14.0 Attachments: HIVE-7841.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7841) Case, When, Lead, Lag UDF is missing annotation
[ https://issues.apache.org/jira/browse/HIVE-7841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7841: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, John! Case, When, Lead, Lag UDF is missing annotation --- Key: HIVE-7841 URL: https://issues.apache.org/jira/browse/HIVE-7841 Project: Hive Issue Type: Bug Components: Documentation, UDF Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.14.0 Attachments: HIVE-7841.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7833) Remove unwanted allocation in ORC RunLengthIntegerWriterV2 determine encoding function
[ https://issues.apache.org/jira/browse/HIVE-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7833: - Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk Remove unwanted allocation in ORC RunLengthIntegerWriterV2 determine encoding function -- Key: HIVE-7833 URL: https://issues.apache.org/jira/browse/HIVE-7833 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.14.0 Attachments: HIVE-7833.1.patch, HIVE-7833.2.patch RunLengthIntegerWriterV2.determineEncoding() is used heavily. There are unwanted buffer allocation for every invocation of the function which are not required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107139#comment-14107139 ] Venki Korukanti commented on HIVE-7799: --- Whenever {{ResultIterator.hasNext()}} or {{ResultIterator.next()}} is called we first serve records from RowContainer until all records in RowContainer are consumed. If there are no more records in RowContainer then we clear RowContainer and call {{processNextRecord}} or {{closeRecordProcessor}} in {{ResultIterator.hasNext()}} to get the next output record(s). So we start adding records only when RowContainer is empty (or cleared). I am trying to understand how we got into the situation where we are trying to write after reading has started. One scenario I can think of is if Spark has two threads like in producer-consumer. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7654) A method to extrapolate columnStats for partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107145#comment-14107145 ] Szehon Ho commented on HIVE-7654: - Thanks for the detailed explanation. A method to extrapolate columnStats for partitions of a table - Key: HIVE-7654 URL: https://issues.apache.org/jira/browse/HIVE-7654 Project: Hive Issue Type: New Feature Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch, HIVE-7654.8.patch In a PARTITIONED table, there are many partitions. For example, create table if not exists loc_orc ( state string, locid int, zip bigint ) partitioned by(year string) stored as orc; We assume there are 4 partitions, partition(year='2000'), partition(year='2001'), partition(year='2002') and partition(year='2003'). We can use the following command to compute statistics for columns state,locid of partition(year='2001') analyze table loc_orc partition(year='2001') compute statistics for columns state,locid; We need to know the “aggregated” column status for the whole table loc_orc. However, we may not have the column status for some partitions, e.g., partition(year='2002') and also we may not have the column status for some columns, e.g., zip bigint for partition(year='2001') We propose a method to extrapolate the missing column status for the partitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24981: Handle unary op.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24981/#review51296 --- branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java https://reviews.apache.org/r/24981/#comment89437 It seems that only change needed is to add unuary plus, minus to builder table. registerFunction(++, SqlStdOperatorTable.UNARY_PLUS, hToken(HiveParser.PLUS, PLUS)); registerFunction(--, SqlStdOperatorTable.UNARY_MINUS, hToken(HiveParser.PLUS, MINUS)); - John Pullokkaran On Aug. 22, 2014, 3:16 p.m., Ashutosh Chauhan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24981/ --- (Updated Aug. 22, 2014, 3:16 p.m.) Review request for hive and John Pullokkaran. Bugs: HIVE-7852 https://issues.apache.org/jira/browse/HIVE-7852 Repository: hive Description --- Handle unary op. Diffs - branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java 1619831 branches/cbo/ql/src/test/queries/clientpositive/cbo_correctness.q 1619831 branches/cbo/ql/src/test/results/clientpositive/cbo_correctness.q.out 1619831 Diff: https://reviews.apache.org/r/24981/diff/ Testing --- Added new test. Thanks, Ashutosh Chauhan
[jira] [Updated] (HIVE-7654) A method to extrapolate columnStats for partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengcheng xiong updated HIVE-7654: -- Attachment: HIVE-7654.9.patch A method to extrapolate columnStats for partitions of a table - Key: HIVE-7654 URL: https://issues.apache.org/jira/browse/HIVE-7654 Project: Hive Issue Type: New Feature Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch, HIVE-7654.8.patch, HIVE-7654.9.patch In a PARTITIONED table, there are many partitions. For example, create table if not exists loc_orc ( state string, locid int, zip bigint ) partitioned by(year string) stored as orc; We assume there are 4 partitions, partition(year='2000'), partition(year='2001'), partition(year='2002') and partition(year='2003'). We can use the following command to compute statistics for columns state,locid of partition(year='2001') analyze table loc_orc partition(year='2001') compute statistics for columns state,locid; We need to know the “aggregated” column status for the whole table loc_orc. However, we may not have the column status for some partitions, e.g., partition(year='2002') and also we may not have the column status for some columns, e.g., zip bigint for partition(year='2001') We propose a method to extrapolate the missing column status for the partitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7654) A method to extrapolate columnStats for partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengcheng xiong updated HIVE-7654: -- Status: Patch Available (was: Open) address the partial test case error A method to extrapolate columnStats for partitions of a table - Key: HIVE-7654 URL: https://issues.apache.org/jira/browse/HIVE-7654 Project: Hive Issue Type: New Feature Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch, HIVE-7654.8.patch, HIVE-7654.9.patch In a PARTITIONED table, there are many partitions. For example, create table if not exists loc_orc ( state string, locid int, zip bigint ) partitioned by(year string) stored as orc; We assume there are 4 partitions, partition(year='2000'), partition(year='2001'), partition(year='2002') and partition(year='2003'). We can use the following command to compute statistics for columns state,locid of partition(year='2001') analyze table loc_orc partition(year='2001') compute statistics for columns state,locid; We need to know the “aggregated” column status for the whole table loc_orc. However, we may not have the column status for some partitions, e.g., partition(year='2002') and also we may not have the column status for some columns, e.g., zip bigint for partition(year='2001') We propose a method to extrapolate the missing column status for the partitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7654) A method to extrapolate columnStats for partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengcheng xiong updated HIVE-7654: -- Status: Open (was: Patch Available) A method to extrapolate columnStats for partitions of a table - Key: HIVE-7654 URL: https://issues.apache.org/jira/browse/HIVE-7654 Project: Hive Issue Type: New Feature Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch, HIVE-7654.8.patch, HIVE-7654.9.patch In a PARTITIONED table, there are many partitions. For example, create table if not exists loc_orc ( state string, locid int, zip bigint ) partitioned by(year string) stored as orc; We assume there are 4 partitions, partition(year='2000'), partition(year='2001'), partition(year='2002') and partition(year='2003'). We can use the following command to compute statistics for columns state,locid of partition(year='2001') analyze table loc_orc partition(year='2001') compute statistics for columns state,locid; We need to know the “aggregated” column status for the whole table loc_orc. However, we may not have the column status for some partitions, e.g., partition(year='2002') and also we may not have the column status for some columns, e.g., zip bigint for partition(year='2001') We propose a method to extrapolate the missing column status for the partitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24498: A method to extrapolate the missing column status for the partitions.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24498/ --- (Updated Aug. 22, 2014, 5:45 p.m.) Review request for hive. Changes --- address the partial test case error Repository: hive-git Description --- We propose a method to extrapolate the missing column status for the partitions. Diffs (updated) - data/files/extrapolate_stats_full.txt PRE-CREATION data/files/extrapolate_stats_partial.txt PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 9489949 metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 767cffc metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java a9f4be2 metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 0364385 metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 4eba2b0 metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 78ab19a ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 8100b39 ql/src/test/queries/clientpositive/extrapolate_part_stats_full.q PRE-CREATION ql/src/test/queries/clientpositive/extrapolate_part_stats_partial.q PRE-CREATION ql/src/test/results/clientpositive/annotate_stats_part.q.out 10993c3 ql/src/test/results/clientpositive/extrapolate_part_stats_full.q.out PRE-CREATION ql/src/test/results/clientpositive/extrapolate_part_stats_partial.q.out PRE-CREATION Diff: https://reviews.apache.org/r/24498/diff/ Testing --- File Attachments HIVE-7654.0.patch https://reviews.apache.org/media/uploaded/files/2014/08/12/77b155b0-a417-4225-b6b7-4c8c6ce2b97d__HIVE-7654.0.patch Thanks, pengcheng xiong
Re: Review Request 24981: Handle unary op.
On Aug. 22, 2014, 5:36 p.m., John Pullokkaran wrote: branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java, line 125 https://reviews.apache.org/r/24981/diff/1/?file=667267#file667267line125 It seems that only change needed is to add unuary plus, minus to builder table. registerFunction(++, SqlStdOperatorTable.UNARY_PLUS, hToken(HiveParser.PLUS, PLUS)); registerFunction(--, SqlStdOperatorTable.UNARY_MINUS, hToken(HiveParser.PLUS, MINUS)); I am not sure how this can work, since in both map and reverse lookup map, + is overloaded for both unary binary (+) operators. What you suggested can work only if we change annotation description of GenericUDFOpNegative in Hive UnaryMinus in Optiq from - to --. - Ashutosh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24981/#review51296 --- On Aug. 22, 2014, 3:16 p.m., Ashutosh Chauhan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24981/ --- (Updated Aug. 22, 2014, 3:16 p.m.) Review request for hive and John Pullokkaran. Bugs: HIVE-7852 https://issues.apache.org/jira/browse/HIVE-7852 Repository: hive Description --- Handle unary op. Diffs - branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java 1619831 branches/cbo/ql/src/test/queries/clientpositive/cbo_correctness.q 1619831 branches/cbo/ql/src/test/results/clientpositive/cbo_correctness.q.out 1619831 Diff: https://reviews.apache.org/r/24981/diff/ Testing --- Added new test. Thanks, Ashutosh Chauhan
[jira] [Updated] (HIVE-7828) TestCLIDriver.parquet_join.q is failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7828: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Thank you very much for the review Alan! I have committed this to trunk! TestCLIDriver.parquet_join.q is failing on trunk Key: HIVE-7828 URL: https://issues.apache.org/jira/browse/HIVE-7828 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.14.0 Attachments: HIVE-7828.patch The test is failing in the HiveQA tests of late. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7598) Potential null pointer dereference in MergeTask#closeJob()
[ https://issues.apache.org/jira/browse/HIVE-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107185#comment-14107185 ] Hive QA commented on HIVE-7598: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663564/HIVE-7598.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6116 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/461/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/461/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-461/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663564 Potential null pointer dereference in MergeTask#closeJob() -- Key: HIVE-7598 URL: https://issues.apache.org/jira/browse/HIVE-7598 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: SUYEON LEE Priority: Minor Attachments: HIVE-7598.patch Call to Utilities.mvFileToFinalPath() passes null as second last parameter, conf. null gets passed to createEmptyBuckets() which dereferences conf directly: {code} boolean isCompressed = conf.getCompressed(); TableDesc tableInfo = conf.getTableInfo(); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7848) Refresh SparkContext when spark configuration changes
[ https://issues.apache.org/jira/browse/HIVE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-7848: --- Attachment: HIVE-7848.1-spark.patch Addressed both the comments. Refresh SparkContext when spark configuration changes - Key: HIVE-7848 URL: https://issues.apache.org/jira/browse/HIVE-7848 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: spark-branch Attachments: HIVE-7848-spark.patch, HIVE-7848.1-spark.patch Recreate the spark client if spark configurations are updated (through set command). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107200#comment-14107200 ] Szehon Ho commented on HIVE-7850: - Hi Satish, can you please fix the formatting? Indents are 2 spaces (hive code is like that), and put a space after the comma, etc. Otherwise it looks good to me. But granted, I'm not an expert of parquet schema, so my only question is that it compatible with other tools? + [~jcoffey], [~rdblue] for comments (if any). Hive Query failed if the data type is arraystring with parquet files -- Key: HIVE-7850 URL: https://issues.apache.org/jira/browse/HIVE-7850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.14.0, 0.13.1 Reporter: Sathish Labels: parquet, serde Fix For: 0.14.0 Attachments: HIVE-7850.patch * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7848) Refresh SparkContext when spark configuration changes
[ https://issues.apache.org/jira/browse/HIVE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-7848: --- Status: Patch Available (was: Open) Refresh SparkContext when spark configuration changes - Key: HIVE-7848 URL: https://issues.apache.org/jira/browse/HIVE-7848 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: spark-branch Attachments: HIVE-7848-spark.patch, HIVE-7848.1-spark.patch Recreate the spark client if spark configurations are updated (through set command). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107206#comment-14107206 ] Brock Noland commented on HIVE-4629: Hi Dong, The latest patch fails to apply to HEAD and will need to be rebased. bq. the latest patch still does not fulfill the comments about backward compatibility. I am very sorry for the confusion, I believe the patch *does* meet the backward compatibility requirement. bq. For client and service layer interface ICLIService, although it is not RPC and is not a public API of Hive, I think making it follow the single request/response struct mode is also good. Will make the new fetchResults method follow the single request/response struct model. Then remove those old fetchResults methods. I do not feel this is required. The current patch works exactly as we requested! Thank you very much Dong!! HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Shreepadma Venugopalan Assignee: Dong Chen Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, HIVE-4629.2.patch, HIVE-4629.3.patch.txt, HIVE-4629.4.patch, HIVE-4629.5.patch, HIVE-4629.6.patch, HIVE-4629.7.patch HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7840) Generated hive-default.xml.template mistakenly refers to property names as keys
[ https://issues.apache.org/jira/browse/HIVE-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7840: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thank you so much for this fix!! I have committed this to trunk! Generated hive-default.xml.template mistakenly refers to property names as keys --- Key: HIVE-7840 URL: https://issues.apache.org/jira/browse/HIVE-7840 Project: Hive Issue Type: Bug Reporter: Wilbur Yang Assignee: Wilbur Yang Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7840.patch When Hive is built with Maven, the default template for hive-site.xml (hive/packaging/target/apache-hive-0.14.0-SNAPSHOT-bin/apache-hive-0.14.0-SNAPSHOT-bin/conf/hive-default.xml.template) uses the key tag as opposed to the correct name tag. If a user were to create a custom hive-site.xml using this template, then it results in a rather confusing situation in which Hive logs that it has loaded hive-site.xml, but in reality none of those properties are registering correctly. *Wrong:* {quote} configuration ... property keyhive.exec.script.wrapper/key value/ description/ /property ... {quote} *Right:* {quote} configuration ... property namehive.exec.script.wrapper/name value/ description/ /property ... {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)