Review Request 24962: HIVE-7730: Extend ReadEntity to add accessed columns from query

2014-08-22 Thread Xiaomeng Huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24962/
---

Review request for hive, Prasad Mujumdar and Szehon Ho.


Repository: hive-git


Description
---

External authorization model can not get accessed columns from query. Hive 
should store accessed columns to ReadEntity 


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java 7ed50b4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b05d3b4 

Diff: https://reviews.apache.org/r/24962/diff/


Testing
---


Thanks,

Xiaomeng Huang



[jira] [Created] (HIVE-7847) query orc partitioned table fail when table column type change

2014-08-22 Thread Zhichun Wu (JIRA)
Zhichun Wu created HIVE-7847:


 Summary: query orc partitioned table fail when table column type 
change
 Key: HIVE-7847
 URL: https://issues.apache.org/jira/browse/HIVE-7847
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.13.0, 0.12.0, 0.11.0
Reporter: Zhichun Wu
Assignee: Zhichun Wu
 Fix For: 0.14.0


I use the following script to test orc column type change with partitioned 
table on branch-0.13:

{code}
use test;
DROP TABLE if exists orc_change_type_staging;
DROP TABLE if exists orc_change_type;
CREATE TABLE orc_change_type_staging (
id int
);
CREATE TABLE orc_change_type (
id int
) PARTITIONED BY (`dt` string)
stored as orc;
--- load staging table
LOAD DATA LOCAL INPATH '../hive/examples/files/int.txt' OVERWRITE INTO TABLE 
orc_change_type_staging;
--- populate orc hive table
INSERT OVERWRITE TABLE orc_change_type partition(dt='20140718') select * FROM 
orc_change_type_staging limit 1;
--- change column id from int to bigint
ALTER TABLE orc_change_type CHANGE id id bigint;
INSERT OVERWRITE TABLE orc_change_type partition(dt='20140719') select * FROM 
orc_change_type_staging limit 1;
SELECT id FROM orc_change_type where dt between '20140718' and '20140719';
{code}

if fails in the last query SELECT id FROM orc_change_type where dt between 
'20140718' and '20140719'; with exception:
{code}
Error: java.io.IOException: java.io.IOException: java.lang.ClassCastException: 
org.apache.hadoop.io.IntWritable cannot be cast to 
org.apache.hadoop.io.LongWritable
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:256)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.io.IOException: java.lang.ClassCastException: 
org.apache.hadoop.io.IntWritable cannot be cast to 
org.apache.hadoop.io.LongWritable
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344)
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254)
... 11 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable 
cannot be cast to org.apache.hadoop.io.LongWritable
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:717)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1788)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2997)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:153)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:127)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339)
... 15 more
{code}

The value object would be reused each time we deserialize the row,  it will 
fail when we start to process the next path with different schema.  Resetting 
value each time we finish reading one path would solve this problem.






[jira] [Commented] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-22 Thread Xiaomeng Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106515#comment-14106515
 ] 

Xiaomeng Huang commented on HIVE-7730:
--

Thanks [~szehon] I have linked to review board.

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, outputs);
 // TODO: 
 // after compile, we can put accessed column list to ReadEntity getting 
 from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7847) query orc partitioned table fail when table column type change

2014-08-22 Thread Zhichun Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichun Wu updated HIVE-7847:
-

Attachment: HIVE-7847.1.patch

 query orc partitioned table fail when table column type change
 --

 Key: HIVE-7847
 URL: https://issues.apache.org/jira/browse/HIVE-7847
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.11.0, 0.12.0, 0.13.0
Reporter: Zhichun Wu
Assignee: Zhichun Wu
 Fix For: 0.14.0

 Attachments: HIVE-7847.1.patch


 I use the following script to test orc column type change with partitioned 
 table on branch-0.13:
 {code}
 use test;
 DROP TABLE if exists orc_change_type_staging;
 DROP TABLE if exists orc_change_type;
 CREATE TABLE orc_change_type_staging (
 id int
 );
 CREATE TABLE orc_change_type (
 id int
 ) PARTITIONED BY (`dt` string)
 stored as orc;
 --- load staging table
 LOAD DATA LOCAL INPATH '../hive/examples/files/int.txt' OVERWRITE INTO TABLE 
 orc_change_type_staging;
 --- populate orc hive table
 INSERT OVERWRITE TABLE orc_change_type partition(dt='20140718') select * FROM 
 orc_change_type_staging limit 1;
 --- change column id from int to bigint
 ALTER TABLE orc_change_type CHANGE id id bigint;
 INSERT OVERWRITE TABLE orc_change_type partition(dt='20140719') select * FROM 
 orc_change_type_staging limit 1;
 SELECT id FROM orc_change_type where dt between '20140718' and '20140719';
 {code}
 if fails in the last query SELECT id FROM orc_change_type where dt between 
 '20140718' and '20140719'; with exception:
 {code}
 Error: java.io.IOException: java.io.IOException: 
 java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast 
 to org.apache.hadoop.io.LongWritable
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:256)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
 Caused by: java.io.IOException: java.lang.ClassCastException: 
 org.apache.hadoop.io.IntWritable cannot be cast to 
 org.apache.hadoop.io.LongWritable
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254)
 ... 11 more
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable 
 cannot be cast to org.apache.hadoop.io.LongWritable
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:717)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1788)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2997)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:153)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:127)
 at 
 

[jira] [Updated] (HIVE-7847) query orc partitioned table fail when table column type change

2014-08-22 Thread Zhichun Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhichun Wu updated HIVE-7847:
-

Status: Patch Available  (was: Open)

 query orc partitioned table fail when table column type change
 --

 Key: HIVE-7847
 URL: https://issues.apache.org/jira/browse/HIVE-7847
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.13.0, 0.12.0, 0.11.0
Reporter: Zhichun Wu
Assignee: Zhichun Wu
 Fix For: 0.14.0

 Attachments: HIVE-7847.1.patch


 I use the following script to test orc column type change with partitioned 
 table on branch-0.13:
 {code}
 use test;
 DROP TABLE if exists orc_change_type_staging;
 DROP TABLE if exists orc_change_type;
 CREATE TABLE orc_change_type_staging (
 id int
 );
 CREATE TABLE orc_change_type (
 id int
 ) PARTITIONED BY (`dt` string)
 stored as orc;
 --- load staging table
 LOAD DATA LOCAL INPATH '../hive/examples/files/int.txt' OVERWRITE INTO TABLE 
 orc_change_type_staging;
 --- populate orc hive table
 INSERT OVERWRITE TABLE orc_change_type partition(dt='20140718') select * FROM 
 orc_change_type_staging limit 1;
 --- change column id from int to bigint
 ALTER TABLE orc_change_type CHANGE id id bigint;
 INSERT OVERWRITE TABLE orc_change_type partition(dt='20140719') select * FROM 
 orc_change_type_staging limit 1;
 SELECT id FROM orc_change_type where dt between '20140718' and '20140719';
 {code}
 if fails in the last query SELECT id FROM orc_change_type where dt between 
 '20140718' and '20140719'; with exception:
 {code}
 Error: java.io.IOException: java.io.IOException: 
 java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast 
 to org.apache.hadoop.io.LongWritable
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:256)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
 Caused by: java.io.IOException: java.lang.ClassCastException: 
 org.apache.hadoop.io.IntWritable cannot be cast to 
 org.apache.hadoop.io.LongWritable
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254)
 ... 11 more
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable 
 cannot be cast to org.apache.hadoop.io.LongWritable
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:717)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1788)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2997)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:153)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:127)
 at 
 

Re: Review Request 24962: HIVE-7730: Extend ReadEntity to add accessed columns from query

2014-08-22 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24962/#review51257
---


Hi Xiaomeng, patch looks good, just had some style comments.


ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java
https://reviews.apache.org/r/24962/#comment89359

Can we make this final, and not have a setter?  The caller can just add to 
the list.  It'll make the code a bit simpler.

Also should it be set?



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
https://reviews.apache.org/r/24962/#comment89360

No need for '==true' part.



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
https://reviews.apache.org/r/24962/#comment89362

Can we indent this code block inside {}?


- Szehon Ho


On Aug. 22, 2014, 6:01 a.m., Xiaomeng Huang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24962/
 ---
 
 (Updated Aug. 22, 2014, 6:01 a.m.)
 
 
 Review request for hive, Prasad Mujumdar and Szehon Ho.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 External authorization model can not get accessed columns from query. Hive 
 should store accessed columns to ReadEntity 
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java 7ed50b4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b05d3b4 
 
 Diff: https://reviews.apache.org/r/24962/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Xiaomeng Huang
 




[jira] [Commented] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-22 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106523#comment-14106523
 ] 

Szehon Ho commented on HIVE-7730:
-

Thanks Xiaomeng, patch looks good overall, I put some minor comments on rb.

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, outputs);
 // TODO: 
 // after compile, we can put accessed column list to ReadEntity getting 
 from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7384) Research into reduce-side join [Spark Branch]

2014-08-22 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106532#comment-14106532
 ] 

Szehon Ho commented on HIVE-7384:
-

Thanks [~lianhuiwang] for the information.

 Research into reduce-side join [Spark Branch]
 -

 Key: HIVE-7384
 URL: https://issues.apache.org/jira/browse/HIVE-7384
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Szehon Ho
 Attachments: Hive on Spark Reduce Side Join.docx, sales_items.txt, 
 sales_products.txt, sales_stores.txt


 Hive's join operator is very sophisticated, especially for reduce-side join. 
 While we expect that other types of join, such as map-side join and SMB 
 map-side join, will work out of the box with our design, there may be some 
 complication in reduce-side join, which extensively utilizes key tag and 
 shuffle behavior. Our design principle prefers to making Hive implementation 
 work out of box also, which might requires new functionality from Spark. The 
 tasks is to research into this area, identifying requirements for Spark 
 community and the work to be done on Hive to make reduce-side join work.
 A design doc might be needed for this. For more information, please refer to 
 the overall design doc on wiki.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7654) A method to extrapolate columnStats for partitions of a table

2014-08-22 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106528#comment-14106528
 ] 

Szehon Ho commented on HIVE-7654:
-

Sorry for maybe a dumb question, but was curious is it a typical behavior to 
extrapolate in all cases?  I can see it would be a good approx in some case, 
but would it ever be undesirable in some cases?

 A method to extrapolate columnStats for partitions of a table
 -

 Key: HIVE-7654
 URL: https://issues.apache.org/jira/browse/HIVE-7654
 Project: Hive
  Issue Type: New Feature
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, 
 HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch, 
 HIVE-7654.8.patch


 In a PARTITIONED table, there are many partitions. For example, 
 create table if not exists loc_orc (
   state string,
   locid int,
   zip bigint
 ) partitioned by(year string) stored as orc;
 We assume there are 4 partitions, partition(year='2000'), 
 partition(year='2001'), partition(year='2002') and partition(year='2003').
 We can use the following command to compute statistics for columns 
 state,locid of partition(year='2001')
 analyze table loc_orc partition(year='2001') compute statistics for columns 
 state,locid;
 We need to know the “aggregated” column status for the whole table loc_orc. 
 However, we may not have the column status for some partitions, e.g., 
 partition(year='2002') and also we may not have the column status for some 
 columns, e.g., zip bigint for partition(year='2001')
 We propose a method to extrapolate the missing column status for the 
 partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Timeline for release of Hive 0.14

2014-08-22 Thread Alan Gates
+1, Eugene and I are working on getting HIVE-5317 (insert, update, 
delete) done and would like to get it in.


Alan.


Nick Dimiduk mailto:ndimi...@gmail.com
August 20, 2014 at 12:27
It'd be great to get HIVE-4765 included in 0.14. The proposed changes 
are a

big improvement for us HBase folks. Would someone mind having a look in
that direction?

Thanks,
Nick



Thejas Nair mailto:the...@hortonworks.com
August 19, 2014 at 15:20
+1
Sounds good to me.
Its already almost 4 months since the last release. It is time to
start preparing for the next one.
Thanks for volunteering!


Vikram Dixit mailto:vik...@hortonworks.com
August 19, 2014 at 14:02
Hi Folks,

I was thinking that it was about time that we had a release of hive 0.14
given our commitment to having a release of hive on a periodic basis. We
could cut a branch and start working on a release in say 2 weeks time
around September 5th (Friday). After branching, we can focus on 
stabilizing

for the release and hopefully have an RC in about 2 weeks post that. I
would like to volunteer myself for the duties of the release manager for
this version if the community agrees.

Thanks
Vikram.



--
Sent with Postbox http://www.getpostbox.com

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]

2014-08-22 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106542#comment-14106542
 ] 

Chinna Rao Lalam commented on HIVE-7702:


Hi [~brocknoland],

Compare against MR most of the times differences are due to sorting order only.





 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7702-spark.patch, HIVE-7702.1-spark.patch


 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7821) StarterProject: enable groupby4.q

2014-08-22 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106546#comment-14106546
 ] 

Chinna Rao Lalam commented on HIVE-7821:


Hi [~brocknoland],

I don't know you have created for suhas. I am handling group by queries in the 
previous jira so i assigned my self to avoid duplicate work. 
I don't mind let Suhas can work on this.


 StarterProject: enable groupby4.q
 -

 Key: HIVE-7821
 URL: https://issues.apache.org/jira/browse/HIVE-7821
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Suhas Satish





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-7794) Enable tests on Spark branch (4) [Sparch Branch]

2014-08-22 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam reassigned HIVE-7794:
--

Assignee: Chinna Rao Lalam

 Enable tests on Spark branch (4) [Sparch Branch]
 

 Key: HIVE-7794
 URL: https://issues.apache.org/jira/browse/HIVE-7794
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Chinna Rao Lalam

 This jira is to enable *most* of the tests below. If tests don't pass because 
 of some unsupported feature, ensure that a JIRA exists and move on.
 {noformat}
   vector_cast_constant.q,\
   vector_data_types.q,\
   vector_decimal_aggregate.q,\
   vector_left_outer_join.q,\
   vector_string_concat.q,\
   vectorization_12.q,\
   vectorization_13.q,\
   vectorization_14.q,\
   vectorization_15.q,\
   vectorization_9.q,\
   vectorization_part_project.q,\
   vectorization_short_regress.q,\
   vectorized_mapjoin.q,\
   vectorized_nested_mapjoin.q,\
   vectorized_ptf.q,\
   vectorized_shufflejoin.q,\
   vectorized_timestamp_funcs.q
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24962: HIVE-7730: Extend ReadEntity to add accessed columns from query

2014-08-22 Thread Xiaomeng Huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24962/
---

(Updated Aug. 22, 2014, 6:47 a.m.)


Review request for hive, Prasad Mujumdar and Szehon Ho.


Repository: hive-git


Description
---

External authorization model can not get accessed columns from query. Hive 
should store accessed columns to ReadEntity 


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java 7ed50b4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b05d3b4 

Diff: https://reviews.apache.org/r/24962/diff/


Testing
---


Thanks,

Xiaomeng Huang



Re: Review Request 24962: HIVE-7730: Extend ReadEntity to add accessed columns from query

2014-08-22 Thread Xiaomeng Huang


 On Aug. 22, 2014, 6:14 a.m., Szehon Ho wrote:
  ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java, line 54
  https://reviews.apache.org/r/24962/diff/1/?file=666753#file666753line54
 
  Can we make this final, and not have a setter?  The caller can just add 
  to the list.  It'll make the code a bit simpler.
  
  Also should it be set?

Thanks, I think it better to be list. I get accessed columns from 
tableToColumnAccessMap, which is a MapString, ListString. Hive's native 
authorization is use this list too.
I get the column list via a table name, then set it to readEntity directly, 
don't need to add every one with a loop. so it is necessary to have a setter.
BTW, I can also to add a API addAccessedColumn(String column) to add one column 
to this column list.


 On Aug. 22, 2014, 6:14 a.m., Szehon Ho wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 9521
  https://reviews.apache.org/r/24962/diff/1/?file=666754#file666754line9521
 
  No need for '==true' part.

fixed. Thanks.


 On Aug. 22, 2014, 6:14 a.m., Szehon Ho wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 9539
  https://reviews.apache.org/r/24962/diff/1/?file=666754#file666754line9539
 
  Can we indent this code block inside {}?

fixed. thanks.


- Xiaomeng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24962/#review51257
---


On Aug. 22, 2014, 6:47 a.m., Xiaomeng Huang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24962/
 ---
 
 (Updated Aug. 22, 2014, 6:47 a.m.)
 
 
 Review request for hive, Prasad Mujumdar and Szehon Ho.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 External authorization model can not get accessed columns from query. Hive 
 should store accessed columns to ReadEntity 
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java 7ed50b4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b05d3b4 
 
 Diff: https://reviews.apache.org/r/24962/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Xiaomeng Huang
 




[jira] [Commented] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes

2014-08-22 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106554#comment-14106554
 ] 

Gopal V commented on HIVE-7832:
---

Minor refactoring comments on RB.

LGTM +1, pending tests pass.

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7832.1.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6245) HS2 creates DBs/Tables with wrong ownership when HMS setugi is true

2014-08-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106558#comment-14106558
 ] 

Hive QA commented on HIVE-6245:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663248/HIVE-6245.4.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6116 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/451/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/451/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-451/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663248

 HS2 creates DBs/Tables with wrong ownership when HMS setugi is true
 ---

 Key: HIVE-6245
 URL: https://issues.apache.org/jira/browse/HIVE-6245
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0, 0.13.0
Reporter: Chaoyu Tang
Assignee: Venki Korukanti
 Attachments: HIVE-6245.2.patch.txt, HIVE-6245.3.patch.txt, 
 HIVE-6245.4.patch, HIVE-6245.patch


 The case with following settings is valid but does not work correctly in 
 current HS2:
 ==
 hive.server2.authentication=NONE (or LDAP)
 hive.server2.enable.doAs= true
 hive.metastore.sasl.enabled=false
 hive.metastore.execute.setugi=true
 ==
 Ideally, HS2 is able to impersonate the logged in user (from Beeline, or JDBC 
 application) and create DBs/Tables with user's ownership.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Timeline for release of Hive 0.14

2014-08-22 Thread Lefty Leverenz
Release 0.14 should include HIVE-6586
https://issues.apache.org/jira/browse/HIVE-6586 (various fixes to
HiveConf.java parameters).  I'll do that as soon as possible.

72 jiras have the TODOC14 label now, although my own tally is 99.  This is
more than mere mortals can accomplish in a few weeks.  Therefore I
recommend that you all plead with your managers to allocate some
tech-writer resources to Hive wikidocs for the 0.14.0 release.

I'll send out a state-of-the-docs message in a separate thread.

-- Lefty


On Fri, Aug 22, 2014 at 2:28 AM, Alan Gates ga...@hortonworks.com wrote:

 +1, Eugene and I are working on getting HIVE-5317 (insert, update, delete)
 done and would like to get it in.

 Alan.

   Nick Dimiduk ndimi...@gmail.com
  August 20, 2014 at 12:27
 It'd be great to get HIVE-4765 included in 0.14. The proposed changes are a
 big improvement for us HBase folks. Would someone mind having a look in
 that direction?

 Thanks,
 Nick



   Thejas Nair the...@hortonworks.com
  August 19, 2014 at 15:20
 +1
 Sounds good to me.
 Its already almost 4 months since the last release. It is time to
 start preparing for the next one.
 Thanks for volunteering!


   Vikram Dixit vik...@hortonworks.com
  August 19, 2014 at 14:02
 Hi Folks,

 I was thinking that it was about time that we had a release of hive 0.14
 given our commitment to having a release of hive on a periodic basis. We
 could cut a branch and start working on a release in say 2 weeks time
 around September 5th (Friday). After branching, we can focus on stabilizing
 for the release and hopefully have an RC in about 2 weeks post that. I
 would like to volunteer myself for the duties of the release manager for
 this version if the community agrees.

 Thanks
 Vikram.


 --
 Sent with Postbox http://www.getpostbox.com

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes

2014-08-22 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7832:
-

Attachment: HIVE-7832.2.patch

Addressed Gopal's review comment.

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup

2014-08-22 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6847:
---

Attachment: HIVE-6847.3.patch

 Improve / fix bugs in Hive scratch dir setup
 

 Key: HIVE-6847
 URL: https://issues.apache.org/jira/browse/HIVE-6847
 Project: Hive
  Issue Type: Bug
  Components: CLI, HiveServer2
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-6847.1.patch, HIVE-6847.2.patch, HIVE-6847.3.patch


 Currently, the hive server creates scratch directory and changes permission 
 to 777 however, this is not great with respect to security. We need to create 
 user specific scratch directories instead. Also refer to HIVE-6782 1st 
 iteration of the patch for approach.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup

2014-08-22 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6847:
---

Status: Patch Available  (was: Open)

Fixes test failures

 Improve / fix bugs in Hive scratch dir setup
 

 Key: HIVE-6847
 URL: https://issues.apache.org/jira/browse/HIVE-6847
 Project: Hive
  Issue Type: Bug
  Components: CLI, HiveServer2
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-6847.1.patch, HIVE-6847.2.patch, HIVE-6847.3.patch


 Currently, the hive server creates scratch directory and changes permission 
 to 777 however, this is not great with respect to security. We need to create 
 user specific scratch directories instead. Also refer to HIVE-6782 1st 
 iteration of the patch for approach.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup

2014-08-22 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6847:
---

Status: Open  (was: Patch Available)

 Improve / fix bugs in Hive scratch dir setup
 

 Key: HIVE-6847
 URL: https://issues.apache.org/jira/browse/HIVE-6847
 Project: Hive
  Issue Type: Bug
  Components: CLI, HiveServer2
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-6847.1.patch, HIVE-6847.2.patch, HIVE-6847.3.patch


 Currently, the hive server creates scratch directory and changes permission 
 to 777 however, this is not great with respect to security. We need to create 
 user specific scratch directories instead. Also refer to HIVE-6782 1st 
 iteration of the patch for approach.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


State of the docs

2014-08-22 Thread Lefty Leverenz
The backlog of Hive wikidoc tasks is large, and keeps on growing.

Jiras that need documentation:

   25 for releases 0.10, 0.11, and 0.12  (only 17 have TODOC labels)
   37 for release 0.13  (only 25 have TODOC13 label)
   99 for release 0.14  (only 72 have TODOC14 label)

Also:

 5 doc tasks not associated with jiras
   36 wish-list tasks (clarifications, new docs, improvements)
 ~10 tasks or projects not associated with email

These numbers are probably inaccurate but they give the general idea.

Lately I've been making progress at a rate of 1 or 2 per day.  I could do
more if I stopped monitoring the mailing lists, but then (a) we'd miss a
fair number of doc tasks, and (b) at best I might manage 4 a day.

Some doc tasks can and should be done by developers, but IMHO the bulk of
these tasks should be handled by tech writers.  My attempts to recruit more
volunteers have failed so far, although I'll keep trying.  Can we get some
corporate support for the Hive wiki?

-- Lefty


[jira] [Commented] (HIVE-7222) Support timestamp column statistics in ORC and extend PPD for timestamp

2014-08-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106613#comment-14106613
 ] 

Hive QA commented on HIVE-7222:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663459/HIVE-7222.1.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 6116 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_part_project
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_timestamp_funcs
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/452/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/452/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-452/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663459

 Support timestamp column statistics in ORC and extend PPD for timestamp
 ---

 Key: HIVE-7222
 URL: https://issues.apache.org/jira/browse/HIVE-7222
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Daniel Dai
  Labels: orcfile
 Attachments: HIVE-7222-1.patch, HIVE-7222.1.patch


 Add column statistics for timestamp columns in ORC. Also extend predicate 
 pushdown to support timestamp column evaluation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-3421) Column Level Top K Values Statistics

2014-08-22 Thread wangmeng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106624#comment-14106624
 ] 

wangmeng commented on HIVE-3421:


this is very useful!!!  I  am  waiting  the  coming  version

 Column Level Top K Values Statistics
 

 Key: HIVE-3421
 URL: https://issues.apache.org/jira/browse/HIVE-3421
 Project: Hive
  Issue Type: New Feature
Reporter: Feng Lu
Assignee: Feng Lu
 Attachments: HIVE-3421.patch.1.txt, HIVE-3421.patch.2.txt, 
 HIVE-3421.patch.3.txt, HIVE-3421.patch.4.txt, HIVE-3421.patch.5.txt, 
 HIVE-3421.patch.6.txt, HIVE-3421.patch.7.txt, HIVE-3421.patch.8.txt, 
 HIVE-3421.patch.9.txt, HIVE-3421.patch.txt


 Compute (estimate) top k values statistics for each column, and put the most 
 skewed column into skewed info, if user hasn't specified skew.
 This feature depends on ListBucketing (create table skewed on) 
 https://cwiki.apache.org/Hive/listbucketing.html.
 All column topk can be added to skewed info, if in the future skewed info 
 supports multiple independent columns.
 The TopK algorithm is based on this paper:
 http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7848) Refresh SparkContext when spark configuration changes

2014-08-22 Thread Chinna Rao Lalam (JIRA)
Chinna Rao Lalam created HIVE-7848:
--

 Summary: Refresh SparkContext when spark configuration changes
 Key: HIVE-7848
 URL: https://issues.apache.org/jira/browse/HIVE-7848
 Project: Hive
  Issue Type: Sub-task
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7848) Refresh SparkContext when spark configuration changes

2014-08-22 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-7848:
---

Description: Recreate the spark client if spark configurations are updated 
(through set command).

 Refresh SparkContext when spark configuration changes
 -

 Key: HIVE-7848
 URL: https://issues.apache.org/jira/browse/HIVE-7848
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: spark-branch


 Recreate the spark client if spark configurations are updated (through set 
 command).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7848) Refresh SparkContext when spark configuration changes

2014-08-22 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-7848:
---

Fix Version/s: spark-branch

 Refresh SparkContext when spark configuration changes
 -

 Key: HIVE-7848
 URL: https://issues.apache.org/jira/browse/HIVE-7848
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: spark-branch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: State of the docs

2014-08-22 Thread Damien Carol

I seen few inaccuracies in the wiki about some properties.

Just few questions, how to report it? To who?

For example, this page (1) say that property 
hive.fetch.task.conversion has default value of  minimal.

But that's wrong.

(1) 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution


Regards,

Damien CAROL

 * tél : +33 (0)4 74 96 88 14
 * fax : +33 (0)4 74 96 31 88
 * email :dca...@blitzbs.com mailto:dca...@blitzbs.com

BLITZ BUSINESS SERVICE

Le 22/08/2014 09:50, Lefty Leverenz a écrit :

The backlog of Hive wikidoc tasks is large, and keeps on growing.

Jiras that need documentation:

25 for releases 0.10, 0.11, and 0.12  (only 17 have TODOC labels)
37 for release 0.13  (only 25 have TODOC13 label)
99 for release 0.14  (only 72 have TODOC14 label)

Also:

  5 doc tasks not associated with jiras
36 wish-list tasks (clarifications, new docs, improvements)
  ~10 tasks or projects not associated with email

These numbers are probably inaccurate but they give the general idea.

Lately I've been making progress at a rate of 1 or 2 per day.  I could do
more if I stopped monitoring the mailing lists, but then (a) we'd miss a
fair number of doc tasks, and (b) at best I might manage 4 a day.

Some doc tasks can and should be done by developers, but IMHO the bulk of
these tasks should be handled by tech writers.  My attempts to recruit more
volunteers have failed so far, although I'll keep trying.  Can we get some
corporate support for the Hive wiki?

-- Lefty





[jira] [Updated] (HIVE-6987) Metastore qop settings won't work with Hadoop-2.4

2014-08-22 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-6987:
-

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

 Metastore qop settings won't work with Hadoop-2.4
 -

 Key: HIVE-6987
 URL: https://issues.apache.org/jira/browse/HIVE-6987
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
  Labels: patch
 Fix For: 0.14.0

 Attachments: HIVE-6987.txt


  [HADOOP-10211|https://issues.apache.org/jira/browse/HADOOP-10211] made a 
 backward incompatible change due to which the following hive call returns a 
 null map:
 {code}
 MapString, String hadoopSaslProps =  ShimLoader.getHadoopThriftAuthBridge().
 getHadoopSaslProperties(conf); 
 {code}
 Metastore uses the underlying hadoop.rpc.protection values to set the qop 
 between metastore client/server. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6987) Metastore qop settings won't work with Hadoop-2.4

2014-08-22 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106644#comment-14106644
 ] 

Jason Dere commented on HIVE-6987:
--

I think this is the same issue as HIVE-7620, which was recently committed to 
trunk. Marking as a duplicate.
Sorry about that [~skrho], hope to see your next contribution soon.

 Metastore qop settings won't work with Hadoop-2.4
 -

 Key: HIVE-6987
 URL: https://issues.apache.org/jira/browse/HIVE-6987
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
  Labels: patch
 Fix For: 0.14.0

 Attachments: HIVE-6987.txt


  [HADOOP-10211|https://issues.apache.org/jira/browse/HADOOP-10211] made a 
 backward incompatible change due to which the following hive call returns a 
 null map:
 {code}
 MapString, String hadoopSaslProps =  ShimLoader.getHadoopThriftAuthBridge().
 getHadoopSaslProperties(conf); 
 {code}
 Metastore uses the underlying hadoop.rpc.protection values to set the qop 
 between metastore client/server. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7849) Support more generic predicate pushdown for hbase handler

2014-08-22 Thread Navis (JIRA)
Navis created HIVE-7849:
---

 Summary: Support more generic predicate pushdown for hbase handler
 Key: HIVE-7849
 URL: https://issues.apache.org/jira/browse/HIVE-7849
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor


Currently, hbase handler supports AND conjugated filters only. This is the 
first try to support OR, NOT, IN, BETWEEN predicates for hbase.
Mostly based on the work done by [~teddy.choi].



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7593) Instantiate SparkClient per user session [Spark Branch]

2014-08-22 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-7593:
---

Status: Open  (was: Patch Available)

Patch moved to HIVE-7848

 Instantiate SparkClient per user session [Spark Branch]
 ---

 Key: HIVE-7593
 URL: https://issues.apache.org/jira/browse/HIVE-7593
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7593-spark.patch, HIVE-7593.1-spark.patch


 SparkContext is the main class via which Hive talk to Spark cluster. 
 SparkClient encapsulates a SparkContext instance. Currently all user sessions 
 share a single SparkClient instance in HiveServer2. While this is good enough 
 for a POC, even for our first two milestones, this is not desirable for a 
 multi-tenancy environment and gives least flexibility to Hive users. Here is 
 what we propose:
 1. Have a SparkClient instance per user session. The SparkClient instance is 
 created when user executes its first query in the session. It will get 
 destroyed when user session ends.
 2. The SparkClient is instantiated based on the spark configurations that are 
 available to the user, including those defined at the global level and those 
 overwritten by the user (thru set command, for instance).
 3. Ideally, when user changes any spark configuration during the session, the 
 old SparkClient instance should be destroyed and a new one based on the new 
 configurations is created. This may turn out to be a little hard, and thus 
 it's a nice-to-have. If not implemented, we need to document that 
 subsequent configuration changes will not take effect in the current session.
 Please note that there is a thread-safety issue on Spark side where multiple 
 SparkContext instances cannot coexist in the same JVM (SPARK-2243). We need 
 to work with Spark community to get this addressed.
 Besides above functional requirements, avoid potential issues is also a 
 consideration. For instance, sharing SC among users is bad, as resources 
 (such as jar for UDF) will be also shared, which is problematic. On the other 
 hand, one SC per job seems too expensive, as the resource needs to be 
 re-rendered even there isn't any change.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7849) Support more generic predicate pushdown for hbase handler

2014-08-22 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7849:


Attachment: HIVE-7849.1.patch.txt

Running preliminary test. Need some more elaboration on interfaces, etc.

 Support more generic predicate pushdown for hbase handler
 -

 Key: HIVE-7849
 URL: https://issues.apache.org/jira/browse/HIVE-7849
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-7849.1.patch.txt


 Currently, hbase handler supports AND conjugated filters only. This is the 
 first try to support OR, NOT, IN, BETWEEN predicates for hbase.
 Mostly based on the work done by [~teddy.choi].



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7848) Refresh SparkContext when spark configuration changes

2014-08-22 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-7848:
---

Attachment: HIVE-7848-spark.patch

 Refresh SparkContext when spark configuration changes
 -

 Key: HIVE-7848
 URL: https://issues.apache.org/jira/browse/HIVE-7848
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: spark-branch

 Attachments: HIVE-7848-spark.patch


 Recreate the spark client if spark configurations are updated (through set 
 command).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7849) Support more generic predicate pushdown for hbase handler

2014-08-22 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7849:


Status: Patch Available  (was: Open)

 Support more generic predicate pushdown for hbase handler
 -

 Key: HIVE-7849
 URL: https://issues.apache.org/jira/browse/HIVE-7849
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-7849.1.patch.txt


 Currently, hbase handler supports AND conjugated filters only. This is the 
 first try to support OR, NOT, IN, BETWEEN predicates for hbase.
 Mostly based on the work done by [~teddy.choi].



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes

2014-08-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106654#comment-14106654
 ] 

Hive QA commented on HIVE-7832:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663613/HIVE-7832.2.patch

{color:red}ERROR:{color} -1 due to 28 failed/errored test(s), 6118 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_part_project
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_timestamp_funcs
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.columnProjection[0]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.columnProjection[1]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.metaData[0]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.metaData[1]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.test1[1]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testPredicatePushdown[1]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testSeek[1]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testSnappy[1]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testStringAndBinaryStatistics[1]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testStripeLevelStats[1]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testTimestamp[1]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testUnionAndTimestamp[1]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testWithoutIndex[1]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testZeroCopySeek[1]
org.apache.hadoop.hive.ql.io.orc.TestOrcNullOptimization.testMultiStripeWithNull
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/453/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/453/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-453/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 28 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663613

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7848) Refresh SparkContext when spark configuration changes

2014-08-22 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106652#comment-14106652
 ] 

Chinna Rao Lalam commented on HIVE-7848:


Review comments :

bq. 1) null out sparkSession after closing

Taken care this in the new patch

bq. 2) null out the static SparkClient member variable when closed

It is already taken care in 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.close()

 Refresh SparkContext when spark configuration changes
 -

 Key: HIVE-7848
 URL: https://issues.apache.org/jira/browse/HIVE-7848
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: spark-branch

 Attachments: HIVE-7848-spark.patch


 Recreate the spark client if spark configurations are updated (through set 
 command).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-22 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7799:


Attachment: HIVE-7799.1-spark.patch

HiveBaseFunctionResultList use RowContainer to store collected map output row, 
all rows should be added into RowContainer then start read from it, 
RowContainer does not support write after read. Remove current lazy execution 
mode as it depends on RowContainer write-after-read.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContrainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-22 Thread Sathish (JIRA)
Sathish created HIVE-7850:
-

 Summary: Hive Query failed if the data type is arraystring with 
parquet files
 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1, 0.14.0
Reporter: Sathish


* Created a parquet file from the Avro file which have 1 array data type and 
rest are primitive types. Avro Schema of the array data type. Eg: 
{code}
{ name : action, type : [ { type : array, items : string }, 
null ] }
{code}
* Created External Hive table with the Array type as below, 
{code}
create external table paraArray (action Array) partitioned by (partitionid int) 
row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 
'parquet.hive.MapredParquetInputFormat' outputformat 
'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
alter table paraArray add partition(partitionid=1) location '/testPara';
{code}
* Run the following query(select action from paraArray limit 10) and the Map 
reduce jobs are failing with the following exception.
{code}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row [Error getting row data with exception 
java.lang.ClassCastException: 
parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
org.apache.hadoop.io.ArrayWritable
at 
parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
]
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
... 8 more
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-22 Thread Sathish (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish updated HIVE-7850:
--

Description: 
* Created a parquet file from the Avro file which have 1 array data type and 
rest are primitive types. Avro Schema of the array data type. Eg: 
{code}
{ name : action, type : [ { type : array, items : string }, 
null ] }
{code}
* Created External Hive table with the Array type as below, 
{code}
create external table paraArray (action Array) partitioned by (partitionid int) 
row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 
'parquet.hive.MapredParquetInputFormat' outputformat 
'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
alter table paraArray add partition(partitionid=1) location '/testPara';
{code}
* Run the following query(select action from paraArray limit 10) and the Map 
reduce jobs are failing with the following exception.
{code}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row [Error getting row data with exception 
java.lang.ClassCastException: 
parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
org.apache.hadoop.io.ArrayWritable
at 
parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
]
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
... 8 more
{code}


This issue has long back posted on Parquet issues list and Since this is 
related to Parquet Hive serde, I have created the Hive issue here, The details 
and history of this information are as shown in the link here 
https://github.com/Parquet/parquet-mr/issues/281.

  was:
* Created a parquet file from the Avro file which have 1 array data type and 
rest are primitive types. Avro Schema of the array data type. Eg: 
{code}
{ name : action, type : [ { type : array, items : string }, 
null ] }
{code}
* Created External Hive table with the Array type as below, 
{code}
create external table paraArray (action Array) partitioned by (partitionid int) 
row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 
'parquet.hive.MapredParquetInputFormat' outputformat 
'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
alter table paraArray add partition(partitionid=1) location '/testPara';
{code}
* Run the following query(select action from paraArray limit 10) and the Map 
reduce jobs are failing with the following exception.
{code}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row [Error getting row data with exception 
java.lang.ClassCastException: 
parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
org.apache.hadoop.io.ArrayWritable
at 
parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
]
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
at 

[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-22 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7799:


Status: Patch Available  (was: Open)

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContrainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-22 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7799:


Description: 
Here is the exception:
{noformat}
2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - 
Exception in task 0.0 in stage 1.0 (TID 0)
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
at 
scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
{noformat}

Basically, the cause is that RowContainer is misused(it's not allowed to write 
once someone read row from it), i'm trying to figure out whether it's a hive 
issue or just in hive on spark mode.

  was:
Here is the exception:
{noformat}
2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - 
Exception in task 0.0 in stage 1.0 (TID 0)
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
at 
scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
{noformat}

Basically, the cause is that RowContrainer is misused(it's not allowed to write 
once someone read row from it), i'm trying to figure out whether it's a hive 
issue or just in hive on spark mode.


 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 

[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-22 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7799:


Attachment: HIVE-7799.2-spark.patch

Process all inputs and stored output rows in RowContainer and then read from it 
is not very performance efficient. We could just use a queue to store output 
rows as ResultIterator only process next record while get next output row.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContrainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 24972: HIVE-7799 TRANSFORM failed in transform_ppr1.q

2014-08-22 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24972/
---

Review request for hive, Brock Noland and Szehon Ho.


Bugs: HIVE-7799
https://issues.apache.org/jira/browse/HIVE-7799


Repository: hive-git


Description
---

HiveBaseFunctionResultList use RowContainer to store collected map output row, 
all rows should be added into RowContainer then start read from it, 
RowContainer does not support write after read. 


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
 6568a76 

Diff: https://reviews.apache.org/r/24972/diff/


Testing
---


Thanks,

chengxiang li



Re: Review Request 24972: HIVE-7799 TRANSFORM failed in transform_ppr1.q

2014-08-22 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24972/
---

(Updated 八月 22, 2014, 9:47 a.m.)


Review request for hive, Brock Noland and Szehon Ho.


Changes
---

we do not need RowContainer here for persistent storage support, as 
ResultIterator just process new record on demand, a queue should just work fine.


Bugs: HIVE-7799
https://issues.apache.org/jira/browse/HIVE-7799


Repository: hive-git


Description
---

HiveBaseFunctionResultList use RowContainer to store collected map output row, 
all rows should be added into RowContainer then start read from it, 
RowContainer does not support write after read. 


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
 6568a76 

Diff: https://reviews.apache.org/r/24972/diff/


Testing
---


Thanks,

chengxiang li



[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-22 Thread Sathish (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish updated HIVE-7850:
--

Fix Version/s: 0.14.0
   Status: Patch Available  (was: Open)

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1, 0.14.0
Reporter: Sathish
  Labels: parquet, serde
 Fix For: 0.14.0


 * Created a parquet file from the Avro file which have 1 array data type and 
 rest are primitive types. Avro Schema of the array data type. Eg: 
 {code}
 { name : action, type : [ { type : array, items : string }, 
 null ] }
 {code}
 * Created External Hive table with the Array type as below, 
 {code}
 create external table paraArray (action Array) partitioned by (partitionid 
 int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
 inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
 alter table paraArray add partition(partitionid=1) location '/testPara';
 {code}
 * Run the following query(select action from paraArray limit 10) and the Map 
 reduce jobs are failing with the following exception.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row [Error getting row data with exception 
 java.lang.ClassCastException: 
 parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
 org.apache.hadoop.io.ArrayWritable
 at 
 parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)
 ]
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 {code}
 This issue has long back posted on Parquet issues list and Since this is 
 related to Parquet Hive serde, I have created the Hive issue here, The 
 details and history of this information are as shown in the link here 
 https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-22 Thread Sathish (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish updated HIVE-7850:
--

Status: Open  (was: Patch Available)

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1, 0.14.0
Reporter: Sathish
  Labels: parquet, serde
 Fix For: 0.14.0


 * Created a parquet file from the Avro file which have 1 array data type and 
 rest are primitive types. Avro Schema of the array data type. Eg: 
 {code}
 { name : action, type : [ { type : array, items : string }, 
 null ] }
 {code}
 * Created External Hive table with the Array type as below, 
 {code}
 create external table paraArray (action Array) partitioned by (partitionid 
 int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
 inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
 alter table paraArray add partition(partitionid=1) location '/testPara';
 {code}
 * Run the following query(select action from paraArray limit 10) and the Map 
 reduce jobs are failing with the following exception.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row [Error getting row data with exception 
 java.lang.ClassCastException: 
 parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
 org.apache.hadoop.io.ArrayWritable
 at 
 parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)
 ]
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 {code}
 This issue has long back posted on Parquet issues list and Since this is 
 related to Parquet Hive serde, I have created the Hive issue here, The 
 details and history of this information are as shown in the link here 
 https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


HIVE-7850 requet to assign jira ticket

2014-08-22 Thread Valluri, Sathish
Hi All,

 

I have fix available for https://issues.apache.org/jira/browse/HIVE-7850
with me and can anyone provide permissions for me to submit the patch
changes to this jira issue.

 

Jira username : vallurisathish

 

Regards

Sathish Valluri

 

 



smime.p7s
Description: S/MIME cryptographic signature


[jira] [Commented] (HIVE-7833) Remove unwanted allocation in ORC RunLengthIntegerWriterV2 determine encoding function

2014-08-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106689#comment-14106689
 ] 

Hive QA commented on HIVE-7833:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663497/HIVE-7833.2.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6115 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/454/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/454/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-454/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663497

 Remove unwanted allocation in ORC RunLengthIntegerWriterV2 determine encoding 
 function
 --

 Key: HIVE-7833
 URL: https://issues.apache.org/jira/browse/HIVE-7833
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7833.1.patch, HIVE-7833.2.patch


 RunLengthIntegerWriterV2.determineEncoding() is used heavily. There are 
 unwanted buffer allocation for every invocation of the function which are not 
 required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs

2014-08-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106691#comment-14106691
 ] 

Hive QA commented on HIVE-4629:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663483/HIVE-4629.7.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/455/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/455/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-455/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-455/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriterV2.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java'
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-unit/target itests/custom-serde/target itests/util/target 
hcatalog/target hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/webhcat/svr/target 
hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target 
accumulo-handler/target hwi/target common/target common/src/gen contrib/target 
service/target serde/target beeline/target odbc/target cli/target 
ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1619731.

At revision 1619731.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663483

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Shreepadma Venugopalan
Assignee: Dong Chen
 Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, 
 HIVE-4629.2.patch, HIVE-4629.3.patch.txt, HIVE-4629.4.patch, 
 HIVE-4629.5.patch, HIVE-4629.6.patch, HIVE-4629.7.patch


 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106708#comment-14106708
 ] 

Hive QA commented on HIVE-7799:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663637/HIVE-7799.2-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5980 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_null
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/79/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/79/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-79/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663637

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7851) Fix NPE in split generation on Tez 0.5

2014-08-22 Thread Gunther Hagleitner (JIRA)
Gunther Hagleitner created HIVE-7851:


 Summary: Fix NPE in split generation on Tez 0.5
 Key: HIVE-7851
 URL: https://issues.apache.org/jira/browse/HIVE-7851
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7851) Fix NPE in split generation on Tez 0.5

2014-08-22 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7851:
-

Attachment: HIVE-7851.1.patch

 Fix NPE in split generation on Tez 0.5
 --

 Key: HIVE-7851
 URL: https://issues.apache.org/jira/browse/HIVE-7851
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch

 Attachments: HIVE-7851.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-7851) Fix NPE in split generation on Tez 0.5

2014-08-22 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner resolved HIVE-7851.
--

Resolution: Fixed

Committed to tez branch.

 Fix NPE in split generation on Tez 0.5
 --

 Key: HIVE-7851
 URL: https://issues.apache.org/jira/browse/HIVE-7851
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7851.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7851) Fix NPE in split generation on Tez 0.5

2014-08-22 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7851:
-

Fix Version/s: tez-branch

 Fix NPE in split generation on Tez 0.5
 --

 Key: HIVE-7851
 URL: https://issues.apache.org/jira/browse/HIVE-7851
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch

 Attachments: HIVE-7851.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


HIVE-7850 requet to assign jira ticket

2014-08-22 Thread Valluri, Sathish


Hi All,



I have fix available for https://issues.apache.org/jira/browse/HIVE-7850 with 
me and can anyone provide permissions for me to submit the patch changes to 
this jira issue.



Jira username : vallurisathish



Regards

Sathish Valluri







[jira] [Commented] (HIVE-7772) Add tests for order/sort/distribute/cluster by query [Spark Branch]

2014-08-22 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106731#comment-14106731
 ] 

Rui Li commented on HIVE-7772:
--

Hi [~brocknoland], I tested other cases with latest code again but the error 
message I mentioned earlier still remains. These queries have a stage of 
{{Stats-Aggr Operator}}, which I think caused the error.
It seems HIVE-7819 only avoids the exception but we still don't have a proper 
counter for spark task, so that {{CounterStatsAggregator.connect}} returns 
false and leads to the connection error I mentioned.
I think we can include the tests in the patch for now and add more once spark 
counter is ready?

 Add tests for order/sort/distribute/cluster by query [Spark Branch]
 ---

 Key: HIVE-7772
 URL: https://issues.apache.org/jira/browse/HIVE-7772
 Project: Hive
  Issue Type: Test
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7772-spark.patch


 Now that these queries are supported, we should have tests to catch any 
 problems we may have.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7828) TestCLIDriver.parquet_join.q is failing on trunk

2014-08-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106737#comment-14106737
 ] 

Hive QA commented on HIVE-7828:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663418/HIVE-7828.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6115 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/456/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/456/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-456/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663418

 TestCLIDriver.parquet_join.q is failing on trunk
 

 Key: HIVE-7828
 URL: https://issues.apache.org/jira/browse/HIVE-7828
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7828.patch


 The test is failing in the HiveQA tests of late.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-22 Thread Sathish (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish updated HIVE-7850:
--

Attachment: HIVE-7850.patch

This patch fixes this issue,Since this feature we want to use in the next 
release of Hive. Requesting someone to look into this patch changes and merge 
to the main branch.

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.14.0, 0.13.1
Reporter: Sathish
  Labels: parquet, serde
 Fix For: 0.14.0

 Attachments: HIVE-7850.patch


 * Created a parquet file from the Avro file which have 1 array data type and 
 rest are primitive types. Avro Schema of the array data type. Eg: 
 {code}
 { name : action, type : [ { type : array, items : string }, 
 null ] }
 {code}
 * Created External Hive table with the Array type as below, 
 {code}
 create external table paraArray (action Array) partitioned by (partitionid 
 int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
 inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
 alter table paraArray add partition(partitionid=1) location '/testPara';
 {code}
 * Run the following query(select action from paraArray limit 10) and the Map 
 reduce jobs are failing with the following exception.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row [Error getting row data with exception 
 java.lang.ClassCastException: 
 parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
 org.apache.hadoop.io.ArrayWritable
 at 
 parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)
 ]
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 {code}
 This issue has long back posted on Parquet issues list and Since this is 
 related to Parquet Hive serde, I have created the Hive issue here, The 
 details and history of this information are as shown in the link here 
 https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-22 Thread Sathish (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish updated HIVE-7850:
--

Status: Patch Available  (was: Open)

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1, 0.14.0
Reporter: Sathish
  Labels: parquet, serde
 Fix For: 0.14.0

 Attachments: HIVE-7850.patch


 * Created a parquet file from the Avro file which have 1 array data type and 
 rest are primitive types. Avro Schema of the array data type. Eg: 
 {code}
 { name : action, type : [ { type : array, items : string }, 
 null ] }
 {code}
 * Created External Hive table with the Array type as below, 
 {code}
 create external table paraArray (action Array) partitioned by (partitionid 
 int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
 inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
 alter table paraArray add partition(partitionid=1) location '/testPara';
 {code}
 * Run the following query(select action from paraArray limit 10) and the Map 
 reduce jobs are failing with the following exception.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row [Error getting row data with exception 
 java.lang.ClassCastException: 
 parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
 org.apache.hadoop.io.ArrayWritable
 at 
 parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)
 ]
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 {code}
 This issue has long back posted on Parquet issues list and Since this is 
 related to Parquet Hive serde, I have created the Hive issue here, The 
 details and history of this information are as shown in the link here 
 https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-22 Thread Sathish (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106775#comment-14106775
 ] 

Sathish commented on HIVE-7850:
---

Can someone look into this issue and provide any comments or suggestions for 
this fix. Provided the patch and waiting for this patch to be merged to the 
main branch as this feature of Hive we want use in our next release.

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.14.0, 0.13.1
Reporter: Sathish
  Labels: parquet, serde
 Fix For: 0.14.0

 Attachments: HIVE-7850.patch


 * Created a parquet file from the Avro file which have 1 array data type and 
 rest are primitive types. Avro Schema of the array data type. Eg: 
 {code}
 { name : action, type : [ { type : array, items : string }, 
 null ] }
 {code}
 * Created External Hive table with the Array type as below, 
 {code}
 create external table paraArray (action Array) partitioned by (partitionid 
 int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
 inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
 alter table paraArray add partition(partitionid=1) location '/testPara';
 {code}
 * Run the following query(select action from paraArray limit 10) and the Map 
 reduce jobs are failing with the following exception.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row [Error getting row data with exception 
 java.lang.ClassCastException: 
 parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
 org.apache.hadoop.io.ArrayWritable
 at 
 parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)
 ]
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 {code}
 This issue has long back posted on Parquet issues list and Since this is 
 related to Parquet Hive serde, I have created the Hive issue here, The 
 details and history of this information are as shown in the link here 
 https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.

2014-08-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106795#comment-14106795
 ] 

Hive QA commented on HIVE-7100:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663513/HIVE-7100.4.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6118 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/457/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/457/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-457/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663513

 Users of hive should be able to specify skipTrash when dropping tables.
 ---

 Key: HIVE-7100
 URL: https://issues.apache.org/jira/browse/HIVE-7100
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Ravi Prakash
Assignee: Jayesh
 Attachments: HIVE-7100.1.patch, HIVE-7100.2.patch, HIVE-7100.3.patch, 
 HIVE-7100.4.patch, HIVE-7100.patch


 Users of our clusters are often running up against their quota limits because 
 of Hive tables. When they drop tables, they have to then manually delete the 
 files from HDFS using skipTrash. This is cumbersome and unnecessary. We 
 should enable users to skipTrash directly when dropping tables.
 We should also be able to provide this functionality without polluting SQL 
 syntax.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7840) Generated hive-default.xml.template mistakenly refers to property names as keys

2014-08-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106843#comment-14106843
 ] 

Hive QA commented on HIVE-7840:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663537/HIVE-7840.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6115 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/458/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/458/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-458/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663537

 Generated hive-default.xml.template mistakenly refers to property names as 
 keys
 ---

 Key: HIVE-7840
 URL: https://issues.apache.org/jira/browse/HIVE-7840
 Project: Hive
  Issue Type: Bug
Reporter: Wilbur Yang
Assignee: Wilbur Yang
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7840.patch


 When Hive is built with Maven, the default template for hive-site.xml 
 (hive/packaging/target/apache-hive-0.14.0-SNAPSHOT-bin/apache-hive-0.14.0-SNAPSHOT-bin/conf/hive-default.xml.template)
  uses the key tag as opposed to the correct name tag. If a user were to 
 create a custom hive-site.xml using this template, then it results in a 
 rather confusing situation in which Hive logs that it has loaded 
 hive-site.xml, but in reality none of those properties are registering 
 correctly.
 *Wrong:*
 {quote}
 configuration
   ...
   property
 keyhive.exec.script.wrapper/key
 value/
 description/
   /property
   ...
 {quote}
 *Right:*
 {quote}
 configuration
   ...
   property
 namehive.exec.script.wrapper/name
 value/
 description/
   /property
   ...
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-22 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106868#comment-14106868
 ] 

Venki Korukanti commented on HIVE-7799:
---

I think with the v2 patch we need unbounded memory as we store the results in 
Queue and sometime a single input record could generate more than one record 
(UDTF) or some operators (such as group by) flush after processing many input 
records.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-22 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106874#comment-14106874
 ] 

Venki Korukanti commented on HIVE-7799:
---

Let me look at the RowContainer and see if we can modify/extend it to support 
read/write like a queue with a persistent support. As far as I see we need 
persistent support.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7852) [CBO] Handle unary operators

2014-08-22 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-7852:
--

 Summary: [CBO] Handle unary operators
 Key: HIVE-7852
 URL: https://issues.apache.org/jira/browse/HIVE-7852
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


Currently, query like select c1 from t1 where c2 = -6; throws exception because 
cbo path confuses unary -ve with binary -ve



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7836) Ease-out denominator for multi-attribute join case in statistics annotation

2014-08-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106938#comment-14106938
 ] 

Hive QA commented on HIVE-7836:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663533/HIVE-7836.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6115 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/459/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/459/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-459/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663533

 Ease-out denominator for multi-attribute join case in statistics annotation
 ---

 Key: HIVE-7836
 URL: https://issues.apache.org/jira/browse/HIVE-7836
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7836.1.patch


 In cases where number of relations involved in join is less than the number 
 of join attributes the denominator of join rule can get larger resulting in 
 aggressive row count estimation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7852) [CBO] Handle unary operators

2014-08-22 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7852:
---

Attachment: h-7852.patch

 [CBO] Handle unary operators
 

 Key: HIVE-7852
 URL: https://issues.apache.org/jira/browse/HIVE-7852
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: h-7852.patch


 Currently, query like select c1 from t1 where c2 = -6; throws exception 
 because cbo path confuses unary -ve with binary -ve



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7852) [CBO] Handle unary operators

2014-08-22 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7852:
---

Status: Patch Available  (was: Open)

 [CBO] Handle unary operators
 

 Key: HIVE-7852
 URL: https://issues.apache.org/jira/browse/HIVE-7852
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: h-7852.patch


 Currently, query like select c1 from t1 where c2 = -6; throws exception 
 because cbo path confuses unary -ve with binary -ve



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 24981: Handle unary op.

2014-08-22 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24981/
---

Review request for hive and John Pullokkaran.


Bugs: HIVE-7852
https://issues.apache.org/jira/browse/HIVE-7852


Repository: hive


Description
---

Handle unary op.


Diffs
-

  
branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java
 1619831 
  branches/cbo/ql/src/test/queries/clientpositive/cbo_correctness.q 1619831 
  branches/cbo/ql/src/test/results/clientpositive/cbo_correctness.q.out 1619831 

Diff: https://reviews.apache.org/r/24981/diff/


Testing
---

Added new test.


Thanks,

Ashutosh Chauhan



[jira] [Updated] (HIVE-7836) Ease-out denominator for multi-attribute join case in statistics annotation

2014-08-22 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7836:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

 Ease-out denominator for multi-attribute join case in statistics annotation
 ---

 Key: HIVE-7836
 URL: https://issues.apache.org/jira/browse/HIVE-7836
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.14.0

 Attachments: HIVE-7836.1.patch


 In cases where number of relations involved in join is less than the number 
 of join attributes the denominator of join rule can get larger resulting in 
 aggressive row count estimation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7736) improve the columns stats update speed for all the partitions of a table

2014-08-22 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7736:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Pengcheng!

 improve the columns stats update speed for all the partitions of a table
 

 Key: HIVE-7736
 URL: https://issues.apache.org/jira/browse/HIVE-7736
 Project: Hive
  Issue Type: Improvement
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7736.0.patch, HIVE-7736.1.patch, HIVE-7736.2.patch, 
 HIVE-7736.3.patch, HIVE-7736.4.patch


 The current implementation of columns stats update for all the partitions of 
 a table takes a long time when there are thousands of partitions. 
 For example, on a given cluster, it took 600+ seconds to update all the 
 partitions' columns stats for a table with 2 columns but 2000 partitions.
 ANALYZE TABLE src_stat_part partition (partitionId) COMPUTE STATISTICS for 
 columns;
 We would like to improve the columns stats update speed for all the 
 partitions of a table



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7848) Refresh SparkContext when spark configuration changes

2014-08-22 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106988#comment-14106988
 ] 

Chinna Rao Lalam commented on HIVE-7848:


Still 2nd review comment need to address.

 Refresh SparkContext when spark configuration changes
 -

 Key: HIVE-7848
 URL: https://issues.apache.org/jira/browse/HIVE-7848
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: spark-branch

 Attachments: HIVE-7848-spark.patch


 Recreate the spark client if spark configurations are updated (through set 
 command).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24602: HIVE-7689 : Enable Postgres as METASTORE back-end

2014-08-22 Thread Damien Carol

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24602/
---

(Updated août 22, 2014, 3:50 après-midi)


Review request for hive.


Bugs: HIVE-7689
https://issues.apache.org/jira/browse/HIVE-7689


Repository: hive-git


Description
---

I maintain few patches to make Metastore works with Postgres back end in our 
production environment.
The main goal of this JIRA is to push upstream these patches.

This patch enable these features :
* LOCKS on postgres metastore
* COMPACTION on postgres metastore
* TRANSACTION on postgres metastore
* fix metastore update script for postgres


Diffs (updated)
-

  metastore/scripts/upgrade/postgres/hive-txn-schema-0.13.0.postgres.sql 
2ebd3b0 
  
metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
 524a7a4 
  metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java 
06d8ac0 
  metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 
063dee6 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 264052f 
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java 
f636cff 
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 
db62721 
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java 4625d27 

Diff: https://reviews.apache.org/r/24602/diff/


Testing
---

Using patched version in production. Enable concurrency with DbTxnManager.


Thanks,

Damien Carol



[jira] [Updated] (HIVE-7689) Enable Postgres as METASTORE back-end

2014-08-22 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-7689:
---

Attachment: HIVE-7889.3.patch

Rebased the patch and rewrited some stuffs.

 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Minor
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch


 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and fix error in STATS on metastore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 24986: HIVE-7553: decouple the auxiliary jar loading from hive server2 starting phase

2014-08-22 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24986/
---

Review request for hive.


Bugs: HIVE-7553
https://issues.apache.org/jira/browse/HIVE-7553


Repository: hive-git


Description
---

HIVE-7553: decouple the auxiliary jar loading from hive server2 starting phase


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
15bc0a33b556b4be7a0a1fa671e5ee9a2a553fee 
  hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatUtil.java 
93a03adeab7ba3c3c91344955d303e4252005239 
  
hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatClient.java
 7df84e997af4c626cf4fe92b22293c3165a5e6cc 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultFetchFormatter.java 
5924bcf1f55dc4c2dd06f312f929047b7df9de55 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
0c6a3d44ef1f796778768421dc02f8bf3ede6a8c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionTask.java 
bd45df1a401d1adb009e953d08205c7d5c2d5de2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ListSinkOperator.java 
dcc19f70644c561e17df8c8660ca62805465f1d6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 
4cf4522ace1932a0cd7f8203a98e69a35e8e9e8e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
eb2851b2c5fa52e0f555b3d8d1beea5d1ac3b225 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/HookUtils.java 
3f474f846c7af5f1f65f1c14f3ce51308f1279d4 
  ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughOutputFormat.java 
0962cadce0d515e046371d0a816f4efd70b8eef7 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java 
9051ba6d80e619ddbb6c27bb161e1e7a5cdb08a5 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 
44f6198b55594e1394e9a1556603fe1c730fb438 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 
2f13ac2e30195a25844a25e9ec8a7c42ed99b75c 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java
 46bf55d64eef0e994c103f3f2a16a81753aa48cf 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 
d86df453cd7686627940ade62c0fd72f1636dd0b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java 
0a1c660b4bbd46d8410e646270b23c99a4de8b7e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
51838ae0b8abd4e040f180cad2375355fbfff621 
  ql/src/java/org/apache/hadoop/hive/ql/plan/AggregationDesc.java 
17eeae1a3435fceb4b57325675c58b599e0973ea 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 
930acbc98e81f8d421cee1170659d8b7a427fe7d 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 
39f1793aaa5bed8a494883cac516ad314be951f4 
  ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorFactory.java 
70c76b1ca50dd0540e57125eee9e4aa347d085d9 
  ql/src/java/org/apache/hadoop/hive/ql/processors/HiveCommand.java 
ae532f6e6b1a2e626043865b7bb502377455e7e1 
  ql/src/java/org/apache/hadoop/hive/ql/processors/RefreshProcessor.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 
fcfcf4227610279090a867cdeeb36dbc3d13d902 
  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java 
e247184b7d95c85fd3e12432e7eb75eb1e2a0b68 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBridge.java 
959007a54b335bb0bdef0256f60e6cbc65798dc7 
  ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java 
ef0052f5763922d50986f127c416af5eaa6ae30d 
  ql/src/test/resources/SessionStateTest-V1.jar PRE-CREATION 
  ql/src/test/resources/SessionStateTest-V2.jar PRE-CREATION 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
b39d64dbec7b7d4beb622ce72c86ed1c8264f042 

Diff: https://reviews.apache.org/r/24986/diff/


Testing
---


Thanks,

cheng xu



[jira] [Commented] (HIVE-7841) Case, When, Lead, Lag UDF is missing annotation

2014-08-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107054#comment-14107054
 ] 

Hive QA commented on HIVE-7841:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663556/HIVE-7841.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6115 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_case
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_when
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/460/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/460/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-460/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663556

 Case, When, Lead, Lag UDF is missing annotation
 ---

 Key: HIVE-7841
 URL: https://issues.apache.org/jira/browse/HIVE-7841
 Project: Hive
  Issue Type: Bug
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7841.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-22 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7799:


Status: Open  (was: Patch Available)

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7654) A method to extrapolate columnStats for partitions of a table

2014-08-22 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107096#comment-14107096
 ] 

Ashutosh Chauhan commented on HIVE-7654:


[~szehon], api returns # of partitions for which partitions were found in 
metastore. If user of api is not interested in extrapolated stats, than she can 
check for # of partitions requested  # of partitions returned and if those two 
numbers arent equal, than can choose to ignore result of this api. 

Api for unaggregated stats already exists, which they can use if they arent 
interested in this extrapolation.

 A method to extrapolate columnStats for partitions of a table
 -

 Key: HIVE-7654
 URL: https://issues.apache.org/jira/browse/HIVE-7654
 Project: Hive
  Issue Type: New Feature
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, 
 HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch, 
 HIVE-7654.8.patch


 In a PARTITIONED table, there are many partitions. For example, 
 create table if not exists loc_orc (
   state string,
   locid int,
   zip bigint
 ) partitioned by(year string) stored as orc;
 We assume there are 4 partitions, partition(year='2000'), 
 partition(year='2001'), partition(year='2002') and partition(year='2003').
 We can use the following command to compute statistics for columns 
 state,locid of partition(year='2001')
 analyze table loc_orc partition(year='2001') compute statistics for columns 
 state,locid;
 We need to know the “aggregated” column status for the whole table loc_orc. 
 However, we may not have the column status for some partitions, e.g., 
 partition(year='2002') and also we may not have the column status for some 
 columns, e.g., zip bigint for partition(year='2001')
 We propose a method to extrapolate the missing column status for the 
 partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7663) OrcRecordUpdater needs to implement getStats

2014-08-22 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7663:
-

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Patch checked in.  Thank you Owen for the review.

 OrcRecordUpdater needs to implement getStats
 

 Key: HIVE-7663
 URL: https://issues.apache.org/jira/browse/HIVE-7663
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.14.0

 Attachments: HIVE-7663.patch


 OrcRecordUpdater.getStats currently returns null.  It needs to track the 
 stats and return a valid value.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-22 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107106#comment-14107106
 ] 

Chengxiang Li commented on HIVE-7799:
-

Thanks,[~venki387], it seems i miss something here, group by does need a 
persistent storage support here.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7841) Case, When, Lead, Lag UDF is missing annotation

2014-08-22 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7841:
---

Component/s: UDF
 Documentation

 Case, When, Lead, Lag UDF is missing annotation
 ---

 Key: HIVE-7841
 URL: https://issues.apache.org/jira/browse/HIVE-7841
 Project: Hive
  Issue Type: Bug
  Components: Documentation, UDF
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.14.0

 Attachments: HIVE-7841.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7841) Case, When, Lead, Lag UDF is missing annotation

2014-08-22 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7841:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, John!

 Case, When, Lead, Lag UDF is missing annotation
 ---

 Key: HIVE-7841
 URL: https://issues.apache.org/jira/browse/HIVE-7841
 Project: Hive
  Issue Type: Bug
  Components: Documentation, UDF
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.14.0

 Attachments: HIVE-7841.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7833) Remove unwanted allocation in ORC RunLengthIntegerWriterV2 determine encoding function

2014-08-22 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7833:
-

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk

 Remove unwanted allocation in ORC RunLengthIntegerWriterV2 determine encoding 
 function
 --

 Key: HIVE-7833
 URL: https://issues.apache.org/jira/browse/HIVE-7833
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.14.0

 Attachments: HIVE-7833.1.patch, HIVE-7833.2.patch


 RunLengthIntegerWriterV2.determineEncoding() is used heavily. There are 
 unwanted buffer allocation for every invocation of the function which are not 
 required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-22 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107139#comment-14107139
 ] 

Venki Korukanti commented on HIVE-7799:
---

Whenever {{ResultIterator.hasNext()}} or {{ResultIterator.next()}} is called we 
first serve records from RowContainer until all records in RowContainer are 
consumed. If there are no more records in RowContainer then we clear 
RowContainer and call {{processNextRecord}} or {{closeRecordProcessor}} in 
{{ResultIterator.hasNext()}} to get the next output record(s). So we start 
adding records only when RowContainer is empty (or cleared). I am trying to 
understand how we got into the situation where we are trying to write after 
reading has started. One scenario I can think of is if Spark has two threads 
like in producer-consumer.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7654) A method to extrapolate columnStats for partitions of a table

2014-08-22 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107145#comment-14107145
 ] 

Szehon Ho commented on HIVE-7654:
-

Thanks for the detailed explanation.

 A method to extrapolate columnStats for partitions of a table
 -

 Key: HIVE-7654
 URL: https://issues.apache.org/jira/browse/HIVE-7654
 Project: Hive
  Issue Type: New Feature
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, 
 HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch, 
 HIVE-7654.8.patch


 In a PARTITIONED table, there are many partitions. For example, 
 create table if not exists loc_orc (
   state string,
   locid int,
   zip bigint
 ) partitioned by(year string) stored as orc;
 We assume there are 4 partitions, partition(year='2000'), 
 partition(year='2001'), partition(year='2002') and partition(year='2003').
 We can use the following command to compute statistics for columns 
 state,locid of partition(year='2001')
 analyze table loc_orc partition(year='2001') compute statistics for columns 
 state,locid;
 We need to know the “aggregated” column status for the whole table loc_orc. 
 However, we may not have the column status for some partitions, e.g., 
 partition(year='2002') and also we may not have the column status for some 
 columns, e.g., zip bigint for partition(year='2001')
 We propose a method to extrapolate the missing column status for the 
 partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24981: Handle unary op.

2014-08-22 Thread John Pullokkaran

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24981/#review51296
---



branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java
https://reviews.apache.org/r/24981/#comment89437

It seems that only change needed is to add unuary plus, minus to builder 
table.
registerFunction(++, SqlStdOperatorTable.UNARY_PLUS, 
hToken(HiveParser.PLUS, PLUS));
registerFunction(--, SqlStdOperatorTable.UNARY_MINUS, 
hToken(HiveParser.PLUS, MINUS));


- John Pullokkaran


On Aug. 22, 2014, 3:16 p.m., Ashutosh Chauhan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24981/
 ---
 
 (Updated Aug. 22, 2014, 3:16 p.m.)
 
 
 Review request for hive and John Pullokkaran.
 
 
 Bugs: HIVE-7852
 https://issues.apache.org/jira/browse/HIVE-7852
 
 
 Repository: hive
 
 
 Description
 ---
 
 Handle unary op.
 
 
 Diffs
 -
 
   
 branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java
  1619831 
   branches/cbo/ql/src/test/queries/clientpositive/cbo_correctness.q 1619831 
   branches/cbo/ql/src/test/results/clientpositive/cbo_correctness.q.out 
 1619831 
 
 Diff: https://reviews.apache.org/r/24981/diff/
 
 
 Testing
 ---
 
 Added new test.
 
 
 Thanks,
 
 Ashutosh Chauhan
 




[jira] [Updated] (HIVE-7654) A method to extrapolate columnStats for partitions of a table

2014-08-22 Thread pengcheng xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengcheng xiong updated HIVE-7654:
--

Attachment: HIVE-7654.9.patch

 A method to extrapolate columnStats for partitions of a table
 -

 Key: HIVE-7654
 URL: https://issues.apache.org/jira/browse/HIVE-7654
 Project: Hive
  Issue Type: New Feature
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, 
 HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch, 
 HIVE-7654.8.patch, HIVE-7654.9.patch


 In a PARTITIONED table, there are many partitions. For example, 
 create table if not exists loc_orc (
   state string,
   locid int,
   zip bigint
 ) partitioned by(year string) stored as orc;
 We assume there are 4 partitions, partition(year='2000'), 
 partition(year='2001'), partition(year='2002') and partition(year='2003').
 We can use the following command to compute statistics for columns 
 state,locid of partition(year='2001')
 analyze table loc_orc partition(year='2001') compute statistics for columns 
 state,locid;
 We need to know the “aggregated” column status for the whole table loc_orc. 
 However, we may not have the column status for some partitions, e.g., 
 partition(year='2002') and also we may not have the column status for some 
 columns, e.g., zip bigint for partition(year='2001')
 We propose a method to extrapolate the missing column status for the 
 partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7654) A method to extrapolate columnStats for partitions of a table

2014-08-22 Thread pengcheng xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengcheng xiong updated HIVE-7654:
--

Status: Patch Available  (was: Open)

address the partial test case error

 A method to extrapolate columnStats for partitions of a table
 -

 Key: HIVE-7654
 URL: https://issues.apache.org/jira/browse/HIVE-7654
 Project: Hive
  Issue Type: New Feature
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, 
 HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch, 
 HIVE-7654.8.patch, HIVE-7654.9.patch


 In a PARTITIONED table, there are many partitions. For example, 
 create table if not exists loc_orc (
   state string,
   locid int,
   zip bigint
 ) partitioned by(year string) stored as orc;
 We assume there are 4 partitions, partition(year='2000'), 
 partition(year='2001'), partition(year='2002') and partition(year='2003').
 We can use the following command to compute statistics for columns 
 state,locid of partition(year='2001')
 analyze table loc_orc partition(year='2001') compute statistics for columns 
 state,locid;
 We need to know the “aggregated” column status for the whole table loc_orc. 
 However, we may not have the column status for some partitions, e.g., 
 partition(year='2002') and also we may not have the column status for some 
 columns, e.g., zip bigint for partition(year='2001')
 We propose a method to extrapolate the missing column status for the 
 partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7654) A method to extrapolate columnStats for partitions of a table

2014-08-22 Thread pengcheng xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengcheng xiong updated HIVE-7654:
--

Status: Open  (was: Patch Available)

 A method to extrapolate columnStats for partitions of a table
 -

 Key: HIVE-7654
 URL: https://issues.apache.org/jira/browse/HIVE-7654
 Project: Hive
  Issue Type: New Feature
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch, 
 HIVE-7654.1.patch, HIVE-7654.4.patch, HIVE-7654.6.patch, HIVE-7654.7.patch, 
 HIVE-7654.8.patch, HIVE-7654.9.patch


 In a PARTITIONED table, there are many partitions. For example, 
 create table if not exists loc_orc (
   state string,
   locid int,
   zip bigint
 ) partitioned by(year string) stored as orc;
 We assume there are 4 partitions, partition(year='2000'), 
 partition(year='2001'), partition(year='2002') and partition(year='2003').
 We can use the following command to compute statistics for columns 
 state,locid of partition(year='2001')
 analyze table loc_orc partition(year='2001') compute statistics for columns 
 state,locid;
 We need to know the “aggregated” column status for the whole table loc_orc. 
 However, we may not have the column status for some partitions, e.g., 
 partition(year='2002') and also we may not have the column status for some 
 columns, e.g., zip bigint for partition(year='2001')
 We propose a method to extrapolate the missing column status for the 
 partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24498: A method to extrapolate the missing column status for the partitions.

2014-08-22 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24498/
---

(Updated Aug. 22, 2014, 5:45 p.m.)


Review request for hive.


Changes
---

address the partial test case error


Repository: hive-git


Description
---

We propose a method to extrapolate the missing column status for the partitions.


Diffs (updated)
-

  data/files/extrapolate_stats_full.txt PRE-CREATION 
  data/files/extrapolate_stats_partial.txt PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
9489949 
  
metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java 
PRE-CREATION 
  
metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java
 PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
767cffc 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java a9f4be2 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 0364385 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
 4eba2b0 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 78ab19a 
  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 8100b39 
  ql/src/test/queries/clientpositive/extrapolate_part_stats_full.q PRE-CREATION 
  ql/src/test/queries/clientpositive/extrapolate_part_stats_partial.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/annotate_stats_part.q.out 10993c3 
  ql/src/test/results/clientpositive/extrapolate_part_stats_full.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/extrapolate_part_stats_partial.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/24498/diff/


Testing
---


File Attachments


HIVE-7654.0.patch
  
https://reviews.apache.org/media/uploaded/files/2014/08/12/77b155b0-a417-4225-b6b7-4c8c6ce2b97d__HIVE-7654.0.patch


Thanks,

pengcheng xiong



Re: Review Request 24981: Handle unary op.

2014-08-22 Thread Ashutosh Chauhan


 On Aug. 22, 2014, 5:36 p.m., John Pullokkaran wrote:
  branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java,
   line 125
  https://reviews.apache.org/r/24981/diff/1/?file=667267#file667267line125
 
  It seems that only change needed is to add unuary plus, minus to 
  builder table.
  registerFunction(++, SqlStdOperatorTable.UNARY_PLUS, 
  hToken(HiveParser.PLUS, PLUS));
  registerFunction(--, SqlStdOperatorTable.UNARY_MINUS, 
  hToken(HiveParser.PLUS, MINUS));

I am not sure how this can work, since in both map and reverse lookup map, + is 
overloaded for both unary  binary (+) operators. 
What you suggested can work only if we change annotation description of 
GenericUDFOpNegative in Hive  UnaryMinus in Optiq from - to --.


- Ashutosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24981/#review51296
---


On Aug. 22, 2014, 3:16 p.m., Ashutosh Chauhan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24981/
 ---
 
 (Updated Aug. 22, 2014, 3:16 p.m.)
 
 
 Review request for hive and John Pullokkaran.
 
 
 Bugs: HIVE-7852
 https://issues.apache.org/jira/browse/HIVE-7852
 
 
 Repository: hive
 
 
 Description
 ---
 
 Handle unary op.
 
 
 Diffs
 -
 
   
 branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java
  1619831 
   branches/cbo/ql/src/test/queries/clientpositive/cbo_correctness.q 1619831 
   branches/cbo/ql/src/test/results/clientpositive/cbo_correctness.q.out 
 1619831 
 
 Diff: https://reviews.apache.org/r/24981/diff/
 
 
 Testing
 ---
 
 Added new test.
 
 
 Thanks,
 
 Ashutosh Chauhan
 




[jira] [Updated] (HIVE-7828) TestCLIDriver.parquet_join.q is failing on trunk

2014-08-22 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7828:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Thank you very much for the review Alan! I have committed this to trunk!

 TestCLIDriver.parquet_join.q is failing on trunk
 

 Key: HIVE-7828
 URL: https://issues.apache.org/jira/browse/HIVE-7828
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.14.0

 Attachments: HIVE-7828.patch


 The test is failing in the HiveQA tests of late.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7598) Potential null pointer dereference in MergeTask#closeJob()

2014-08-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107185#comment-14107185
 ] 

Hive QA commented on HIVE-7598:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663564/HIVE-7598.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6116 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/461/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/461/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-461/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663564

 Potential null pointer dereference in MergeTask#closeJob()
 --

 Key: HIVE-7598
 URL: https://issues.apache.org/jira/browse/HIVE-7598
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: SUYEON LEE
Priority: Minor
 Attachments: HIVE-7598.patch


 Call to Utilities.mvFileToFinalPath() passes null as second last parameter, 
 conf.
 null gets passed to createEmptyBuckets() which dereferences conf directly:
 {code}
 boolean isCompressed = conf.getCompressed();
 TableDesc tableInfo = conf.getTableInfo();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7848) Refresh SparkContext when spark configuration changes

2014-08-22 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-7848:
---

Attachment: HIVE-7848.1-spark.patch

Addressed both the comments.

 Refresh SparkContext when spark configuration changes
 -

 Key: HIVE-7848
 URL: https://issues.apache.org/jira/browse/HIVE-7848
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: spark-branch

 Attachments: HIVE-7848-spark.patch, HIVE-7848.1-spark.patch


 Recreate the spark client if spark configurations are updated (through set 
 command).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-22 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107200#comment-14107200
 ] 

Szehon Ho commented on HIVE-7850:
-

Hi Satish, can you please fix the formatting?  Indents are 2 spaces (hive code 
is like that), and put a space after the comma, etc.  

Otherwise it looks good to me.  But granted, I'm not an expert of parquet 
schema, so my only question is that it compatible with other tools?  + 
[~jcoffey], [~rdblue] for comments (if any).

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.14.0, 0.13.1
Reporter: Sathish
  Labels: parquet, serde
 Fix For: 0.14.0

 Attachments: HIVE-7850.patch


 * Created a parquet file from the Avro file which have 1 array data type and 
 rest are primitive types. Avro Schema of the array data type. Eg: 
 {code}
 { name : action, type : [ { type : array, items : string }, 
 null ] }
 {code}
 * Created External Hive table with the Array type as below, 
 {code}
 create external table paraArray (action Array) partitioned by (partitionid 
 int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
 inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
 alter table paraArray add partition(partitionid=1) location '/testPara';
 {code}
 * Run the following query(select action from paraArray limit 10) and the Map 
 reduce jobs are failing with the following exception.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row [Error getting row data with exception 
 java.lang.ClassCastException: 
 parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
 org.apache.hadoop.io.ArrayWritable
 at 
 parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)
 ]
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 {code}
 This issue has long back posted on Parquet issues list and Since this is 
 related to Parquet Hive serde, I have created the Hive issue here, The 
 details and history of this information are as shown in the link here 
 https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7848) Refresh SparkContext when spark configuration changes

2014-08-22 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-7848:
---

Status: Patch Available  (was: Open)

 Refresh SparkContext when spark configuration changes
 -

 Key: HIVE-7848
 URL: https://issues.apache.org/jira/browse/HIVE-7848
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: spark-branch

 Attachments: HIVE-7848-spark.patch, HIVE-7848.1-spark.patch


 Recreate the spark client if spark configurations are updated (through set 
 command).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs

2014-08-22 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107206#comment-14107206
 ] 

Brock Noland commented on HIVE-4629:


Hi Dong,

The latest patch fails to apply to HEAD and will need to be rebased.

bq. the latest patch still does not fulfill the comments about backward 
compatibility.

I am very sorry for the confusion, I believe the patch *does* meet the backward 
compatibility requirement.

bq. For client and service layer interface ICLIService, although it is not RPC 
and is not a public API of Hive, I think making it follow the single 
request/response struct mode is also good. Will make the new fetchResults 
method follow the single request/response struct model. Then remove those old 
fetchResults methods.

I do not feel this is required. The current patch works exactly as we requested!

Thank you very much Dong!!

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Shreepadma Venugopalan
Assignee: Dong Chen
 Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, 
 HIVE-4629.2.patch, HIVE-4629.3.patch.txt, HIVE-4629.4.patch, 
 HIVE-4629.5.patch, HIVE-4629.6.patch, HIVE-4629.7.patch


 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7840) Generated hive-default.xml.template mistakenly refers to property names as keys

2014-08-22 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7840:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thank you so much for this fix!! I have committed this to trunk!

 Generated hive-default.xml.template mistakenly refers to property names as 
 keys
 ---

 Key: HIVE-7840
 URL: https://issues.apache.org/jira/browse/HIVE-7840
 Project: Hive
  Issue Type: Bug
Reporter: Wilbur Yang
Assignee: Wilbur Yang
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7840.patch


 When Hive is built with Maven, the default template for hive-site.xml 
 (hive/packaging/target/apache-hive-0.14.0-SNAPSHOT-bin/apache-hive-0.14.0-SNAPSHOT-bin/conf/hive-default.xml.template)
  uses the key tag as opposed to the correct name tag. If a user were to 
 create a custom hive-site.xml using this template, then it results in a 
 rather confusing situation in which Hive logs that it has loaded 
 hive-site.xml, but in reality none of those properties are registering 
 correctly.
 *Wrong:*
 {quote}
 configuration
   ...
   property
 keyhive.exec.script.wrapper/key
 value/
 description/
   /property
   ...
 {quote}
 *Right:*
 {quote}
 configuration
   ...
   property
 namehive.exec.script.wrapper/name
 value/
 description/
   /property
   ...
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   3   >