[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2014-08-08 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090350#comment-14090350
 ] 

Prasanth J commented on HIVE-4123:
--

Please go ahead and update the original description. 
At this point the only possible valid values are 0.11 and 0.12. As you had 
mentioned if the parameter is not defined or defined wrongly it will use the 
default 0.12 encoding. 

bq. Is that accurate? Can releases be specified as 0.12.0 or 0.13.1?
Yes. Accurate. HIVE-6002 was trying to add patch number to the write version so 
that numbers can be specified as 0.12.1. But I don't think it will be committed 
until next major change to ORC writer.

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: TODOC12, orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123-8.patch, HIVE-4123.1.git.patch.txt, 
 HIVE-4123.2.git.patch.txt, HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, 
 HIVE-4123.5.txt, HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123.8.txt, 
 HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7629) Problem in SMB Joins between two Parquet tables

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090365#comment-14090365
 ] 

Hive QA commented on HIVE-7629:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660393/HIVE-7629.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5887 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/220/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/220/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-220/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660393

 Problem in SMB Joins between two Parquet tables
 ---

 Key: HIVE-7629
 URL: https://issues.apache.org/jira/browse/HIVE-7629
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Suma Shivaprasad
  Labels: Parquet
 Fix For: 0.14.0

 Attachments: HIVE-7629.patch


 The issue is clearly seen when two bucketed and sorted parquet tables with 
 different number of columns are involved in the join . The following 
 exception is seen
 Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
 at java.util.ArrayList.rangeCheck(ArrayList.java:635)
 at java.util.ArrayList.get(ArrayList.java:411)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:101)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:79)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:66)
 at 
 org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.init(CombineHiveRecordReader.java:65)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 24497: HIVE-7629 - Map joins between two parquet tables failing

2014-08-08 Thread Suma Shivaprasad

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24497/
---

Review request for hive.


Bugs: HIVE-7629
https://issues.apache.org/jira/browse/HIVE-7629


Repository: hive-git


Description
---

Map Joins between 2 parquet tables are failing since the Mapper is trying to 
access the columns of the first table(bigger table) while trying to load the  
second table(smaller map join table). Fixed this by adding a guard on the 
column indexes passed by hive


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java 
2f155f6 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
 d6be4bd 
  ql/src/test/queries/clientpositive/parquet_join.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_join.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/24497/diff/


Testing
---

parquet_join.q covers most types of joins between 2 parquet tables - Normal, 
Map join, SMB join


Thanks,

Suma Shivaprasad



[jira] [Commented] (HIVE-7629) Problem in SMB Joins between two Parquet tables

2014-08-08 Thread Suma Shivaprasad (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090371#comment-14090371
 ] 

Suma Shivaprasad commented on HIVE-7629:


Reviewboard request - https://reviews.apache.org/r/24497/

 Problem in SMB Joins between two Parquet tables
 ---

 Key: HIVE-7629
 URL: https://issues.apache.org/jira/browse/HIVE-7629
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Suma Shivaprasad
  Labels: Parquet
 Fix For: 0.14.0

 Attachments: HIVE-7629.patch


 The issue is clearly seen when two bucketed and sorted parquet tables with 
 different number of columns are involved in the join . The following 
 exception is seen
 Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
 at java.util.ArrayList.rangeCheck(ArrayList.java:635)
 at java.util.ArrayList.get(ArrayList.java:411)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:101)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:79)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:66)
 at 
 org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.init(CombineHiveRecordReader.java:65)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 23674: Handle db qualified names consistently across all HiveQL statements

2014-08-08 Thread Thejas Nair


 On Aug. 5, 2014, 5:09 a.m., Thejas Nair wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java, line 607
  https://reviews.apache.org/r/23674/diff/2/?file=647687#file647687line607
 
  doesn't the default authorization mode support columns in show-grants ? 
  It is there in ShowGrantDesc
 
 
 Navis Ryu wrote:
 I've moved columns in ShowGrantDesc to PrivilegeObjectDesc, which seemed 
 more neat, imho. Isn't it?

Yes, thats certainly better.


 On Aug. 5, 2014, 5:09 a.m., Thejas Nair wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java, line 
  2110
  https://reviews.apache.org/r/23674/diff/2/?file=647694#file647694line2110
 
  how about using BaseSemanticAnalyzer. getQualifiedTableName and having 
  this check for 'duplicate declaration' there ?
  
 
 
 Navis Ryu wrote:
 It's not TOK_TABNAME (which is TOK_FROM identifier? 
 (identifier|StringLiteral)?) and seemed not replaced with 
 getQualifiedTableName()

Thanks for clarifying!


 On Aug. 5, 2014, 5:09 a.m., Thejas Nair wrote:
  ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HivePrivilegeObject.java,
   line 94
  https://reviews.apache.org/r/23674/diff/2/?file=647707#file647707line94
 
  Isn't it better to represent the columns as a set instead of list, as 
  multiple columns with same name in this object does not make sense ?
  Same for other places in this patch where columns has been changed from 
  a set to list.
 
 Navis Ryu wrote:
 HivePrivilegeObject compares columns by iteration. If columns is not 
 ordered somehow, it seemed not a valid comparison. I didn't have a idea to 
 compare two column sets, I just replaced it to a sorted list, which felt 
 easier that that. Any idea?

Sounds fine. Maybe we can do a copy and sort as part of the constructor of this 
HivePrivilegeObject, instead of relying on the argument being sorted. That is 
likely to avoid potential bugs. But that can be done as part of separate jira.


- Thejas


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23674/#review49498
---


On Aug. 1, 2014, 1:55 a.m., Navis Ryu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/23674/
 ---
 
 (Updated Aug. 1, 2014, 1:55 a.m.)
 
 
 Review request for hive and Thejas Nair.
 
 
 Bugs: HIVE-4064
 https://issues.apache.org/jira/browse/HIVE-4064
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Hive doesn't consistently handle db qualified names across all HiveQL 
 statements. While some HiveQL statements such as SELECT support DB qualified 
 names, other such as CREATE INDEX doesn't. 
 
 
 Diffs
 -
 
   
 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveAuthorizerCheckInvocation.java
  c91b15c 
   
 itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/CheckColumnAccessHook.java
  14fc430 
   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
 b74868b 
   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
 5a56ced 
   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 4f186f4 
   ql/src/java/org/apache/hadoop/hive/ql/Driver.java cba5cfa 
   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 40d910c 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 4cf4522 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java a7e50ad 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java ae87aac 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
  11a6d07 
   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
 22945e3 
   ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnAccessInfo.java 939dc65 
   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
 67a3aa7 
   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g ab1188a 
   ql/src/java/org/apache/hadoop/hive/ql/parse/IndexUpdater.java 856ec2f 
   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 51838ae 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/authorization/HiveAuthorizationTaskFactoryImpl.java
  826bdf3 
   ql/src/java/org/apache/hadoop/hive/ql/plan/AlterIndexDesc.java 0318e4b 
   ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableAlterPartDesc.java 
 cf67e16 
   ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableSimpleDesc.java 
 541675c 
   ql/src/java/org/apache/hadoop/hive/ql/plan/PrivilegeObjectDesc.java 9417220 
   ql/src/java/org/apache/hadoop/hive/ql/plan/RenamePartitionDesc.java 1b5fb9e 
   ql/src/java/org/apache/hadoop/hive/ql/plan/ShowColumnsDesc.java fe6a91e 
   ql/src/java/org/apache/hadoop/hive/ql/plan/ShowGrantDesc.java aa88153 

[jira] [Commented] (HIVE-4064) Handle db qualified names consistently across all HiveQL statements

2014-08-08 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090422#comment-14090422
 ] 

Thejas M Nair commented on HIVE-4064:
-

[~navis] Can you also please upload the new patch to reviewboard ?


 Handle db qualified names consistently across all HiveQL statements
 ---

 Key: HIVE-4064
 URL: https://issues.apache.org/jira/browse/HIVE-4064
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Navis
 Attachments: HIVE-4064-1.patch, HIVE-4064.1.patch.txt, 
 HIVE-4064.2.patch.txt, HIVE-4064.3.patch.txt, HIVE-4064.4.patch.txt, 
 HIVE-4064.5.patch.txt, HIVE-4064.6.patch.txt


 Hive doesn't consistently handle db qualified names across all HiveQL 
 statements. While some HiveQL statements such as SELECT support DB qualified 
 names, other such as CREATE INDEX doesn't. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 24498: A method to extrapolate the missing column status for the partitions.

2014-08-08 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24498/
---

Review request for hive.


Repository: hive-git


Description
---

We propose a method to extrapolate the missing column status for the partitions.


Diffs
-

  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
43c412d 

Diff: https://reviews.apache.org/r/24498/diff/


Testing
---


Thanks,

pengcheng xiong



[jira] [Created] (HIVE-7657) Nullable union of 3 or more types is not recognized nullable

2014-08-08 Thread Arkadiusz Gasior (JIRA)
Arkadiusz Gasior created HIVE-7657:
--

 Summary: Nullable union of 3 or more types is not recognized 
nullable
 Key: HIVE-7657
 URL: https://issues.apache.org/jira/browse/HIVE-7657
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Arkadiusz Gasior


Handling nullable union of 3 types or more is causing serialization issues, as 
[null,long,string] is not recognized nullable. Potential code causing 
issues might be AvroSerdeUtils.java: 

{code}
  public static boolean isNullableType(Schema schema) {
return schema.getType().equals(Schema.Type.UNION) 
   schema.getTypes().size() == 2 
 (schema.getTypes().get(0).getType().equals(Schema.Type.NULL) ||
  schema.getTypes().get(1).getType().equals(Schema.Type.NULL));
  // [null, null] not allowed, so this check is ok.
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7553) avoid the scheduling maintenance window for every jar change

2014-08-08 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-7553:
---

Attachment: HIVE-7553.pdf

Since I do not have post privilege in hive space from wiki, I have to attach 
the original design document here. Can you please help review my design? Thanks 
in advance!

 avoid the scheduling maintenance window for every jar change
 

 Key: HIVE-7553
 URL: https://issues.apache.org/jira/browse/HIVE-7553
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-7553.pdf


 When user needs to refresh existing or add a new jar to HS2, it needs to 
 restart it. As HS2 is service exposed to clients, this requires scheduling 
 maintenance window for every jar change. It would be great if we could avoid 
 that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-08 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-7624:
-

Attachment: HIVE-7624.3-spark.patch

 Reduce operator initialization failed when running multiple MR query on spark
 -

 Key: HIVE-7624
 URL: https://issues.apache.org/jira/browse/HIVE-7624
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
 HIVE-7624.patch


 The following error occurs when I try to run a query with multiple reduce 
 works (M-R-R):
 {quote}
 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
 java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
 [0:_col0]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
 …
 {quote}
 I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-08 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-7624:
-

Status: Patch Available  (was: Open)

 Reduce operator initialization failed when running multiple MR query on spark
 -

 Key: HIVE-7624
 URL: https://issues.apache.org/jira/browse/HIVE-7624
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
 HIVE-7624.patch


 The following error occurs when I try to run a query with multiple reduce 
 works (M-R-R):
 {quote}
 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
 java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
 [0:_col0]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
 …
 {quote}
 I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-08 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090452#comment-14090452
 ] 

Rui Li commented on HIVE-7624:
--

Finally I found this is because we don't set output collector for RS in 
ExecReducer. While this is natural for MR where ExecReducer shouldn't contain 
RS, we have to do it for spark. The added code just looks for RS and sets 
collector for it, so there shouldn't be any regression.

 Reduce operator initialization failed when running multiple MR query on spark
 -

 Key: HIVE-7624
 URL: https://issues.apache.org/jira/browse/HIVE-7624
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
 HIVE-7624.patch


 The following error occurs when I try to run a query with multiple reduce 
 works (M-R-R):
 {quote}
 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
 java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
 [0:_col0]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
 …
 {quote}
 I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7658) Hive search order for hive-site.xml when using --config option

2014-08-08 Thread James Spurin (JIRA)
James Spurin created HIVE-7658:
--

 Summary: Hive search order for hive-site.xml when using --config 
option
 Key: HIVE-7658
 URL: https://issues.apache.org/jira/browse/HIVE-7658
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
 Environment: -bash-3.2$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.9 (Tikanga)

Hive 0.13.0-mapr-1406
Subversion git://rhbuild/root/builds/opensource/node/ecosystem/dl/hive -r 
4ff8f8b4a8fc4862727108204399710ef7ee7abc
Compiled by root on Tue Jul 1 14:18:09 PDT 2014
From source with checksum 208afc25260342b51aefd2e0edf4c9d6
Reporter: James Spurin
Priority: Minor


When using the hive cl, the tool appears to favour a hive-site.xml file in the 
current working directory even if the --config option is used with a valid 
directory containing a hive-site.xml file.

I would have expected the directory specified with --config to take precedence 
in the CLASSPATH search order.

Here's an example -



/home/spurija/hive-site.xml =

configuration
property
namehive.exec.local.scratchdir/name
value/tmp/example1/value
/property
/configuration



/tmp/hive/hive-site.xml =

configuration
property
namehive.exec.local.scratchdir/name
value/tmp/example2/value
/property
/configuration



-bash-4.1$ diff /home/spurija/hive-site.xml /tmp/hive/hive-site.xml
23c23
 value/tmp/example1/value
---
 value/tmp/example2/value




{ check the value of scratchdir, should be example 1 }
-bash-4.1$ pwd
/home/spurija
-bash-4.1$ hive

Logging initialized using configuration in 
jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
hive set hive.exec.local.scratchdir;
hive.exec.local.scratchdir=/tmp/example1




{ run with a specified config, check the value of scratchdir, should be 
example2 … still reported as example1 }

-bash-4.1$ pwd
/home/spurija
-bash-4.1$ hive --config /tmp/hive

Logging initialized using configuration in 
jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
hive set hive.exec.local.scratchdir;
hive.exec.local.scratchdir=/tmp/example1




{ remove the local config, check the value of scratchdir, should be example2 … 
now correct }

-bash-4.1$ pwd
/home/spurija
-bash-4.1$ rm hive-site.xml
-bash-4.1$ hive --config /tmp/hive

Logging initialized using configuration in 
jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
hive set hive.exec.local.scratchdir;
hive.exec.local.scratchdir=/tmp/example2


Is this expected behavior or should it use the directory supplied with --config 
as the preferred configuration?




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7658) Hive search order for hive-site.xml when using --config option

2014-08-08 Thread James Spurin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Spurin updated HIVE-7658:
---

Environment: 
Red Hat Enterprise Linux Server release 5.9 (Tikanga)

Hive 0.13.0-mapr-1406
Subversion git://rhbuild/root/builds/opensource/node/ecosystem/dl/hive -r 
4ff8f8b4a8fc4862727108204399710ef7ee7abc
Compiled by root on Tue Jul 1 14:18:09 PDT 2014
From source with checksum 208afc25260342b51aefd2e0edf4c9d6

  was:
-bash-3.2$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.9 (Tikanga)

Hive 0.13.0-mapr-1406
Subversion git://rhbuild/root/builds/opensource/node/ecosystem/dl/hive -r 
4ff8f8b4a8fc4862727108204399710ef7ee7abc
Compiled by root on Tue Jul 1 14:18:09 PDT 2014
From source with checksum 208afc25260342b51aefd2e0edf4c9d6


 Hive search order for hive-site.xml when using --config option
 --

 Key: HIVE-7658
 URL: https://issues.apache.org/jira/browse/HIVE-7658
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
 Environment: Red Hat Enterprise Linux Server release 5.9 (Tikanga)
 Hive 0.13.0-mapr-1406
 Subversion git://rhbuild/root/builds/opensource/node/ecosystem/dl/hive -r 
 4ff8f8b4a8fc4862727108204399710ef7ee7abc
 Compiled by root on Tue Jul 1 14:18:09 PDT 2014
 From source with checksum 208afc25260342b51aefd2e0edf4c9d6
Reporter: James Spurin
Priority: Minor

 When using the hive cl, the tool appears to favour a hive-site.xml file in 
 the current working directory even if the --config option is used with a 
 valid directory containing a hive-site.xml file.
 I would have expected the directory specified with --config to take 
 precedence in the CLASSPATH search order.
 Here's an example -
 /home/spurija/hive-site.xml =
 configuration
 property
 namehive.exec.local.scratchdir/name
 value/tmp/example1/value
 /property
 /configuration
 /tmp/hive/hive-site.xml =
 configuration
 property
 namehive.exec.local.scratchdir/name
 value/tmp/example2/value
 /property
 /configuration
 -bash-4.1$ diff /home/spurija/hive-site.xml /tmp/hive/hive-site.xml
 23c23
  value/tmp/example1/value
 ---
  value/tmp/example2/value
 { check the value of scratchdir, should be example 1 }
 -bash-4.1$ pwd
 /home/spurija
 -bash-4.1$ hive
 Logging initialized using configuration in 
 jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
 hive set hive.exec.local.scratchdir;
 hive.exec.local.scratchdir=/tmp/example1
 { run with a specified config, check the value of scratchdir, should be 
 example2 … still reported as example1 }
 -bash-4.1$ pwd
 /home/spurija
 -bash-4.1$ hive --config /tmp/hive
 Logging initialized using configuration in 
 jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
 hive set hive.exec.local.scratchdir;
 hive.exec.local.scratchdir=/tmp/example1
 { remove the local config, check the value of scratchdir, should be example2 
 … now correct }
 -bash-4.1$ pwd
 /home/spurija
 -bash-4.1$ rm hive-site.xml
 -bash-4.1$ hive --config /tmp/hive
 Logging initialized using configuration in 
 jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
 hive set hive.exec.local.scratchdir;
 hive.exec.local.scratchdir=/tmp/example2
 Is this expected behavior or should it use the directory supplied with 
 --config as the preferred configuration?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7658) Hive search order for hive-site.xml when using --config option

2014-08-08 Thread James Spurin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Spurin updated HIVE-7658:
---

Description: 
When using the hive cli, the tool appears to favour a hive-site.xml file in the 
current working directory even if the --config option is used with a valid 
directory containing a hive-site.xml file.

I would have expected the directory specified with --config to take precedence 
in the CLASSPATH search order.

Here's an example -



/home/spurija/hive-site.xml =

configuration
property
namehive.exec.local.scratchdir/name
value/tmp/example1/value
/property
/configuration



/tmp/hive/hive-site.xml =

configuration
property
namehive.exec.local.scratchdir/name
value/tmp/example2/value
/property
/configuration



-bash-4.1$ diff /home/spurija/hive-site.xml /tmp/hive/hive-site.xml
23c23
 value/tmp/example1/value
---
 value/tmp/example2/value




{ check the value of scratchdir, should be example 1 }
-bash-4.1$ pwd
/home/spurija
-bash-4.1$ hive

Logging initialized using configuration in 
jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
hive set hive.exec.local.scratchdir;
hive.exec.local.scratchdir=/tmp/example1




{ run with a specified config, check the value of scratchdir, should be 
example2 … still reported as example1 }

-bash-4.1$ pwd
/home/spurija
-bash-4.1$ hive --config /tmp/hive

Logging initialized using configuration in 
jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
hive set hive.exec.local.scratchdir;
hive.exec.local.scratchdir=/tmp/example1




{ remove the local config, check the value of scratchdir, should be example2 … 
now correct }

-bash-4.1$ pwd
/home/spurija
-bash-4.1$ rm hive-site.xml
-bash-4.1$ hive --config /tmp/hive

Logging initialized using configuration in 
jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
hive set hive.exec.local.scratchdir;
hive.exec.local.scratchdir=/tmp/example2


Is this expected behavior or should it use the directory supplied with --config 
as the preferred configuration?


  was:
When using the hive cl, the tool appears to favour a hive-site.xml file in the 
current working directory even if the --config option is used with a valid 
directory containing a hive-site.xml file.

I would have expected the directory specified with --config to take precedence 
in the CLASSPATH search order.

Here's an example -



/home/spurija/hive-site.xml =

configuration
property
namehive.exec.local.scratchdir/name
value/tmp/example1/value
/property
/configuration



/tmp/hive/hive-site.xml =

configuration
property
namehive.exec.local.scratchdir/name
value/tmp/example2/value
/property
/configuration



-bash-4.1$ diff /home/spurija/hive-site.xml /tmp/hive/hive-site.xml
23c23
 value/tmp/example1/value
---
 value/tmp/example2/value




{ check the value of scratchdir, should be example 1 }
-bash-4.1$ pwd
/home/spurija
-bash-4.1$ hive

Logging initialized using configuration in 
jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
hive set hive.exec.local.scratchdir;
hive.exec.local.scratchdir=/tmp/example1




{ run with a specified config, check the value of scratchdir, should be 
example2 … still reported as example1 }

-bash-4.1$ pwd
/home/spurija
-bash-4.1$ hive --config /tmp/hive

Logging initialized using configuration in 
jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
hive set hive.exec.local.scratchdir;
hive.exec.local.scratchdir=/tmp/example1




{ remove the local config, check the value of scratchdir, should be example2 … 
now correct }

-bash-4.1$ pwd
/home/spurija
-bash-4.1$ rm hive-site.xml
-bash-4.1$ hive --config /tmp/hive

Logging initialized using configuration in 
jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
hive set hive.exec.local.scratchdir;
hive.exec.local.scratchdir=/tmp/example2


Is this expected behavior or should it use the directory supplied with --config 
as the preferred configuration?



 Hive search order for hive-site.xml when using --config option
 --

 Key: HIVE-7658
 URL: https://issues.apache.org/jira/browse/HIVE-7658
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
 Environment: Red Hat Enterprise Linux Server release 5.9 (Tikanga)
 Hive 0.13.0-mapr-1406
 Subversion git://rhbuild/root/builds/opensource/node/ecosystem/dl/hive -r 
 4ff8f8b4a8fc4862727108204399710ef7ee7abc
 Compiled by root on Tue Jul 1 14:18:09 PDT 2014
 From source with checksum 208afc25260342b51aefd2e0edf4c9d6
Reporter: James Spurin
Priority: Minor

 When using the 

[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090499#comment-14090499
 ] 

Hive QA commented on HIVE-4629:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660420/HIVE-4629.6.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5875 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_hdfs_file_with_space_in_the_name
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/221/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/221/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-221/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660420

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, 
 HIVE-4629.2.patch, HIVE-4629.3.patch.txt, HIVE-4629.4.patch, 
 HIVE-4629.5.patch, HIVE-4629.6.patch


 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-08 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-7624:
-

Attachment: HIVE-7624.4-spark.patch

 Reduce operator initialization failed when running multiple MR query on spark
 -

 Key: HIVE-7624
 URL: https://issues.apache.org/jira/browse/HIVE-7624
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
 HIVE-7624.4-spark.patch, HIVE-7624.patch


 The following error occurs when I try to run a query with multiple reduce 
 works (M-R-R):
 {quote}
 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
 java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
 [0:_col0]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
 …
 {quote}
 I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-08 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090509#comment-14090509
 ] 

Rui Li commented on HIVE-7624:
--

Some change may bypass HIVE-7597. Remove it.

 Reduce operator initialization failed when running multiple MR query on spark
 -

 Key: HIVE-7624
 URL: https://issues.apache.org/jira/browse/HIVE-7624
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
 HIVE-7624.4-spark.patch, HIVE-7624.patch


 The following error occurs when I try to run a query with multiple reduce 
 works (M-R-R):
 {quote}
 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
 java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
 [0:_col0]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
 …
 {quote}
 I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090525#comment-14090525
 ] 

Hive QA commented on HIVE-7624:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660582/HIVE-7624.3-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5843 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/23/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/23/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-23/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660582

 Reduce operator initialization failed when running multiple MR query on spark
 -

 Key: HIVE-7624
 URL: https://issues.apache.org/jira/browse/HIVE-7624
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
 HIVE-7624.4-spark.patch, HIVE-7624.patch


 The following error occurs when I try to run a query with multiple reduce 
 works (M-R-R):
 {quote}
 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
 java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
 [0:_col0]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
 …
 {quote}
 I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090577#comment-14090577
 ] 

Hive QA commented on HIVE-5760:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660430/HIVE-5760.2.patch

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 5894 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_simple
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_varchar_simple
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/222/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/222/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-222/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660430

 Add vectorized support for CHAR/VARCHAR data types
 --

 Key: HIVE-5760
 URL: https://issues.apache.org/jira/browse/HIVE-5760
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Matt McCline
 Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch


 Add support to allow queries referencing VARCHAR columns and expression 
 results to run efficiently in vectorized mode. This should re-use the code 
 for the STRING type to the extent possible and beneficial. Include unit tests 
 and end-to-end tests. Consider re-using or extending existing end-to-end 
 tests for vectorized string operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7647) Beeline does not honor --headerInterval and --color when executing with -e

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090650#comment-14090650
 ] 

Hive QA commented on HIVE-7647:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660466/HIVE-7647.1.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5886 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/223/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/223/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-223/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660466

 Beeline does not honor --headerInterval and --color when executing with -e
 

 Key: HIVE-7647
 URL: https://issues.apache.org/jira/browse/HIVE-7647
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.14.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7647.1.patch


 --showHeader is being honored
 [root@localhost ~]# beeline --showHeader=false -u 
 'jdbc:hive2://localhost:1/default' -n hive -d 
 org.apache.hive.jdbc.HiveDriver -e select * from sample_07 limit 10;
 Connecting to jdbc:hive2://localhost:1/default
 Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
 Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
 Transaction isolation: TRANSACTION_REPEATABLE_READ
 -hiveconf (No such file or directory)
 +--+--++-+
 | 00-  | All Occupations  | 135185230  | 42270   |
 | 11-  | Management occupations   | 6152650| 100310  |
 | 11-1011  | Chief executives | 301930 | 160440  |
 | 11-1021  | General and operations managers  | 1697690| 107970  |
 | 11-1031  | Legislators  | 64650  | 37980   |
 | 11-2011  | Advertising and promotions managers  | 36100  | 94720   |
 | 11-2021  | Marketing managers   | 166790 | 118160  |
 | 11-2022  | Sales managers   | 333910 | 110390  |
 | 11-2031  | Public relations managers| 51730  | 101220  |
 | 11-3011  | Administrative services managers | 246930 | 79500   |
 +--+--++-+
 10 rows selected (0.838 seconds)
 Beeline version 0.12.0-cdh5.1.0 by Apache Hive
 Closing: org.apache.hive.jdbc.HiveConnection
 --outputFormat is being honored.
 [root@localhost ~]# beeline --outputFormat=csv -u 
 'jdbc:hive2://localhost:1/default' -n hive -d 
 org.apache.hive.jdbc.HiveDriver -e select * from sample_07 limit 10;
 Connecting to jdbc:hive2://localhost:1/default
 Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
 Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
 Transaction isolation: TRANSACTION_REPEATABLE_READ
 'code','description','total_emp','salary'
 '00-','All Occupations','135185230','42270'
 '11-','Management occupations','6152650','100310'
 '11-1011','Chief executives','301930','160440'
 '11-1021','General and operations managers','1697690','107970'
 '11-1031','Legislators','64650','37980'
 '11-2011','Advertising and promotions managers','36100','94720'
 '11-2021','Marketing managers','166790','118160'
 '11-2022','Sales managers','333910','110390'
 '11-2031','Public relations managers','51730','101220'
 '11-3011','Administrative services managers','246930','79500'
 10 rows selected (0.664 seconds)
 Beeline version 0.12.0-cdh5.1.0 by Apache Hive
 Closing: org.apache.hive.jdbc.HiveConnection
 both --color  --headerInterval are being honored when executing using -f 
 option (reads query from a file rather than the commandline) (cannot really 
 see the color here but use the terminal colors)
 [root@localhost ~]# beeline --showheader=true --color=true --headerInterval=5 
 -u 

[jira] [Commented] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090656#comment-14090656
 ] 

Hive QA commented on HIVE-7223:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660470/HIVE-7223.2.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/224/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/224/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-224/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[ERROR] location: class org.apache.hadoop.hive.metastore.ObjectStore
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/metastore/src/java/org/apache/hadoop/hive/metastore/events/PreAddPartitionEvent.java:[24,55]
 package org.apache.hadoop.hive.metastore.partition.spec does not exist
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/metastore/src/java/org/apache/hadoop/hive/metastore/events/PreAddPartitionEvent.java:[34,11]
 cannot find symbol
[ERROR] symbol:   class PartitionSpecProxy
[ERROR] location: class 
org.apache.hadoop.hive.metastore.events.PreAddPartitionEvent
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/metastore/src/java/org/apache/hadoop/hive/metastore/events/PreAddPartitionEvent.java:[50,44]
 cannot find symbol
[ERROR] symbol:   class PartitionSpecProxy
[ERROR] location: class 
org.apache.hadoop.hive.metastore.events.PreAddPartitionEvent
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[323,43]
 cannot find symbol
[ERROR] symbol:   class PartitionSpec
[ERROR] location: interface 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.AsyncIface
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[4786,43]
 cannot find symbol
[ERROR] symbol:   class PartitionSpec
[ERROR] location: class 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.AsyncClient
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[4794,20]
 cannot find symbol
[ERROR] symbol:   class PartitionSpec
[ERROR] location: class 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.AsyncClient.add_partitions_pspec_call
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[4795,45]
 cannot find symbol
[ERROR] symbol:   class PartitionSpec
[ERROR] location: class 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.AsyncClient.add_partitions_pspec_call
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[5484,19]
 cannot find symbol
[ERROR] symbol:   class PartitionSpec
[ERROR] location: class 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.AsyncClient.get_partitions_pspec_call
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[5733,19]
 cannot find symbol
[ERROR] symbol:   class PartitionSpec
[ERROR] location: class 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.AsyncClient.get_part_specs_by_filter_call
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[1294,42]
 cannot find symbol
[ERROR] symbol:   class PartitionSpec
[ERROR] location: class 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.Client
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[1300,48]
 cannot find symbol
[ERROR] symbol:   class PartitionSpec
[ERROR] location: class 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.Client
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[1853,17]
 cannot find symbol
[ERROR] symbol:   class PartitionSpec
[ERROR] location: class 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.Client
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[1868,17]
 cannot find symbol
[ERROR] symbol:   class PartitionSpec
[ERROR] location: class 

[jira] [Commented] (HIVE-7649) Support column stats with temporary tables

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090657#comment-14090657
 ] 

Hive QA commented on HIVE-7649:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660474/HIVE-7649.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/225/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/225/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-225/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-225/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 
'metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java'
Reverted 
'metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java'
Reverted 'metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java'
Reverted 'metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java'
Reverted 
'metastore/src/java/org/apache/hadoop/hive/metastore/events/AddPartitionEvent.java'
Reverted 
'metastore/src/java/org/apache/hadoop/hive/metastore/events/PreAddPartitionEvent.java'
Reverted 
'metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java'
Reverted 'metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java'
Reverted 
'metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java'
Reverted 
'metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java'
Reverted 
'metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java'
Reverted 'metastore/src/gen/thrift/gen-py/hive_metastore/ttypes.py'
Reverted 'metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py'
Reverted 
'metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote'
Reverted 'metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp'
Reverted 'metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp'
Reverted 'metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h'
Reverted 'metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h'
Reverted 
'metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp'
Reverted 'metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb'
Reverted 'metastore/src/gen/thrift/gen-rb/hive_metastore_types.rb'
Reverted 
'metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AggrStats.java'
Reverted 
'metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ColumnStatistics.java'
Reverted 
'metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/PartitionsStatsRequest.java'
Reverted 
'metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ShowCompactResponse.java'
Reverted 
'metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/EnvironmentContext.java'
Reverted 
'metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/RequestPartsSpec.java'
Reverted 
'metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AddPartitionsRequest.java'
Reverted 
'metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/HeartbeatTxnRangeResponse.java'
Reverted 
'metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ShowLocksResponse.java'
Reverted 

[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090689#comment-14090689
 ] 

Hive QA commented on HIVE-7624:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660590/HIVE-7624.4-spark.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5828 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/24/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/24/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-24/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660590

 Reduce operator initialization failed when running multiple MR query on spark
 -

 Key: HIVE-7624
 URL: https://issues.apache.org/jira/browse/HIVE-7624
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
 HIVE-7624.4-spark.patch, HIVE-7624.patch


 The following error occurs when I try to run a query with multiple reduce 
 works (M-R-R):
 {quote}
 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
 java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
 [0:_col0]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
 …
 {quote}
 I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7659) Unnecessary sort in query plan

2014-08-08 Thread Rui Li (JIRA)
Rui Li created HIVE-7659:


 Summary: Unnecessary sort in query plan
 Key: HIVE-7659
 URL: https://issues.apache.org/jira/browse/HIVE-7659
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Rui Li


For hive on spark.
Currently we rely on the sort order in RS to decide whether we need a sortByKey 
transformation. However a simple group by query will also have the sort order 
set to '+'.
Consider the query: select key from table group by key. The RS in the map work 
will have sort order set to '+', thus requiring a sortByKey shuffle.

To avoid the unnecessary sort, we should either use another way to decide if 
there has to be a sort shuffle, or we should set the sort order only when sort 
is really needed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7620) Hive metastore fails to start in secure mode due to java.lang.NoSuchFieldError: SASL_PROPS error

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090713#comment-14090713
 ] 

Hive QA commented on HIVE-7620:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660498/HIVE-7620.2.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5886 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/226/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/226/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-226/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660498

 Hive metastore fails to start in secure mode due to 
 java.lang.NoSuchFieldError: SASL_PROPS error
 --

 Key: HIVE-7620
 URL: https://issues.apache.org/jira/browse/HIVE-7620
 Project: Hive
  Issue Type: Bug
  Components: Metastore
 Environment: Hadoop 2.5-snapshot with kerberos authentication on
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7620.1.patch, HIVE-7620.2.patch


 When Hive metastore is started in a Hadoop 2.5 cluster, it fails to start 
 with following error
 {code}
 14/07/31 17:45:58 [main]: ERROR metastore.HiveMetaStore: Metastore Thrift 
 Server threw an exception...
 java.lang.NoSuchFieldError: SASL_PROPS
   at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S.getHadoopSaslProperties(HadoopThriftAuthBridge20S.java:126)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getMetaStoreSaslProperties(MetaStoreUtils.java:1483)
   at 
 org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5225)
   at 
 org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5152)
 {code}
 Changes in HADOOP-10451 to remove SaslRpcServer.SASL_PROPS are causing this 
 error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7645) Hive CompactorMR job set NUM_BUCKETS mistake

2014-08-08 Thread Xiaoyu Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090728#comment-14090728
 ] 

Xiaoyu Wang commented on HIVE-7645:
---

This error should not cause by this patch!

 Hive CompactorMR job set NUM_BUCKETS mistake
 

 Key: HIVE-7645
 URL: https://issues.apache.org/jira/browse/HIVE-7645
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.13.1
Reporter: Xiaoyu Wang
 Attachments: HIVE-7645.patch


 code:
 job.setInt(NUM_BUCKETS, sd.getBucketColsSize());
 should change to:
 job.setInt(NUM_BUCKETS, sd.getNumBuckets());



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6959) Enable Constant propagation optimizer for Hive Vectorization

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090761#comment-14090761
 ] 

Hive QA commented on HIVE-6959:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660482/HIVE-6959.5.patch

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 5885 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_14
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_15
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_9
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/227/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/227/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-227/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660482

 Enable Constant propagation optimizer for Hive Vectorization
 

 Key: HIVE-6959
 URL: https://issues.apache.org/jira/browse/HIVE-6959
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-6959.1.patch, HIVE-6959.2.patch, HIVE-6959.4.patch, 
 HIVE-6959.5.patch


 HIVE-5771 covers Constant propagation optimizer for Hive. Now that HIVE-5771 
 is committed, we should remove any vectorization related code which 
 duplicates this feature. For example, a fn to be cleaned is 
 VectorizarionContext::foldConstantsForUnaryExprs(). In addition to this 
 change, constant propagation should kick in when vectorization is enabled. 
 i.e. we need to lift the HIVE_VECTORIZATION_ENABLED restriction inside 
 ConstantPropagate::transform().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7653) Hive AvroSerDe does not support circular references in Schema

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090766#comment-14090766
 ] 

Hive QA commented on HIVE-7653:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660500/HIVE-7653.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/228/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/228/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-228/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-228/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'ql/src/test/results/clientpositive/vectorization_9.q.out'
Reverted 'ql/src/test/results/clientpositive/vectorization_14.q.out'
Reverted 'ql/src/test/results/clientpositive/vector_decimal_math_funcs.q.out'
Reverted 'ql/src/test/results/clientpositive/vectorization_short_regress.q.out'
Reverted 'ql/src/test/results/clientpositive/vectorization_16.q.out'
Reverted 'ql/src/test/results/clientpositive/vectorized_parquet.q.out'
Reverted 'ql/src/test/results/clientpositive/vector_cast_constant.q.out'
Reverted 'ql/src/test/results/clientpositive/vector_elt.q.out'
Reverted 'ql/src/test/results/clientpositive/vectorization_div0.q.out'
Reverted 'ql/src/test/results/clientpositive/vector_coalesce.q.out'
Reverted 'ql/src/test/results/clientpositive/vectorization_15.q.out'
Reverted 'ql/src/test/results/clientpositive/vector_decimal_mapjoin.q.out'
Reverted 'ql/src/test/results/clientpositive/vector_between_in.q.out'
Reverted 'ql/src/test/results/clientpositive/vectorized_math_funcs.q.out'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java'
Reverted 'ql/src/test/queries/clientpositive/vector_coalesce.q'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagate.java'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/ConstantVectorExpression.java'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java'
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-unit/target itests/custom-serde/target itests/util/target 
hcatalog/target hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/webhcat/svr/target 
hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target 
hwi/target common/target common/src/gen service/target contrib/target 
serde/target beeline/target odbc/target cli/target 
ql/dependency-reduced-pom.xml ql/target 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java.orig
+ svn update

Fetching external item into 

[jira] [Updated] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers

2014-08-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-7373:
--

Attachment: HIVE-7373.1.patch

 Hive should not remove trailing zeros for decimal numbers
 -

 Key: HIVE-7373
 URL: https://issues.apache.org/jira/browse/HIVE-7373
 Project: Hive
  Issue Type: Bug
  Components: Types
Affects Versions: 0.13.0, 0.13.1
Reporter: Xuefu Zhang
Assignee: Sergio Peña
 Attachments: HIVE-7373.1.patch


 Currently Hive blindly removes trailing zeros of a decimal input number as 
 sort of standardization. This is questionable in theory and problematic in 
 practice.
 1. In decimal context,  number 3.14 has a different semantic meaning from 
 number 3.14. Removing trailing zeroes makes the meaning lost.
 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, 
 and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a 
 decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL 
 because the column doesn't allow a decimal number with integer part.
 Therefore, I propose Hive preserve the trailing zeroes (up to what the scale 
 allows). With this, in above example, 0.0, 0.00, and 0. will be 
 represented as 0.0 (precision=1, scale=1) internally.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers

2014-08-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-7373:
--

Status: Patch Available  (was: In Progress)

 Hive should not remove trailing zeros for decimal numbers
 -

 Key: HIVE-7373
 URL: https://issues.apache.org/jira/browse/HIVE-7373
 Project: Hive
  Issue Type: Bug
  Components: Types
Affects Versions: 0.13.1, 0.13.0
Reporter: Xuefu Zhang
Assignee: Sergio Peña
 Attachments: HIVE-7373.1.patch


 Currently Hive blindly removes trailing zeros of a decimal input number as 
 sort of standardization. This is questionable in theory and problematic in 
 practice.
 1. In decimal context,  number 3.14 has a different semantic meaning from 
 number 3.14. Removing trailing zeroes makes the meaning lost.
 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, 
 and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a 
 decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL 
 because the column doesn't allow a decimal number with integer part.
 Therefore, I propose Hive preserve the trailing zeroes (up to what the scale 
 allows). With this, in above example, 0.0, 0.00, and 0. will be 
 represented as 0.0 (precision=1, scale=1) internally.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers

2014-08-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090818#comment-14090818
 ] 

Sergio Peña commented on HIVE-7373:
---

RB: https://reviews.apache.org/r/24467/

 Hive should not remove trailing zeros for decimal numbers
 -

 Key: HIVE-7373
 URL: https://issues.apache.org/jira/browse/HIVE-7373
 Project: Hive
  Issue Type: Bug
  Components: Types
Affects Versions: 0.13.0, 0.13.1
Reporter: Xuefu Zhang
Assignee: Sergio Peña
 Attachments: HIVE-7373.1.patch


 Currently Hive blindly removes trailing zeros of a decimal input number as 
 sort of standardization. This is questionable in theory and problematic in 
 practice.
 1. In decimal context,  number 3.14 has a different semantic meaning from 
 number 3.14. Removing trailing zeroes makes the meaning lost.
 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, 
 and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a 
 decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL 
 because the column doesn't allow a decimal number with integer part.
 Therefore, I propose Hive preserve the trailing zeroes (up to what the scale 
 allows). With this, in above example, 0.0, 0.00, and 0. will be 
 represented as 0.0 (precision=1, scale=1) internally.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Work started] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers

2014-08-08 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-7373 started by Sergio Peña.

 Hive should not remove trailing zeros for decimal numbers
 -

 Key: HIVE-7373
 URL: https://issues.apache.org/jira/browse/HIVE-7373
 Project: Hive
  Issue Type: Bug
  Components: Types
Affects Versions: 0.13.0, 0.13.1
Reporter: Xuefu Zhang
Assignee: Sergio Peña

 Currently Hive blindly removes trailing zeros of a decimal input number as 
 sort of standardization. This is questionable in theory and problematic in 
 practice.
 1. In decimal context,  number 3.14 has a different semantic meaning from 
 number 3.14. Removing trailing zeroes makes the meaning lost.
 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, 
 and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a 
 decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL 
 because the column doesn't allow a decimal number with integer part.
 Therefore, I propose Hive preserve the trailing zeroes (up to what the scale 
 allows). With this, in above example, 0.0, 0.00, and 0. will be 
 represented as 0.0 (precision=1, scale=1) internally.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7446) Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090834#comment-14090834
 ] 

Hive QA commented on HIVE-7446:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660511/HIVE-7446.1.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5889 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/229/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/229/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-229/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660511

 Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables
 --

 Key: HIVE-7446
 URL: https://issues.apache.org/jira/browse/HIVE-7446
 Project: Hive
  Issue Type: New Feature
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Attachments: HIVE-7446.1.patch, HIVE-7446.patch


 HIVE-6806 adds native support for creating hive table stored as Avro. It 
 would be good to add support to ALTER TABLE .. ADD COLUMN to Avro backed 
 tables.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7229) String is compared using equal in HiveMetaStore#HMSHandler#init()

2014-08-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7229:
---

Assignee: KangHS

 String is compared using equal in HiveMetaStore#HMSHandler#init()
 -

 Key: HIVE-7229
 URL: https://issues.apache.org/jira/browse/HIVE-7229
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: KangHS
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7229.1.patch, HIVE-7229.patch, HIVE-7229.patch


 Around line 423:
 {code}
   if (partitionValidationRegex != null  partitionValidationRegex != ) 
 {
 partitionValidationPattern = 
 Pattern.compile(partitionValidationRegex);
 {code}
 partitionValidationRegex.isEmpty() can be used instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7229) String is compared using equal in HiveMetaStore#HMSHandler#init()

2014-08-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7229:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

 String is compared using equal in HiveMetaStore#HMSHandler#init()
 -

 Key: HIVE-7229
 URL: https://issues.apache.org/jira/browse/HIVE-7229
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: KangHS
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7229.1.patch, HIVE-7229.patch, HIVE-7229.patch


 Around line 423:
 {code}
   if (partitionValidationRegex != null  partitionValidationRegex != ) 
 {
 partitionValidationPattern = 
 Pattern.compile(partitionValidationRegex);
 {code}
 partitionValidationRegex.isEmpty() can be used instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7600) ConstantPropagateProcFactory uses reference equality on Boolean

2014-08-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7600:
---

Assignee: KangHS

 ConstantPropagateProcFactory uses reference equality on Boolean
 ---

 Key: HIVE-7600
 URL: https://issues.apache.org/jira/browse/HIVE-7600
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: KangHS
 Attachments: HIVE-7600.patch


 shortcutFunction() has the following code:
 {code}
   if (c.getValue() == Boolean.FALSE) {
 {code}
 Boolean.FALSE.equals() should be used.
 There're a few other occurrences of using reference equality on Boolean in 
 this class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7600) ConstantPropagateProcFactory uses reference equality on Boolean

2014-08-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7600:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, KangHS!

 ConstantPropagateProcFactory uses reference equality on Boolean
 ---

 Key: HIVE-7600
 URL: https://issues.apache.org/jira/browse/HIVE-7600
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: KangHS
 Fix For: 0.14.0

 Attachments: HIVE-7600.patch


 shortcutFunction() has the following code:
 {code}
   if (c.getValue() == Boolean.FALSE) {
 {code}
 Boolean.FALSE.equals() should be used.
 There're a few other occurrences of using reference equality on Boolean in 
 this class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7636) cbo fails when no projection is required from aggregate function

2014-08-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7636:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to cbo branch.

 cbo fails when no projection is required from aggregate function
 

 Key: HIVE-7636
 URL: https://issues.apache.org/jira/browse/HIVE-7636
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: h-7636.patch


 select count (*) from t1 join t2 on t1.c1=t2.c2 fails



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7660) Hive to support qualify analytic filtering

2014-08-08 Thread Viji (JIRA)
Viji created HIVE-7660:
--

 Summary: Hive to support qualify analytic filtering
 Key: HIVE-7660
 URL: https://issues.apache.org/jira/browse/HIVE-7660
 Project: Hive
  Issue Type: New Feature
Reporter: Viji
Priority: Trivial


Currently, Hive does not support qualify analytic filtering. It would be useful 
fi this feature were added in the future.

As a workaround, since it is just a filter, we can replace it with a subquery 
and filter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7341) Support for Table replication across HCatalog instances

2014-08-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7341:
---

Attachment: HIVE-7341.3.patch

Added documentation for MetadataSerializer, and subclass.

 Support for Table replication across HCatalog instances
 ---

 Key: HIVE-7341
 URL: https://issues.apache.org/jira/browse/HIVE-7341
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Fix For: 0.14.0

 Attachments: HIVE-7341.1.patch, HIVE-7341.2.patch, HIVE-7341.3.patch


 The HCatClient currently doesn't provide very much support for replicating 
 HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) 
 instances. 
 Systems similar to Apache Falcon might find the need to replicate partition 
 data between 2 clusters, and keep the HCatalog metadata in sync between the 
 two. This poses a couple of problems:
 # The definition of the source table might change (in column schema, I/O 
 formats, record-formats, serde-parameters, etc.) The system will need a way 
 to diff 2 tables and update the target-metastore with the changes. E.g. 
 {code}
 targetTable.resolve( sourceTable, targetTable.diff(sourceTable) );
 hcatClient.updateTableSchema(dbName, tableName, targetTable);
 {code}
 # The current {{HCatClient.addPartitions()}} API requires that the 
 partition's schema be derived from the table's schema, thereby requiring that 
 the table-schema be resolved *before* partitions with the new schema are 
 added to the table. This is problematic, because it introduces race 
 conditions when 2 partitions with differing column-schemas (e.g. right after 
 a schema change) are copied in parallel. This can be avoided if each 
 HCatAddPartitionDesc kept track of the partition's schema, in flight.
 # The source and target metastores might be running different/incompatible 
 versions of Hive. 
 The impending patch attempts to address these concerns (with some caveats).
 # {{HCatTable}} now has 
 ## a {{diff()}} method, to compare against another HCatTable instance
 ## a {{resolve(diff)}} method to copy over specified table-attributes from 
 another HCatTable
 ## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and 
 {{HCatClient.deserializeTable()}}), so that HCatTable instances constructed 
 in other class-loaders may be used for comparison
 # {{HCatPartition}} now provides finer-grained control over a Partition's 
 column-schema, StorageDescriptor settings, etc. This allows partitions to be 
 copied completely from source, with the ability to override specific 
 properties if required (e.g. location).
 # {{HCatClient.updateTableSchema()}} can now update the entire 
 table-definition, not just the column schema.
 # I've cleaned up and removed most of the redundancy between the HCatTable, 
 HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to 
 separate the table-attributes from the add-table-operation's attributes. By 
 providing fluent-interfaces in HCatTable, and composing an HCatTable instance 
 in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are 
 deprecated, in favour of those in HCatTable. Likewise, HCatPartition and 
 HCatAddPartitionDesc.
 I'll post a patch for trunk shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7341) Support for Table replication across HCatalog instances

2014-08-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7341:
---

Status: Open  (was: Patch Available)

 Support for Table replication across HCatalog instances
 ---

 Key: HIVE-7341
 URL: https://issues.apache.org/jira/browse/HIVE-7341
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Fix For: 0.14.0

 Attachments: HIVE-7341.1.patch, HIVE-7341.2.patch, HIVE-7341.3.patch


 The HCatClient currently doesn't provide very much support for replicating 
 HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) 
 instances. 
 Systems similar to Apache Falcon might find the need to replicate partition 
 data between 2 clusters, and keep the HCatalog metadata in sync between the 
 two. This poses a couple of problems:
 # The definition of the source table might change (in column schema, I/O 
 formats, record-formats, serde-parameters, etc.) The system will need a way 
 to diff 2 tables and update the target-metastore with the changes. E.g. 
 {code}
 targetTable.resolve( sourceTable, targetTable.diff(sourceTable) );
 hcatClient.updateTableSchema(dbName, tableName, targetTable);
 {code}
 # The current {{HCatClient.addPartitions()}} API requires that the 
 partition's schema be derived from the table's schema, thereby requiring that 
 the table-schema be resolved *before* partitions with the new schema are 
 added to the table. This is problematic, because it introduces race 
 conditions when 2 partitions with differing column-schemas (e.g. right after 
 a schema change) are copied in parallel. This can be avoided if each 
 HCatAddPartitionDesc kept track of the partition's schema, in flight.
 # The source and target metastores might be running different/incompatible 
 versions of Hive. 
 The impending patch attempts to address these concerns (with some caveats).
 # {{HCatTable}} now has 
 ## a {{diff()}} method, to compare against another HCatTable instance
 ## a {{resolve(diff)}} method to copy over specified table-attributes from 
 another HCatTable
 ## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and 
 {{HCatClient.deserializeTable()}}), so that HCatTable instances constructed 
 in other class-loaders may be used for comparison
 # {{HCatPartition}} now provides finer-grained control over a Partition's 
 column-schema, StorageDescriptor settings, etc. This allows partitions to be 
 copied completely from source, with the ability to override specific 
 properties if required (e.g. location).
 # {{HCatClient.updateTableSchema()}} can now update the entire 
 table-definition, not just the column schema.
 # I've cleaned up and removed most of the redundancy between the HCatTable, 
 HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to 
 separate the table-attributes from the add-table-operation's attributes. By 
 providing fluent-interfaces in HCatTable, and composing an HCatTable instance 
 in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are 
 deprecated, in favour of those in HCatTable. Likewise, HCatPartition and 
 HCatAddPartitionDesc.
 I'll post a patch for trunk shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7341) Support for Table replication across HCatalog instances

2014-08-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7341:
---

Status: Patch Available  (was: Open)

 Support for Table replication across HCatalog instances
 ---

 Key: HIVE-7341
 URL: https://issues.apache.org/jira/browse/HIVE-7341
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Fix For: 0.14.0

 Attachments: HIVE-7341.1.patch, HIVE-7341.2.patch, HIVE-7341.3.patch


 The HCatClient currently doesn't provide very much support for replicating 
 HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) 
 instances. 
 Systems similar to Apache Falcon might find the need to replicate partition 
 data between 2 clusters, and keep the HCatalog metadata in sync between the 
 two. This poses a couple of problems:
 # The definition of the source table might change (in column schema, I/O 
 formats, record-formats, serde-parameters, etc.) The system will need a way 
 to diff 2 tables and update the target-metastore with the changes. E.g. 
 {code}
 targetTable.resolve( sourceTable, targetTable.diff(sourceTable) );
 hcatClient.updateTableSchema(dbName, tableName, targetTable);
 {code}
 # The current {{HCatClient.addPartitions()}} API requires that the 
 partition's schema be derived from the table's schema, thereby requiring that 
 the table-schema be resolved *before* partitions with the new schema are 
 added to the table. This is problematic, because it introduces race 
 conditions when 2 partitions with differing column-schemas (e.g. right after 
 a schema change) are copied in parallel. This can be avoided if each 
 HCatAddPartitionDesc kept track of the partition's schema, in flight.
 # The source and target metastores might be running different/incompatible 
 versions of Hive. 
 The impending patch attempts to address these concerns (with some caveats).
 # {{HCatTable}} now has 
 ## a {{diff()}} method, to compare against another HCatTable instance
 ## a {{resolve(diff)}} method to copy over specified table-attributes from 
 another HCatTable
 ## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and 
 {{HCatClient.deserializeTable()}}), so that HCatTable instances constructed 
 in other class-loaders may be used for comparison
 # {{HCatPartition}} now provides finer-grained control over a Partition's 
 column-schema, StorageDescriptor settings, etc. This allows partitions to be 
 copied completely from source, with the ability to override specific 
 properties if required (e.g. location).
 # {{HCatClient.updateTableSchema()}} can now update the entire 
 table-definition, not just the column schema.
 # I've cleaned up and removed most of the redundancy between the HCatTable, 
 HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to 
 separate the table-attributes from the add-table-operation's attributes. By 
 providing fluent-interfaces in HCatTable, and composing an HCatTable instance 
 in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are 
 deprecated, in favour of those in HCatTable. Likewise, HCatPartition and 
 HCatAddPartitionDesc.
 I'll post a patch for trunk shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7661) Observed performance issues while sorting using Hive's Parallel Order by clause while retaining pre-existing sort order.

2014-08-08 Thread Vishal Kamath (JIRA)
Vishal Kamath created HIVE-7661:
---

 Summary: Observed performance issues while sorting using Hive's 
Parallel Order by clause while retaining pre-existing sort order.
 Key: HIVE-7661
 URL: https://issues.apache.org/jira/browse/HIVE-7661
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.12.0
 Environment: Cloudera 5.0
hive-0.12.0-cdh5.0.0
Red Hat Linux
Reporter: Vishal Kamath
 Fix For: 0.12.1


Improve Hive's sampling logic to accommodate use cases that require to retain 
the pre existing sort in the underlying source table. 

In order to support Parallel order by clause, Hive Samples the source table 
based on values provided to hive.optimize.sampling.orderby.number and 
hive.optimize.sampling.orderby.percent. 

This does work with reasonable performance when sorting is performed on a 
columns having random distribution of data but has severe performance issues 
when retaining the sort order. 

Let us try to understand this with an example. 

insert overwrite table lineitem_temp_report 
select 
  l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, 
l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, 
l_receiptdate, l_shipinstruct, l_shipmode, l_comment
from 
  lineitem
order by l_orderkey, l_partkey, l_suppkey;

Sample data set for lineitem table. The first column represents the l_orderKey 
and is sorted.
 
l_orderkey|l_partkey|l_suppkey|l_linenumber|l_quantity|l_extendedprice|l_discount|l_tax|l_returnflag|l_linestatus|l_shipdate|l_commitdate|l_receiptdate|l_shipinstruct|l_shipmode|l_comment
197|1771022|96040|2|8|8743.52|0.09|0.02|A|F|1995-04-17|1995-07-01|1995-0
197|1771022|96040|2|8|4-27|DELIVER IN PERSON|SHIP|y blithely even 
197|1771022|96040|2|8|deposits. blithely fina|
197|1558290|83306|3|17|22919.74|0.06|0.02|N|O|1995-08-02|1995-06-23|1995
197|1558290|83306|3|17|-08-03|COLLECT COD|REG AIR|ts. careful|
197|179355|29358|4|25|35858.75|0.04|0.01|N|F|1995-06-13|1995-05-23|1995-
197|179355|29358|4|25|06-24|TAKE BACK RETURN|FOB|s-- quickly final 
197|179355|29358|4|25|accounts|
197|414653|39658|5|14|21946.82|0.09|0.01|R|F|1995-05-08|1995-05-24|1995-
197|414653|39658|5|14|05-12|TAKE BACK RETURN|RAIL|use slyly slyly silent 
197|414653|39658|5|14|depo|
197|1058800|8821|6|1|1758.75|0.07|0.05|N|O|1995-07-15|1995-06-21|1995-08
197|1058800|8821|6|1|-11|COLLECT COD|RAIL| even, thin dependencies sno|
198|560609|60610|1|33|55096.14|0.07|0.02|N|O|1998-01-05|1998-03-20|1998-
198|560609|60610|1|33|01-10|TAKE BACK RETURN|TRUCK|carefully caref|
198|152287|77289|2|20|26785.60|0.03|0.00|N|O|1998-01-15|1998-03-31|1998-
198|152287|77289|2|20|01-25|DELIVER IN PERSON|FOB|carefully final 
198|152287|77289|2|20|escapades a|
224|1899665|74720|3|41|68247.37|0.07|0.04|A|F|1994-09-01|1994-09-15|1994
224|1899665|74720|3|41|-09-02|TAKE BACK RETURN|SHIP|after the furiou|


When we try to either sort on a presorted column or do a multi-column sort 
while trying to retain the sort order on the source table,

Source table lineitem has 600 million rows. 

We don't see equal distribution of data to the reducers. Out of 100 reducers, 
99 complete in less than 40 seconds. The last reducer is doing the bulk of the 
work processing nearly 570 million rows. 

So, let us understand what is going wrong here ..

on a table having 600 million records with orderkey column sorted, i created 
temp table with 10% sampling.  

insert overwrite table sampTempTbl (select * from lineitem tablesample (10 
percent) t);

select min(l_orderkey), max(l_orderkey) from sampTempTbl ;
12306309,142321700

where as on the source table, the orderkey range (select min(l_orderkey), 
max(l_orderkey) from lineitem)  is 1 and 6  

So naturally bulk of the records will be directed towards single reducer. 

One way to work around this problem is to increase the 
hive.optimize.sampling.orderby.number to a larger value (as close as the # rows 
in the input source table). But then we will have to provide higher heap 
(hive-env.sh) for hive, otherwise it will fail while creating the Sampling 
Data. With larger data volume, it is not practical to sample the entire data 
set. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-08-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7223:
---

Status: Open  (was: Patch Available)

 Support generic PartitionSpecs in Metastore partition-functions
 ---

 Key: HIVE-7223
 URL: https://issues.apache.org/jira/browse/HIVE-7223
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog, Metastore
Affects Versions: 0.13.0, 0.12.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch


 Currently, the functions in the HiveMetaStore API that handle multiple 
 partitions do so using ListPartition. E.g. 
 {code}
 public ListPartition listPartitions(String db_name, String tbl_name, short 
 max_parts);
 public ListPartition listPartitionsByFilter(String db_name, String 
 tbl_name, String filter, short max_parts);
 public int add_partitions(ListPartition new_parts);
 {code}
 Partition objects are fairly heavyweight, since each Partition carries its 
 own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
 thousands of partitions take so long to have their partitions listed that the 
 client times out with default hive.metastore.client.socket.timeout. There is 
 the additional expense of serializing and deserializing metadata for large 
 sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
 should help in this regard.
 In a date-partitioned table, all sub-partitions for a particular date are 
 *likely* (but not expected) to have:
 # The same base directory (e.g. {{/feeds/search/20140601/}})
 # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
 # The same SerDe/StorageHandler/IOFormat classes
 # Sorting/Bucketing/SkewInfo settings
 In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
 represent the partition-list (for a date) in a more condensed form: a list of 
 LighterPartition instances, all sharing a common StorageDescriptor whose 
 location points to the root directory. 
 We can go one better for the {{add_partitions()}} case: When adding all 
 partitions for a given date, the “normal” case affords us the ability to 
 specify the top-level date-directory, where sub-partitions can be inferred 
 from the HDFS directory-path.
 These extensions are hard to introduce at the metastore-level, since 
 partition-functions explicitly specify {{ListPartition}} arguments. I 
 wonder if a {{PartitionSpec}} interface might help:
 {code}
 public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
 ; 
 public int add_partitions( PartitionSpec new_parts ) throws … ;
 {code}
 where the PartitionSpec looks like:
 {code}
 public interface PartitionSpec {
 public ListPartition getPartitions();
 public ListString getPartNames();
 public IteratorPartition getPartitionIter();
 public IteratorString getPartNameIter();
 }
 {code}
 For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
 {{PartitionSpec}}, store a top-level directory, and return Partition 
 instances from sub-directory names, while storing a single StorageDescriptor 
 for all of them.
 Similarly, list_partitions() could return a ListPartitionSpec, where each 
 PartitionSpec corresponds to a set or partitions that can share a 
 StorageDescriptor.
 By exposing iterator semantics, neither the client nor the metastore need 
 instantiate all partitions at once. That should help with memory requirements.
 In case no smart grouping is possible, we could just fall back on a 
 {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse 
 than status quo.
 PartitionSpec abstracts away how a set of partitions may be represented. A 
 tighter representation allows us to communicate metadata for a larger number 
 of Partitions, with less Thrift traffic.
 Given that Thrift doesn’t support polymorphism, we’d have to implement the 
 PartitionSpec as a Thrift Union of supported implementations. (We could 
 convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
 sub-class.)
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24085: HIVE-7446: Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables

2014-08-08 Thread Tom White

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24085/#review50049
---

Ship it!


Ship It!

- Tom White


On Aug. 8, 2014, midnight, Ashish Singh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24085/
 ---
 
 (Updated Aug. 8, 2014, midnight)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7446
 https://issues.apache.org/jira/browse/HIVE-7446
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-7446: Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables
 
 
 Diffs
 -
 
   ql/src/test/queries/clientpositive/avro_add_column.q PRE-CREATION 
   ql/src/test/queries/clientpositive/avro_add_column2.q PRE-CREATION 
   ql/src/test/queries/clientpositive/avro_add_column3.q PRE-CREATION 
   ql/src/test/results/clientpositive/avro_add_column.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/avro_add_column2.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/avro_add_column3.q.out PRE-CREATION 
   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java 
 915f01679183904d0d93b9b8a88dc1a64ac2af78 
   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java 
 722bdf9f8452fe8632db7d9167182310e467281d 
   serde/src/test/resources/avro-nested-struct.avsc 
 785af83cd01fe91626741b3d7659d8f515854774 
   serde/src/test/resources/avro-struct.avsc 
 313c74f6140615d2737ef1a49a2777656f35f4e3 
 
 Diff: https://reviews.apache.org/r/24085/diff/
 
 
 Testing
 ---
 
 qTests
 
 
 Thanks,
 
 Ashish Singh
 




[jira] [Updated] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-08-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7223:
---

Attachment: (was: HIVE-7223.2.patch)

 Support generic PartitionSpecs in Metastore partition-functions
 ---

 Key: HIVE-7223
 URL: https://issues.apache.org/jira/browse/HIVE-7223
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog, Metastore
Affects Versions: 0.12.0, 0.13.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-7223.1.patch


 Currently, the functions in the HiveMetaStore API that handle multiple 
 partitions do so using ListPartition. E.g. 
 {code}
 public ListPartition listPartitions(String db_name, String tbl_name, short 
 max_parts);
 public ListPartition listPartitionsByFilter(String db_name, String 
 tbl_name, String filter, short max_parts);
 public int add_partitions(ListPartition new_parts);
 {code}
 Partition objects are fairly heavyweight, since each Partition carries its 
 own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
 thousands of partitions take so long to have their partitions listed that the 
 client times out with default hive.metastore.client.socket.timeout. There is 
 the additional expense of serializing and deserializing metadata for large 
 sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
 should help in this regard.
 In a date-partitioned table, all sub-partitions for a particular date are 
 *likely* (but not expected) to have:
 # The same base directory (e.g. {{/feeds/search/20140601/}})
 # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
 # The same SerDe/StorageHandler/IOFormat classes
 # Sorting/Bucketing/SkewInfo settings
 In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
 represent the partition-list (for a date) in a more condensed form: a list of 
 LighterPartition instances, all sharing a common StorageDescriptor whose 
 location points to the root directory. 
 We can go one better for the {{add_partitions()}} case: When adding all 
 partitions for a given date, the “normal” case affords us the ability to 
 specify the top-level date-directory, where sub-partitions can be inferred 
 from the HDFS directory-path.
 These extensions are hard to introduce at the metastore-level, since 
 partition-functions explicitly specify {{ListPartition}} arguments. I 
 wonder if a {{PartitionSpec}} interface might help:
 {code}
 public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
 ; 
 public int add_partitions( PartitionSpec new_parts ) throws … ;
 {code}
 where the PartitionSpec looks like:
 {code}
 public interface PartitionSpec {
 public ListPartition getPartitions();
 public ListString getPartNames();
 public IteratorPartition getPartitionIter();
 public IteratorString getPartNameIter();
 }
 {code}
 For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
 {{PartitionSpec}}, store a top-level directory, and return Partition 
 instances from sub-directory names, while storing a single StorageDescriptor 
 for all of them.
 Similarly, list_partitions() could return a ListPartitionSpec, where each 
 PartitionSpec corresponds to a set or partitions that can share a 
 StorageDescriptor.
 By exposing iterator semantics, neither the client nor the metastore need 
 instantiate all partitions at once. That should help with memory requirements.
 In case no smart grouping is possible, we could just fall back on a 
 {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse 
 than status quo.
 PartitionSpec abstracts away how a set of partitions may be represented. A 
 tighter representation allows us to communicate metadata for a larger number 
 of Partitions, with less Thrift traffic.
 Given that Thrift doesn’t support polymorphism, we’d have to implement the 
 PartitionSpec as a Thrift Union of supported implementations. (We could 
 convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
 sub-class.)
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-08-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7223:
---

Status: Patch Available  (was: Open)

 Support generic PartitionSpecs in Metastore partition-functions
 ---

 Key: HIVE-7223
 URL: https://issues.apache.org/jira/browse/HIVE-7223
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog, Metastore
Affects Versions: 0.13.0, 0.12.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch


 Currently, the functions in the HiveMetaStore API that handle multiple 
 partitions do so using ListPartition. E.g. 
 {code}
 public ListPartition listPartitions(String db_name, String tbl_name, short 
 max_parts);
 public ListPartition listPartitionsByFilter(String db_name, String 
 tbl_name, String filter, short max_parts);
 public int add_partitions(ListPartition new_parts);
 {code}
 Partition objects are fairly heavyweight, since each Partition carries its 
 own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
 thousands of partitions take so long to have their partitions listed that the 
 client times out with default hive.metastore.client.socket.timeout. There is 
 the additional expense of serializing and deserializing metadata for large 
 sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
 should help in this regard.
 In a date-partitioned table, all sub-partitions for a particular date are 
 *likely* (but not expected) to have:
 # The same base directory (e.g. {{/feeds/search/20140601/}})
 # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
 # The same SerDe/StorageHandler/IOFormat classes
 # Sorting/Bucketing/SkewInfo settings
 In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
 represent the partition-list (for a date) in a more condensed form: a list of 
 LighterPartition instances, all sharing a common StorageDescriptor whose 
 location points to the root directory. 
 We can go one better for the {{add_partitions()}} case: When adding all 
 partitions for a given date, the “normal” case affords us the ability to 
 specify the top-level date-directory, where sub-partitions can be inferred 
 from the HDFS directory-path.
 These extensions are hard to introduce at the metastore-level, since 
 partition-functions explicitly specify {{ListPartition}} arguments. I 
 wonder if a {{PartitionSpec}} interface might help:
 {code}
 public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
 ; 
 public int add_partitions( PartitionSpec new_parts ) throws … ;
 {code}
 where the PartitionSpec looks like:
 {code}
 public interface PartitionSpec {
 public ListPartition getPartitions();
 public ListString getPartNames();
 public IteratorPartition getPartitionIter();
 public IteratorString getPartNameIter();
 }
 {code}
 For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
 {{PartitionSpec}}, store a top-level directory, and return Partition 
 instances from sub-directory names, while storing a single StorageDescriptor 
 for all of them.
 Similarly, list_partitions() could return a ListPartitionSpec, where each 
 PartitionSpec corresponds to a set or partitions that can share a 
 StorageDescriptor.
 By exposing iterator semantics, neither the client nor the metastore need 
 instantiate all partitions at once. That should help with memory requirements.
 In case no smart grouping is possible, we could just fall back on a 
 {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse 
 than status quo.
 PartitionSpec abstracts away how a set of partitions may be represented. A 
 tighter representation allows us to communicate metadata for a larger number 
 of Partitions, with less Thrift traffic.
 Given that Thrift doesn’t support polymorphism, we’d have to implement the 
 PartitionSpec as a Thrift Union of supported implementations. (We could 
 convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
 sub-class.)
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-08-08 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7223:
---

Attachment: HIVE-7223.2.patch

Updated patch, with Thrift definitions updated, etc.

 Support generic PartitionSpecs in Metastore partition-functions
 ---

 Key: HIVE-7223
 URL: https://issues.apache.org/jira/browse/HIVE-7223
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog, Metastore
Affects Versions: 0.12.0, 0.13.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch


 Currently, the functions in the HiveMetaStore API that handle multiple 
 partitions do so using ListPartition. E.g. 
 {code}
 public ListPartition listPartitions(String db_name, String tbl_name, short 
 max_parts);
 public ListPartition listPartitionsByFilter(String db_name, String 
 tbl_name, String filter, short max_parts);
 public int add_partitions(ListPartition new_parts);
 {code}
 Partition objects are fairly heavyweight, since each Partition carries its 
 own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
 thousands of partitions take so long to have their partitions listed that the 
 client times out with default hive.metastore.client.socket.timeout. There is 
 the additional expense of serializing and deserializing metadata for large 
 sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
 should help in this regard.
 In a date-partitioned table, all sub-partitions for a particular date are 
 *likely* (but not expected) to have:
 # The same base directory (e.g. {{/feeds/search/20140601/}})
 # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
 # The same SerDe/StorageHandler/IOFormat classes
 # Sorting/Bucketing/SkewInfo settings
 In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
 represent the partition-list (for a date) in a more condensed form: a list of 
 LighterPartition instances, all sharing a common StorageDescriptor whose 
 location points to the root directory. 
 We can go one better for the {{add_partitions()}} case: When adding all 
 partitions for a given date, the “normal” case affords us the ability to 
 specify the top-level date-directory, where sub-partitions can be inferred 
 from the HDFS directory-path.
 These extensions are hard to introduce at the metastore-level, since 
 partition-functions explicitly specify {{ListPartition}} arguments. I 
 wonder if a {{PartitionSpec}} interface might help:
 {code}
 public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
 ; 
 public int add_partitions( PartitionSpec new_parts ) throws … ;
 {code}
 where the PartitionSpec looks like:
 {code}
 public interface PartitionSpec {
 public ListPartition getPartitions();
 public ListString getPartNames();
 public IteratorPartition getPartitionIter();
 public IteratorString getPartNameIter();
 }
 {code}
 For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
 {{PartitionSpec}}, store a top-level directory, and return Partition 
 instances from sub-directory names, while storing a single StorageDescriptor 
 for all of them.
 Similarly, list_partitions() could return a ListPartitionSpec, where each 
 PartitionSpec corresponds to a set or partitions that can share a 
 StorageDescriptor.
 By exposing iterator semantics, neither the client nor the metastore need 
 instantiate all partitions at once. That should help with memory requirements.
 In case no smart grouping is possible, we could just fall back on a 
 {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse 
 than status quo.
 PartitionSpec abstracts away how a set of partitions may be represented. A 
 tighter representation allows us to communicate metadata for a larger number 
 of Partitions, with less Thrift traffic.
 Given that Thrift doesn’t support polymorphism, we’d have to implement the 
 PartitionSpec as a Thrift Union of supported implementations. (We could 
 convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
 sub-class.)
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7446) Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables

2014-08-08 Thread Ashish Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090993#comment-14090993
 ] 

Ashish Kumar Singh commented on HIVE-7446:
--

The test errors above are not related to this patch.

 Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables
 --

 Key: HIVE-7446
 URL: https://issues.apache.org/jira/browse/HIVE-7446
 Project: Hive
  Issue Type: New Feature
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Attachments: HIVE-7446.1.patch, HIVE-7446.patch


 HIVE-6806 adds native support for creating hive table stored as Avro. It 
 would be good to add support to ALTER TABLE .. ADD COLUMN to Avro backed 
 tables.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7654) A method to extrapolate columnStats for partitions of a table

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091001#comment-14091001
 ] 

Hive QA commented on HIVE-7654:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660487/HIVE-7654.0.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5871 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/230/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/230/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-230/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660487

 A method to extrapolate columnStats for partitions of a table
 -

 Key: HIVE-7654
 URL: https://issues.apache.org/jira/browse/HIVE-7654
 Project: Hive
  Issue Type: New Feature
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch


 In a PARTITIONED table, there are many partitions. For example, 
 create table if not exists loc_orc (
   state string,
   locid int,
   zip bigint
 ) partitioned by(year string) stored as orc;
 We assume there are 4 partitions, partition(year='2000'), 
 partition(year='2001'), partition(year='2002') and partition(year='2003').
 We can use the following command to compute statistics for columns 
 state,locid of partition(year='2001')
 analyze table loc_orc partition(year='2001') compute statistics for columns 
 state,locid;
 We need to know the “aggregated” column status for the whole table loc_orc. 
 However, we may not have the column status for some partitions, e.g., 
 partition(year='2002') and also we may not have the column status for some 
 columns, e.g., zip bigint for partition(year='2001')
 We propose a method to extrapolate the missing column status for the 
 partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


RE: Key is null in map when OrcNewInputFormat is used as Input Format Class

2014-08-08 Thread John Zeng
Any update from anybody?  Should I file a bug?

Thanks

-Original Message-
From: John Zeng [mailto:john.z...@dataguise.com] 
Sent: Wednesday, August 6, 2014 10:17 AM
To: dev@hive.apache.org
Subject: Key is null in map when OrcNewInputFormat is used as Input Format Class

Dear OrcNewInputFormat owner,

When using OrcNewInputFormat as input format class for my map reduce job, I 
find its key is always null in my map method. This gives me no way to get row 
number in my map method.  If you compare RCFileInputFormat (for RC file), its 
key in map method returns the row number so I know which row I am processing. 

Is there any workaround for me to get the row number from my map method?  Of 
course, I can count the row number by myself.  But that has two problems: #1 I 
have to assume the row is coming in the order; #2 I will get duplicated (and 
wrong) row numbers if a big input file causes multiple file splits (which will 
trigger my map method multiple times in different data nodes).   At this point, 
I am really seeking a better way to get row number for each processed row in 
map method.

Here is what I have in my map logs:

[2014-08-06 09:39:25 DEBUG com..hadoop.orcfile.OrcFileMap]: Mapper 
Input Key: (null)
[2014-08-06 09:39:25 DEBUG com..hadoop.orcfile.OrcFileMap]: Mapper 
Input Value: {Q8151, T9976, 69976, 8156756, 966798161, 
97898989898, Laura, laura...@gmail.com}

My map method is:

protected void map(Object key, Writable value, Context context)
throws IOException, InterruptedException {
logger.debug(Mapper Input Key:  + key);
logger.debug(Mapper Input Value:  + value.toString());
.
}

Thanks

John


[jira] [Updated] (HIVE-7662) CBO: changes to Cost Model

2014-08-08 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-7662:


Summary: CBO: changes to Cost Model  (was: CBO: changes to COst Model)

 CBO: changes to Cost Model
 --

 Key: HIVE-7662
 URL: https://issues.apache.org/jira/browse/HIVE-7662
 Project: Hive
  Issue Type: Sub-task
Reporter: Harish Butani
Assignee: Harish Butani

 - Model Join cost as Sum of Input sizes
 - Fix bug with NDV calculations. For now use Optiq's default formulas.
 - Model Cummulative cost to favor broad Plans over Deep Plans.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7662) CBO: changes to COst Model

2014-08-08 Thread Harish Butani (JIRA)
Harish Butani created HIVE-7662:
---

 Summary: CBO: changes to COst Model
 Key: HIVE-7662
 URL: https://issues.apache.org/jira/browse/HIVE-7662
 Project: Hive
  Issue Type: Sub-task
Reporter: Harish Butani
Assignee: Harish Butani


- Model Join cost as Sum of Input sizes
- Fix bug with NDV calculations. For now use Optiq's default formulas.
- Model Cummulative cost to favor broad Plans over Deep Plans.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7655) Reading of partitioned table stats slows down explain

2014-08-08 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-7655:


Issue Type: Sub-task  (was: Bug)
Parent: HIVE-5775

 Reading of partitioned table stats slows down explain
 -

 Key: HIVE-7655
 URL: https://issues.apache.org/jira/browse/HIVE-7655
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Harish Butani
  Labels: hive
 Fix For: 0.14.0


 This defect is due to a regression introduced in 
 https://issues.apache.org/jira/browse/HIVE-7625, explain for queries that 
 touch partitioned tables is 10x slower.
 RelOptHiveTable.getRowCount calls listPartitionsWithAuthInfo which returns 
 the data from all partitions, listPartitionsByExpr should be used instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7662) CBO: changes to Cost Model

2014-08-08 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-7662:


Attachment: HIVE-7662.1.patch

 CBO: changes to Cost Model
 --

 Key: HIVE-7662
 URL: https://issues.apache.org/jira/browse/HIVE-7662
 Project: Hive
  Issue Type: Sub-task
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7662.1.patch


 - Model Join cost as Sum of Input sizes
 - Fix bug with NDV calculations. For now use Optiq's default formulas.
 - Model Cummulative cost to favor broad Plans over Deep Plans.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7649) Support column stats with temporary tables

2014-08-08 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7649:
-

Attachment: HIVE-7649.2.patch

rebasing patch with trunk

 Support column stats with temporary tables
 --

 Key: HIVE-7649
 URL: https://issues.apache.org/jira/browse/HIVE-7649
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7649.1.patch, HIVE-7649.2.patch


 Column stats currently not supported with temp tables, see if they can be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24289: MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)

2014-08-08 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24289/#review50047
---


Please add .q tests for these. Test for partitioned table with more than one 
partition column on variety of column types and variety of stats type.


ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
https://reviews.apache.org/r/24289/#comment87572

Include example sql statement for which this task is meant for.



ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
https://reviews.apache.org/r/24289/#comment87575

Add a comment saying grammar prohibits more than 1 column, so we are 
guaranteed to have only 1 element in this lists.



ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
https://reviews.apache.org/r/24289/#comment87576

Is clear() needed here?



ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
https://reviews.apache.org/r/24289/#comment87579

Add else{

throw SemanticException (Unknown stat);
}

add to all of subsequent block.

You may also want to reconsider some of this reptition in private method.



ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
https://reviews.apache.org/r/24289/#comment87580

Add else {
throw Exception (Unsupported type);
}



ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
https://reviews.apache.org/r/24289/#comment87574

Copy-paste comments?



ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
https://reviews.apache.org/r/24289/#comment87573

Comments seem out of place. Copy-paste?



ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
https://reviews.apache.org/r/24289/#comment87563

throw  new SemanticException (table  + tbl + not found);



ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
https://reviews.apache.org/r/24289/#comment87564

if (colType == null) throw new Semantic Exception (col not found);



ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
https://reviews.apache.org/r/24289/#comment87565

There can be multiple partitioning column, in which case this assert will 
fail. Dont think you want that.



ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
https://reviews.apache.org/r/24289/#comment87566

Instead of this for loop, you want to use Warehouse.makePartName(partSpec, 
false);



ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
https://reviews.apache.org/r/24289/#comment87567

throw SemanticEx



ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
https://reviews.apache.org/r/24289/#comment87568

check colType != null



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java
https://reviews.apache.org/r/24289/#comment87562

I don't think this if block is required. Further, you need to add a 
HiveOperation corresponding to this new token.



ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsUpdateWork.java
https://reviews.apache.org/r/24289/#comment87571

Add comment like, work corresponding to statement:
alter table t1 partition (p1=c1,p2=c2), update...



ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsUpdateWork.java
https://reviews.apache.org/r/24289/#comment87569

This field doesnt seem to be used. Can be removed.



ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsUpdateWork.java
https://reviews.apache.org/r/24289/#comment87570

Good to implement this. Useful for debugging.


- Ashutosh Chauhan


On Aug. 5, 2014, 6:40 p.m., pengcheng xiong wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24289/
 ---
 
 (Updated Aug. 5, 2014, 6:40 p.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This patch provides ability to update certain stats without scanning any data 
 or without hacking the backend db. It helps (esp for CBO work) to set up 
 unit tests quickly and verify both cbo and the stats subsystem. It also helps 
 when experimenting with the system if you're just trying out hive/hadoop on a 
 small cluster. Finally it gives you a quick and clean way to fix things when 
 something went wrong wrt stats in your environment.
 Usage:
 ALTER TABLE table_name PARTITION partition_spec UPDATE STATISTICS FOR COLUMN 
 col_name SET col_statistics
 For example,
 ALTER TABLE src_x_int UPDATE STATISTICS FOR COLUMN key SET 
 ('numDVs'='101','highValue'='10001.0');
 ALTER TABLE src_p PARTITION(partitionId=1) UPDATE STATISTICS FOR COLUMN key 
 SET ('numDVs'='100','avgColLen'='1.0001');
 
 
 Diffs
 -
 
   
 

[jira] [Updated] (HIVE-7506) MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)

2014-08-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7506:
---

Status: Open  (was: Patch Available)

Left comments on RB. Two major items:
* Add .q tests.
* Add new HiveOperation enum corresponding to new token type.

 MetadataUpdater: provide a mechanism to edit the statistics of a column in a 
 table (or a partition of a table)
 --

 Key: HIVE-7506
 URL: https://issues.apache.org/jira/browse/HIVE-7506
 Project: Hive
  Issue Type: New Feature
  Components: Statistics
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: HIVE-7506.1.patch, HIVE-7506.3.patch, HIVE-7506.4.patch, 
 HIVE-7506.patch

   Original Estimate: 252h
  Remaining Estimate: 252h

 Two motivations:
 (1) Cost-based Optimizer (CBO) depends heavily on the statistics of a column 
 in a table (or a partition of a table). If we would like to test whether CBO 
 chooses the best plan under different statistics, it would be time consuming 
 if we load the whole table and create the statistics from ground up.
 (2) As database runs,  the statistics of a column in a table (or a partition 
 of a table) may change. We need a way or a mechanism to synchronize. 
 We propose the following command to achieve that:
 ALTER TABLE table_name PARTITION partition_spec [COLUMN col_name] UPDATE 
 STATISTICS col_statistics [COMMENT col_comment]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7506) MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)

2014-08-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7506:
---

Component/s: (was: Database/Schema)
 Statistics

 MetadataUpdater: provide a mechanism to edit the statistics of a column in a 
 table (or a partition of a table)
 --

 Key: HIVE-7506
 URL: https://issues.apache.org/jira/browse/HIVE-7506
 Project: Hive
  Issue Type: New Feature
  Components: Statistics
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: HIVE-7506.1.patch, HIVE-7506.3.patch, HIVE-7506.4.patch, 
 HIVE-7506.patch

   Original Estimate: 252h
  Remaining Estimate: 252h

 Two motivations:
 (1) Cost-based Optimizer (CBO) depends heavily on the statistics of a column 
 in a table (or a partition of a table). If we would like to test whether CBO 
 chooses the best plan under different statistics, it would be time consuming 
 if we load the whole table and create the statistics from ground up.
 (2) As database runs,  the statistics of a column in a table (or a partition 
 of a table) may change. We need a way or a mechanism to synchronize. 
 We propose the following command to achieve that:
 ALTER TABLE table_name PARTITION partition_spec [COLUMN col_name] UPDATE 
 STATISTICS col_statistics [COMMENT col_comment]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7663) OrcRecordUpdater needs to implement getStats

2014-08-08 Thread Alan Gates (JIRA)
Alan Gates created HIVE-7663:


 Summary: OrcRecordUpdater needs to implement getStats
 Key: HIVE-7663
 URL: https://issues.apache.org/jira/browse/HIVE-7663
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates


OrcRecordUpdater.getStats currently returns null.  It needs to track the stats 
and return a valid value.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7618) TestDDLWithRemoteMetastoreSecondNamenode unit test failure

2014-08-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091126#comment-14091126
 ] 

Sushanth Sowmyan commented on HIVE-7618:


+1 on the patch as it currently stands.

While I am, in theory, in favour of adding this to the SH interface, I think we 
should hold off on that for now. I would rather open discussion with the hive 
group at large in re-architecting StorageHandlers in general, trying to do the 
following:

a) Deprecation/removal of HiveOutputFormat/HiveRecordWriter in general, in 
favour of using M/R definitions of the same, and having Committer semantics 
included.
b) Rearchitecting/refactoring native Hive storage in a way that makes 
everything go through the SH interface, rather than having special-casing for 
SH and native
c) Support for notion of SH per partition, rather than SH per table
d) Notion of possible plan modifications by SH for any add-on tasks that are 
required.

And if we're making that many changes, it's likely that we will break SHs 
significantly at that time, and I'd rather do it once rather than have a 
constant stream of breaking.

I'd like to see us pursue that as a major initiative in the 0.15 timeframe, if 
possible. I'll shoot out a mail to the list on that regard.


 TestDDLWithRemoteMetastoreSecondNamenode unit test failure
 --

 Key: HIVE-7618
 URL: https://issues.apache.org/jira/browse/HIVE-7618
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7618.1.patch, HIVE-7618.2.patch


 Looks like TestDDLWithRemoteMetastoreSecondNamenode started failing after 
 HIVE-6584 was committed.
 {noformat}
 TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode:272-createTableAndCheck:201-createTableAndCheck:219
  Table should be located in the second filesystem expected:[hdfs] but 
 was:[pfile]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091127#comment-14091127
 ] 

Hive QA commented on HIVE-7616:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660507/HIVE-7616.04.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5886 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_nested_mapjoin
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/231/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/231/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-231/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660507

 pre-size mapjoin hashtable based on statistics
 --

 Key: HIVE-7616
 URL: https://issues.apache.org/jira/browse/HIVE-7616
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, 
 HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7655) Reading of partitioned table stats slows down explain

2014-08-08 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-7655:


Attachment: HIVE-7655.1.patch

 Reading of partitioned table stats slows down explain
 -

 Key: HIVE-7655
 URL: https://issues.apache.org/jira/browse/HIVE-7655
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Harish Butani
  Labels: hive
 Fix For: 0.14.0

 Attachments: HIVE-7655.1.patch


 This defect is due to a regression introduced in 
 https://issues.apache.org/jira/browse/HIVE-7625, explain for queries that 
 touch partitioned tables is 10x slower.
 RelOptHiveTable.getRowCount calls listPartitionsWithAuthInfo which returns 
 the data from all partitions, listPartitionsByExpr should be used instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7663) OrcRecordUpdater needs to implement getStats

2014-08-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7663:
-

Attachment: HIVE-7663.patch

This patch implements the getRowCount.  It does not implement getRawSize, as 
that is very hard to calculate for update and delete.  But in those cases the 
rawSize isn't so important as we can use the raw size of the base.

 OrcRecordUpdater needs to implement getStats
 

 Key: HIVE-7663
 URL: https://issues.apache.org/jira/browse/HIVE-7663
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7663.patch


 OrcRecordUpdater.getStats currently returns null.  It needs to track the 
 stats and return a valid value.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7663) OrcRecordUpdater needs to implement getStats

2014-08-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7663:
-

Status: Patch Available  (was: Open)

 OrcRecordUpdater needs to implement getStats
 

 Key: HIVE-7663
 URL: https://issues.apache.org/jira/browse/HIVE-7663
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7663.patch


 OrcRecordUpdater.getStats currently returns null.  It needs to track the 
 stats and return a valid value.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24474: HIVE-6959 Enable Constant propagation optimizer for Hive Vectorization

2014-08-08 Thread Hari Sankar Sivarama Subramaniyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24474/
---

(Updated Aug. 8, 2014, 7:16 p.m.)


Review request for hive, Ashutosh Chauhan and Jitendra Pandey.


Bugs: HIVE-6959
https://issues.apache.org/jira/browse/HIVE-6959


Repository: hive-git


Description
---

Enable Constant propagation optimizer for Hive Vectorization


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 
535e4b3 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/ConstantVectorExpression.java
 9fd3853 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java
 eeb76d7 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagate.java 
b12d3a8 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
e778ba4 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java 
2329f52 
  ql/src/test/queries/clientpositive/vector_coalesce.q 052ab71 
  ql/src/test/results/clientpositive/tez/vector_cast_constant.q.out 9dac17b 
  ql/src/test/results/clientpositive/tez/vectorization_14.q.out 04c99f1 
  ql/src/test/results/clientpositive/tez/vectorization_15.q.out 1381695 
  ql/src/test/results/clientpositive/tez/vectorization_9.q.out 3d2645a 
  ql/src/test/results/clientpositive/tez/vectorization_short_regress.q.out 
2fa1bae 
  ql/src/test/results/clientpositive/vector_between_in.q.out 78e340b 
  ql/src/test/results/clientpositive/vector_cast_constant.q.out cdb13cb 
  ql/src/test/results/clientpositive/vector_coalesce.q.out 9561d47 
  ql/src/test/results/clientpositive/vector_decimal_mapjoin.q.out 71a3def 
  ql/src/test/results/clientpositive/vector_decimal_math_funcs.q.out 717e81a 
  ql/src/test/results/clientpositive/vector_elt.q.out ea0af62 
  ql/src/test/results/clientpositive/vectorization_14.q.out 3992bb1 
  ql/src/test/results/clientpositive/vectorization_15.q.out 1f48fea 
  ql/src/test/results/clientpositive/vectorization_16.q.out 38596e6 
  ql/src/test/results/clientpositive/vectorization_9.q.out c757b1f 
  ql/src/test/results/clientpositive/vectorization_div0.q.out b2321b4 
  ql/src/test/results/clientpositive/vectorization_short_regress.q.out 5b23850 
  ql/src/test/results/clientpositive/vectorized_math_funcs.q.out 181ab51 
  ql/src/test/results/clientpositive/vectorized_parquet.q.out 2e459a8 

Diff: https://reviews.apache.org/r/24474/diff/


Testing
---


Thanks,

Hari Sankar Sivarama Subramaniyan



[jira] [Updated] (HIVE-6959) Enable Constant propagation optimizer for Hive Vectorization

2014-08-08 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-6959:


Status: Patch Available  (was: Open)

 Enable Constant propagation optimizer for Hive Vectorization
 

 Key: HIVE-6959
 URL: https://issues.apache.org/jira/browse/HIVE-6959
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-6959.1.patch, HIVE-6959.2.patch, HIVE-6959.4.patch, 
 HIVE-6959.5.patch, HIVE-6959.6.patch


 HIVE-5771 covers Constant propagation optimizer for Hive. Now that HIVE-5771 
 is committed, we should remove any vectorization related code which 
 duplicates this feature. For example, a fn to be cleaned is 
 VectorizarionContext::foldConstantsForUnaryExprs(). In addition to this 
 change, constant propagation should kick in when vectorization is enabled. 
 i.e. we need to lift the HIVE_VECTORIZATION_ENABLED restriction inside 
 ConstantPropagate::transform().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6959) Enable Constant propagation optimizer for Hive Vectorization

2014-08-08 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-6959:


Status: Open  (was: Patch Available)

 Enable Constant propagation optimizer for Hive Vectorization
 

 Key: HIVE-6959
 URL: https://issues.apache.org/jira/browse/HIVE-6959
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-6959.1.patch, HIVE-6959.2.patch, HIVE-6959.4.patch, 
 HIVE-6959.5.patch, HIVE-6959.6.patch


 HIVE-5771 covers Constant propagation optimizer for Hive. Now that HIVE-5771 
 is committed, we should remove any vectorization related code which 
 duplicates this feature. For example, a fn to be cleaned is 
 VectorizarionContext::foldConstantsForUnaryExprs(). In addition to this 
 change, constant propagation should kick in when vectorization is enabled. 
 i.e. we need to lift the HIVE_VECTORIZATION_ENABLED restriction inside 
 ConstantPropagate::transform().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6959) Enable Constant propagation optimizer for Hive Vectorization

2014-08-08 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-6959:


Attachment: HIVE-6959.6.patch

updated the MiniTezCliDriver test results as well.

 Enable Constant propagation optimizer for Hive Vectorization
 

 Key: HIVE-6959
 URL: https://issues.apache.org/jira/browse/HIVE-6959
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-6959.1.patch, HIVE-6959.2.patch, HIVE-6959.4.patch, 
 HIVE-6959.5.patch, HIVE-6959.6.patch


 HIVE-5771 covers Constant propagation optimizer for Hive. Now that HIVE-5771 
 is committed, we should remove any vectorization related code which 
 duplicates this feature. For example, a fn to be cleaned is 
 VectorizarionContext::foldConstantsForUnaryExprs(). In addition to this 
 change, constant propagation should kick in when vectorization is enabled. 
 i.e. we need to lift the HIVE_VECTORIZATION_ENABLED restriction inside 
 ConstantPropagate::transform().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-08 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091198#comment-14091198
 ] 

Szehon Ho commented on HIVE-7624:
-

Hi Li Rui, I think the patch looks reasonable.  Just had a comment and a 
question on the RB.  Thanks

 Reduce operator initialization failed when running multiple MR query on spark
 -

 Key: HIVE-7624
 URL: https://issues.apache.org/jira/browse/HIVE-7624
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
 HIVE-7624.4-spark.patch, HIVE-7624.patch


 The following error occurs when I try to run a query with multiple reduce 
 works (M-R-R):
 {quote}
 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
 java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
 [0:_col0]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
 …
 {quote}
 I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24427: HIVE-7616 pre-size mapjoin hashtable based on statistics

2014-08-08 Thread Gunther Hagleitner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24427/#review50071
---



ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java
https://reviews.apache.org/r/24427/#comment87593

I'm not sure about this one. But shouldn't the condition be: 
joinConf.isBucketMapJoin()?


- Gunther Hagleitner


On Aug. 7, 2014, 11:53 p.m., Sergey Shelukhin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24427/
 ---
 
 (Updated Aug. 7, 2014, 11:53 p.m.)
 
 
 Review request for hive, Gunther Hagleitner, Mostafa Mokhtar, and Prasanth_J.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 See jira
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8490558 
   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java 
 cf64aa0 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java
  cdb5dc5 
   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HashMapWrapper.java 
 5b3b770 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
  629457c 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 6d292d0 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
 d42e1f7 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 
 29d895a 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java 44cb9c0 
 
 Diff: https://reviews.apache.org/r/24427/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Sergey Shelukhin
 




Re: Review Request 24427: HIVE-7616 pre-size mapjoin hashtable based on statistics

2014-08-08 Thread Gunther Hagleitner


 On Aug. 7, 2014, 12:04 a.m., Gunther Hagleitner wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java, 
  line 136
  https://reviews.apache.org/r/24427/diff/2/?file=654266#file654266line136
 
  curlies per coding standard
 
 Sergey Shelukhin wrote:
 added; next time I review your patch, I'll enforce C variable 
 declarations (all variables in the beginning of the block) which are also 
 part of the same Sun standard Hive wiki page point to :P

I do that anyways.


- Gunther


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24427/#review49830
---


On Aug. 7, 2014, 11:53 p.m., Sergey Shelukhin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24427/
 ---
 
 (Updated Aug. 7, 2014, 11:53 p.m.)
 
 
 Review request for hive, Gunther Hagleitner, Mostafa Mokhtar, and Prasanth_J.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 See jira
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8490558 
   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java 
 cf64aa0 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java
  cdb5dc5 
   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HashMapWrapper.java 
 5b3b770 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
  629457c 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 6d292d0 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
 d42e1f7 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 
 29d895a 
   ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java 44cb9c0 
 
 Diff: https://reviews.apache.org/r/24427/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Sergey Shelukhin
 




[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-08 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091284#comment-14091284
 ] 

Gunther Hagleitner commented on HIVE-7616:
--

Removing TODO in commit is fine by me. I've had one additional question about 
how to detect bucketed joins on the reviewboard.

For testing: Can you add the expected key count to explain extended? that way 
you can verify the correct working through the unit tests.

 pre-size mapjoin hashtable based on statistics
 --

 Key: HIVE-7616
 URL: https://issues.apache.org/jira/browse/HIVE-7616
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, 
 HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-08 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091287#comment-14091287
 ] 

Gunther Hagleitner commented on HIVE-7616:
--

other than these 2 things i am +1

 pre-size mapjoin hashtable based on statistics
 --

 Key: HIVE-7616
 URL: https://issues.apache.org/jira/browse/HIVE-7616
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, 
 HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-7655) Reading of partitioned table stats slows down explain

2014-08-08 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner resolved HIVE-7655.
--

Resolution: Fixed

Committed to branch. Thank you [~rhbutani]!

 Reading of partitioned table stats slows down explain
 -

 Key: HIVE-7655
 URL: https://issues.apache.org/jira/browse/HIVE-7655
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Harish Butani
  Labels: hive
 Fix For: 0.14.0

 Attachments: HIVE-7655.1.patch


 This defect is due to a regression introduced in 
 https://issues.apache.org/jira/browse/HIVE-7625, explain for queries that 
 touch partitioned tables is 10x slower.
 RelOptHiveTable.getRowCount calls listPartitionsWithAuthInfo which returns 
 the data from all partitions, listPartitionsByExpr should be used instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7638) Disallow CREATE VIEW when created with a temporary table

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091316#comment-14091316
 ] 

Hive QA commented on HIVE-7638:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660527/HIVE-7638.1.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5872 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/232/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/232/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-232/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660527

 Disallow CREATE VIEW when created with a temporary table
 

 Key: HIVE-7638
 URL: https://issues.apache.org/jira/browse/HIVE-7638
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7638.1.patch


 Followup item from HIVE-7090, don't allow view to be created if the view 
 definition has a temp table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7664) VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU

2014-08-08 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created HIVE-7664:
-

 Summary: VectorizedBatchUtil.addRowToBatchFrom is not optimized 
for Vectorized execution and takes 25% CPU
 Key: HIVE-7664
 URL: https://issues.apache.org/jira/browse/HIVE-7664
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
 Fix For: 0.14.0


In a Group by heavy vectorized Reducer vertex 25% of CPU is spent in 
VectorizedBatchUtil.addRowToBatchFrom().

Looked at the code of VectorizedBatchUtil.addRowToBatchFrom and it looks like 
it wasn't optimized for Vectorized processing.

addRowToBatchFrom is called for every row and for each row and every column in 
the batch getPrimitiveCategory is called to figure the type of each column, 
column types are stored in a HashMap, for VectorGroupByOperator columns types 
won't change between batches, so column types shouldn't be looked up for every 
row.

I recommend storing the column type in StructObjectInspector so that other 
components can leverage this optimization.

Also addRowToBatchFrom has a case statement for every row and every column used 
for type casting I recommend encapsulating the type logic in templatized 
methods.   

{code}
Stack Trace Sample CountPercentage(%)
VectorizedBatchUtil.addRowToBatchFrom   86  26.543
   AbstractPrimitiveObjectInspector.getPrimitiveCategory()  34  10.494
   LazyBinaryStructObjectInspector.getStructFieldData   25  7.716
   StandardStructObjectInspector.getStructFieldData 4   1.235
{code}

The query used : 
{code}
select 
ss_sold_date_sk
from
store_sales
where
ss_sold_date between '1998-01-01' and '1998-06-01'
group by ss_item_sk , ss_customer_sk , ss_sold_date_sk
having sum(ss_list_price)  50;
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-7662) CBO: changes to Cost Model

2014-08-08 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner resolved HIVE-7662.
--

Resolution: Fixed

Committed to branch. Thanks [~rhbutani]!

 CBO: changes to Cost Model
 --

 Key: HIVE-7662
 URL: https://issues.apache.org/jira/browse/HIVE-7662
 Project: Hive
  Issue Type: Sub-task
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7662.1.patch


 - Model Join cost as Sum of Input sizes
 - Fix bug with NDV calculations. For now use Optiq's default formulas.
 - Model Cummulative cost to favor broad Plans over Deep Plans.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7656) Bring tez-branch up-to the API changes made by TEZ-1372

2014-08-08 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-7656:
--

Attachment: HIVE-7656.1-tez.patch

 Bring tez-branch up-to the API changes made by TEZ-1372
 ---

 Key: HIVE-7656
 URL: https://issues.apache.org/jira/browse/HIVE-7656
 Project: Hive
  Issue Type: Sub-task
Affects Versions: tez-branch
Reporter: Gopal V
Assignee: Gopal V
 Fix For: tez-branch

 Attachments: HIVE-7656.1-tez.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7656) Bring tez-branch up-to the API changes made by TEZ-1372

2014-08-08 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-7656:
--

Fix Version/s: tez-branch
   Status: Patch Available  (was: Open)

 Bring tez-branch up-to the API changes made by TEZ-1372
 ---

 Key: HIVE-7656
 URL: https://issues.apache.org/jira/browse/HIVE-7656
 Project: Hive
  Issue Type: Sub-task
Affects Versions: tez-branch
Reporter: Gopal V
Assignee: Gopal V
 Fix For: tez-branch

 Attachments: HIVE-7656.1-tez.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7656) Bring tez-branch up-to the API changes made by TEZ-1372, TEZ-1386

2014-08-08 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-7656:
--

Summary: Bring tez-branch up-to the API changes made by TEZ-1372,  TEZ-1386 
 (was: Bring tez-branch up-to the API changes made by TEZ-1372)

 Bring tez-branch up-to the API changes made by TEZ-1372,  TEZ-1386
 --

 Key: HIVE-7656
 URL: https://issues.apache.org/jira/browse/HIVE-7656
 Project: Hive
  Issue Type: Sub-task
Affects Versions: tez-branch
Reporter: Gopal V
Assignee: Gopal V
 Fix For: tez-branch

 Attachments: HIVE-7656.1-tez.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7665) Create TestSparkCliDriver to run test in spark local mode

2014-08-08 Thread Szehon Ho (JIRA)
Szehon Ho created HIVE-7665:
---

 Summary: Create TestSparkCliDriver to run test in spark local mode
 Key: HIVE-7665
 URL: https://issues.apache.org/jira/browse/HIVE-7665
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7656) Bring tez-branch up-to the API changes made by TEZ-1372

2014-08-08 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-7656:
--

Status: Open  (was: Patch Available)

 Bring tez-branch up-to the API changes made by TEZ-1372
 ---

 Key: HIVE-7656
 URL: https://issues.apache.org/jira/browse/HIVE-7656
 Project: Hive
  Issue Type: Sub-task
Affects Versions: tez-branch
Reporter: Gopal V
Assignee: Gopal V
 Fix For: tez-branch

 Attachments: HIVE-7656.1-tez.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7665) Create TestSparkCliDriver to run test in spark local mode

2014-08-08 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-7665:


Component/s: Spark

 Create TestSparkCliDriver to run test in spark local mode
 -

 Key: HIVE-7665
 URL: https://issues.apache.org/jira/browse/HIVE-7665
 Project: Hive
  Issue Type: Sub-task
  Components: Spark, Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7656) Bring tez-branch up-to the API changes made by TEZ-1372

2014-08-08 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-7656:
--

Status: Patch Available  (was: Open)

 Bring tez-branch up-to the API changes made by TEZ-1372
 ---

 Key: HIVE-7656
 URL: https://issues.apache.org/jira/browse/HIVE-7656
 Project: Hive
  Issue Type: Sub-task
Affects Versions: tez-branch
Reporter: Gopal V
Assignee: Gopal V
 Fix For: tez-branch

 Attachments: HIVE-7656.1-tez.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7656) Bring tez-branch up-to the API changes made by TEZ-1372

2014-08-08 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-7656:
--

Summary: Bring tez-branch up-to the API changes made by TEZ-1372  (was: 
Bring tez-branch up-to the API changes made by TEZ-1372,  TEZ-1386)

 Bring tez-branch up-to the API changes made by TEZ-1372
 ---

 Key: HIVE-7656
 URL: https://issues.apache.org/jira/browse/HIVE-7656
 Project: Hive
  Issue Type: Sub-task
Affects Versions: tez-branch
Reporter: Gopal V
Assignee: Gopal V
 Fix For: tez-branch

 Attachments: HIVE-7656.1-tez.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7432) Remove deprecated Avro's Schema.parse usages

2014-08-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091386#comment-14091386
 ] 

Hive QA commented on HIVE-7432:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660529/HIVE-7432.2.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5886 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/233/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/233/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-233/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660529

 Remove deprecated Avro's Schema.parse usages
 

 Key: HIVE-7432
 URL: https://issues.apache.org/jira/browse/HIVE-7432
 Project: Hive
  Issue Type: Improvement
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Attachments: HIVE-7432.1.patch, HIVE-7432.2.patch, HIVE-7432.patch


 Schema.parse has been deprecated by Avro, however it is being used at 
 multiple places in Hive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7432) Remove deprecated Avro's Schema.parse usages

2014-08-08 Thread Ashish Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091389#comment-14091389
 ] 

Ashish Kumar Singh commented on HIVE-7432:
--

Above test errors are not related to this patch.

 Remove deprecated Avro's Schema.parse usages
 

 Key: HIVE-7432
 URL: https://issues.apache.org/jira/browse/HIVE-7432
 Project: Hive
  Issue Type: Improvement
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Attachments: HIVE-7432.1.patch, HIVE-7432.2.patch, HIVE-7432.patch


 Schema.parse has been deprecated by Avro, however it is being used at 
 multiple places in Hive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7656) Bring tez-branch up-to the API changes made by TEZ-1372

2014-08-08 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7656:
-

Attachment: HIVE-7656.2.patch

Needed some more changes to work with TEZ-1386.

 Bring tez-branch up-to the API changes made by TEZ-1372
 ---

 Key: HIVE-7656
 URL: https://issues.apache.org/jira/browse/HIVE-7656
 Project: Hive
  Issue Type: Sub-task
Affects Versions: tez-branch
Reporter: Gopal V
Assignee: Gopal V
 Fix For: tez-branch

 Attachments: HIVE-7656.1-tez.patch, HIVE-7656.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7666) Join selectivity calculation should use exponential back-off for conjunction predicates

2014-08-08 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created HIVE-7666:
-

 Summary: Join selectivity calculation should use exponential 
back-off for conjunction predicates 
 Key: HIVE-7666
 URL: https://issues.apache.org/jira/browse/HIVE-7666
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran
 Fix For: 0.14.0


Assuming dependency for predicate (number of column joins  filters) will 
almost always hurt us as implied correlations do actually exist.

Currently HiveRelMdSelectivity.computeInnerJoinSelectivity uses to log to 
smoothen selectivity of conjunction predicates which results in un-optimal 
plans.

The problem with log is that it still assumes dependency, For instance in 
TPC-DS Q17 store_sales has 6 join predicates which explains why stor_sales is 
in the wrong place in the plan.

Change the algorithm to use exponential back-off  :
ndv(pe0) * ndv(pe1) ^(1/2)  * ndv(pe2) ^(1/4)  * ndv(pe3) ^(1/8)

Opposed to :

ndv(pex)*log(ndv(pe1))*log(ndv(pe2))

For TPC-DS Q17 store_sales has 6 inner join predicates if we assume selectivity 
of 0.7 for each join then join selectivity can end up being 6.24285E-05 which 
is too low and eventually results in an un-optimal plan.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7656) Bring tez-branch up-to the API changes made by TEZ-1372

2014-08-08 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091440#comment-14091440
 ] 

Vikram Dixit K commented on HIVE-7656:
--

Committed to tez branch. Thanks Gopal!

 Bring tez-branch up-to the API changes made by TEZ-1372
 ---

 Key: HIVE-7656
 URL: https://issues.apache.org/jira/browse/HIVE-7656
 Project: Hive
  Issue Type: Sub-task
Affects Versions: tez-branch
Reporter: Gopal V
Assignee: Gopal V
 Fix For: tez-branch

 Attachments: HIVE-7656.1-tez.patch, HIVE-7656.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-08 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-7616:
---

Attachment: HIVE-7616.05.patch

added to explain plan, fixed column stats names. I ran some tests to change out 
files, let's see what else fails to change those

 pre-size mapjoin hashtable based on statistics
 --

 Key: HIVE-7616
 URL: https://issues.apache.org/jira/browse/HIVE-7616
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, 
 HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.05.patch, HIVE-7616.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24427: HIVE-7616 pre-size mapjoin hashtable based on statistics

2014-08-08 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24427/
---

(Updated Aug. 8, 2014, 11:36 p.m.)


Review request for hive, Gunther Hagleitner, Mostafa Mokhtar, and Prasanth_J.


Repository: hive-git


Description
---

See jira


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8490558 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java cf64aa0 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java
 cdb5dc5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HashMapWrapper.java 
5b3b770 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
 629457c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 6d292d0 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 
29d895a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java 44cb9c0 
  ql/src/java/org/apache/hadoop/hive/ql/plan/Statistics.java 4173ea4 
  ql/src/test/queries/clientpositive/mapjoin_mapjoin.q 3f36851 
  ql/src/test/results/clientpositive/bucket_map_join_1.q.out 63fb0d1 
  ql/src/test/results/clientpositive/bucket_map_join_2.q.out 21f2d5a 
  ql/src/test/results/clientpositive/bucketcontext_1.q.out 5212de3 
  ql/src/test/results/clientpositive/bucketcontext_2.q.out d86c430 
  ql/src/test/results/clientpositive/bucketcontext_3.q.out a536e8b 
  ql/src/test/results/clientpositive/bucketcontext_4.q.out 26c8720 
  ql/src/test/results/clientpositive/bucketcontext_5.q.out 2619cfb 
  ql/src/test/results/clientpositive/bucketcontext_6.q.out 4c42ca7 
  ql/src/test/results/clientpositive/bucketcontext_7.q.out 7e5afb5 
  ql/src/test/results/clientpositive/bucketcontext_8.q.out 243b67a 
  ql/src/test/results/clientpositive/bucketmapjoin1.q.out 10f1af4 
  ql/src/test/results/clientpositive/bucketmapjoin10.q.out f852cde 
  ql/src/test/results/clientpositive/bucketmapjoin11.q.out 97e80fb 
  ql/src/test/results/clientpositive/bucketmapjoin12.q.out e486ca5 
  ql/src/test/results/clientpositive/bucketmapjoin2.q.out 297412f 
  ql/src/test/results/clientpositive/bucketmapjoin3.q.out 7f307a0 
  ql/src/test/results/clientpositive/bucketmapjoin4.q.out f0f9aee 
  ql/src/test/results/clientpositive/bucketmapjoin5.q.out 79e1c3d 
  ql/src/test/results/clientpositive/bucketmapjoin8.q.out e504c9d 
  ql/src/test/results/clientpositive/bucketmapjoin9.q.out 18f350a 
  ql/src/test/results/clientpositive/bucketmapjoin_negative.q.out 751e32f 
  ql/src/test/results/clientpositive/bucketmapjoin_negative2.q.out 3eb70d1 
  ql/src/test/results/clientpositive/bucketmapjoin_negative3.q.out 34abe4f 
  ql/src/test/results/clientpositive/join26.q.out bf8cf57 
  ql/src/test/results/clientpositive/join32.q.out ff0d7cc 
  ql/src/test/results/clientpositive/join33.q.out ff0d7cc 
  ql/src/test/results/clientpositive/join34.q.out b52777a 
  ql/src/test/results/clientpositive/join35.q.out 20c69ea 
  ql/src/test/results/clientpositive/join_map_ppr.q.out 51fb6c6 
  ql/src/test/results/clientpositive/mapjoin_mapjoin.q.out 567b0ca 
  ql/src/test/results/clientpositive/sample8.q.out e0c0f9e 
  ql/src/test/results/clientpositive/smb_mapjoin_11.q.out d59b801 
  ql/src/test/results/clientpositive/sort_merge_join_desc_5.q.out ba8928b 
  ql/src/test/results/clientpositive/sort_merge_join_desc_6.q.out d51a54e 
  ql/src/test/results/clientpositive/sort_merge_join_desc_7.q.out fcb6367 
  ql/src/test/results/clientpositive/stats11.q.out c5531c5 
  ql/src/test/results/clientpositive/tez/mapjoin_mapjoin.q.out 9e90ec2 
  ql/src/test/results/clientpositive/transform_ppr1.q.out 6f908fa 
  ql/src/test/results/clientpositive/transform_ppr2.q.out 9285151 
  ql/src/test/results/clientpositive/union22.q.out 884c106 
  ql/src/test/results/clientpositive/union_ppr.q.out ee209c7 

Diff: https://reviews.apache.org/r/24427/diff/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Updated] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-08 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-7616:
---

Attachment: HIVE-7616.06.patch

restrict explain change to tez

 pre-size mapjoin hashtable based on statistics
 --

 Key: HIVE-7616
 URL: https://issues.apache.org/jira/browse/HIVE-7616
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, 
 HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.05.patch, 
 HIVE-7616.06.patch, HIVE-7616.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24427: HIVE-7616 pre-size mapjoin hashtable based on statistics

2014-08-08 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24427/
---

(Updated Aug. 8, 2014, 11:42 p.m.)


Review request for hive, Gunther Hagleitner, Mostafa Mokhtar, and Prasanth_J.


Repository: hive-git


Description
---

See jira


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8490558 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java cf64aa0 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java
 cdb5dc5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HashMapWrapper.java 
5b3b770 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
 629457c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 6d292d0 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 
29d895a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java 44cb9c0 
  ql/src/java/org/apache/hadoop/hive/ql/plan/Statistics.java 4173ea4 
  ql/src/test/queries/clientpositive/mapjoin_mapjoin.q 3f36851 
  ql/src/test/results/clientpositive/mapjoin_mapjoin.q.out 567b0ca 
  ql/src/test/results/clientpositive/tez/mapjoin_mapjoin.q.out 9e90ec2 

Diff: https://reviews.apache.org/r/24427/diff/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Updated] (HIVE-7617) optimize bytes mapjoin hash table read path wrt serialization, at least for common cases

2014-08-08 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-7617:
---

Attachment: HIVE-7617.prelim.patch

preliminary patch after some experiments. Need to run tests and then perf tests 
too

 optimize bytes mapjoin hash table read path wrt serialization, at least for 
 common cases
 

 Key: HIVE-7617
 URL: https://issues.apache.org/jira/browse/HIVE-7617
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-7617.prelim.patch


 BytesBytes has table stores keys in the byte array for compact 
 representation, however that means that the straightforward implementation of 
 lookups serializes lookup keys to byte arrays, which is relatively expensive.
 We can either shortcut hashcode and compare for common types on read path 
 (integral types which would cover most of the real-world keys), or specialize 
 hashtable and from BytesBytes... create LongBytes, StringBytes, or whatever. 
 First one seems simpler now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7366) getDatabase using direct sql

2014-08-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-7366:
---

Status: Patch Available  (was: Open)

 getDatabase using direct sql
 

 Key: HIVE-7366
 URL: https://issues.apache.org/jira/browse/HIVE-7366
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-7366.patch


 Given that get_database is easily one of the most frequent calls made on the 
 metastore, we should have the ability to bypass datanucleus for that, and use 
 direct SQL instead.
 This was something that I did initially as part of debugging HIVE-7368, but I 
 think that given the frequency of this call, it's useful to have it in 
 mainline direct sql.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7366) getDatabase using direct sql

2014-08-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-7366:
---

Attachment: HIVE-7366.patch

Attaching patch.

 getDatabase using direct sql
 

 Key: HIVE-7366
 URL: https://issues.apache.org/jira/browse/HIVE-7366
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-7366.patch


 Given that get_database is easily one of the most frequent calls made on the 
 metastore, we should have the ability to bypass datanucleus for that, and use 
 direct SQL instead.
 This was something that I did initially as part of debugging HIVE-7368, but I 
 think that given the frequency of this call, it's useful to have it in 
 mainline direct sql.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-7657) Nullable union of 3 or more types is not recognized nullable

2014-08-08 Thread Ashish Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Kumar Singh reassigned HIVE-7657:


Assignee: Ashish Kumar Singh

 Nullable union of 3 or more types is not recognized nullable
 

 Key: HIVE-7657
 URL: https://issues.apache.org/jira/browse/HIVE-7657
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Arkadiusz Gasior
Assignee: Ashish Kumar Singh
  Labels: avro

 Handling nullable union of 3 types or more is causing serialization issues, 
 as [null,long,string] is not recognized nullable. Potential code 
 causing issues might be AvroSerdeUtils.java: 
 {code}
   public static boolean isNullableType(Schema schema) {
 return schema.getType().equals(Schema.Type.UNION) 
schema.getTypes().size() == 2 
  (schema.getTypes().get(0).getType().equals(Schema.Type.NULL) ||
   schema.getTypes().get(1).getType().equals(Schema.Type.NULL));
   // [null, null] not allowed, so this check is ok.
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >