date:20140829



 On Aug. 28, 2014, 6:05 a.m., Szehon Ho wrote:
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 1040
  https://reviews.apache.org/r/24688/diff/3/?file=669965#file669965line1040
 
  Yep, thats what I meant.

I think this option seemed not useful. Any bigger number than one reducer, 
which is default for order-by, will make better performance, then why don't we 
try with that?


- Navis


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24688/#review51747
---


On Aug. 27, 2014, 2:18 a.m., Navis Ryu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24688/
 ---
 
 (Updated Aug. 27, 2014, 2:18 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7669
 https://issues.apache.org/jira/browse/HIVE-7669
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The source table has 600 Million rows and it has a String column 
 l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated 
 across the 600 million rows)
 
 We are sorting it based on this string column l_shipinstruct as shown in 
 the below HiveQL with the following parameters. 
 {code:sql}
 set hive.optimize.sampling.orderby=true;
 set hive.optimize.sampling.orderby.number=1000;
 set hive.optimize.sampling.orderby.percent=0.1f;
 
 insert overwrite table lineitem_temp_report 
 select 
   l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, 
 l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, 
 l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment
 from 
   lineitem
 order by l_shipinstruct;
 {code}
 Stack Trace
 Diagnostic Messages for this Task:
 {noformat}
 Error: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
 ... 10 more
 Caused by: java.lang.IllegalArgumentException: Can't read partitions file
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
 at 
 org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42)
 at 
 org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37)
 ... 15 more
 Caused by: java.io.IOException: Split points are out of order
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96)
 ... 17 more
 {noformat}
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7f4afd9 
   common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 
   ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java 
 6c22362 
   ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java 166461a 
   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java ef72039 
   ql/src/test/org/apache/hadoop/hive/ql/exec/TestPartitionKeySampler.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/24688/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Navis Ryu

[jira] [Commented] (HIVE-7857) Hive query fails after Tez session times out


[ 
https://issues.apache.org/jira/browse/HIVE-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114901#comment-14114901
 ] 

Hive QA commented on HIVE-7857:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12665141/HIVE-7857.2.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 6127 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_stats_counter
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/552/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/552/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-552/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12665141

 Hive query fails after Tez session times out
 

 Key: HIVE-7857
 URL: https://issues.apache.org/jira/browse/HIVE-7857
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-7857.1.patch, HIVE-7857.2.patch


 Originally reported by [~deepesh]
 Steps to reproduce:
 Open the Hive CLI, ensure that HIVE_AUX_JARS_PATH has hcatalog-core.jar 
 in the path.
 Keep it idle for more than 5 minutes (this is the default tez session 
 timeout). Essentially Tez session should time out.
 Run a Hive on Tez query, the query fails. Here is a sample CLI session:
 {noformat}
 hive select from_unixtime(unix_timestamp(), dd-MMM-) from 
 vectortab10korc limit 1;
 Query ID = hrt_qa_20140626002525_6e964079-4031-406b-85ed-cda9c65dca22
 Total jobs = 1
 Launching Job 1 out of 1
 Tez session was closed. Reopening...
 Session re-established.
 Status: Running (application id: application_1403688364015_1930)
 Map 1: -/-
 Map 1: 0/1
 Map 1: 0/1
 Map 1: 0/1
 Map 1: 0/1
 Map 1: 0/1
 Status: Failed
 Vertex failed, vertexName=Map 1, vertexId=vertex_1403688364015_1930_1_00, 
 diagnostics=[Task failed, taskId=task_1403688364015_1930_1_00_00, 
 diagnostics=[AttemptID:attempt_1403688364015_1930_1_00_00_0 
 Info:Container container_1403688364015_1930_01_02 COMPLETED with 
 diagnostics set to [Resource 
 hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
  changed on src filesystem (expected 1403741969169, was 1403742347351
 ], AttemptID:attempt_1403688364015_1930_1_00_00_1 Info:Container 
 container_1403688364015_1930_01_03 COMPLETED with diagnostics set to 
 [Resource 
 hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
  changed on src filesystem (expected 1403741969169, was 1403742347351
 ], AttemptID:attempt_1403688364015_1930_1_00_00_2 Info:Container 
 container_1403688364015_1930_01_04 COMPLETED with diagnostics set to 
 [Resource 
 hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
  changed on src filesystem (expected 1403741969169, was 1403742347351
 ], AttemptID:attempt_1403688364015_1930_1_00_00_3 Info:Container 
 container_1403688364015_1930_01_05 COMPLETED with diagnostics set to 
 [Resource 
 hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar
  changed on src filesystem (expected 1403741969169, was 1403742347351
 ]], Vertex failed as one or more tasks failed. failedTasks:1]
 DAG failed due to vertex failure. failedVertices:1 killedVertices:0
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.tez.TezTask
 {noformat}



--
This message

[jira] [Commented] (HIVE-7497) Fix some default values in HiveConf

2014-08-29 Thread Dong Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114902#comment-14114902
 ] 

Dong Chen commented on HIVE-7497:
-

[~vgumashta] Thanks for taking care of it. I'm ok with it and please go ahead. 
Thanks :)

 Fix some default values in HiveConf
 ---

 Key: HIVE-7497
 URL: https://issues.apache.org/jira/browse/HIVE-7497
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland
Assignee: Dong Chen
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7497.1.patch, HIVE-7497.patch


 HIVE-5160 resolves an env variable at runtime via calling System.getenv(). As 
 long as the variable is not defined when you run the build null is returned 
 and the path is not placed in the hive-default,template. However if it is 
 defined it will populate hive-default.template with a path which will be 
 different based on the user running the build. We should use 
 $\{system:HIVE_CONF_DIR\} instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24472: HIVE-7649: Support column stats with temporary tables

2014-08-29 Thread Jason Dere


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24472/
---

(Updated Aug. 29, 2014, 6:13 a.m.)


Review request for hive and Prasanth_J.


Changes
---

Addressing review feedback from Prasanth


Bugs: HIVE-7649
https://issues.apache.org/jira/browse/HIVE-7649


Repository: hive-git


Description
---

Update SessionHiveMetastoreClient to get column stats to work for temp tables.


Diffs (updated)
-

  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
51c3f2c 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java 
4cf98d8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
3f8648b 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 9798cf3 
  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 7cb7c5e 
  ql/src/test/queries/clientnegative/temp_table_column_stats.q 9b7aa4a 
  ql/src/test/queries/clientpositive/temp_table_display_colstats_tbllvl.q 
PRE-CREATION 
  ql/src/test/results/clientnegative/temp_table_column_stats.q.out 486597a 
  ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/24472/diff/


Testing
---


Thanks,

Jason Dere

[jira] [Updated] (HIVE-7908) CBO: Handle Windowing functions part of expressions

2014-08-29 Thread Laljo John Pullokkaran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7908:
-

Attachment: HIVE-7908.patch

 CBO: Handle Windowing functions part of expressions
 ---

 Key: HIVE-7908
 URL: https://issues.apache.org/jira/browse/HIVE-7908
 Project: Hive
  Issue Type: Bug
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7908.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7649) Support column stats with temporary tables

2014-08-29 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7649:
-

Attachment: HIVE-7649.4.patch

patch v4, changes per review comments from Prasanth

 Support column stats with temporary tables
 --

 Key: HIVE-7649
 URL: https://issues.apache.org/jira/browse/HIVE-7649
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7649.1.patch, HIVE-7649.2.patch, HIVE-7649.3.patch, 
 HIVE-7649.4.patch


 Column stats currently not supported with temp tables, see if they can be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7908) CBO: Handle Windowing functions part of expressions

2014-08-29 Thread Laljo John Pullokkaran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7908:
-

Status: Patch Available  (was: Open)

 CBO: Handle Windowing functions part of expressions
 ---

 Key: HIVE-7908
 URL: https://issues.apache.org/jira/browse/HIVE-7908
 Project: Hive
  Issue Type: Bug
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7908.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7775) enable sample8.q.[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114919#comment-14114919
 ] 

Szehon Ho commented on HIVE-7775:
-

Hi Chengxiang, sorry do you mind opening a new JIRA as this one is already 
resolved?  Its one JIRA per commit.

 enable sample8.q.[Spark Branch]
 ---

 Key: HIVE-7775
 URL: https://issues.apache.org/jira/browse/HIVE-7775
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Fix For: spark-branch

 Attachments: HIVE-7775.1-spark.patch, HIVE-7775.2-spark.patch, 
 HIVE-7775.3-spark.additional.patch


 sample8.q contain join query, should enable this qtest after hive on spark 
 support join operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24688: parallel order by clause on a string column fails with IOException: Split points are out of order

2014-08-29 Thread Szehon Ho



 On Aug. 28, 2014, 6:05 a.m., Szehon Ho wrote:
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 1040
  https://reviews.apache.org/r/24688/diff/3/?file=669965#file669965line1040
 
  Yep, thats what I meant.
 
 Navis Ryu wrote:
 I think this option seemed not useful. Any bigger number than one 
 reducer, which is default for order-by, will make better performance, then 
 why don't we try with that?

You mean get rid of error check?  I was just trying to make this option easier 
to user, if we aren't going to expose it I'm ok with that.


- Szehon


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24688/#review51747
---


On Aug. 27, 2014, 2:18 a.m., Navis Ryu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24688/
 ---
 
 (Updated Aug. 27, 2014, 2:18 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7669
 https://issues.apache.org/jira/browse/HIVE-7669
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The source table has 600 Million rows and it has a String column 
 l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated 
 across the 600 million rows)
 
 We are sorting it based on this string column l_shipinstruct as shown in 
 the below HiveQL with the following parameters. 
 {code:sql}
 set hive.optimize.sampling.orderby=true;
 set hive.optimize.sampling.orderby.number=1000;
 set hive.optimize.sampling.orderby.percent=0.1f;
 
 insert overwrite table lineitem_temp_report 
 select 
   l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, 
 l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, 
 l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment
 from 
   lineitem
 order by l_shipinstruct;
 {code}
 Stack Trace
 Diagnostic Messages for this Task:
 {noformat}
 Error: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
 ... 10 more
 Caused by: java.lang.IllegalArgumentException: Can't read partitions file
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
 at 
 org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42)
 at 
 org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37)
 ... 15 more
 Caused by: java.io.IOException: Split points are out of order
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96)
 ... 17 more
 {noformat}
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7f4afd9 
   common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 
   ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java 
 6c22362 
   ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java 166461a 
   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java ef72039 
   ql/src/test/org/apache/hadoop/hive/ql/exec/TestPartitionKeySampler.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/24688/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Navis Ryu

[jira] [Commented] (HIVE-7907) Bring up tez branch to changes in TEZ-1038, TEZ-1500

2014-08-29 Thread Gunther Hagleitner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114925#comment-14114925
 ] 

Gunther Hagleitner commented on HIVE-7907:
--

+1

 Bring up tez branch to changes in TEZ-1038, TEZ-1500
 

 Key: HIVE-7907
 URL: https://issues.apache.org/jira/browse/HIVE-7907
 Project: Hive
  Issue Type: Sub-task
  Components: Tez
Affects Versions: tez-branch
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-7907.1-tez.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7870) Insert overwrite table query does not generate correct task plan [Spark Branch]

2014-08-29 Thread Na Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-7870:
--

Attachment: HIVE-7870.3-spark.patch

 Insert overwrite table query does not generate correct task plan [Spark 
 Branch]
 ---

 Key: HIVE-7870
 URL: https://issues.apache.org/jira/browse/HIVE-7870
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Na Yang
Assignee: Na Yang
  Labels: Spark-M1
 Attachments: HIVE-7870.1-spark.patch, HIVE-7870.2-spark.patch, 
 HIVE-7870.3-spark.patch


 Insert overwrite table query does not generate correct task plan when 
 hive.optimize.union.remove and hive.merge.sparkfiles properties are ON. 
 {noformat}
 set hive.optimize.union.remove=true
 set hive.merge.sparkfiles=true
 insert overwrite table outputTbl1
 SELECT * FROM
 (
 select key, 1 as values from inputTbl1
 union all
 select * FROM (
   SELECT key, count(1) as values from inputTbl1 group by key
   UNION ALL
   SELECT key, 2 as values from inputTbl1
 ) a
 )b;
 select * from outputTbl1 order by key, values;
 {noformat}
 query result
 {noformat}
 1 1
 1 2
 2 1
 2 2
 3 1
 3 2
 7 1
 7 2
 8 2
 8 2
 8 2
 {noformat}
 expected result:
 {noformat}
 1 1
 1 1
 1 2
 2 1
 2 1
 2 2
 3 1
 3 1
 3 2
 7 1
 7 1
 7 2
 8 1
 8 1
 8 2
 8 2
 8 2
 {noformat}
 Move work is not working properly and some data are missing during move.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7907) Bring up tez branch to changes in TEZ-1038, TEZ-1500

2014-08-29 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114933#comment-14114933
 ] 

Gopal V commented on HIVE-7907:
---

Looks like this has to wait till the 0.5.0-SNAPSHOT gets updated on the apache 
snapshots.

 Bring up tez branch to changes in TEZ-1038, TEZ-1500
 

 Key: HIVE-7907
 URL: https://issues.apache.org/jira/browse/HIVE-7907
 Project: Hive
  Issue Type: Sub-task
  Components: Tez
Affects Versions: tez-branch
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-7907.1-tez.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 25176: HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch]


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25176/
---

(Updated Aug. 29, 2014, 6:44 a.m.)


Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-7870
https://issues.apache.org/jira/browse/HIVE-7870


Repository: hive-git


Description
---

HIVE-7870: Insert overwrite table query does not generate correct task plan 
[Spark Branch]

The cause of this problem is during spark/tez task generation, the union file 
sink operator are cloned to two new filesink operator. The linkedfilesinkdesc 
info for those new filesink operators are missing. In addition, the two new 
filesink operators also need to be linked together.   


Diffs
-

  itests/src/test/resources/testconfiguration.properties 6393671 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
5ddc16d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 379a39c 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 
  ql/src/test/queries/clientpositive/union_remove_spark_1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_10.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_11.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_15.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_16.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_17.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_18.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_19.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_20.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_21.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_24.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_25.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_5.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_6.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_7.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_8.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_9.q PRE-CREATION 
  ql/src/test/results/clientpositive/spark/sample8.q.out c7e333b 
  ql/src/test/results/clientpositive/spark/union10.q.out 20c681e 
  ql/src/test/results/clientpositive/spark/union18.q.out 3f37a0a 
  ql/src/test/results/clientpositive/spark/union19.q.out 6922fcd 
  ql/src/test/results/clientpositive/spark/union28.q.out 8bd5218 
  ql/src/test/results/clientpositive/spark/union29.q.out b9546ef 
  ql/src/test/results/clientpositive/spark/union3.q.out 3ae6536 
  ql/src/test/results/clientpositive/spark/union30.q.out 12717a1 
  ql/src/test/results/clientpositive/spark/union33.q.out b89757f 
  ql/src/test/results/clientpositive/spark/union4.q.out 6341cd9 
  ql/src/test/results/clientpositive/spark/union6.q.out 263d9f4 
  ql/src/test/results/clientpositive/spark/union_remove_spark_1.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_10.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_11.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_15.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_16.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_17.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_18.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_19.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_2.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_20.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_21.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_24.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_25.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_3.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_4.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_5.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_6.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_7.q.out 
PRE-CREATION

Re: Review Request 25176: HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch]


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25176/
---

(Updated Aug. 29, 2014, 6:44 a.m.)


Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.


Changes
---

1. add .q.out for TestCliDriver test for all new spark .q tests
2. update existing .q.out files because of plan change


Bugs: HIVE-7870
https://issues.apache.org/jira/browse/HIVE-7870


Repository: hive-git


Description
---

HIVE-7870: Insert overwrite table query does not generate correct task plan 
[Spark Branch]

The cause of this problem is during spark/tez task generation, the union file 
sink operator are cloned to two new filesink operator. The linkedfilesinkdesc 
info for those new filesink operators are missing. In addition, the two new 
filesink operators also need to be linked together.   


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 6393671 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
5ddc16d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 379a39c 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 
  ql/src/test/queries/clientpositive/union_remove_spark_1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_10.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_11.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_15.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_16.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_17.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_18.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_19.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_20.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_21.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_24.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_25.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_5.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_6.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_7.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_8.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union_remove_spark_9.q PRE-CREATION 
  ql/src/test/results/clientpositive/spark/sample8.q.out c7e333b 
  ql/src/test/results/clientpositive/spark/union10.q.out 20c681e 
  ql/src/test/results/clientpositive/spark/union18.q.out 3f37a0a 
  ql/src/test/results/clientpositive/spark/union19.q.out 6922fcd 
  ql/src/test/results/clientpositive/spark/union28.q.out 8bd5218 
  ql/src/test/results/clientpositive/spark/union29.q.out b9546ef 
  ql/src/test/results/clientpositive/spark/union3.q.out 3ae6536 
  ql/src/test/results/clientpositive/spark/union30.q.out 12717a1 
  ql/src/test/results/clientpositive/spark/union33.q.out b89757f 
  ql/src/test/results/clientpositive/spark/union4.q.out 6341cd9 
  ql/src/test/results/clientpositive/spark/union6.q.out 263d9f4 
  ql/src/test/results/clientpositive/spark/union_remove_spark_1.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_10.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_11.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_15.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_16.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_17.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_18.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_19.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_2.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_20.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_21.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_24.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_25.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_3.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_4.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union_remove_spark_5.q.out 
PRE-CREATION

Re: Review Request 15449: session/operation timeout for hiveserver2

2014-08-29 Thread Lefty Leverenz



 On Aug. 28, 2014, 7:56 a.m., Lefty Leverenz wrote:
 
 
 Navis Ryu wrote:
 Addressing previous comments, I've revised validator to describe itself 
 to description. For StringSet validator, the description of the conf will be 
 started with something like, Expects one of [textfile, sequencefile, rcfile, 
 orc]. and for TimeValidator, it's Expects a numeric value with timeunit 
 (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), etc. It's the 
 reason why some part of description is removed. Could you generate the 
 template and see the result? (cd commmon;mvn clean package -Phadoop-2 -Pdist 
 -DskipTests). If you don't like this, I'll revert that.

Navis, that is cool to the nth degree!  I applied patch 15, generated a 
template file, and checked each parameter changed by the patch.  All the 
Expects phrases look great.

However, non-numeric values are lowercase.  For example, 
hive.exec.orc.encoding.strategy used to say the values are SPEED and 
COMPRESSION, but now it's Expects one of [speed, compression].  Are all 
parameter values case-insensitive?  If so, the Configuration Properties  
Configuration docs should mention it.

Two parameters still give units in their descriptions, although that seems to 
be deliberate:

  - hive.server2.long.polling.timeout:  Time in milliseconds that HiveServer2 
will wait, ... (has a non-zero default value, in milliseconds)
  - hive.support.quoted.identifiers:  Whether to use quoted identifier. 'none' 
or 'column' can be used. (goes on to explain what 'none' and 'column' mean)


- Lefty


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15449/#review51760
---


On Aug. 28, 2014, 2:31 a.m., Navis Ryu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15449/
 ---
 
 (Updated Aug. 28, 2014, 2:31 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-5799
 https://issues.apache.org/jira/browse/HIVE-5799
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Need some timeout facility for preventing resource leakages from instable or 
 bad clients.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/ant/GenHiveTemplate.java 4293b7c 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 74bb863 
   common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 
   
 itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java
  PRE-CREATION 
   service/src/java/org/apache/hive/service/cli/CLIService.java ff5de4a 
   service/src/java/org/apache/hive/service/cli/OperationState.java 3e15f0c 
   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
 0d6436e 
   
 service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
 2867301 
   service/src/java/org/apache/hive/service/cli/session/HiveSession.java 
 270e4a6 
   service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 
 84e1c7e 
   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
 4e5f595 
   
 service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java
  39d2184 
   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
 17c1c7b 
   service/src/test/org/apache/hive/service/cli/CLIServiceTest.java d01e819 
 
 Diff: https://reviews.apache.org/r/15449/diff/
 
 
 Testing
 ---
 
 Confirmed in the local environment.
 
 
 Thanks,
 
 Navis Ryu

Re: Review Request 15449: session/operation timeout for hiveserver2

2014-08-29 Thread Lefty Leverenz



 On Aug. 28, 2014, 7:56 a.m., Lefty Leverenz wrote:
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 1523
  https://reviews.apache.org/r/15449/diff/10/?file=670860#file670860line1523
 
  Please restore (in seconds) to description and specify other time 
  units that can be used, if any.

Not an issue -- my mistake.


 On Aug. 28, 2014, 7:56 a.m., Lefty Leverenz wrote:
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 1529
  https://reviews.apache.org/r/15449/diff/10/?file=670860#file670860line1529
 
  Please restore (in seconds) to description and specify other time 
  units that can be used, if any.

Not an issue -- my mistake.


 On Aug. 28, 2014, 7:56 a.m., Lefty Leverenz wrote:
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 1601
  https://reviews.apache.org/r/15449/diff/10/?file=670860#file670860line1601
 
  Please add time unit information:  Accepts time units like 
  d/h/m/s/ms/us/ns.

Not an issue -- my mistake.


 On Aug. 28, 2014, 7:56 a.m., Lefty Leverenz wrote:
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 1604
  https://reviews.apache.org/r/15449/diff/10/?file=670860#file670860line1604
 
  Please add time unit information:  Accepts time units like 
  d/h/m/s/ms/us/ns.

Not an issue -- my mistake.


 On Aug. 28, 2014, 7:56 a.m., Lefty Leverenz wrote:
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 1607
  https://reviews.apache.org/r/15449/diff/10/?file=670860#file670860line1607
 
  Please add time unit information:  Accepts time units like 
  d/h/m/s/ms/us/ns.

Not an issue -- my mistake.


- Lefty


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15449/#review51760
---


On Aug. 28, 2014, 2:31 a.m., Navis Ryu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15449/
 ---
 
 (Updated Aug. 28, 2014, 2:31 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-5799
 https://issues.apache.org/jira/browse/HIVE-5799
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Need some timeout facility for preventing resource leakages from instable or 
 bad clients.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/ant/GenHiveTemplate.java 4293b7c 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 74bb863 
   common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 
   
 itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java
  PRE-CREATION 
   service/src/java/org/apache/hive/service/cli/CLIService.java ff5de4a 
   service/src/java/org/apache/hive/service/cli/OperationState.java 3e15f0c 
   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
 0d6436e 
   
 service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
 2867301 
   service/src/java/org/apache/hive/service/cli/session/HiveSession.java 
 270e4a6 
   service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 
 84e1c7e 
   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
 4e5f595 
   
 service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java
  39d2184 
   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
 17c1c7b 
   service/src/test/org/apache/hive/service/cli/CLIServiceTest.java d01e819 
 
 Diff: https://reviews.apache.org/r/15449/diff/
 
 
 Testing
 ---
 
 Confirmed in the local environment.
 
 
 Thanks,
 
 Navis Ryu

Re: Review Request 24472: HIVE-7649: Support column stats with temporary tables

2014-08-29 Thread j . prasanth . j


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24472/#review51875
---

Ship it!


Ship It!

- Prasanth_J


On Aug. 29, 2014, 6:13 a.m., Jason Dere wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24472/
 ---
 
 (Updated Aug. 29, 2014, 6:13 a.m.)
 
 
 Review request for hive and Prasanth_J.
 
 
 Bugs: HIVE-7649
 https://issues.apache.org/jira/browse/HIVE-7649
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Update SessionHiveMetastoreClient to get column stats to work for temp tables.
 
 
 Diffs
 -
 
   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
 51c3f2c 
   
 ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
  4cf98d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
 3f8648b 
   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 9798cf3 
   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 7cb7c5e 
   ql/src/test/queries/clientnegative/temp_table_column_stats.q 9b7aa4a 
   ql/src/test/queries/clientpositive/temp_table_display_colstats_tbllvl.q 
 PRE-CREATION 
   ql/src/test/results/clientnegative/temp_table_column_stats.q.out 486597a 
   ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/24472/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jason Dere

[jira] [Commented] (HIVE-7649) Support column stats with temporary tables

2014-08-29 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114955#comment-14114955
 ] 

Prasanth J commented on HIVE-7649:
--

LGTM, +1

 Support column stats with temporary tables
 --

 Key: HIVE-7649
 URL: https://issues.apache.org/jira/browse/HIVE-7649
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7649.1.patch, HIVE-7649.2.patch, HIVE-7649.3.patch, 
 HIVE-7649.4.patch


 Column stats currently not supported with temp tables, see if they can be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 15449: session/operation timeout for hiveserver2



 On Aug. 28, 2014, 7:56 a.m., Lefty Leverenz wrote:
 
 
 Navis Ryu wrote:
 Addressing previous comments, I've revised validator to describe itself 
 to description. For StringSet validator, the description of the conf will be 
 started with something like, Expects one of [textfile, sequencefile, rcfile, 
 orc]. and for TimeValidator, it's Expects a numeric value with timeunit 
 (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), etc. It's the 
 reason why some part of description is removed. Could you generate the 
 template and see the result? (cd commmon;mvn clean package -Phadoop-2 -Pdist 
 -DskipTests). If you don't like this, I'll revert that.
 
 Lefty Leverenz wrote:
 Navis, that is cool to the nth degree!  I applied patch 15, generated a 
 template file, and checked each parameter changed by the patch.  All the 
 Expects phrases look great.
 
 However, non-numeric values are lowercase.  For example, 
 hive.exec.orc.encoding.strategy used to say the values are SPEED and 
 COMPRESSION, but now it's Expects one of [speed, compression].  Are all 
 parameter values case-insensitive?  If so, the Configuration Properties  
 Configuration docs should mention it.
 
 Two parameters still give units in their descriptions, although that 
 seems to be deliberate:
 
   - hive.server2.long.polling.timeout:  Time in milliseconds that 
 HiveServer2 will wait, ... (has a non-zero default value, in milliseconds)
   - hive.support.quoted.identifiers:  Whether to use quoted identifier. 
 'none' or 'column' can be used. (goes on to explain what 'none' and 'column' 
 mean)

bq. non-numeric values are lowercase
All values in StringSet are case-insensitive but if you prefer uppercase 
strings in description, that will be done. 

bq. Two parameters still give units in their descriptions
I was thinking of following issue to applying time validators to others, but 
I'll do that in here.


- Navis


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15449/#review51760
---


On Aug. 28, 2014, 2:31 a.m., Navis Ryu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15449/
 ---
 
 (Updated Aug. 28, 2014, 2:31 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-5799
 https://issues.apache.org/jira/browse/HIVE-5799
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Need some timeout facility for preventing resource leakages from instable or 
 bad clients.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/ant/GenHiveTemplate.java 4293b7c 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 74bb863 
   common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 
   
 itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java
  PRE-CREATION 
   service/src/java/org/apache/hive/service/cli/CLIService.java ff5de4a 
   service/src/java/org/apache/hive/service/cli/OperationState.java 3e15f0c 
   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
 0d6436e 
   
 service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
 2867301 
   service/src/java/org/apache/hive/service/cli/session/HiveSession.java 
 270e4a6 
   service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 
 84e1c7e 
   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
 4e5f595 
   
 service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java
  39d2184 
   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
 17c1c7b 
   service/src/test/org/apache/hive/service/cli/CLIServiceTest.java d01e819 
 
 Diff: https://reviews.apache.org/r/15449/diff/
 
 
 Testing
 ---
 
 Confirmed in the local environment.
 
 
 Thanks,
 
 Navis Ryu

[jira] [Commented] (HIVE-7904) Missing null check cause NPE when updating join column stats in statistics annotation


[ 
https://issues.apache.org/jira/browse/HIVE-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114979#comment-14114979
 ] 

Hive QA commented on HIVE-7904:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12665131/HIVE-7904.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6127 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/553/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/553/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-553/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12665131

 Missing null check cause NPE when updating join column stats in statistics 
 annotation
 -

 Key: HIVE-7904
 URL: https://issues.apache.org/jira/browse/HIVE-7904
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Trivial
 Fix For: 0.13.0

 Attachments: HIVE-7904.1.patch


 Column stats updation in join stats rule annotation can cause NPE if column 
 stats is missing from one relation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7870) Insert overwrite table query does not generate correct task plan [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114985#comment-14114985
 ] 

Hive QA commented on HIVE-7870:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12665274/HIVE-7870.3-spark.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6306 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/103/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/103/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-103/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12665274

 Insert overwrite table query does not generate correct task plan [Spark 
 Branch]
 ---

 Key: HIVE-7870
 URL: https://issues.apache.org/jira/browse/HIVE-7870
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Na Yang
Assignee: Na Yang
  Labels: Spark-M1
 Attachments: HIVE-7870.1-spark.patch, HIVE-7870.2-spark.patch, 
 HIVE-7870.3-spark.patch


 Insert overwrite table query does not generate correct task plan when 
 hive.optimize.union.remove and hive.merge.sparkfiles properties are ON. 
 {noformat}
 set hive.optimize.union.remove=true
 set hive.merge.sparkfiles=true
 insert overwrite table outputTbl1
 SELECT * FROM
 (
 select key, 1 as values from inputTbl1
 union all
 select * FROM (
   SELECT key, count(1) as values from inputTbl1 group by key
   UNION ALL
   SELECT key, 2 as values from inputTbl1
 ) a
 )b;
 select * from outputTbl1 order by key, values;
 {noformat}
 query result
 {noformat}
 1 1
 1 2
 2 1
 2 2
 3 1
 3 2
 7 1
 7 2
 8 2
 8 2
 8 2
 {noformat}
 expected result:
 {noformat}
 1 1
 1 1
 1 2
 2 1
 2 1
 2 2
 3 1
 3 1
 3 2
 7 1
 7 1
 7 2
 8 1
 8 1
 8 2
 8 2
 8 2
 {noformat}
 Move work is not working properly and some data are missing during move.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 15449: session/operation timeout for hiveserver2

2014-08-29 Thread Lefty Leverenz



 On Aug. 28, 2014, 7:56 a.m., Lefty Leverenz wrote:
 
 
 Navis Ryu wrote:
 Addressing previous comments, I've revised validator to describe itself 
 to description. For StringSet validator, the description of the conf will be 
 started with something like, Expects one of [textfile, sequencefile, rcfile, 
 orc]. and for TimeValidator, it's Expects a numeric value with timeunit 
 (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), etc. It's the 
 reason why some part of description is removed. Could you generate the 
 template and see the result? (cd commmon;mvn clean package -Phadoop-2 -Pdist 
 -DskipTests). If you don't like this, I'll revert that.
 
 Lefty Leverenz wrote:
 Navis, that is cool to the nth degree!  I applied patch 15, generated a 
 template file, and checked each parameter changed by the patch.  All the 
 Expects phrases look great.
 
 However, non-numeric values are lowercase.  For example, 
 hive.exec.orc.encoding.strategy used to say the values are SPEED and 
 COMPRESSION, but now it's Expects one of [speed, compression].  Are all 
 parameter values case-insensitive?  If so, the Configuration Properties  
 Configuration docs should mention it.
 
 Two parameters still give units in their descriptions, although that 
 seems to be deliberate:
 
   - hive.server2.long.polling.timeout:  Time in milliseconds that 
 HiveServer2 will wait, ... (has a non-zero default value, in milliseconds)
   - hive.support.quoted.identifiers:  Whether to use quoted identifier. 
 'none' or 'column' can be used. (goes on to explain what 'none' and 'column' 
 mean)
 
 Navis Ryu wrote:
 bq. non-numeric values are lowercase
 All values in StringSet are case-insensitive but if you prefer uppercase 
 strings in description, that will be done. 
 
 bq. Two parameters still give units in their descriptions
 I was thinking of following issue to applying time validators to others, 
 but I'll do that in here.

bq. ... if you prefer uppercase strings in description, that will be done.

No, lowercase is better.  (If values appeared in uppercase, people would assume 
it's required.)


- Lefty


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15449/#review51760
---


On Aug. 28, 2014, 2:31 a.m., Navis Ryu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15449/
 ---
 
 (Updated Aug. 28, 2014, 2:31 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-5799
 https://issues.apache.org/jira/browse/HIVE-5799
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Need some timeout facility for preventing resource leakages from instable or 
 bad clients.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/ant/GenHiveTemplate.java 4293b7c 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 74bb863 
   common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 
   
 itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java
  PRE-CREATION 
   service/src/java/org/apache/hive/service/cli/CLIService.java ff5de4a 
   service/src/java/org/apache/hive/service/cli/OperationState.java 3e15f0c 
   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
 0d6436e 
   
 service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
 2867301 
   service/src/java/org/apache/hive/service/cli/session/HiveSession.java 
 270e4a6 
   service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 
 84e1c7e 
   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
 4e5f595 
   
 service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java
  39d2184 
   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
 17c1c7b 
   service/src/test/org/apache/hive/service/cli/CLIServiceTest.java d01e819 
 
 Diff: https://reviews.apache.org/r/15449/diff/
 
 
 Testing
 ---
 
 Confirmed in the local environment.
 
 
 Thanks,
 
 Navis Ryu

[jira] [Updated] (HIVE-7904) Missing null check cause NPE when updating join column stats in statistics annotation

2014-08-29 Thread Prasanth J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7904:
-

   Resolution: Fixed
Fix Version/s: (was: 0.13.0)
   0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk

 Missing null check cause NPE when updating join column stats in statistics 
 annotation
 -

 Key: HIVE-7904
 URL: https://issues.apache.org/jira/browse/HIVE-7904
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Trivial
 Fix For: 0.14.0

 Attachments: HIVE-7904.1.patch


 Column stats updation in join stats rule annotation can cause NPE if column 
 stats is missing from one relation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7627:


Attachment: HIVE-7627.1-spark.patch

Use {{mapPartitionToPairWithContext()}} instead of {{mapPartitionToPair()}} to 
get access of TaskContext in HiveMapFuction/HiveReduceFunction. 

*NOTICE*, this patch depends on SPARK-2895, we need to update Spark dependency 
to latest spark master build after SPARK-2895 is merged.

 FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
 -

 Key: HIVE-7627
 URL: https://issues.apache.org/jira/browse/HIVE-7627
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7627.1-spark.patch


 Hive table statistic failed on FSStatsPublisher mode, with the following 
 exception in Spark executor side:
 {noformat}
 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception
 java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 
 20278 for file 
 /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
 at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
 Caused by: 
 org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID 
 mismatch. Request id and saved id: 20277 , 20278 for file 
 /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at

[jira] [Commented] (HIVE-7775) enable sample8.q.[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115002#comment-14115002
 ] 

Chengxiang Li commented on HIVE-7775:
-

Oh, i got it.

 enable sample8.q.[Spark Branch]
 ---

 Key: HIVE-7775
 URL: https://issues.apache.org/jira/browse/HIVE-7775
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Fix For: spark-branch

 Attachments: HIVE-7775.1-spark.patch, HIVE-7775.2-spark.patch, 
 HIVE-7775.3-spark.additional.patch


 sample8.q contain join query, should enable this qtest after hive on spark 
 support join operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7909) Fix samaple8.q automatic test failure[Spark Branch]

Chengxiang Li created HIVE-7909:
---

 Summary: Fix samaple8.q automatic test failure[Spark Branch]
 Key: HIVE-7909
 URL: https://issues.apache.org/jira/browse/HIVE-7909
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7843) orc_analyze.q fails with an assertion in FileSinkOperator [Spark Branch]

2014-08-29 Thread Venki Korukanti (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115003#comment-14115003
 ] 

Venki Korukanti commented on HIVE-7843:
---

Linking SPARK-2895 which is adding support for accessing TaskContext within a 
function.

 orc_analyze.q fails with an assertion in FileSinkOperator [Spark Branch]
 

 Key: HIVE-7843
 URL: https://issues.apache.org/jira/browse/HIVE-7843
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Venki Korukanti
Assignee: Venki Korukanti
  Labels: Spark-M1
 Fix For: spark-branch


 {code}
 java.lang.AssertionError: data length is different from num of DP columns
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynPartDirectory(FileSinkOperator.java:809)
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:730)
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.startGroup(FileSinkOperator.java:829)
 org.apache.hadoop.hive.ql.exec.Operator.defaultStartGroup(Operator.java:502)
 org.apache.hadoop.hive.ql.exec.Operator.startGroup(Operator.java:525)
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:198)
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47)
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27)
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
 org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7909) Fix samaple8.q automatic test failure[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7909:


Attachment: HIVE-7909.1-spark.patch

some stats change in the explain part, it's consistent with sample8.q,output in 
MR mode now.

 Fix samaple8.q automatic test failure[Spark Branch]
 ---

 Key: HIVE-7909
 URL: https://issues.apache.org/jira/browse/HIVE-7909
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7909.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7909) Fix samaple8.q automatic test failure[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7909:


Status: Patch Available  (was: Open)

 Fix samaple8.q automatic test failure[Spark Branch]
 ---

 Key: HIVE-7909
 URL: https://issues.apache.org/jira/browse/HIVE-7909
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7909.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7775) enable sample8.q.[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7775:


Status: Open  (was: Patch Available)

 enable sample8.q.[Spark Branch]
 ---

 Key: HIVE-7775
 URL: https://issues.apache.org/jira/browse/HIVE-7775
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Fix For: spark-branch

 Attachments: HIVE-7775.1-spark.patch, HIVE-7775.2-spark.patch, 
 HIVE-7775.3-spark.additional.patch


 sample8.q contain join query, should enable this qtest after hive on spark 
 support join operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HIVE-7775) enable sample8.q.[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li resolved HIVE-7775.
-

Resolution: Fixed

 enable sample8.q.[Spark Branch]
 ---

 Key: HIVE-7775
 URL: https://issues.apache.org/jira/browse/HIVE-7775
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Fix For: spark-branch

 Attachments: HIVE-7775.1-spark.patch, HIVE-7775.2-spark.patch, 
 HIVE-7775.3-spark.additional.patch


 sample8.q contain join query, should enable this qtest after hive on spark 
 support join operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7775) enable sample8.q.[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115006#comment-14115006
 ] 

Chengxiang Li commented on HIVE-7775:
-

Hi, Szehon, I've created HIVE-7909 to track it.

 enable sample8.q.[Spark Branch]
 ---

 Key: HIVE-7775
 URL: https://issues.apache.org/jira/browse/HIVE-7775
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Fix For: spark-branch

 Attachments: HIVE-7775.1-spark.patch, HIVE-7775.2-spark.patch, 
 HIVE-7775.3-spark.additional.patch


 sample8.q contain join query, should enable this qtest after hive on spark 
 support join operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7909) Fix samaple8.q automatic test failure[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115030#comment-14115030
 ] 

Hive QA commented on HIVE-7909:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12665292/HIVE-7909.1-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6266 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/104/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/104/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-104/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12665292

 Fix samaple8.q automatic test failure[Spark Branch]
 ---

 Key: HIVE-7909
 URL: https://issues.apache.org/jira/browse/HIVE-7909
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7909.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7557) When reduce is vectorized, dynpart_sort_opt_vectorization.q under Tez fails

2014-08-29 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7557:


   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Matt McCline.

 When reduce is vectorized, dynpart_sort_opt_vectorization.q under Tez fails
 ---

 Key: HIVE-7557
 URL: https://issues.apache.org/jira/browse/HIVE-7557
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline
 Fix For: 0.14.0

 Attachments: HIVE-7557.1.patch


 Turned off dynpart_sort_opt_vectorization.q (Tez) since it fails when reduce 
 is vectorized to get HIVE-7029 checked in.
 Stack trace:
 {code}
 Container released by application, 
 AttemptID:attempt_1406747677386_0003_2_00_00_2 Info:Error: 
 java.lang.RuntimeException: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing vector batch (tag=0) [Error getting row data with exception 
 java.lang.ClassCastException: 
 org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector cannot be cast to 
 org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
   at 
 org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:168)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:159)
   at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processVectors(ReduceRecordProcessor.java:481)
   at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processRows(ReduceRecordProcessor.java:371)
   at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307)
   at 
 org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:562)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:394)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at 
 org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:551)
  ]
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:188)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307)
   at 
 org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:562)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:394)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at 
 org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:551)
 Caused by: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing vector batch (tag=0) [Error getting row data with exception 
 java.lang.ClassCastException: 
 org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector cannot be cast to 
 org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
   at 
 org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:168)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:159)
   at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processVectors(ReduceRecordProcessor.java:481)
   at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processRows(ReduceRecordProcessor.java:371)
   at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307)
   at 
 org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:562)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:394)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at 
 org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:551)
  ]
   at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processRows(ReduceRecordProcessor.java:382)
   at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291)
   at

[jira] [Commented] (HIVE-7803) Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition)


[ 
https://issues.apache.org/jira/browse/HIVE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115037#comment-14115037
 ] 

Hive QA commented on HIVE-7803:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12665172/HIVE-7803.2.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6127 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/554/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/554/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-554/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12665172

 Enable Hadoop speculative execution may cause corrupt output directory 
 (dynamic partition)
 --

 Key: HIVE-7803
 URL: https://issues.apache.org/jira/browse/HIVE-7803
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
 Environment: 
Reporter: Selina Zhang
Assignee: Selina Zhang
Priority: Critical
 Attachments: HIVE-7803.1.patch, HIVE-7803.2.patch


 One of our users reports they see intermittent failures due to attempt 
 directories in the input paths. We found with speculative execution turned 
 on, two mappers tried to commit task at the same time using the same 
 committed task path,  which cause the corrupt output directory. 
 The original Pig script:
 {code}
 STORE AdvertiserDataParsedClean INTO '$DB_NAME.$ADVERTISER_META_TABLE_NAME'
 USING org.apache.hcatalog.pig.HCatStorer();
 {code}
 Two mappers
 attempt_1405021984947_5394024_m_000523_0: KILLED
 attempt_1405021984947_5394024_m_000523_1: SUCCEEDED
 attempt_1405021984947_5394024_m_000523_0 was killed right after the commit.
 As a result, it created corrupt directory as 
   
 /projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523/
 containing 
part-m-00523 (from attempt_1405021984947_5394024_m_000523_0)
 and 
attempt_1405021984947_5394024_m_000523_1/part-m-00523
 Namenode Audit log
 ==
 1. 2014-08-05 05:04:36,811 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
 cmd=create 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0/part-m-00523
  dst=null  perm=user:group:rw-r-
 2. 2014-08-05 05:04:53,112 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
 cmd=create 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1/part-m-00523
  dst=null  perm=user:group:rw-r-
 3. 2014-08-05 05:05:13,001 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
 cmd=rename 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0
 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
 perm=user:group:rwxr-x---
 4. 2014-08-05 05:05:13,004 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
 cmd=rename 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1
 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
 perm=user:group:rwxr-x---
 After consulting our Hadoop core team, we was pointed out some HCat code does 
 not participating in the two-phase commit protocol, for example in 
 FileRecordWriterContainer.close():
 {code}
 for (Map.EntryString, org.apache.hadoop.mapred.OutputCommitter 
 entry : baseDynamicCommitters.entrySet()) {
 org.apache.hadoop.mapred.TaskAttemptContext currContext = 
 dynamicContexts.get(entry.getKey());
 OutputCommitter baseOutputCommitter = entry.getValue();
 if (baseOutputCommitter.needsTaskCommit(currContext)) {

[jira] [Commented] (HIVE-7906) Missing Index on Hive metastore query


[ 
https://issues.apache.org/jira/browse/HIVE-7906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115040#comment-14115040
 ] 

Hive QA commented on HIVE-7906:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12665188/HIVE-456.patch.txt

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/556/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/556/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-556/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-556/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1621263.

At revision 1621263.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
patch:  Only garbage was found in the patch input.
patch:  Only garbage was found in the patch input.
patch:  Only garbage was found in the patch input.
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12665188

 Missing Index on Hive metastore query
 -

 Key: HIVE-7906
 URL: https://issues.apache.org/jira/browse/HIVE-7906
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Chu Tong
 Attachments: HIVE-456.patch.txt


 When it comes to SELECT statement on a table with large number of partitions 
 on Windows Azure DB, the query in the word document below causes major 
 performance degradation. Adding this missing index to turn index scan into 
 seek.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7908) CBO: Handle Windowing functions part of expressions


[ 
https://issues.apache.org/jira/browse/HIVE-7908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115041#comment-14115041
 ] 

Hive QA commented on HIVE-7908:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12665264/HIVE-7908.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/557/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/557/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-557/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-557/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1621263.

At revision 1621263.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12665264

 CBO: Handle Windowing functions part of expressions
 ---

 Key: HIVE-7908
 URL: https://issues.apache.org/jira/browse/HIVE-7908
 Project: Hive
  Issue Type: Bug
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7908.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5799) session/operation timeout for hiveserver2

2014-08-29 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5799:


Attachment: HIVE-5799.16.patch.txt

 session/operation timeout for hiveserver2
 -

 Key: HIVE-5799
 URL: https://issues.apache.org/jira/browse/HIVE-5799
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5799.1.patch.txt, HIVE-5799.10.patch.txt, 
 HIVE-5799.11.patch.txt, HIVE-5799.12.patch.txt, HIVE-5799.13.patch.txt, 
 HIVE-5799.14.patch.txt, HIVE-5799.15.patch.txt, HIVE-5799.16.patch.txt, 
 HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt, 
 HIVE-5799.5.patch.txt, HIVE-5799.6.patch.txt, HIVE-5799.7.patch.txt, 
 HIVE-5799.8.patch.txt, HIVE-5799.9.patch.txt


 Need some timeout facility for preventing resource leakages from instable  or 
 bad clients.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 15449: session/operation timeout for hiveserver2


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15449/
---

(Updated Aug. 29, 2014, 9:05 a.m.)


Review request for hive.


Changes
---

Addressed comments


Bugs: HIVE-5799
https://issues.apache.org/jira/browse/HIVE-5799


Repository: hive-git


Description
---

Need some timeout facility for preventing resource leakages from instable or 
bad clients.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/ant/GenHiveTemplate.java 4293b7c 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 74bb863 
  common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestRetryingHMSHandler.java
 39e7005 
  
itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java
 PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
9e3481a 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 4e76236 
  metastore/src/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java 
84e6dcd 
  metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 
063dee6 
  metastore/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 
8287c60 
  ql/src/java/org/apache/hadoop/hive/ql/exec/AutoProgressor.java d7323cb 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Heartbeater.java 7fdb4e7 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 5b857e2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java afd7bcf 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 70047a2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
eb2851b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java ebe9f92 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java 
11434a0 
  
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
 46044d0 
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java 
f636cff 
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 
db62721 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 3211759 
  ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java 
f34b5ad 
  ql/src/test/results/clientnegative/set_hiveconf_validation2.q.out 33f9360 
  service/src/java/org/apache/hadoop/hive/service/HiveServer.java 32729f2 
  service/src/java/org/apache/hive/service/cli/CLIService.java ff5de4a 
  service/src/java/org/apache/hive/service/cli/OperationState.java 3e15f0c 
  service/src/java/org/apache/hive/service/cli/operation/Operation.java 0d6436e 
  service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
2867301 
  service/src/java/org/apache/hive/service/cli/session/HiveSession.java 270e4a6 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 
84e1c7e 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
4e5f595 
  
service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java
 7668904 
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
17c1c7b 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
86ed4b4 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 
21d1563 
  service/src/test/org/apache/hive/service/cli/CLIServiceTest.java d01e819 

Diff: https://reviews.apache.org/r/15449/diff/


Testing
---

Confirmed in the local environment.


Thanks,

Navis Ryu

[jira] [Commented] (HIVE-7811) Compactions need to update table/partition stats


[ 
https://issues.apache.org/jira/browse/HIVE-7811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115039#comment-14115039
 ] 

Hive QA commented on HIVE-7811:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12665185/HIVE-7811.3.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/555/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/555/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-555/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-555/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 
'hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/DynamicPartitionFileRecordWriterContainer.java'
Reverted 
'hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-unit/target itests/custom-serde/target itests/util/target 
hcatalog/target hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target 
accumulo-handler/target hwi/target common/target common/src/gen service/target 
contrib/target serde/target beeline/target odbc/target cli/target 
ql/dependency-reduced-pom.xml ql/target
+ svn update
Uql/src/test/results/clientpositive/tez/dynpart_sort_opt_vectorization.q.out
Uql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
U
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java

Fetching external item into 'hcatalog/src/test/e2e/harness'
Updated external to revision 1621263.

Updated to revision 1621263.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12665185

 Compactions need to update table/partition stats
 

 Key: HIVE-7811
 URL: https://issues.apache.org/jira/browse/HIVE-7811
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 0.13.1
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-7811.3.patch


 Compactions should trigger stats recalculation for columns that which already 
 have sats.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7669) parallel order by clause on a string column fails with IOException: Split points are out of order

2014-08-29 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7669:


Attachment: HIVE-7669.4.patch.txt

 parallel order by clause on a string column fails with IOException: Split 
 points are out of order
 -

 Key: HIVE-7669
 URL: https://issues.apache.org/jira/browse/HIVE-7669
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor, SQL
Affects Versions: 0.12.0
 Environment: Hive 0.12.0-cdh5.0.0
 OS: Redhat linux
Reporter: Vishal Kamath
Assignee: Navis
  Labels: orderby
 Attachments: HIVE-7669.1.patch.txt, HIVE-7669.2.patch.txt, 
 HIVE-7669.3.patch.txt, HIVE-7669.4.patch.txt


 The source table has 600 Million rows and it has a String column 
 l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated 
 across the 600 million rows)
 We are sorting it based on this string column l_shipinstruct as shown in 
 the below HiveQL with the following parameters. 
 {code:sql}
 set hive.optimize.sampling.orderby=true;
 set hive.optimize.sampling.orderby.number=1000;
 set hive.optimize.sampling.orderby.percent=0.1f;
 insert overwrite table lineitem_temp_report 
 select 
   l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, 
 l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, 
 l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment
 from 
   lineitem
 order by l_shipinstruct;
 {code}
 Stack Trace
 Diagnostic Messages for this Task:
 {noformat}
 Error: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
 ... 10 more
 Caused by: java.lang.IllegalArgumentException: Can't read partitions file
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
 at 
 org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42)
 at 
 org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37)
 ... 15 more
 Caused by: java.io.IOException: Split points are out of order
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96)
 ... 17 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24688: parallel order by clause on a string column fails with IOException: Split points are out of order


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24688/
---

(Updated Aug. 29, 2014, 9:08 a.m.)


Review request for hive.


Changes
---

Removed the conf, as commented


Bugs: HIVE-7669
https://issues.apache.org/jira/browse/HIVE-7669


Repository: hive-git


Description
---

The source table has 600 Million rows and it has a String column 
l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated 
across the 600 million rows)

We are sorting it based on this string column l_shipinstruct as shown in the 
below HiveQL with the following parameters. 
{code:sql}
set hive.optimize.sampling.orderby=true;
set hive.optimize.sampling.orderby.number=1000;
set hive.optimize.sampling.orderby.percent=0.1f;

insert overwrite table lineitem_temp_report 
select 
  l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, 
l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, 
l_receiptdate, l_shipinstruct, l_shipmode, l_comment
from 
  lineitem
order by l_shipinstruct;
{code}
Stack Trace
Diagnostic Messages for this Task:
{noformat}
Error: java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at 
org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 10 more
Caused by: java.lang.IllegalArgumentException: Can't read partitions file
at 
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
at 
org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42)
at 
org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37)
... 15 more
Caused by: java.io.IOException: Split points are out of order
at 
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96)
... 17 more
{noformat}


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 74bb863 
  common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java 
6c22362 
  ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java 166461a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java ef72039 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestPartitionKeySampler.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/24688/diff/


Testing
---


Thanks,

Navis Ryu

[jira] [Resolved] (HIVE-7910) Enhance natural order scheduler to prevent downstream vertex from monopolizing the cluster resources

2014-08-29 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan resolved HIVE-7910.


Resolution: Won't Fix

Apologizes..Meant for tez project. Closing this bug.

 Enhance natural order scheduler to prevent downstream vertex from 
 monopolizing the cluster resources
 

 Key: HIVE-7910
 URL: https://issues.apache.org/jira/browse/HIVE-7910
 Project: Hive
  Issue Type: Bug
Reporter: Rajesh Balamohan
  Labels: performance

 M2 M7
 \  /
 (sg) \/
R3/ (b)
 \   /
  (b) \ /
   \   /
 M5
 |
 R6 
 Plz refer to the attachment (task runtime SVG).  In this case, M5 got 
 scheduled much earlier than R3 (R3 is mentioned as green color in the 
 diagram) and retained lots of containers.  R3 got less containers to work 
 with. 
 Attaching the output from the status monitor when the job ran;  Map_5 has 
 taken up almost all containers, whereas Reducer_3 got fraction of the 
 capacity.
 Map_2: 1/1  Map_5: 0(+373)/1000 Map_7: 1/1  Reducer_3: 0/8000 
   Reducer_6: 0/1
 Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 0/8000 
   Reducer_6: 0/1
 Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 0(+1)/8000 
   Reducer_6: 0/1
 
 Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 
 14(+7)/8000  Reducer_6: 0/1
 Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 
 63(+14)/8000 Reducer_6: 0/1
 Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 
 159(+22)/8000Reducer_6: 0/1
 Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 
 308(+29)/8000Reducer_6: 0/1
 ...
 Creating this JIRA as a placeholder for scheduler enhancement. One 
 possibililty could be to
 schedule lesser number of tasks in downstream vertices, based on the 
 information available for the upstream vertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7910) Enhance natural order scheduler to prevent downstream vertex from monopolizing the cluster resources

2014-08-29 Thread Rajesh Balamohan (JIRA)

Rajesh Balamohan created HIVE-7910:
--

 Summary: Enhance natural order scheduler to prevent downstream 
vertex from monopolizing the cluster resources
 Key: HIVE-7910
 URL: https://issues.apache.org/jira/browse/HIVE-7910
 Project: Hive
  Issue Type: Bug
Reporter: Rajesh Balamohan


M2 M7
\  /
(sg) \/
   R3/ (b)
\   /
 (b) \ /
  \   /
M5
|
R6 

Plz refer to the attachment (task runtime SVG).  In this case, M5 got scheduled 
much earlier than R3 (R3 is mentioned as green color in the diagram) and 
retained lots of containers.  R3 got less containers to work with. 

Attaching the output from the status monitor when the job ran;  Map_5 has taken 
up almost all containers, whereas Reducer_3 got fraction of the capacity.

Map_2: 1/1  Map_5: 0(+373)/1000 Map_7: 1/1  Reducer_3: 0/8000   
Reducer_6: 0/1
Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 0/8000   
Reducer_6: 0/1
Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 0(+1)/8000   
Reducer_6: 0/1

Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 14(+7)/8000  
Reducer_6: 0/1
Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 63(+14)/8000 
Reducer_6: 0/1
Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 
159(+22)/8000Reducer_6: 0/1
Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 
308(+29)/8000Reducer_6: 0/1
...


Creating this JIRA as a placeholder for scheduler enhancement. One possibililty 
could be to
schedule lesser number of tasks in downstream vertices, based on the 
information available for the upstream vertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7649) Support column stats with temporary tables


[ 
https://issues.apache.org/jira/browse/HIVE-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115083#comment-14115083
 ] 

Hive QA commented on HIVE-7649:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12665265/HIVE-7649.4.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6127 tests executed
*Failed tests:*
{noformat}
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/558/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/558/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-558/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12665265

 Support column stats with temporary tables
 --

 Key: HIVE-7649
 URL: https://issues.apache.org/jira/browse/HIVE-7649
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7649.1.patch, HIVE-7649.2.patch, HIVE-7649.3.patch, 
 HIVE-7649.4.patch


 Column stats currently not supported with temp tables, see if they can be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7669) parallel order by clause on a string column fails with IOException: Split points are out of order


[ 
https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115160#comment-14115160
 ] 

Hive QA commented on HIVE-7669:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12665301/HIVE-7669.4.patch.txt

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6128 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/559/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/559/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-559/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12665301

 parallel order by clause on a string column fails with IOException: Split 
 points are out of order
 -

 Key: HIVE-7669
 URL: https://issues.apache.org/jira/browse/HIVE-7669
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor, SQL
Affects Versions: 0.12.0
 Environment: Hive 0.12.0-cdh5.0.0
 OS: Redhat linux
Reporter: Vishal Kamath
Assignee: Navis
  Labels: orderby
 Attachments: HIVE-7669.1.patch.txt, HIVE-7669.2.patch.txt, 
 HIVE-7669.3.patch.txt, HIVE-7669.4.patch.txt


 The source table has 600 Million rows and it has a String column 
 l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated 
 across the 600 million rows)
 We are sorting it based on this string column l_shipinstruct as shown in 
 the below HiveQL with the following parameters. 
 {code:sql}
 set hive.optimize.sampling.orderby=true;
 set hive.optimize.sampling.orderby.number=1000;
 set hive.optimize.sampling.orderby.percent=0.1f;
 insert overwrite table lineitem_temp_report 
 select 
   l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, 
 l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, 
 l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment
 from 
   lineitem
 order by l_shipinstruct;
 {code}
 Stack Trace
 Diagnostic Messages for this Task:
 {noformat}
 Error: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
 ... 10 more
 Caused by: java.lang.IllegalArgumentException: Can't read partitions file
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
 at 
 org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42)
 at 
 org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37)
 ... 15 more
 Caused by: java.io.IOException: Split points are out of order
 at

[jira] [Commented] (HIVE-5799) session/operation timeout for hiveserver2


[ 
https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115227#comment-14115227
 ] 

Hive QA commented on HIVE-5799:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12665300/HIVE-5799.16.patch.txt

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 6128 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_conf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt10
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_set_hiveconf_validation2
org.apache.hadoop.hive.ql.txn.compactor.TestInitiator.recoverFailedRemoteWorkers
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/560/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/560/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-560/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12665300

 session/operation timeout for hiveserver2
 -

 Key: HIVE-5799
 URL: https://issues.apache.org/jira/browse/HIVE-5799
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5799.1.patch.txt, HIVE-5799.10.patch.txt, 
 HIVE-5799.11.patch.txt, HIVE-5799.12.patch.txt, HIVE-5799.13.patch.txt, 
 HIVE-5799.14.patch.txt, HIVE-5799.15.patch.txt, HIVE-5799.16.patch.txt, 
 HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt, 
 HIVE-5799.5.patch.txt, HIVE-5799.6.patch.txt, HIVE-5799.7.patch.txt, 
 HIVE-5799.8.patch.txt, HIVE-5799.9.patch.txt


 Need some timeout facility for preventing resource leakages from instable  or 
 bad clients.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 15449: session/operation timeout for hiveserver2

2014-08-29 Thread Lars Francke


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15449/#review51882
---

Ship it!


Only minor comments mostly on lines exceeding Checkstyle's configuration.


common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/15449/#comment90540

long line



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/15449/#comment90541

long line



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/15449/#comment90542

long line



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/15449/#comment90543

long lines



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/15449/#comment90544

long line



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/15449/#comment90545

long line



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/15449/#comment90546

long line



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/15449/#comment90547

long line



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/15449/#comment90548

long line



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/15449/#comment90549

long lines



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/15449/#comment90550

long line



common/src/java/org/apache/hadoop/hive/conf/Validator.java
https://reviews.apache.org/r/15449/#comment90551

missing @Override



metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
https://reviews.apache.org/r/15449/#comment90552

long line



metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
https://reviews.apache.org/r/15449/#comment90553

long line



ql/src/java/org/apache/hadoop/hive/ql/exec/Heartbeater.java
https://reviews.apache.org/r/15449/#comment90554

This comment needs updating



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
https://reviews.apache.org/r/15449/#comment90555

extra space before semicolon



service/src/java/org/apache/hive/service/cli/operation/Operation.java
https://reviews.apache.org/r/15449/#comment90556

+ - can be changed to just -, maybe warrants a comment.


- Lars Francke


On Aug. 29, 2014, 9:05 a.m., Navis Ryu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15449/
 ---
 
 (Updated Aug. 29, 2014, 9:05 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-5799
 https://issues.apache.org/jira/browse/HIVE-5799
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Need some timeout facility for preventing resource leakages from instable or 
 bad clients.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/ant/GenHiveTemplate.java 4293b7c 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 74bb863 
   common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 
   
 itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestRetryingHMSHandler.java
  39e7005 
   
 itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java
  PRE-CREATION 
   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
 9e3481a 
   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 4e76236 
   metastore/src/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java 
 84e6dcd 
   metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 
 063dee6 
   metastore/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 
 8287c60 
   ql/src/java/org/apache/hadoop/hive/ql/exec/AutoProgressor.java d7323cb 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Heartbeater.java 7fdb4e7 
   ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 5b857e2 
   ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java afd7bcf 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 70047a2 
   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
 eb2851b 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java ebe9f92 
   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java 
 11434a0 
   
 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
  46044d0 
   ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java 
 f636cff 
   ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 
 db62721

[jira] [Updated] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)


 [ 
https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7405:
---

Status: Patch Available  (was: Open)

+1

 Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
 --

 Key: HIVE-7405
 URL: https://issues.apache.org/jira/browse/HIVE-7405
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, 
 HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, 
 HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, 
 HIVE-7405.93.patch


 Vectorize the basic case that does not have any count distinct aggregation.
 Add a 4th processing mode in VectorGroupByOperator for reduce where each 
 input VectorizedRowBatch has only values for one key at a time.  Thus, the 
 values in the batch can be aggregated quickly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7911) Guaranteed ClassCastException in AccumuloRangeGenerator

Lars Francke created HIVE-7911:
--

 Summary: Guaranteed ClassCastException in AccumuloRangeGenerator
 Key: HIVE-7911
 URL: https://issues.apache.org/jira/browse/HIVE-7911
 Project: Hive
  Issue Type: Bug
Reporter: Lars Francke


AccumuloRangeGenerator has a typo where it should say 
{{WritableConstantFloatObjectInspector}} instead of 
{{WritableConstantDoubleObjectInspector}}.

I've changed the method to avoid the multiple if-else statements as all that is 
expected is a {{PrimitiveObjectInspector}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HIVE-7911) Guaranteed ClassCastException in AccumuloRangeGenerator


 [ 
https://issues.apache.org/jira/browse/HIVE-7911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke reassigned HIVE-7911:
--

Assignee: Lars Francke

 Guaranteed ClassCastException in AccumuloRangeGenerator
 ---

 Key: HIVE-7911
 URL: https://issues.apache.org/jira/browse/HIVE-7911
 Project: Hive
  Issue Type: Bug
Reporter: Lars Francke
Assignee: Lars Francke
 Attachments: HIVE-7911.1.patch


 AccumuloRangeGenerator has a typo where it should say 
 {{WritableConstantFloatObjectInspector}} instead of 
 {{WritableConstantDoubleObjectInspector}}.
 I've changed the method to avoid the multiple if-else statements as all that 
 is expected is a {{PrimitiveObjectInspector}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7911) Guaranteed ClassCastException in AccumuloRangeGenerator


 [ 
https://issues.apache.org/jira/browse/HIVE-7911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke updated HIVE-7911:
---

Attachment: HIVE-7911.1.patch

 Guaranteed ClassCastException in AccumuloRangeGenerator
 ---

 Key: HIVE-7911
 URL: https://issues.apache.org/jira/browse/HIVE-7911
 Project: Hive
  Issue Type: Bug
Reporter: Lars Francke
 Attachments: HIVE-7911.1.patch


 AccumuloRangeGenerator has a typo where it should say 
 {{WritableConstantFloatObjectInspector}} instead of 
 {{WritableConstantDoubleObjectInspector}}.
 I've changed the method to avoid the multiple if-else statements as all that 
 is expected is a {{PrimitiveObjectInspector}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7911) Guaranteed ClassCastException in AccumuloRangeGenerator


 [ 
https://issues.apache.org/jira/browse/HIVE-7911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke updated HIVE-7911:
---

Status: Patch Available  (was: Open)

 Guaranteed ClassCastException in AccumuloRangeGenerator
 ---

 Key: HIVE-7911
 URL: https://issues.apache.org/jira/browse/HIVE-7911
 Project: Hive
  Issue Type: Bug
Reporter: Lars Francke
 Attachments: HIVE-7911.1.patch


 AccumuloRangeGenerator has a typo where it should say 
 {{WritableConstantFloatObjectInspector}} instead of 
 {{WritableConstantDoubleObjectInspector}}.
 I've changed the method to avoid the multiple if-else statements as all that 
 is expected is a {{PrimitiveObjectInspector}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7551) expand spark accumulator to support hive counter [Spark Branch]

2014-08-29 Thread Suhas Satish (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115369#comment-14115369
 ] 

Suhas Satish commented on HIVE-7551:


Assigning to myself after talking to Na. Is this for milestone Spark-M3 as the 
dependent jiras are labeled?

 expand spark accumulator  to support hive counter [Spark Branch]
 

 Key: HIVE-7551
 URL: https://issues.apache.org/jira/browse/HIVE-7551
 Project: Hive
  Issue Type: New Feature
  Components: Spark
Reporter: Chengxiang Li
Assignee: Na Yang

 hive collect some operator statistic information through counter, we need to 
 support MR/Tez counter counterpart through spark accumulator.
 NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HIVE-7551) expand spark accumulator to support hive counter [Spark Branch]

2014-08-29 Thread Suhas Satish (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suhas Satish reassigned HIVE-7551:
--

Assignee: Suhas Satish  (was: Na Yang)

 expand spark accumulator  to support hive counter [Spark Branch]
 

 Key: HIVE-7551
 URL: https://issues.apache.org/jira/browse/HIVE-7551
 Project: Hive
  Issue Type: New Feature
  Components: Spark
Reporter: Chengxiang Li
Assignee: Suhas Satish

 hive collect some operator statistic information through counter, we need to 
 support MR/Tez counter counterpart through spark accumulator.
 NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7775) enable sample8.q.[Spark Branch]

2014-08-29 Thread Suhas Satish (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115380#comment-14115380
 ] 

Suhas Satish commented on HIVE-7775:


what kind of join did Szehon enable? Does hive on spark support full outer 
join? 

 enable sample8.q.[Spark Branch]
 ---

 Key: HIVE-7775
 URL: https://issues.apache.org/jira/browse/HIVE-7775
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Fix For: spark-branch

 Attachments: HIVE-7775.1-spark.patch, HIVE-7775.2-spark.patch, 
 HIVE-7775.3-spark.additional.patch


 sample8.q contain join query, should enable this qtest after hive on spark 
 support join operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7775) enable sample8.q.[Spark Branch]

2014-08-29 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115389#comment-14115389
 ] 

Brock Noland commented on HIVE-7775:


Hi Suhas,

The JIRA is here: HIVE-7815 basically it's non-parallel reduce side join 
which supports full, left, right, and inner joins.

Cheers!


 enable sample8.q.[Spark Branch]
 ---

 Key: HIVE-7775
 URL: https://issues.apache.org/jira/browse/HIVE-7775
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Fix For: spark-branch

 Attachments: HIVE-7775.1-spark.patch, HIVE-7775.2-spark.patch, 
 HIVE-7775.3-spark.additional.patch


 sample8.q contain join query, should enable this qtest after hive on spark 
 support join operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)


[ 
https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115387#comment-14115387
 ] 

Hive QA commented on HIVE-7405:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12665164/HIVE-7405.93.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6127 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/561/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/561/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-561/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12665164

 Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
 --

 Key: HIVE-7405
 URL: https://issues.apache.org/jira/browse/HIVE-7405
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, 
 HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, 
 HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, 
 HIVE-7405.93.patch


 Vectorize the basic case that does not have any count distinct aggregation.
 Add a 4th processing mode in VectorGroupByOperator for reduce where each 
 input VectorizedRowBatch has only values for one key at a time.  Thus, the 
 values in the batch can be aggregated quickly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7811) Compactions need to update table/partition stats

2014-08-29 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-7811:
-

Attachment: HIVE-7811.4.patch

 Compactions need to update table/partition stats
 

 Key: HIVE-7811
 URL: https://issues.apache.org/jira/browse/HIVE-7811
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 0.13.1
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-7811.3.patch, HIVE-7811.4.patch


 Compactions should trigger stats recalculation for columns that which already 
 have sats.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HIVE-7884) When partition filter containing single column from multiple partition is used in HCatInputFormat.setFilter, it returns empty set

2014-08-29 Thread Prafulla T (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prafulla T resolved HIVE-7884.
--

Resolution: Invalid

We found that this was due to error in our program that fetches from hive.
Resolving as Invalid.

 When partition filter containing single column from multiple partition is 
 used in  HCatInputFormat.setFilter, it returns empty set
 --

 Key: HIVE-7884
 URL: https://issues.apache.org/jira/browse/HIVE-7884
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prafulla T

 In one of product of my company, we use HCatInputFormat to import data from 
 hadoop-hive to our database. We use HCatInputFormat.setFilter to pass 
 partition filters based on partition columns. 
 We see following issue in latest hive. When hive table has multiple partition 
 columns and partition filter uses only single column out of it, we get empty 
 set instead of returning rows from partitions which match with single column 
 used in partition filter.
 This used to work earlier (hive 0.10.0 or 0.11.0), We experience this problem 
 in hive 0.13.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7911) Guaranteed ClassCastException in AccumuloRangeGenerator


[ 
https://issues.apache.org/jira/browse/HIVE-7911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115469#comment-14115469
 ] 

Ashutosh Chauhan commented on HIVE-7911:


+1

 Guaranteed ClassCastException in AccumuloRangeGenerator
 ---

 Key: HIVE-7911
 URL: https://issues.apache.org/jira/browse/HIVE-7911
 Project: Hive
  Issue Type: Bug
Reporter: Lars Francke
Assignee: Lars Francke
 Attachments: HIVE-7911.1.patch


 AccumuloRangeGenerator has a typo where it should say 
 {{WritableConstantFloatObjectInspector}} instead of 
 {{WritableConstantDoubleObjectInspector}}.
 I've changed the method to avoid the multiple if-else statements as all that 
 is expected is a {{PrimitiveObjectInspector}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7902) Cleanup hbase-handler/pom.xml dependency list

2014-08-29 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7902:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thank you very much for cleaning up my mess! :) I have committed this to trunk.

 Cleanup hbase-handler/pom.xml dependency list
 -

 Key: HIVE-7902
 URL: https://issues.apache.org/jira/browse/HIVE-7902
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.13.0, 0.13.1
Reporter: Venki Korukanti
Assignee: Venki Korukanti
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7902.1.patch


 Noticed an extra dependency {{hive-service}} when changing dependency version 
 of {{hive-hbase-handler}} from 0.12.0 to 0.13.0 in a third party application. 
 Tracing the log of hbase-handler/pom.xml file, it is added as part of ant to 
 maven migration and not because of any specific functionality requirement. 
 Dependency {{hive-service}} is not needed in {{hive-hbase-handler}} and can 
 be removed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7649) Support column stats with temporary tables

2014-08-29 Thread Prasanth J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7649:
-

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks [~jdere]!

 Support column stats with temporary tables
 --

 Key: HIVE-7649
 URL: https://issues.apache.org/jira/browse/HIVE-7649
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.14.0

 Attachments: HIVE-7649.1.patch, HIVE-7649.2.patch, HIVE-7649.3.patch, 
 HIVE-7649.4.patch


 Column stats currently not supported with temp tables, see if they can be 
 added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7911) Guaranteed ClassCastException in AccumuloRangeGenerator


[ 
https://issues.apache.org/jira/browse/HIVE-7911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115531#comment-14115531
 ] 

Hive QA commented on HIVE-7911:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12665356/HIVE-7911.1.patch

{color:green}SUCCESS:{color} +1 6127 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/562/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/562/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-562/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12665356

 Guaranteed ClassCastException in AccumuloRangeGenerator
 ---

 Key: HIVE-7911
 URL: https://issues.apache.org/jira/browse/HIVE-7911
 Project: Hive
  Issue Type: Bug
Reporter: Lars Francke
Assignee: Lars Francke
 Attachments: HIVE-7911.1.patch


 AccumuloRangeGenerator has a typo where it should say 
 {{WritableConstantFloatObjectInspector}} instead of 
 {{WritableConstantDoubleObjectInspector}}.
 I've changed the method to avoid the multiple if-else statements as all that 
 is expected is a {{PrimitiveObjectInspector}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 25176: HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch]

2014-08-29 Thread Brock Noland


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25176/#review51889
---


Hi Na,

Thank you very much for the patch! I have one high level question:

It appears we created the union_remove_spark* files because we wanted to add an 
additional property to the union_remove .q file? Meaning what is the delta 
beween union_remove_spark_1.q and union_remove_?

Cheers!

- Brock Noland


On Aug. 29, 2014, 6:44 a.m., Na Yang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25176/
 ---
 
 (Updated Aug. 29, 2014, 6:44 a.m.)
 
 
 Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.
 
 
 Bugs: HIVE-7870
 https://issues.apache.org/jira/browse/HIVE-7870
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-7870: Insert overwrite table query does not generate correct task plan 
 [Spark Branch]
 
 The cause of this problem is during spark/tez task generation, the union file 
 sink operator are cloned to two new filesink operator. The linkedfilesinkdesc 
 info for those new filesink operators are missing. In addition, the two new 
 filesink operators also need to be linked together.   
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties 6393671 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 5ddc16d 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 379a39c 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 76fc290 
   ql/src/test/queries/clientpositive/union_remove_spark_1.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_10.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_11.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_15.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_16.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_17.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_18.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_19.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_2.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_20.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_21.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_24.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_25.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_3.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_4.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_5.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_6.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_7.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_8.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_9.q PRE-CREATION 
   ql/src/test/results/clientpositive/spark/sample8.q.out c7e333b 
   ql/src/test/results/clientpositive/spark/union10.q.out 20c681e 
   ql/src/test/results/clientpositive/spark/union18.q.out 3f37a0a 
   ql/src/test/results/clientpositive/spark/union19.q.out 6922fcd 
   ql/src/test/results/clientpositive/spark/union28.q.out 8bd5218 
   ql/src/test/results/clientpositive/spark/union29.q.out b9546ef 
   ql/src/test/results/clientpositive/spark/union3.q.out 3ae6536 
   ql/src/test/results/clientpositive/spark/union30.q.out 12717a1 
   ql/src/test/results/clientpositive/spark/union33.q.out b89757f 
   ql/src/test/results/clientpositive/spark/union4.q.out 6341cd9 
   ql/src/test/results/clientpositive/spark/union6.q.out 263d9f4 
   ql/src/test/results/clientpositive/spark/union_remove_spark_1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/union_remove_spark_10.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/union_remove_spark_11.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/union_remove_spark_15.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/union_remove_spark_16.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/union_remove_spark_17.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/union_remove_spark_18.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/union_remove_spark_19.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/union_remove_spark_2.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/union_remove_spark_20.q.out 
 PRE-CREATION

[jira] [Commented] (HIVE-7803) Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition)

2014-08-29 Thread Selina Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115538#comment-14115538
 ] 

Selina Zhang commented on HIVE-7803:


The test failures seem not related to this patch. Saw the same failures for 
HIVE-7890. 

 Enable Hadoop speculative execution may cause corrupt output directory 
 (dynamic partition)
 --

 Key: HIVE-7803
 URL: https://issues.apache.org/jira/browse/HIVE-7803
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
 Environment: 
Reporter: Selina Zhang
Assignee: Selina Zhang
Priority: Critical
 Attachments: HIVE-7803.1.patch, HIVE-7803.2.patch


 One of our users reports they see intermittent failures due to attempt 
 directories in the input paths. We found with speculative execution turned 
 on, two mappers tried to commit task at the same time using the same 
 committed task path,  which cause the corrupt output directory. 
 The original Pig script:
 {code}
 STORE AdvertiserDataParsedClean INTO '$DB_NAME.$ADVERTISER_META_TABLE_NAME'
 USING org.apache.hcatalog.pig.HCatStorer();
 {code}
 Two mappers
 attempt_1405021984947_5394024_m_000523_0: KILLED
 attempt_1405021984947_5394024_m_000523_1: SUCCEEDED
 attempt_1405021984947_5394024_m_000523_0 was killed right after the commit.
 As a result, it created corrupt directory as 
   
 /projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523/
 containing 
part-m-00523 (from attempt_1405021984947_5394024_m_000523_0)
 and 
attempt_1405021984947_5394024_m_000523_1/part-m-00523
 Namenode Audit log
 ==
 1. 2014-08-05 05:04:36,811 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
 cmd=create 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0/part-m-00523
  dst=null  perm=user:group:rw-r-
 2. 2014-08-05 05:04:53,112 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
 cmd=create 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1/part-m-00523
  dst=null  perm=user:group:rw-r-
 3. 2014-08-05 05:05:13,001 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
 cmd=rename 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0
 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
 perm=user:group:rwxr-x---
 4. 2014-08-05 05:05:13,004 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
 cmd=rename 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1
 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
 perm=user:group:rwxr-x---
 After consulting our Hadoop core team, we was pointed out some HCat code does 
 not participating in the two-phase commit protocol, for example in 
 FileRecordWriterContainer.close():
 {code}
 for (Map.EntryString, org.apache.hadoop.mapred.OutputCommitter 
 entry : baseDynamicCommitters.entrySet()) {
 org.apache.hadoop.mapred.TaskAttemptContext currContext = 
 dynamicContexts.get(entry.getKey());
 OutputCommitter baseOutputCommitter = entry.getValue();
 if (baseOutputCommitter.needsTaskCommit(currContext)) {
 baseOutputCommitter.commitTask(currContext);
 }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.

2014-08-29 Thread Ravi Prakash (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HIVE-7100:
---

Status: Open  (was: Patch Available)

 Users of hive should be able to specify skipTrash when dropping tables.
 ---

 Key: HIVE-7100
 URL: https://issues.apache.org/jira/browse/HIVE-7100
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Ravi Prakash
Assignee: Jayesh
 Attachments: HIVE-7100.1.patch, HIVE-7100.2.patch, HIVE-7100.3.patch, 
 HIVE-7100.4.patch, HIVE-7100.patch


 Users of our clusters are often running up against their quota limits because 
 of Hive tables. When they drop tables, they have to then manually delete the 
 files from HDFS using skipTrash. This is cumbersome and unnecessary. We 
 should enable users to skipTrash directly when dropping tables.
 We should also be able to provide this functionality without polluting SQL 
 syntax.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.

2014-08-29 Thread Ravi Prakash (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HIVE-7100:
---

Status: Patch Available  (was: Open)

 Users of hive should be able to specify skipTrash when dropping tables.
 ---

 Key: HIVE-7100
 URL: https://issues.apache.org/jira/browse/HIVE-7100
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Ravi Prakash
Assignee: Jayesh
 Attachments: HIVE-7100.1.patch, HIVE-7100.2.patch, HIVE-7100.3.patch, 
 HIVE-7100.4.patch, HIVE-7100.patch


 Users of our clusters are often running up against their quota limits because 
 of Hive tables. When they drop tables, they have to then manually delete the 
 files from HDFS using skipTrash. This is cumbersome and unnecessary. We 
 should enable users to skipTrash directly when dropping tables.
 We should also be able to provide this functionality without polluting SQL 
 syntax.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7901) CLONE - pig -useHCatalog with embedded metastore fails to pass command line args to metastore (org.apache.hive.hcatalog version)

2014-08-29 Thread Eric Hanson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-7901:
--

Attachment: hive-7901.01.patch

I modified the original HIVE-6633 patch to put the changes in the right place, 
under apache/hive.  This is a new patch for those changes based directly off 
the current hive trunk.

 CLONE - pig -useHCatalog with embedded metastore fails to pass command line 
 args to metastore (org.apache.hive.hcatalog version)
 

 Key: HIVE-7901
 URL: https://issues.apache.org/jira/browse/HIVE-7901
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.14.0
Reporter: Sushanth Sowmyan
Assignee: Eric Hanson
 Attachments: hive-7901.01.patch


 This fails because the embedded metastore can't connect to the database 
 because the command line -D arguments passed to pig are not getting passed to 
 the metastore when the embedded metastore is created. Using 
 hive.metastore.uris set to the empty string causes creation of an embedded 
 metastore.
 pig -useHCatalog -Dhive.metastore.uris= 
 -Djavax.jdo.option.ConnectionPassword=AzureSQLDBXYZ
 The goal is to allow a pig job submitted via WebHCat to specify a metastore 
 to use via job arguments. That is not working because it is not possible to 
 pass Djavax.jdo.option.ConnectionPassword and other necessary arguments to 
 the embedded metastore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7901) CLONE - pig -useHCatalog with embedded metastore fails to pass command line args to metastore (org.apache.hive.hcatalog version)

2014-08-29 Thread Eric Hanson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-7901:
--

Status: Patch Available  (was: Open)

 CLONE - pig -useHCatalog with embedded metastore fails to pass command line 
 args to metastore (org.apache.hive.hcatalog version)
 

 Key: HIVE-7901
 URL: https://issues.apache.org/jira/browse/HIVE-7901
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.14.0
Reporter: Sushanth Sowmyan
Assignee: Eric Hanson
 Attachments: hive-7901.01.patch


 This fails because the embedded metastore can't connect to the database 
 because the command line -D arguments passed to pig are not getting passed to 
 the metastore when the embedded metastore is created. Using 
 hive.metastore.uris set to the empty string causes creation of an embedded 
 metastore.
 pig -useHCatalog -Dhive.metastore.uris= 
 -Djavax.jdo.option.ConnectionPassword=AzureSQLDBXYZ
 The goal is to allow a pig job submitted via WebHCat to specify a metastore 
 to use via job arguments. That is not working because it is not possible to 
 pass Djavax.jdo.option.ConnectionPassword and other necessary arguments to 
 the embedded metastore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7901) CLONE - pig -useHCatalog with embedded metastore fails to pass command line args to metastore (org.apache.hive.hcatalog version)

2014-08-29 Thread Eric Hanson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115550#comment-14115550
 ] 

Eric Hanson commented on HIVE-7901:
---

[~sushanth], please have a look and +1/commit if you think it's ready. Thanks! 

 CLONE - pig -useHCatalog with embedded metastore fails to pass command line 
 args to metastore (org.apache.hive.hcatalog version)
 

 Key: HIVE-7901
 URL: https://issues.apache.org/jira/browse/HIVE-7901
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.14.0
Reporter: Sushanth Sowmyan
Assignee: Eric Hanson
 Attachments: hive-7901.01.patch


 This fails because the embedded metastore can't connect to the database 
 because the command line -D arguments passed to pig are not getting passed to 
 the metastore when the embedded metastore is created. Using 
 hive.metastore.uris set to the empty string causes creation of an embedded 
 metastore.
 pig -useHCatalog -Dhive.metastore.uris= 
 -Djavax.jdo.option.ConnectionPassword=AzureSQLDBXYZ
 The goal is to allow a pig job submitted via WebHCat to specify a metastore 
 to use via job arguments. That is not working because it is not possible to 
 pass Djavax.jdo.option.ConnectionPassword and other necessary arguments to 
 the embedded metastore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7909) Fix samaple8.q automatic test failure[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115549#comment-14115549
 ] 

Szehon Ho commented on HIVE-7909:
-

+1

 Fix samaple8.q automatic test failure[Spark Branch]
 ---

 Key: HIVE-7909
 URL: https://issues.apache.org/jira/browse/HIVE-7909
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7909.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7669) parallel order by clause on a string column fails with IOException: Split points are out of order


[ 
https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115559#comment-14115559
 ] 

Szehon Ho commented on HIVE-7669:
-

Can you please add the license header, though?

 parallel order by clause on a string column fails with IOException: Split 
 points are out of order
 -

 Key: HIVE-7669
 URL: https://issues.apache.org/jira/browse/HIVE-7669
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor, SQL
Affects Versions: 0.12.0
 Environment: Hive 0.12.0-cdh5.0.0
 OS: Redhat linux
Reporter: Vishal Kamath
Assignee: Navis
  Labels: orderby
 Attachments: HIVE-7669.1.patch.txt, HIVE-7669.2.patch.txt, 
 HIVE-7669.3.patch.txt, HIVE-7669.4.patch.txt


 The source table has 600 Million rows and it has a String column 
 l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated 
 across the 600 million rows)
 We are sorting it based on this string column l_shipinstruct as shown in 
 the below HiveQL with the following parameters. 
 {code:sql}
 set hive.optimize.sampling.orderby=true;
 set hive.optimize.sampling.orderby.number=1000;
 set hive.optimize.sampling.orderby.percent=0.1f;
 insert overwrite table lineitem_temp_report 
 select 
   l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, 
 l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, 
 l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment
 from 
   lineitem
 order by l_shipinstruct;
 {code}
 Stack Trace
 Diagnostic Messages for this Task:
 {noformat}
 Error: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
 ... 10 more
 Caused by: java.lang.IllegalArgumentException: Can't read partitions file
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
 at 
 org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42)
 at 
 org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37)
 ... 15 more
 Caused by: java.io.IOException: Split points are out of order
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96)
 ... 17 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7669) parallel order by clause on a string column fails with IOException: Split points are out of order


[ 
https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1411#comment-1411
 ] 

Szehon Ho commented on HIVE-7669:
-

+1

 parallel order by clause on a string column fails with IOException: Split 
 points are out of order
 -

 Key: HIVE-7669
 URL: https://issues.apache.org/jira/browse/HIVE-7669
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor, SQL
Affects Versions: 0.12.0
 Environment: Hive 0.12.0-cdh5.0.0
 OS: Redhat linux
Reporter: Vishal Kamath
Assignee: Navis
  Labels: orderby
 Attachments: HIVE-7669.1.patch.txt, HIVE-7669.2.patch.txt, 
 HIVE-7669.3.patch.txt, HIVE-7669.4.patch.txt


 The source table has 600 Million rows and it has a String column 
 l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated 
 across the 600 million rows)
 We are sorting it based on this string column l_shipinstruct as shown in 
 the below HiveQL with the following parameters. 
 {code:sql}
 set hive.optimize.sampling.orderby=true;
 set hive.optimize.sampling.orderby.number=1000;
 set hive.optimize.sampling.orderby.percent=0.1f;
 insert overwrite table lineitem_temp_report 
 select 
   l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, 
 l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, 
 l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment
 from 
   lineitem
 order by l_shipinstruct;
 {code}
 Stack Trace
 Diagnostic Messages for this Task:
 {noformat}
 Error: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
 ... 10 more
 Caused by: java.lang.IllegalArgumentException: Can't read partitions file
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
 at 
 org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42)
 at 
 org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37)
 ... 15 more
 Caused by: java.io.IOException: Split points are out of order
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96)
 ... 17 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7912) Don't add is not null filter for partitioning column

Ashutosh Chauhan created HIVE-7912:
--

 Summary: Don't add is not null filter for partitioning column
 Key: HIVE-7912
 URL: https://issues.apache.org/jira/browse/HIVE-7912
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


HIVE-7159 introduces optimization which introduces is not null filter on inner 
join columns which is wasteful for partitioning column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7912) Don't add is not null filter for partitioning column


 [ 
https://issues.apache.org/jira/browse/HIVE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7912:
---

Attachment: HIVE-7912.patch

 Don't add is not null filter for partitioning column
 

 Key: HIVE-7912
 URL: https://issues.apache.org/jira/browse/HIVE-7912
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7912.patch


 HIVE-7159 introduces optimization which introduces is not null filter on 
 inner join columns which is wasteful for partitioning column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7912) Don't add is not null filter for partitioning column


 [ 
https://issues.apache.org/jira/browse/HIVE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7912:
---

Status: Patch Available  (was: Open)

 Don't add is not null filter for partitioning column
 

 Key: HIVE-7912
 URL: https://issues.apache.org/jira/browse/HIVE-7912
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7912.patch


 HIVE-7159 introduces optimization which introduces is not null filter on 
 inner join columns which is wasteful for partitioning column.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Review Request 25194: Don't add is not null filter for partitioning column

2014-08-29 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25194/
---

Review request for hive and Harish Butani.


Bugs: HIVE-7912
https://issues.apache.org/jira/browse/HIVE-7912


Repository: hive-git


Description
---

Don't add is not null filter for partitioning column


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 0106707 

Diff: https://reviews.apache.org/r/25194/diff/


Testing
---

Existing tests. Verified in debugger.


Thanks,

Ashutosh Chauhan

Re: Review Request 25176: HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch]



 On Aug. 29, 2014, 5:30 p.m., Brock Noland wrote:
  Hi Na,
  
  Thank you very much for the patch! I have one high level question:
  
  It appears we created the union_remove_spark* files because we wanted to 
  add an additional property to the union_remove .q file? Meaning what is the 
  delta beween union_remove_spark_1.q and union_remove_?
  
  Cheers!

Hi Brock,

That is correct. the union_remove_spark* files include an extra config property 
hive.merge.sparkfile comparing to the corresponding union_remove_* files. 
Except that extra config property, all other queries in the union_remove_spark* 
file are same as the queries in the union_remove_* file. 

The hive.merge.sparkfile value is set according to the hive.merge.mapfile and 
hive.merge.mapredfile properity values in the orginal union_remove_* file. 
Regarding to the test result, we expect to see the same data are returned from 
the union_remove_spark* queries and the corresponding union_remove_* queries.

Thanks,
Na


- Na


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25176/#review51889
---


On Aug. 29, 2014, 6:44 a.m., Na Yang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25176/
 ---
 
 (Updated Aug. 29, 2014, 6:44 a.m.)
 
 
 Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.
 
 
 Bugs: HIVE-7870
 https://issues.apache.org/jira/browse/HIVE-7870
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-7870: Insert overwrite table query does not generate correct task plan 
 [Spark Branch]
 
 The cause of this problem is during spark/tez task generation, the union file 
 sink operator are cloned to two new filesink operator. The linkedfilesinkdesc 
 info for those new filesink operators are missing. In addition, the two new 
 filesink operators also need to be linked together.   
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties 6393671 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 5ddc16d 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 379a39c 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 76fc290 
   ql/src/test/queries/clientpositive/union_remove_spark_1.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_10.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_11.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_15.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_16.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_17.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_18.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_19.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_2.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_20.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_21.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_24.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_25.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_3.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_4.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_5.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_6.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_7.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_8.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_9.q PRE-CREATION 
   ql/src/test/results/clientpositive/spark/sample8.q.out c7e333b 
   ql/src/test/results/clientpositive/spark/union10.q.out 20c681e 
   ql/src/test/results/clientpositive/spark/union18.q.out 3f37a0a 
   ql/src/test/results/clientpositive/spark/union19.q.out 6922fcd 
   ql/src/test/results/clientpositive/spark/union28.q.out 8bd5218 
   ql/src/test/results/clientpositive/spark/union29.q.out b9546ef 
   ql/src/test/results/clientpositive/spark/union3.q.out 3ae6536 
   ql/src/test/results/clientpositive/spark/union30.q.out 12717a1 
   ql/src/test/results/clientpositive/spark/union33.q.out b89757f 
   ql/src/test/results/clientpositive/spark/union4.q.out 6341cd9 
   ql/src/test/results/clientpositive/spark/union6.q.out 263d9f4 
   ql/src/test/results/clientpositive/spark/union_remove_spark_1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/union_remove_spark_10.q.out 
 PRE-CREATION

[jira] [Updated] (HIVE-7909) Fix samaple8.q automatic test failure[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-7909:


   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to spark branch.  Thanks Chengxiang.

 Fix samaple8.q automatic test failure[Spark Branch]
 ---

 Key: HIVE-7909
 URL: https://issues.apache.org/jira/browse/HIVE-7909
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Fix For: 0.14.0

 Attachments: HIVE-7909.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7913) Simplify predicates for CBO


 [ 
https://issues.apache.org/jira/browse/HIVE-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-7913:
--

Description: (was: I noticed that the estimate number of rows in Map 
joins is higher after the join than before the join that is with column stats 
fetch ON or OFF.
TPC-DS Q55 was a good example for that, the issue is that the current 
statistics provide us enough information that we can estimate with strong 
confidence that the joins are one to many and not many to many.

Joining store_sales x item on ss_item_sk = i_item_sk, we know that the NDV, min 
and max values for both join columns match while the row counts are different 
this pattern indicates a PK/FK relationship between store_sales and item.
Yet when a filter is applied on item and reduces the number of rows from 462K 
to 7K we estimate a many to many join between the filtered item and store_sales 
and as a result the estimate number of rows coming out of the join is off by 
several orders of magnitude.

Available information from the stats
{code}
Table   Join column NDV from describe   NDV actual  
min max
itemi_item_sk   439,501 462,000 
1   462,000
date_dimd_date_sk   65,332  73,049  
2,415,022   2,488,070
store_sales ss_item_sk  439,501 462,000 
1   462,000
store_sales ss_sold_date_sk 2,226   1,823   
2,450,816   2,452,642
{code}

Same thing applies to store_sales and date_dim but with a caveat that the NDV , 
min and max values don't match where date_dim has a bigger domain and 
accordingly a higher NDV count.
For joining store_sales and item on on ss_item_sk = i_item_sk since both 
columns have the same NDV, min and max values we can safely conclude that 
selectivity on item will translate to similar selectivity on store_sales.
This is not the case for joining store_sales and date_dim on ss_sold_date_sk = 
d_date_sk since the domain of d_date_sk is much bigger than that of 
ss_sold_date_sk, differences in domain need to be taken into account when 
inferring selectivity onto store_sales.)

 Simplify predicates for CBO
 ---

 Key: HIVE-7913
 URL: https://issues.apache.org/jira/browse/HIVE-7913
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran
 Fix For: 0.14.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7913) Simplify predicates for CBO

Mostafa Mokhtar created HIVE-7913:
-

 Summary: Simplify predicates for CBO
 Key: HIVE-7913
 URL: https://issues.apache.org/jira/browse/HIVE-7913
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
 Fix For: 0.14.0


I noticed that the estimate number of rows in Map joins is higher after the 
join than before the join that is with column stats fetch ON or OFF.
TPC-DS Q55 was a good example for that, the issue is that the current 
statistics provide us enough information that we can estimate with strong 
confidence that the joins are one to many and not many to many.

Joining store_sales x item on ss_item_sk = i_item_sk, we know that the NDV, min 
and max values for both join columns match while the row counts are different 
this pattern indicates a PK/FK relationship between store_sales and item.
Yet when a filter is applied on item and reduces the number of rows from 462K 
to 7K we estimate a many to many join between the filtered item and store_sales 
and as a result the estimate number of rows coming out of the join is off by 
several orders of magnitude.

Available information from the stats
{code}
Table   Join column NDV from describe   NDV actual  
min max
itemi_item_sk   439,501 462,000 
1   462,000
date_dimd_date_sk   65,332  73,049  
2,415,022   2,488,070
store_sales ss_item_sk  439,501 462,000 
1   462,000
store_sales ss_sold_date_sk 2,226   1,823   
2,450,816   2,452,642
{code}

Same thing applies to store_sales and date_dim but with a caveat that the NDV , 
min and max values don't match where date_dim has a bigger domain and 
accordingly a higher NDV count.
For joining store_sales and item on on ss_item_sk = i_item_sk since both 
columns have the same NDV, min and max values we can safely conclude that 
selectivity on item will translate to similar selectivity on store_sales.
This is not the case for joining store_sales and date_dim on ss_sold_date_sk = 
d_date_sk since the domain of d_date_sk is much bigger than that of 
ss_sold_date_sk, differences in domain need to be taken into account when 
inferring selectivity onto store_sales.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7913) Simplify predicates for CBO


 [ 
https://issues.apache.org/jira/browse/HIVE-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-7913:
--

Assignee: Laljo John Pullokkaran

 Simplify predicates for CBO
 ---

 Key: HIVE-7913
 URL: https://issues.apache.org/jira/browse/HIVE-7913
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran
 Fix For: 0.14.0


 I noticed that the estimate number of rows in Map joins is higher after the 
 join than before the join that is with column stats fetch ON or OFF.
 TPC-DS Q55 was a good example for that, the issue is that the current 
 statistics provide us enough information that we can estimate with strong 
 confidence that the joins are one to many and not many to many.
 Joining store_sales x item on ss_item_sk = i_item_sk, we know that the NDV, 
 min and max values for both join columns match while the row counts are 
 different this pattern indicates a PK/FK relationship between store_sales and 
 item.
 Yet when a filter is applied on item and reduces the number of rows from 462K 
 to 7K we estimate a many to many join between the filtered item and 
 store_sales and as a result the estimate number of rows coming out of the 
 join is off by several orders of magnitude.
 Available information from the stats
 {code}
 Table Join column NDV from describe   NDV actual  
 min max
 item  i_item_sk   439,501 462,000 
 1   462,000
 date_dim  d_date_sk   65,332  73,049  
 2,415,022   2,488,070
 store_sales   ss_item_sk  439,501 462,000 
 1   462,000
 store_sales   ss_sold_date_sk 2,226   1,823   
 2,450,816   2,452,642
 {code}
 Same thing applies to store_sales and date_dim but with a caveat that the NDV 
 , min and max values don't match where date_dim has a bigger domain and 
 accordingly a higher NDV count.
 For joining store_sales and item on on ss_item_sk = i_item_sk since both 
 columns have the same NDV, min and max values we can safely conclude that 
 selectivity on item will translate to similar selectivity on store_sales.
 This is not the case for joining store_sales and date_dim on ss_sold_date_sk 
 = d_date_sk since the domain of d_date_sk is much bigger than that of 
 ss_sold_date_sk, differences in domain need to be taken into account when 
 inferring selectivity onto store_sales.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7913) Simplify predicates for CBO


 [ 
https://issues.apache.org/jira/browse/HIVE-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-7913:
--

Description: 
Simplify predicates for disjunctive predicates so that can get pushed down to 
the scan

{code}
select avg(ss_quantity)
   ,avg(ss_ext_sales_price)
   ,avg(ss_ext_wholesale_cost)
   ,sum(ss_ext_wholesale_cost)
 from store_sales
 ,store
 ,customer_demographics
 ,household_demographics
 ,customer_address
 ,date_dim
 where store.s_store_sk = store_sales.ss_store_sk
 and  store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 
2001
 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'M'
  and customer_demographics.cd_education_status = '4 yr Degree'
  and store_sales.ss_sales_price between 100.00 and 150.00
  and household_demographics.hd_dep_count = 3   
 )or
 (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'D'
  and customer_demographics.cd_education_status = 'Primary'
  and store_sales.ss_sales_price between 50.00 and 100.00   
  and household_demographics.hd_dep_count = 1
 ) or 
 (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'U'
  and customer_demographics.cd_education_status = 'Advanced Degree'
  and store_sales.ss_sales_price between 150.00 and 200.00 
  and household_demographics.hd_dep_count = 1  
 ))
 and((store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('KY', 'GA', 'NM')
  and store_sales.ss_net_profit between 100 and 200  
 ) or
 (store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('MT', 'OR', 'IN')
  and store_sales.ss_net_profit between 150 and 300  
 ) or
 (store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('WI', 'MO', 'WV')
  and store_sales.ss_net_profit between 50 and 250  
 ))
;

{code}

 Simplify predicates for CBO
 ---

 Key: HIVE-7913
 URL: https://issues.apache.org/jira/browse/HIVE-7913
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran
 Fix For: 0.14.0


 Simplify predicates for disjunctive predicates so that can get pushed down to 
 the scan
 {code}
 select avg(ss_quantity)
,avg(ss_ext_sales_price)
,avg(ss_ext_wholesale_cost)
,sum(ss_ext_wholesale_cost)
  from store_sales
  ,store
  ,customer_demographics
  ,household_demographics
  ,customer_address
  ,date_dim
  where store.s_store_sk = store_sales.ss_store_sk
  and  store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 
 2001
  and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
   and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
   and customer_demographics.cd_marital_status = 'M'
   and customer_demographics.cd_education_status = '4 yr Degree'
   and store_sales.ss_sales_price between 100.00 and 150.00
   and household_demographics.hd_dep_count = 3   
  )or
  (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
   and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
   and customer_demographics.cd_marital_status = 'D'
   and customer_demographics.cd_education_status = 'Primary'
   and store_sales.ss_sales_price between 50.00 and 100.00   
   and household_demographics.hd_dep_count = 1
  ) or 
  (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
   and customer_demographics.cd_demo_sk = ss_cdemo_sk
   and customer_demographics.cd_marital_status = 'U'
   and customer_demographics.cd_education_status = 'Advanced Degree'
   and store_sales.ss_sales_price between 150.00 and 200.00 
   and household_demographics.hd_dep_count = 1  
  ))
  and((store_sales.ss_addr_sk = customer_address.ca_address_sk
   and customer_address.ca_country = 'United States'
   and customer_address.ca_state in ('KY', 'GA', 'NM')
   and store_sales.ss_net_profit between 100 and 200  
  ) or
  (store_sales.ss_addr_sk = customer_address.ca_address_sk
   and customer_address.ca_country = 'United States'
   and customer_address.ca_state in ('MT', 'OR', 'IN')
   and store_sales.ss_net_profit between 150 and 300  
  ) or
  (store_sales.ss_addr_sk = customer_address.ca_address_sk
   and

[jira] [Commented] (HIVE-7811) Compactions need to update table/partition stats


[ 
https://issues.apache.org/jira/browse/HIVE-7811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115638#comment-14115638
 ] 

Hive QA commented on HIVE-7811:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12665364/HIVE-7811.4.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6128 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/563/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/563/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-563/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12665364

 Compactions need to update table/partition stats
 

 Key: HIVE-7811
 URL: https://issues.apache.org/jira/browse/HIVE-7811
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 0.13.1
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-7811.3.patch, HIVE-7811.4.patch


 Compactions should trigger stats recalculation for columns that which already 
 have sats.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7902) Cleanup hbase-handler/pom.xml dependency list


[ 
https://issues.apache.org/jira/browse/HIVE-7902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115655#comment-14115655
 ] 

Szehon Ho commented on HIVE-7902:
-

You committed to spark branch :)  I just merged the same patch from spark to 
trunk, as what was intended, hopefully that kept the history.

 Cleanup hbase-handler/pom.xml dependency list
 -

 Key: HIVE-7902
 URL: https://issues.apache.org/jira/browse/HIVE-7902
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.13.0, 0.13.1
Reporter: Venki Korukanti
Assignee: Venki Korukanti
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7902.1.patch


 Noticed an extra dependency {{hive-service}} when changing dependency version 
 of {{hive-hbase-handler}} from 0.12.0 to 0.13.0 in a third party application. 
 Tracing the log of hbase-handler/pom.xml file, it is added as part of ant to 
 maven migration and not because of any specific functionality requirement. 
 Dependency {{hive-service}} is not needed in {{hive-hbase-handler}} and can 
 be removed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7902) Cleanup hbase-handler/pom.xml dependency list

2014-08-29 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115667#comment-14115667
 ] 

Brock Noland commented on HIVE-7902:


Shoot... Yes thank you very much.

 Cleanup hbase-handler/pom.xml dependency list
 -

 Key: HIVE-7902
 URL: https://issues.apache.org/jira/browse/HIVE-7902
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.13.0, 0.13.1
Reporter: Venki Korukanti
Assignee: Venki Korukanti
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7902.1.patch


 Noticed an extra dependency {{hive-service}} when changing dependency version 
 of {{hive-hbase-handler}} from 0.12.0 to 0.13.0 in a third party application. 
 Tracing the log of hbase-handler/pom.xml file, it is added as part of ant to 
 maven migration and not because of any specific functionality requirement. 
 Dependency {{hive-service}} is not needed in {{hive-hbase-handler}} and can 
 be removed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 25176: HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch]

2014-08-29 Thread Brock Noland



 On Aug. 29, 2014, 5:30 p.m., Brock Noland wrote:
  Hi Na,
  
  Thank you very much for the patch! I have one high level question:
  
  It appears we created the union_remove_spark* files because we wanted to 
  add an additional property to the union_remove .q file? Meaning what is the 
  delta beween union_remove_spark_1.q and union_remove_?
  
  Cheers!
 
 Na Yang wrote:
 Hi Brock,
 
 That is correct. the union_remove_spark* files include an extra config 
 property hive.merge.sparkfile comparing to the corresponding union_remove_* 
 files. Except that extra config property, all other queries in the 
 union_remove_spark* file are same as the queries in the union_remove_* file. 
 
 The hive.merge.sparkfile value is set according to the hive.merge.mapfile 
 and hive.merge.mapredfile properity values in the orginal union_remove_* 
 file. Regarding to the test result, we expect to see the same data are 
 returned from the union_remove_spark* queries and the corresponding 
 union_remove_* queries.
 
 Thanks,
 Na

Hi,

Thank you very much for the information! I think instead of adding the new 
union_remove_spark tests we should just add the hive.merge.sparkfile property 
to the union_remove q files. The extra property won't impact the existng tests 
other than an extra line of outpit.

If instead we'd like to keep the hive_remove_spark* properties then we'd need 
to add a check to QTestUtil that does not run spark files for MR:

https://github.com/apache/hive/blob/trunk/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java#L431

as the tests are currently running for both spark and MR. As such, I think the 
first solution (just add the property to the existing tests) makes sense.

Thoughts?


- Brock


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25176/#review51889
---


On Aug. 29, 2014, 6:44 a.m., Na Yang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25176/
 ---
 
 (Updated Aug. 29, 2014, 6:44 a.m.)
 
 
 Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.
 
 
 Bugs: HIVE-7870
 https://issues.apache.org/jira/browse/HIVE-7870
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-7870: Insert overwrite table query does not generate correct task plan 
 [Spark Branch]
 
 The cause of this problem is during spark/tez task generation, the union file 
 sink operator are cloned to two new filesink operator. The linkedfilesinkdesc 
 info for those new filesink operators are missing. In addition, the two new 
 filesink operators also need to be linked together.   
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties 6393671 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 5ddc16d 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 379a39c 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 76fc290 
   ql/src/test/queries/clientpositive/union_remove_spark_1.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_10.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_11.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_15.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_16.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_17.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_18.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_19.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_2.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_20.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_21.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_24.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_25.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_3.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_4.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_5.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_6.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_7.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_8.q PRE-CREATION 
   ql/src/test/queries/clientpositive/union_remove_spark_9.q PRE-CREATION 
   ql/src/test/results/clientpositive/spark/sample8.q.out c7e333b 
   ql/src/test/results/clientpositive/spark/union10.q.out 20c681e

[jira] [Updated] (HIVE-7913) Simplify predicates for CBO


 [ 
https://issues.apache.org/jira/browse/HIVE-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-7913:
--

Description: 
Simplify predicates for disjunctive predicates so that can get pushed down to 
the scan.

For TPC-DS query 13 we push down predicates in the following form 

where c_martial_status in ('M','D','U') etc.. 

{code}
select avg(ss_quantity)
   ,avg(ss_ext_sales_price)
   ,avg(ss_ext_wholesale_cost)
   ,sum(ss_ext_wholesale_cost)
 from store_sales
 ,store
 ,customer_demographics
 ,household_demographics
 ,customer_address
 ,date_dim
 where store.s_store_sk = store_sales.ss_store_sk
 and  store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 
2001
 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'M'
  and customer_demographics.cd_education_status = '4 yr Degree'
  and store_sales.ss_sales_price between 100.00 and 150.00
  and household_demographics.hd_dep_count = 3   
 )or
 (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'D'
  and customer_demographics.cd_education_status = 'Primary'
  and store_sales.ss_sales_price between 50.00 and 100.00   
  and household_demographics.hd_dep_count = 1
 ) or 
 (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'U'
  and customer_demographics.cd_education_status = 'Advanced Degree'
  and store_sales.ss_sales_price between 150.00 and 200.00 
  and household_demographics.hd_dep_count = 1  
 ))
 and((store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('KY', 'GA', 'NM')
  and store_sales.ss_net_profit between 100 and 200  
 ) or
 (store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('MT', 'OR', 'IN')
  and store_sales.ss_net_profit between 150 and 300  
 ) or
 (store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('WI', 'MO', 'WV')
  and store_sales.ss_net_profit between 50 and 250  
 ))
;

{code}

  was:
Simplify predicates for disjunctive predicates so that can get pushed down to 
the scan

{code}
select avg(ss_quantity)
   ,avg(ss_ext_sales_price)
   ,avg(ss_ext_wholesale_cost)
   ,sum(ss_ext_wholesale_cost)
 from store_sales
 ,store
 ,customer_demographics
 ,household_demographics
 ,customer_address
 ,date_dim
 where store.s_store_sk = store_sales.ss_store_sk
 and  store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 
2001
 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'M'
  and customer_demographics.cd_education_status = '4 yr Degree'
  and store_sales.ss_sales_price between 100.00 and 150.00
  and household_demographics.hd_dep_count = 3   
 )or
 (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'D'
  and customer_demographics.cd_education_status = 'Primary'
  and store_sales.ss_sales_price between 50.00 and 100.00   
  and household_demographics.hd_dep_count = 1
 ) or 
 (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'U'
  and customer_demographics.cd_education_status = 'Advanced Degree'
  and store_sales.ss_sales_price between 150.00 and 200.00 
  and household_demographics.hd_dep_count = 1  
 ))
 and((store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('KY', 'GA', 'NM')
  and store_sales.ss_net_profit between 100 and 200  
 ) or
 (store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('MT', 'OR', 'IN')
  and store_sales.ss_net_profit between 150 and 300  
 ) or
 (store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('WI', 'MO', 'WV')
  and store_sales.ss_net_profit between 50 and 250  
 ))
;

{code}


 Simplify predicates for CBO
 ---

 Key: HIVE-7913
 URL:

Re: Review Request 25176: HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch]

On Aug. 29, 2014, 5:30 p.m., Brock Noland wrote:
Hi Na,

Thank you very much for the patch! I have one high level question:

It appears we created the union_remove_spark* files because we wanted to
add an additional property to the union_remove .q file? Meaning what is the
delta beween union_remove_spark_1.q and union_remove_?

Cheers!

Na Yang wrote:
Hi Brock,

That is correct. the union_remove_spark* files include an extra config
property hive.merge.sparkfile comparing to the corresponding union_remove_*
files. Except that extra config property, all other queries in the
union_remove_spark* file are same as the queries in the union_remove_* file.

The hive.merge.sparkfile value is set according to the hive.merge.mapfile
and hive.merge.mapredfile properity values in the orginal union_remove_*
file. Regarding to the test result, we expect to see the same data are
returned from the union_remove_spark* queries and the corresponding
union_remove_* queries.

Thanks,
Na

Brock Noland wrote:
Hi,

Thank you very much for the information! I think instead of adding the
new union_remove_spark tests we should just add the hive.merge.sparkfile
property to the union_remove q files. The extra property won't impact the
existng tests other than an extra line of outpit.

If instead we'd like to keep the hive_remove_spark* properties then we'd
need to add a check to QTestUtil that does not run spark files for MR:

https://github.com/apache/hive/blob/trunk/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java#L431

as the tests are currently running for both spark and MR. As such, I
think the first solution (just add the property to the existing tests) makes
sense.

Thoughts?

Hi Brock,

Thank you for your suggestion. I also prefer the first solution. Let me modify
the existing union_remove q files and re-genenrate the .q.out files for both MR
and Spark.

Thanks,
Na

- Na

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25176/#review51889
---

On Aug. 29, 2014, 6:44 a.m., Na Yang wrote:

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25176/
---

(Updated Aug. 29, 2014, 6:44 a.m.)

Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.

Bugs: HIVE-7870
https://issues.apache.org/jira/browse/HIVE-7870

Repository: hive-git

Description
---

HIVE-7870: Insert overwrite table query does not generate correct task plan
[Spark Branch]

The cause of this problem is during spark/tez task generation, the union file
sink operator are cloned to two new filesink operator. The linkedfilesinkdesc
info for those new filesink operators are missing. In addition, the two new
filesink operators also need to be linked together.

Diffs
-

itests/src/test/resources/testconfiguration.properties 6393671
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java
5ddc16d
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java
379a39c
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java
76fc290
ql/src/test/queries/clientpositive/union_remove_spark_1.q PRE-CREATION
ql/src/test/queries/clientpositive/union_remove_spark_10.q PRE-CREATION
ql/src/test/queries/clientpositive/union_remove_spark_11.q PRE-CREATION
ql/src/test/queries/clientpositive/union_remove_spark_15.q PRE-CREATION
ql/src/test/queries/clientpositive/union_remove_spark_16.q PRE-CREATION
ql/src/test/queries/clientpositive/union_remove_spark_17.q PRE-CREATION
ql/src/test/queries/clientpositive/union_remove_spark_18.q PRE-CREATION
ql/src/test/queries/clientpositive/union_remove_spark_19.q PRE-CREATION
ql/src/test/queries/clientpositive/union_remove_spark_2.q PRE-CREATION
ql/src/test/queries/clientpositive/union_remove_spark_20.q PRE-CREATION
ql/src/test/queries/clientpositive/union_remove_spark_21.q PRE-CREATION
ql/src/test/queries/clientpositive/union_remove_spark_24.q PRE-CREATION
ql/src/test/queries/clientpositive/union_remove_spark_25.q PRE-CREATION
ql/src/test/queries/clientpositive/union_remove_spark_3.q PRE-CREATION
ql/src/test/queries/clientpositive/union_remove_spark_4.q PRE-CREATION
ql/src/test/queries/clientpositive/union_remove_spark_5.q PRE-CREATION
ql/src/test/queries/clientpositive/union_remove_spark_6.q PRE-CREATION
ql/src/test/queries/clientpositive/union_remove_spark_7.q PRE-CREATION

[jira] [Updated] (HIVE-7913) Simplify predicates for CBO


 [ 
https://issues.apache.org/jira/browse/HIVE-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-7913:
--

Description: 
Simplify predicates for disjunctive predicates so that can get pushed down to 
the scan.

For TPC-DS query 13 we push down predicates in the following form 

where c_martial_status in ('M','D','U') etc.. 

{code}
select avg(ss_quantity)
   ,avg(ss_ext_sales_price)
   ,avg(ss_ext_wholesale_cost)
   ,sum(ss_ext_wholesale_cost)
 from store_sales
 ,store
 ,customer_demographics
 ,household_demographics
 ,customer_address
 ,date_dim
 where store.s_store_sk = store_sales.ss_store_sk
 and  store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 
2001
 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'M'
  and customer_demographics.cd_education_status = '4 yr Degree'
  and store_sales.ss_sales_price between 100.00 and 150.00
  and household_demographics.hd_dep_count = 3   
 )or
 (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'D'
  and customer_demographics.cd_education_status = 'Primary'
  and store_sales.ss_sales_price between 50.00 and 100.00   
  and household_demographics.hd_dep_count = 1
 ) or 
 (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'U'
  and customer_demographics.cd_education_status = 'Advanced Degree'
  and store_sales.ss_sales_price between 150.00 and 200.00 
  and household_demographics.hd_dep_count = 1  
 ))
 and((store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('KY', 'GA', 'NM')
  and store_sales.ss_net_profit between 100 and 200  
 ) or
 (store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('MT', 'OR', 'IN')
  and store_sales.ss_net_profit between 150 and 300  
 ) or
 (store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('WI', 'MO', 'WV')
  and store_sales.ss_net_profit between 50 and 250  
 ))
;

{code}


This is the plan currently generated without any predicate simplification 
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Map 7 - Map 8 (BROADCAST_EDGE)
Map 8 - Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE)
Reducer 2 - Map 1 (SIMPLE_EDGE), Map 4 (BROADCAST_EDGE), Map 7 
(SIMPLE_EDGE)
Reducer 3 - Reducer 2 (SIMPLE_EDGE)
  DagName: mmokhtar_20140828155050_7059c24b-501b-4683-86c0-4f3c023f0b0e:1
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: customer_address
  Statistics: Num rows: 4000 Data size: 40595195284 Basic 
stats: COMPLETE Column stats: NONE
  Select Operator
expressions: ca_address_sk (type: int), ca_state (type: 
string), ca_country (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 4000 Data size: 40595195284 Basic 
stats: COMPLETE Column stats: NONE
Reduce Output Operator
  sort order: 
  Statistics: Num rows: 4000 Data size: 40595195284 
Basic stats: COMPLETE Column stats: NONE
  value expressions: _col0 (type: int), _col1 (type: 
string), _col2 (type: string)
Execution mode: vectorized
Map 4 
Map Operator Tree:
TableScan
  alias: date_dim
  filterExpr: ((d_year = 2001) and d_date_sk is not null) 
(type: boolean)
  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: ((d_year = 2001) and d_date_sk is not null) 
(type: boolean)
Statistics: Num rows: 18262 Data size: 20435178 Basic 
stats: COMPLETE Column stats: NONE
Select Operator
  expressions: d_date_sk (type: int)
  outputColumnNames: _col0
  Statistics: Num rows: 18262 Data size: 20435178 Basic 
stats: COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce

[jira] [Updated] (HIVE-7913) Simplify filter predicates for CBO


 [ 
https://issues.apache.org/jira/browse/HIVE-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-7913:
--

Summary: Simplify filter predicates for CBO  (was: Simplify predicates for 
CBO)

 Simplify filter predicates for CBO
 --

 Key: HIVE-7913
 URL: https://issues.apache.org/jira/browse/HIVE-7913
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran
 Fix For: 0.14.0


 Simplify predicates for disjunctive predicates so that can get pushed down to 
 the scan.
 For TPC-DS query 13 we push down predicates in the following form 
 where c_martial_status in ('M','D','U') etc.. 
 {code}
 select avg(ss_quantity)
,avg(ss_ext_sales_price)
,avg(ss_ext_wholesale_cost)
,sum(ss_ext_wholesale_cost)
  from store_sales
  ,store
  ,customer_demographics
  ,household_demographics
  ,customer_address
  ,date_dim
  where store.s_store_sk = store_sales.ss_store_sk
  and  store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 
 2001
  and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
   and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
   and customer_demographics.cd_marital_status = 'M'
   and customer_demographics.cd_education_status = '4 yr Degree'
   and store_sales.ss_sales_price between 100.00 and 150.00
   and household_demographics.hd_dep_count = 3   
  )or
  (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
   and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
   and customer_demographics.cd_marital_status = 'D'
   and customer_demographics.cd_education_status = 'Primary'
   and store_sales.ss_sales_price between 50.00 and 100.00   
   and household_demographics.hd_dep_count = 1
  ) or 
  (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
   and customer_demographics.cd_demo_sk = ss_cdemo_sk
   and customer_demographics.cd_marital_status = 'U'
   and customer_demographics.cd_education_status = 'Advanced Degree'
   and store_sales.ss_sales_price between 150.00 and 200.00 
   and household_demographics.hd_dep_count = 1  
  ))
  and((store_sales.ss_addr_sk = customer_address.ca_address_sk
   and customer_address.ca_country = 'United States'
   and customer_address.ca_state in ('KY', 'GA', 'NM')
   and store_sales.ss_net_profit between 100 and 200  
  ) or
  (store_sales.ss_addr_sk = customer_address.ca_address_sk
   and customer_address.ca_country = 'United States'
   and customer_address.ca_state in ('MT', 'OR', 'IN')
   and store_sales.ss_net_profit between 150 and 300  
  ) or
  (store_sales.ss_addr_sk = customer_address.ca_address_sk
   and customer_address.ca_country = 'United States'
   and customer_address.ca_state in ('WI', 'MO', 'WV')
   and store_sales.ss_net_profit between 50 and 250  
  ))
 ;
 {code}
 This is the plan currently generated without any predicate simplification 
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 7 - Map 8 (BROADCAST_EDGE)
 Map 8 - Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE)
 Reducer 2 - Map 1 (SIMPLE_EDGE), Map 4 (BROADCAST_EDGE), Map 7 
 (SIMPLE_EDGE)
 Reducer 3 - Reducer 2 (SIMPLE_EDGE)
   DagName: mmokhtar_20140828155050_7059c24b-501b-4683-86c0-4f3c023f0b0e:1
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
   alias: customer_address
   Statistics: Num rows: 4000 Data size: 40595195284 Basic 
 stats: COMPLETE Column stats: NONE
   Select Operator
 expressions: ca_address_sk (type: int), ca_state (type: 
 string), ca_country (type: string)
 outputColumnNames: _col0, _col1, _col2
 Statistics: Num rows: 4000 Data size: 40595195284 
 Basic stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   sort order: 
   Statistics: Num rows: 4000 Data size: 40595195284 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col1 (type: 
 string), _col2 (type: string)
 Execution mode: vectorized
 Map 4 
 Map Operator Tree:
 TableScan
   alias: date_dim
   filterExpr: ((d_year = 2001) and d_date_sk is not null) 
 (type: boolean)
   Statistics: Num rows: 73049 Data size: 81741831 Basic 
 stats: COMPLETE Column stats: NONE
   Filter Operator
 predicate: ((d_year = 2001) and d_date_sk is not null)

[jira] [Created] (HIVE-7914) Simplify join predicates for CBO to avoid cross products

Mostafa Mokhtar created HIVE-7914:
-

 Summary: Simplify join predicates for CBO to avoid cross products
 Key: HIVE-7914
 URL: https://issues.apache.org/jira/browse/HIVE-7914
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran
 Fix For: 0.14.0


Simplify predicates for disjunctive predicates so that can get pushed down to 
the scan.

For TPC-DS query 13 we push down predicates in the following form 

where c_martial_status in ('M','D','U') etc.. 

{code}
select avg(ss_quantity)
   ,avg(ss_ext_sales_price)
   ,avg(ss_ext_wholesale_cost)
   ,sum(ss_ext_wholesale_cost)
 from store_sales
 ,store
 ,customer_demographics
 ,household_demographics
 ,customer_address
 ,date_dim
 where store.s_store_sk = store_sales.ss_store_sk
 and  store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 
2001
 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'M'
  and customer_demographics.cd_education_status = '4 yr Degree'
  and store_sales.ss_sales_price between 100.00 and 150.00
  and household_demographics.hd_dep_count = 3   
 )or
 (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'D'
  and customer_demographics.cd_education_status = 'Primary'
  and store_sales.ss_sales_price between 50.00 and 100.00   
  and household_demographics.hd_dep_count = 1
 ) or 
 (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'U'
  and customer_demographics.cd_education_status = 'Advanced Degree'
  and store_sales.ss_sales_price between 150.00 and 200.00 
  and household_demographics.hd_dep_count = 1  
 ))
 and((store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('KY', 'GA', 'NM')
  and store_sales.ss_net_profit between 100 and 200  
 ) or
 (store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('MT', 'OR', 'IN')
  and store_sales.ss_net_profit between 150 and 300  
 ) or
 (store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('WI', 'MO', 'WV')
  and store_sales.ss_net_profit between 50 and 250  
 ))
;

{code}


This is the plan currently generated without any predicate simplification 
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Map 7 - Map 8 (BROADCAST_EDGE)
Map 8 - Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE)
Reducer 2 - Map 1 (SIMPLE_EDGE), Map 4 (BROADCAST_EDGE), Map 7 
(SIMPLE_EDGE)
Reducer 3 - Reducer 2 (SIMPLE_EDGE)
  DagName: mmokhtar_20140828155050_7059c24b-501b-4683-86c0-4f3c023f0b0e:1
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: customer_address
  Statistics: Num rows: 4000 Data size: 40595195284 Basic 
stats: COMPLETE Column stats: NONE
  Select Operator
expressions: ca_address_sk (type: int), ca_state (type: 
string), ca_country (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 4000 Data size: 40595195284 Basic 
stats: COMPLETE Column stats: NONE
Reduce Output Operator
  sort order: 
  Statistics: Num rows: 4000 Data size: 40595195284 
Basic stats: COMPLETE Column stats: NONE
  value expressions: _col0 (type: int), _col1 (type: 
string), _col2 (type: string)
Execution mode: vectorized
Map 4 
Map Operator Tree:
TableScan
  alias: date_dim
  filterExpr: ((d_year = 2001) and d_date_sk is not null) 
(type: boolean)
  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: ((d_year = 2001) and d_date_sk is not null) 
(type: boolean)
Statistics: Num rows: 18262 Data size: 20435178 Basic 
stats: COMPLETE Column stats: NONE
Select Operator
  expressions: d_date_sk (type: int)
  outputColumnNames: _col0
  Statistics:

[jira] [Updated] (HIVE-7914) Simplify join predicates for CBO to avoid cross products


 [ 
https://issues.apache.org/jira/browse/HIVE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-7914:
--

Description: 
Simplify join predicates for disjunctive predicates to avoid cross products.

For TPC-DS query 13 we generate a cross products.
The join predicate on (store_sales x customer_demographics) ,  (store_sales x 
household_demographics) and (store_sales x customer_address) can be pull up to 
avoid the cross products


{code}
select avg(ss_quantity)
   ,avg(ss_ext_sales_price)
   ,avg(ss_ext_wholesale_cost)
   ,sum(ss_ext_wholesale_cost)
 from store_sales
 ,store
 ,customer_demographics
 ,household_demographics
 ,customer_address
 ,date_dim
 where store.s_store_sk = store_sales.ss_store_sk
 and  store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 
2001
 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'M'
  and customer_demographics.cd_education_status = '4 yr Degree'
  and store_sales.ss_sales_price between 100.00 and 150.00
  and household_demographics.hd_dep_count = 3   
 )or
 (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'D'
  and customer_demographics.cd_education_status = 'Primary'
  and store_sales.ss_sales_price between 50.00 and 100.00   
  and household_demographics.hd_dep_count = 1
 ) or 
 (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'U'
  and customer_demographics.cd_education_status = 'Advanced Degree'
  and store_sales.ss_sales_price between 150.00 and 200.00 
  and household_demographics.hd_dep_count = 1  
 ))
 and((store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('KY', 'GA', 'NM')
  and store_sales.ss_net_profit between 100 and 200  
 ) or
 (store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('MT', 'OR', 'IN')
  and store_sales.ss_net_profit between 150 and 300  
 ) or
 (store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('WI', 'MO', 'WV')
  and store_sales.ss_net_profit between 50 and 250  
 ))
;

{code}


This is the plan currently generated without any predicate simplification 
{code}
Warning: Map Join MAPJOIN[59][bigTable=?] in task 'Map 8' is a cross product
Warning: Map Join MAPJOIN[58][bigTable=?] in task 'Map 8' is a cross product
Warning: Shuffle Join JOIN[29][tables = [$hdt$_5, $hdt$_6]] in Stage 'Reducer 
2' is a cross product
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Map 7 - Map 8 (BROADCAST_EDGE)
Map 8 - Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE)
Reducer 2 - Map 1 (SIMPLE_EDGE), Map 4 (BROADCAST_EDGE), Map 7 
(SIMPLE_EDGE)
Reducer 3 - Reducer 2 (SIMPLE_EDGE)
  DagName: mmokhtar_20140828155050_7059c24b-501b-4683-86c0-4f3c023f0b0e:1
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: customer_address
  Statistics: Num rows: 4000 Data size: 40595195284 Basic 
stats: COMPLETE Column stats: NONE
  Select Operator
expressions: ca_address_sk (type: int), ca_state (type: 
string), ca_country (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 4000 Data size: 40595195284 Basic 
stats: COMPLETE Column stats: NONE
Reduce Output Operator
  sort order: 
  Statistics: Num rows: 4000 Data size: 40595195284 
Basic stats: COMPLETE Column stats: NONE
  value expressions: _col0 (type: int), _col1 (type: 
string), _col2 (type: string)
Execution mode: vectorized
Map 4 
Map Operator Tree:
TableScan
  alias: date_dim
  filterExpr: ((d_year = 2001) and d_date_sk is not null) 
(type: boolean)
  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: ((d_year = 2001) and d_date_sk is not null) 
(type: boolean)
Statistics: Num rows: 18262 Data size: 20435178 Basic 
stats: COMPLETE Column stats: NONE
Select Operator

[jira] [Updated] (HIVE-7908) CBO: Handle Windowing functions part of expressions

2014-08-29 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7908:
-

Labels: cbo  (was: )

 CBO: Handle Windowing functions part of expressions
 ---

 Key: HIVE-7908
 URL: https://issues.apache.org/jira/browse/HIVE-7908
 Project: Hive
  Issue Type: Bug
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
  Labels: cbo
 Attachments: HIVE-7908.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7908) CBO: Handle Windowing functions part of expressions

2014-08-29 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7908:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to branch. Thanks [~jpullokkaran]!

 CBO: Handle Windowing functions part of expressions
 ---

 Key: HIVE-7908
 URL: https://issues.apache.org/jira/browse/HIVE-7908
 Project: Hive
  Issue Type: Bug
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
  Labels: cbo
 Attachments: HIVE-7908.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7901) CLONE - pig -useHCatalog with embedded metastore fails to pass command line args to metastore (org.apache.hive.hcatalog version)