[jira] [Commented] (HIVE-6500) Stats collection via filesystem

2015-06-15 Thread Damien Carol (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14585683#comment-14585683
 ] 

Damien Carol commented on HIVE-6500:


[~ashutoshc] Did you miss the property *hive.stats.tmp.loc* in 
_common/src/java/org/apache/hadoop/hive/conf/HiveConf.java_ ?

 Stats collection via filesystem
 ---

 Key: HIVE-6500
 URL: https://issues.apache.org/jira/browse/HIVE-6500
 Project: Hive
  Issue Type: New Feature
  Components: Statistics
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.13.0

 Attachments: HIVE-6500.2.patch, HIVE-6500.3.patch, HIVE-6500.patch


 Recently, support for stats gathering via counter was [added | 
 https://issues.apache.org/jira/browse/HIVE-4632] Although, its useful it has 
 following issues:
 * [Length of counter group name is limited | 
 https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340]
 * [Length of counter name is limited | 
 https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337]
 * [Number of distinct counter groups are limited | 
 https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343]
 * [Number of distinct counters are limited | 
 https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334]
 Although, these limits are configurable, but setting them to higher value 
 implies increased memory load on AM and job history server.
 Now, whether these limits makes sense or not is [debatable | 
 https://issues.apache.org/jira/browse/MAPREDUCE-5680] it is desirable that 
 Hive doesn't make use of counters features of framework so that it we can 
 evolve this feature without relying on support from framework. Filesystem 
 based counter collection is a step in that direction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10542) Full outer joins in tez produce incorrect results in certain cases

2015-06-15 Thread Goun Na (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14585725#comment-14585725
 ] 

Goun Na commented on HIVE-10542:


No patch available for Hive 1.1?

 Full outer joins in tez produce incorrect results in certain cases
 --

 Key: HIVE-10542
 URL: https://issues.apache.org/jira/browse/HIVE-10542
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
Priority: Blocker
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-10542.1.patch, HIVE-10542.2.patch, 
 HIVE-10542.3.patch, HIVE-10542.4.patch, HIVE-10542.5.patch, 
 HIVE-10542.6.patch, HIVE-10542.7.patch, HIVE-10542.8.patch, HIVE-10542.9.patch


 If there is no records for one of the tables in the full outer join, we do 
 not read the other input and end up not producing rows which we should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6500) Stats collection via filesystem

2015-06-15 Thread Damien Carol (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14585678#comment-14585678
 ] 

Damien Carol commented on HIVE-6500:


[~leftylev] This JIRA added a new property NOT documented *hive.stats.tmp.loc*
Also this property is not added in hive-default.xml system.

 Stats collection via filesystem
 ---

 Key: HIVE-6500
 URL: https://issues.apache.org/jira/browse/HIVE-6500
 Project: Hive
  Issue Type: New Feature
  Components: Statistics
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.13.0

 Attachments: HIVE-6500.2.patch, HIVE-6500.3.patch, HIVE-6500.patch


 Recently, support for stats gathering via counter was [added | 
 https://issues.apache.org/jira/browse/HIVE-4632] Although, its useful it has 
 following issues:
 * [Length of counter group name is limited | 
 https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340]
 * [Length of counter name is limited | 
 https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337]
 * [Number of distinct counter groups are limited | 
 https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343]
 * [Number of distinct counters are limited | 
 https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334]
 Although, these limits are configurable, but setting them to higher value 
 implies increased memory load on AM and job history server.
 Now, whether these limits makes sense or not is [debatable | 
 https://issues.apache.org/jira/browse/MAPREDUCE-5680] it is desirable that 
 Hive doesn't make use of counters features of framework so that it we can 
 evolve this feature without relying on support from framework. Filesystem 
 based counter collection is a step in that direction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.

2015-06-15 Thread Elliot West (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-10165:
---
Attachment: HIVE-10165.7.patch

 Improve hive-hcatalog-streaming extensibility and support updates and deletes.
 --

 Key: HIVE-10165
 URL: https://issues.apache.org/jira/browse/HIVE-10165
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Elliot West
Assignee: Elliot West
  Labels: streaming_api
 Attachments: HIVE-10165.0.patch, HIVE-10165.4.patch, 
 HIVE-10165.5.patch, HIVE-10165.6.patch, HIVE-10165.7.patch, 
 mutate-system-overview.png


 h3. Overview
 I'd like to extend the 
 [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
  API so that it also supports the writing of record updates and deletes in 
 addition to the already supported inserts.
 h3. Motivation
 We have many Hadoop processes outside of Hive that merge changed facts into 
 existing datasets. Traditionally we achieve this by: reading in a 
 ground-truth dataset and a modified dataset, grouping by a key, sorting by a 
 sequence and then applying a function to determine inserted, updated, and 
 deleted rows. However, in our current scheme we must rewrite all partitions 
 that may potentially contain changes. In practice the number of mutated 
 records is very small when compared with the records contained in a 
 partition. This approach results in a number of operational issues:
 * Excessive amount of write activity required for small data changes.
 * Downstream applications cannot robustly read these datasets while they are 
 being updated.
 * Due to scale of the updates (hundreds or partitions) the scope for 
 contention is high. 
 I believe we can address this problem by instead writing only the changed 
 records to a Hive transactional table. This should drastically reduce the 
 amount of data that we need to write and also provide a means for managing 
 concurrent access to the data. Our existing merge processes can read and 
 retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to 
 an updated form of the hive-hcatalog-streaming API which will then have the 
 required data to perform an update or insert in a transactional manner. 
 h3. Benefits
 * Enables the creation of large-scale dataset merge processes  
 * Opens up Hive transactional functionality in an accessible manner to 
 processes that operate outside of Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6500) Stats collection via filesystem

2015-06-15 Thread Damien Carol (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14585874#comment-14585874
 ] 

Damien Carol commented on HIVE-6500:


Plz ignore my last comment

 Stats collection via filesystem
 ---

 Key: HIVE-6500
 URL: https://issues.apache.org/jira/browse/HIVE-6500
 Project: Hive
  Issue Type: New Feature
  Components: Statistics
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.13.0

 Attachments: HIVE-6500.2.patch, HIVE-6500.3.patch, HIVE-6500.patch


 Recently, support for stats gathering via counter was [added | 
 https://issues.apache.org/jira/browse/HIVE-4632] Although, its useful it has 
 following issues:
 * [Length of counter group name is limited | 
 https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340]
 * [Length of counter name is limited | 
 https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337]
 * [Number of distinct counter groups are limited | 
 https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343]
 * [Number of distinct counters are limited | 
 https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334]
 Although, these limits are configurable, but setting them to higher value 
 implies increased memory load on AM and job history server.
 Now, whether these limits makes sense or not is [debatable | 
 https://issues.apache.org/jira/browse/MAPREDUCE-5680] it is desirable that 
 Hive doesn't make use of counters features of framework so that it we can 
 evolve this feature without relying on support from framework. Filesystem 
 based counter collection is a step in that direction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10989) HoS can't control number of map tasks for runtime skew join [Spark Branch]

2015-06-15 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14585889#comment-14585889
 ] 

Xuefu Zhang commented on HIVE-10989:


Makes sense. +1

 HoS can't control number of map tasks for runtime skew join [Spark Branch]
 --

 Key: HIVE-10989
 URL: https://issues.apache.org/jira/browse/HIVE-10989
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-10989.1-spark.patch


 Flags {{hive.skewjoin.mapjoin.map.tasks}} and 
 {{hive.skewjoin.mapjoin.min.split}} are used to control the number of map 
 tasks for the map join of runtime skew join. They work well for MR but have 
 no effect for spark.
 This makes runtime skew join less useful, i.e. we just end up with slow 
 mappers instead of reducers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10754) Pig+Hcatalog doesn't work properly since we need to clone the Job instance in HCatLoader

2015-06-15 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586087#comment-14586087
 ] 

Aihua Xu commented on HIVE-10754:
-

[~mithun] Sorry for the late reply. Busy with something else. Seems it's hadoop 
version related issue.

Would it be fair to update all the calls in HCatalog to use the new 
getInstance() since it's deprecated anyway? If you agree, I will use this jira 
to do that and I will update the title to reflect it.

 Pig+Hcatalog doesn't work properly since we need to clone the Job instance in 
 HCatLoader
 

 Key: HIVE-10754
 URL: https://issues.apache.org/jira/browse/HIVE-10754
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10754.patch


 {noformat}
 Create table tbl1 (key string, value string) stored as rcfile;
 Create table tbl2 (key string, value string);
 insert into tbl1 values( '1', '111');
 insert into tbl2 values('1', '2');
 {noformat}
 Pig script:
 {noformat}
 src_tbl1 = FILTER tbl1 BY (key == '1');
 prj_tbl1 = FOREACH src_tbl1 GENERATE
key as tbl1_key,
value as tbl1_value,
'333' as tbl1_v1;

 src_tbl2 = FILTER tbl2 BY (key == '1');
 prj_tbl2 = FOREACH src_tbl2 GENERATE
key as tbl2_key,
value as tbl2_value;

 dump prj_tbl1;
 dump prj_tbl2;
 result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key);
 prj_result = FOREACH result 
   GENERATE  prj_tbl1::tbl1_key AS key1,
 prj_tbl1::tbl1_value AS value1,
 prj_tbl1::tbl1_v1 AS v1,
 prj_tbl2::tbl2_key AS key2,
 prj_tbl2::tbl2_value AS value2;

 dump prj_result;
 {noformat}
 The expected result is (1,111,333,1,2) while the result is (1,2,333,1,2).  We 
 need to clone the job instance in HCatLoader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11004) PermGen OOM error in Hiveserver2

2015-06-15 Thread Martin Benson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Benson updated HIVE-11004:
-
Summary: PermGen OOM error in Hiveserver2  (was: PermGen)

 PermGen OOM error in Hiveserver2
 

 Key: HIVE-11004
 URL: https://issues.apache.org/jira/browse/HIVE-11004
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 1.1.0
 Environment: cdh 5.4
Reporter: Martin Benson
Priority: Critical

 Periodically Hiveserver2 will become unresponsive and looking in the logs 
 there is the following error:
 2:28:22.965 PMERROR   org.apache.hadoop.hive.ql.io.orc.OrcInputFormat 
 Unexpected Exception
 java.lang.OutOfMemoryError: PermGen space
 2:28:22.969 PMWARN
 org.apache.hive.service.cli.thrift.ThriftCLIService 
 Error fetching results: 
 org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
 java.lang.RuntimeException: serious problem
   at 
 org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:343)
   at 
 org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:250)
   at 
 org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:656)
   at 
 org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:451)
   at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
   at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: java.lang.RuntimeException: serious problem
   at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507)
   at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1655)
   at 
 org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:338)
   ... 13 more
 Caused by: java.lang.RuntimeException: serious problem
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.waitForTasks(OrcInputFormat.java:478)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:944)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:969)
   at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:362)
   at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:294)
   at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445)
   ... 17 more
 Caused by: java.lang.OutOfMemoryError: PermGen space
 There does not appear to be an obvious trigger for this (other than the fact 
 that the error mentions ORC). If further details would be helpful in 
 diagnosing the issue please let me know and I'll supply them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10972) DummyTxnManager always locks the current database in shared mode, which is incorrect.

2015-06-15 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586157#comment-14586157
 ] 

Aihua Xu commented on HIVE-10972:
-

The tests are not related.

 DummyTxnManager always locks the current database in shared mode, which is 
 incorrect.
 -

 Key: HIVE-10972
 URL: https://issues.apache.org/jira/browse/HIVE-10972
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 2.0.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10972.2.patch, HIVE-10972.patch


 In DummyTxnManager [line 163 | 
 http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java#163],
  it always locks the current database. 
 That is not correct since the current database can be db1, and the query 
 can be select * from db2.tb1, which will lock db1 unnecessarily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.

2015-06-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586234#comment-14586234
 ] 

Hive QA commented on HIVE-10165:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739615/HIVE-10165.7.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9085 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4268/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4268/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4268/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12739615 - PreCommit-HIVE-TRUNK-Build

 Improve hive-hcatalog-streaming extensibility and support updates and deletes.
 --

 Key: HIVE-10165
 URL: https://issues.apache.org/jira/browse/HIVE-10165
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Elliot West
Assignee: Elliot West
  Labels: streaming_api
 Attachments: HIVE-10165.0.patch, HIVE-10165.4.patch, 
 HIVE-10165.5.patch, HIVE-10165.6.patch, HIVE-10165.7.patch, 
 mutate-system-overview.png


 h3. Overview
 I'd like to extend the 
 [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
  API so that it also supports the writing of record updates and deletes in 
 addition to the already supported inserts.
 h3. Motivation
 We have many Hadoop processes outside of Hive that merge changed facts into 
 existing datasets. Traditionally we achieve this by: reading in a 
 ground-truth dataset and a modified dataset, grouping by a key, sorting by a 
 sequence and then applying a function to determine inserted, updated, and 
 deleted rows. However, in our current scheme we must rewrite all partitions 
 that may potentially contain changes. In practice the number of mutated 
 records is very small when compared with the records contained in a 
 partition. This approach results in a number of operational issues:
 * Excessive amount of write activity required for small data changes.
 * Downstream applications cannot robustly read these datasets while they are 
 being updated.
 * Due to scale of the updates (hundreds or partitions) the scope for 
 contention is high. 
 I believe we can address this problem by instead writing only the changed 
 records to a Hive transactional table. This should drastically reduce the 
 amount of data that we need to write and also provide a means for managing 
 concurrent access to the data. Our existing merge processes can read and 
 retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to 
 an updated form of the hive-hcatalog-streaming API which will then have the 
 required data to perform an update or insert in a transactional manner. 
 h3. Benefits
 * Enables the creation of large-scale dataset merge processes  
 * Opens up Hive transactional functionality in an accessible manner to 
 processes that operate outside of Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10972) DummyTxnManager always locks the current database in shared mode, which is incorrect.

2015-06-15 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586162#comment-14586162
 ] 

Aihua Xu commented on HIVE-10972:
-

[~alangates]  Seems you worked on the initial version? Can you also take a look 
at the change to see if it will cause any issue?

 DummyTxnManager always locks the current database in shared mode, which is 
 incorrect.
 -

 Key: HIVE-10972
 URL: https://issues.apache.org/jira/browse/HIVE-10972
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 2.0.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10972.2.patch, HIVE-10972.patch


 In DummyTxnManager [line 163 | 
 http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java#163],
  it always locks the current database. 
 That is not correct since the current database can be db1, and the query 
 can be select * from db2.tb1, which will lock db1 unnecessarily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others

2015-06-15 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586098#comment-14586098
 ] 

Chaoyu Tang commented on HIVE-7018:
---

[~ychena] Looks like the HMS upgrade test failed, do you know the reason?

 Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
 not others
 -

 Key: HIVE-7018
 URL: https://issues.apache.org/jira/browse/HIVE-7018
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Yongzhi Chen
 Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch, HIVE-7018.3.patch, 
 HIVE-7018.4.patch


 It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
 column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11012) LLAP: fix some tests in the branch and revert incorrectly committed changed out files (from HIVE-11014)

2015-06-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11012:

Summary: LLAP: fix some tests in the branch and revert incorrectly 
committed changed out files (from HIVE-11014)  (was: LLAP: fix some tests in 
the branch)

 LLAP: fix some tests in the branch and revert incorrectly committed changed 
 out files (from HIVE-11014)
 ---

 Key: HIVE-11012
 URL: https://issues.apache.org/jira/browse/HIVE-11012
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 I am assigning some new issues to people and fixing whatever random issues 
 from HIVE-10997. So far fixed all the TestLocationQueries/MtQueries, 
 list_bucket* Kryo exception, and some tez NPE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11014) LLAP: MiniTez vector_binary_join_groupby, vector_outer_join1, vector_outer_join2 and cbo_windowing tests have result changes compared to master

2015-06-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11014:

Summary: LLAP: MiniTez vector_binary_join_groupby, vector_outer_join1, 
vector_outer_join2 and cbo_windowing tests have result changes compared to 
master  (was: LLAP: MiniTez vector_binary_join_groupby test has result changes 
compared to master)

 LLAP: MiniTez vector_binary_join_groupby, vector_outer_join1, 
 vector_outer_join2 and cbo_windowing tests have result changes compared to 
 master
 ---

 Key: HIVE-11014
 URL: https://issues.apache.org/jira/browse/HIVE-11014
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Matt McCline





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11014) LLAP: MiniTez vector_binary_join_groupby, vector_outer_join1, vector_outer_join2 and cbo_windowing tests have result changes compared to master

2015-06-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587254#comment-14587254
 ] 

Sergey Shelukhin commented on HIVE-11014:
-

Feel free to create separate jiras if changes are for different reasons

 LLAP: MiniTez vector_binary_join_groupby, vector_outer_join1, 
 vector_outer_join2 and cbo_windowing tests have result changes compared to 
 master
 ---

 Key: HIVE-11014
 URL: https://issues.apache.org/jira/browse/HIVE-11014
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Matt McCline





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11012) LLAP: fix some tests in the branch and revert incorrectly committed changed out files (from HIVE-11014)

2015-06-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-11012.
-
   Resolution: Fixed
Fix Version/s: llap

committed to branch

 LLAP: fix some tests in the branch and revert incorrectly committed changed 
 out files (from HIVE-11014)
 ---

 Key: HIVE-11012
 URL: https://issues.apache.org/jira/browse/HIVE-11012
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: llap


 I am assigning some new issues to people and fixing whatever random issues 
 from HIVE-10997. So far fixed all the TestLocationQueries/MtQueries, 
 list_bucket* Kryo exception, and some tez NPE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11018) Turn on cbo in more q files

2015-06-15 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11018:

Attachment: HIVE-11018.patch

No code changes. Only test changes.

 Turn on cbo in more q files
 ---

 Key: HIVE-11018
 URL: https://issues.apache.org/jira/browse/HIVE-11018
 Project: Hive
  Issue Type: Task
  Components: Tests
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-11018.patch


 There are few tests in which cbo was turned off for various reasons. Those 
 reasons don't exists anymore. For those tests, we should turn on cbo. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10233) Hive on LLAP: Memory manager

2015-06-15 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-10233:
-
Attachment: HIVE-10233-WIP-8.patch

Upload WIP-8 patch for join only MM.

 Hive on LLAP: Memory manager
 

 Key: HIVE-10233
 URL: https://issues.apache.org/jira/browse/HIVE-10233
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: llap
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, 
 HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, 
 HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch


 We need a memory manager in llap/tez to manage the usage of memory across 
 threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10991) CBO: Calcite Operator To Hive Operator (Calcite Return Path): NonBlockingOpDeDupProc did not kick in rcfile_merge2.q

2015-06-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-10991:
---
Attachment: HIVE-10991.patch

 CBO: Calcite Operator To Hive Operator (Calcite Return Path): 
 NonBlockingOpDeDupProc did not kick in rcfile_merge2.q
 

 Key: HIVE-10991
 URL: https://issues.apache.org/jira/browse/HIVE-10991
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-10991.patch


 NonBlockingOpDeDupProc did not kick in rcfile_merge2.q in return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10991) CBO: Calcite Operator To Hive Operator (Calcite Return Path): NonBlockingOpDeDupProc did not kick in rcfile_merge2.q

2015-06-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-10991:
--

Assignee: Jesus Camacho Rodriguez  (was: Pengcheng Xiong)

 CBO: Calcite Operator To Hive Operator (Calcite Return Path): 
 NonBlockingOpDeDupProc did not kick in rcfile_merge2.q
 

 Key: HIVE-10991
 URL: https://issues.apache.org/jira/browse/HIVE-10991
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-10991.patch


 NonBlockingOpDeDupProc did not kick in rcfile_merge2.q in return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11005) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : Regression on the latest master

2015-06-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11005:
---
Assignee: Jesus Camacho Rodriguez

 CBO: Calcite Operator To Hive Operator (Calcite Return Path) : Regression on 
 the latest master
 --

 Key: HIVE-11005
 URL: https://issues.apache.org/jira/browse/HIVE-11005
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Jesus Camacho Rodriguez

 Test cbo_join.q and cbo_views.q on return path failed. Part of the stack 
 trace is 
 {code}
 2015-06-15 09:51:53,377 ERROR [main]: parse.CalcitePlanner 
 (CalcitePlanner.java:genOPTree(282)) - CBO failed, skipping CBO.
 java.lang.IndexOutOfBoundsException: index (0) must be less than size (0)
 at 
 com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305)
 at 
 com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284)
 at 
 com.google.common.collect.EmptyImmutableList.get(EmptyImmutableList.java:80)
 at 
 org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveInsertExchange4JoinRule.onMatch(HiveInsertExchange4JoinRule.java:101)
 at 
 org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:326)
 at 
 org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:515)
 at 
 org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:392)
 at 
 org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:255)
 at 
 org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:125)
 at 
 org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:207)
 at 
 org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:194)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:888)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:771)
 at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109)
 at 
 org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:876)
 at 
 org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10533) CBO (Calcite Return Path): Join to MultiJoin support for outer joins

2015-06-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-10533:
---
Attachment: HIVE-10533.03.patch

 CBO (Calcite Return Path): Join to MultiJoin support for outer joins
 

 Key: HIVE-10533
 URL: https://issues.apache.org/jira/browse/HIVE-10533
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-10533.01.patch, HIVE-10533.02.patch, 
 HIVE-10533.02.patch, HIVE-10533.03.patch, HIVE-10533.patch


 CBO return path: auto_join7.q can be used to reproduce the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11004) PermGen OOM error in Hiveserver2

2015-06-15 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586438#comment-14586438
 ] 

Mostafa Mokhtar commented on HIVE-11004:


[~martinbenson]
Try setting hive.orc.cache.stripe.details.size=-1 and restart HS2. 

 PermGen OOM error in Hiveserver2
 

 Key: HIVE-11004
 URL: https://issues.apache.org/jira/browse/HIVE-11004
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 1.1.0
 Environment: cdh 5.4
Reporter: Martin Benson
Priority: Critical

 Periodically Hiveserver2 will become unresponsive and looking in the logs 
 there is the following error:
 2:28:22.965 PMERROR   org.apache.hadoop.hive.ql.io.orc.OrcInputFormat 
 Unexpected Exception
 java.lang.OutOfMemoryError: PermGen space
 2:28:22.969 PMWARN
 org.apache.hive.service.cli.thrift.ThriftCLIService 
 Error fetching results: 
 org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
 java.lang.RuntimeException: serious problem
   at 
 org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:343)
   at 
 org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:250)
   at 
 org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:656)
   at 
 org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:451)
   at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
   at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: java.lang.RuntimeException: serious problem
   at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507)
   at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1655)
   at 
 org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:338)
   ... 13 more
 Caused by: java.lang.RuntimeException: serious problem
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.waitForTasks(OrcInputFormat.java:478)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:944)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:969)
   at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:362)
   at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:294)
   at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445)
   ... 17 more
 Caused by: java.lang.OutOfMemoryError: PermGen space
 There does not appear to be an obvious trigger for this (other than the fact 
 that the error mentions ORC). If further details would be helpful in 
 diagnosing the issue please let me know and I'll supply them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11007) CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL

2015-06-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11007:
---
Attachment: HIVE-11007.01.patch

 CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's 
 mapInputToDP should depends on the last SEL
 -

 Key: HIVE-11007
 URL: https://issues.apache.org/jira/browse/HIVE-11007
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11007.01.patch


 In dynamic partitioning case, for example, we are going to have 
 TS0-SEL1-SEL2-FS3. The dpCtx's mapInputToDP is populated by SEL1 rather than 
 SEL2, which causes error in return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10984) Lock table explicit lock command doesn't lock the database object.

2015-06-15 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10984:

Description: 
There is an issue in ZooKeeperHiveLockManager.java, in which when locking 
exclusively on a table, it doesn't lock the database object (which does if it's 
from the query).
The current implementation of ZooKeeperHiveLockManager will lock the the object 
and the parents, and won't check the children when it tries to acquire lock on 
certain object. Then it will cause the following scenario which should not be 
allowed but right now it goes through.

{noformat}
use default; 
lock table db1.tbl1 shared; 
lock database db1 exclusive;
{noformat}

Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable 
failure cases.


  was:
There is an issue in ZooKeeperHiveLockManager.java, in which when locking 
exclusively on an object we didn't check if the children are locked.

So the following should not be allowed.
{noformat}
use default; 
lock table lockneg2.tstsrcpart shared; 
lock database lockneg2 exclusive;
{noformat}

Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable 
failure cases.


 Lock table explicit lock command doesn't lock the database object.
 

 Key: HIVE-10984
 URL: https://issues.apache.org/jira/browse/HIVE-10984
 Project: Hive
  Issue Type: Bug
  Components: Locking
Reporter: Aihua Xu
Assignee: Aihua Xu

 There is an issue in ZooKeeperHiveLockManager.java, in which when locking 
 exclusively on a table, it doesn't lock the database object (which does if 
 it's from the query).
 The current implementation of ZooKeeperHiveLockManager will lock the the 
 object and the parents, and won't check the children when it tries to acquire 
 lock on certain object. Then it will cause the following scenario which 
 should not be allowed but right now it goes through.
 {noformat}
 use default; 
 lock table db1.tbl1 shared; 
 lock database db1 exclusive;
 {noformat}
 Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable 
 failure cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10884) Enable some beeline tests and turn on HIVE-4239 by default

2015-06-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10884:

Attachment: HIVE-10884.05.patch

Beeline tests weren't attempted. Attempting to remove the exclude from 
hivetest...

 Enable some beeline tests and turn on HIVE-4239 by default
 --

 Key: HIVE-10884
 URL: https://issues.apache.org/jira/browse/HIVE-10884
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-10884.01.patch, HIVE-10884.02.patch, 
 HIVE-10884.03.patch, HIVE-10884.04.patch, HIVE-10884.05.patch, 
 HIVE-10884.patch


 See comments in HIVE-4239.
 Beeline tests with parallelism need to be enabled to turn compilation 
 parallelism on by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-10884) Enable some beeline tests and turn on HIVE-4239 by default

2015-06-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10884:

Comment: was deleted

(was: It looks like the instrumentation needs to be updated to run beeline 
tests... )

 Enable some beeline tests and turn on HIVE-4239 by default
 --

 Key: HIVE-10884
 URL: https://issues.apache.org/jira/browse/HIVE-10884
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-10884.01.patch, HIVE-10884.02.patch, 
 HIVE-10884.03.patch, HIVE-10884.04.patch, HIVE-10884.05.patch, 
 HIVE-10884.patch


 See comments in HIVE-4239.
 Beeline tests with parallelism need to be enabled to turn compilation 
 parallelism on by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10884) Enable some beeline tests and turn on HIVE-4239 by default

2015-06-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586674#comment-14586674
 ] 

Sergey Shelukhin commented on HIVE-10884:
-

It looks like the instrumentation needs to be updated to run beeline tests... 

 Enable some beeline tests and turn on HIVE-4239 by default
 --

 Key: HIVE-10884
 URL: https://issues.apache.org/jira/browse/HIVE-10884
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-10884.01.patch, HIVE-10884.02.patch, 
 HIVE-10884.03.patch, HIVE-10884.04.patch, HIVE-10884.05.patch, 
 HIVE-10884.patch


 See comments in HIVE-4239.
 Beeline tests with parallelism need to be enabled to turn compilation 
 parallelism on by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11006) improve logging wrt ACID module

2015-06-15 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11006:
--
Attachment: HIVE-11006.patch

[~alangates] could you review please

 improve logging wrt ACID module
 ---

 Key: HIVE-11006
 URL: https://issues.apache.org/jira/browse/HIVE-11006
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11006.patch


 especially around metastore DB operations (TxnHandler) which are retried or 
 fail for some reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10984) Lock table explicit lock command doesn't lock the database object.

2015-06-15 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10984:

Summary: Lock table explicit lock command doesn't lock the database 
object.  (was: When ZooKeeperHiveLockManager locks an object exclusively, it 
doesn't check the lock on the children.)

 Lock table explicit lock command doesn't lock the database object.
 

 Key: HIVE-10984
 URL: https://issues.apache.org/jira/browse/HIVE-10984
 Project: Hive
  Issue Type: Bug
  Components: Locking
Reporter: Aihua Xu
Assignee: Aihua Xu

 There is an issue in ZooKeeperHiveLockManager.java, in which when locking 
 exclusively on an object we didn't check if the children are locked.
 So the following should not be allowed.
 {noformat}
 use default; 
 lock table lockneg2.tstsrcpart shared; 
 lock database lockneg2 exclusive;
 {noformat}
 Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable 
 failure cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10991) CBO: Calcite Operator To Hive Operator (Calcite Return Path): NonBlockingOpDeDupProc did not kick in rcfile_merge2.q

2015-06-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586613#comment-14586613
 ] 

Hive QA commented on HIVE-10991:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739659/HIVE-10991.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9008 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4269/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4269/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4269/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12739659 - PreCommit-HIVE-TRUNK-Build

 CBO: Calcite Operator To Hive Operator (Calcite Return Path): 
 NonBlockingOpDeDupProc did not kick in rcfile_merge2.q
 

 Key: HIVE-10991
 URL: https://issues.apache.org/jira/browse/HIVE-10991
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-10991.patch


 NonBlockingOpDeDupProc did not kick in rcfile_merge2.q in return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11007) CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL

2015-06-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11007:
---
Attachment: (was: HIVE-11007.01.patch)

 CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's 
 mapInputToDP should depends on the last SEL
 -

 Key: HIVE-11007
 URL: https://issues.apache.org/jira/browse/HIVE-11007
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11007.01.patch


 In dynamic partitioning case, for example, we are going to have 
 TS0-SEL1-SEL2-FS3. The dpCtx's mapInputToDP is populated by SEL1 rather than 
 SEL2, which causes error in return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10685) Alter table concatenate oparetor will cause duplicate data

2015-06-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10685:
-
Attachment: (was: HIVE-10685.1.patch)

 Alter table concatenate oparetor will cause duplicate data
 --

 Key: HIVE-10685
 URL: https://issues.apache.org/jira/browse/HIVE-10685
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 1.2.1
Reporter: guoliming
Assignee: guoliming
Priority: Critical
 Fix For: 1.2.0, 1.1.0

 Attachments: HIVE-10685.patch


 Orders table has 15 rows and stored as ORC. 
 {noformat}
 hive select count(*) from orders;
 OK
 15
 Time taken: 37.692 seconds, Fetched: 1 row(s)
 {noformat}
 The table contain 14 files,the size of each file is about 2.1 ~ 3.2 GB.
 After executing command : ALTER TABLE orders CONCATENATE;
 The table is already 1530115000 rows.
 My hive version is 1.1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10986) Check of fs.trash.interval in HiveMetaStore should be consistent with Trash.moveToAppropriateTrash()

2015-06-15 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-10986:
--
Attachment: HIVE-10986.patch

 Check of fs.trash.interval in HiveMetaStore should be consistent with 
 Trash.moveToAppropriateTrash()
 

 Key: HIVE-10986
 URL: https://issues.apache.org/jira/browse/HIVE-10986
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Affects Versions: 1.2.1
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-10986.patch


 This is a followup to HIVE-10629.
 Trash.moveToAppropriateTrash() takes core-site.xml but HiveMetaStore checks 
 hiveConf which is a problem when they disagree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11007) CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL

2015-06-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11007:
---
Attachment: HIVE-11007.01.patch

 CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's 
 mapInputToDP should depends on the last SEL
 -

 Key: HIVE-11007
 URL: https://issues.apache.org/jira/browse/HIVE-11007
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11007.01.patch


 In dynamic partitioning case, for example, we are going to have 
 TS0-SEL1-SEL2-FS3. The dpCtx's mapInputToDP is populated by SEL1 rather than 
 SEL2, which causes error in return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10972) DummyTxnManager always locks the current database in shared mode, which is incorrect.

2015-06-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586753#comment-14586753
 ] 

Alan Gates commented on HIVE-10972:
---

Yes, I'll take a look.

 DummyTxnManager always locks the current database in shared mode, which is 
 incorrect.
 -

 Key: HIVE-10972
 URL: https://issues.apache.org/jira/browse/HIVE-10972
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 2.0.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10972.2.patch, HIVE-10972.patch


 In DummyTxnManager [line 163 | 
 http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java#163],
  it always locks the current database. 
 That is not correct since the current database can be db1, and the query 
 can be select * from db2.tb1, which will lock db1 unnecessarily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-15 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-10996:
--
Assignee: Jesus Camacho Rodriguez

 Aggregation / Projection over Multi-Join Inner Query producing incorrect 
 results
 

 Key: HIVE-10996
 URL: https://issues.apache.org/jira/browse/HIVE-10996
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.0.0, 1.2.0, 1.1.0
Reporter: Gautam Kowshik
Assignee: Jesus Camacho Rodriguez
Priority: Minor
 Attachments: explain_q1.txt, explain_q2.txt


 We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
 a regression.
 The following query (Q1) produces no results:
 {code}
 select s
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 {code}
 While this one (Q2) does produce results :
 {code}
 select *
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 1 21  20  Bob 1234
 1 31  30  Bob 1234
 3 51  50  Jeff1234
 {code}
 The setup to test this is:
 {code}
 create table purchase_history (s string, product string, price double, 
 timestamp int);
 insert into purchase_history values ('1', 'Belt', 20.00, 21);
 insert into purchase_history values ('1', 'Socks', 3.50, 31);
 insert into purchase_history values ('3', 'Belt', 20.00, 51);
 insert into purchase_history values ('4', 'Shirt', 15.50, 59);
 create table cart_history (s string, cart_id int, timestamp int);
 insert into cart_history values ('1', 1, 10);
 insert into cart_history values ('1', 2, 20);
 insert into cart_history values ('1', 3, 30);
 insert into cart_history values ('1', 4, 40);
 insert into cart_history values ('3', 5, 50);
 insert into cart_history values ('4', 6, 60);
 create table events (s string, st2 string, n int, timestamp int);
 insert into events values ('1', 'Bob', 1234, 20);
 insert into events values ('1', 'Bob', 1234, 30);
 insert into events values ('1', 'Bob', 1234, 25);
 insert into events values ('2', 'Sam', 1234, 30);
 insert into events values ('3', 'Jeff', 1234, 50);
 insert into events values ('4', 'Ted', 1234, 60);
 {code}
 I realize select * and select s are not all that interesting in this context 
 but what lead us to this issue was select count(distinct s) was not returning 
 results. The above queries are the simplified queries that produce the issue. 
 I will note that if I convert the inner join to a table and select from that 
 the issue does not appear.
 Update: Found that turning off  hive.optimize.remove.identity.project fixes 
 this issue. This optimization was introduced in 
 https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-15 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586978#comment-14586978
 ] 

Laljo John Pullokkaran commented on HIVE-10996:
---

[~jcamachorodriguez] Could you take a look? seems like related to DT removal 
HIVE-8435.

 Aggregation / Projection over Multi-Join Inner Query producing incorrect 
 results
 

 Key: HIVE-10996
 URL: https://issues.apache.org/jira/browse/HIVE-10996
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.0.0, 1.2.0, 1.1.0
Reporter: Gautam Kowshik
Priority: Minor
 Attachments: explain_q1.txt, explain_q2.txt


 We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
 a regression.
 The following query (Q1) produces no results:
 {code}
 select s
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 {code}
 While this one (Q2) does produce results :
 {code}
 select *
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 1 21  20  Bob 1234
 1 31  30  Bob 1234
 3 51  50  Jeff1234
 {code}
 The setup to test this is:
 {code}
 create table purchase_history (s string, product string, price double, 
 timestamp int);
 insert into purchase_history values ('1', 'Belt', 20.00, 21);
 insert into purchase_history values ('1', 'Socks', 3.50, 31);
 insert into purchase_history values ('3', 'Belt', 20.00, 51);
 insert into purchase_history values ('4', 'Shirt', 15.50, 59);
 create table cart_history (s string, cart_id int, timestamp int);
 insert into cart_history values ('1', 1, 10);
 insert into cart_history values ('1', 2, 20);
 insert into cart_history values ('1', 3, 30);
 insert into cart_history values ('1', 4, 40);
 insert into cart_history values ('3', 5, 50);
 insert into cart_history values ('4', 6, 60);
 create table events (s string, st2 string, n int, timestamp int);
 insert into events values ('1', 'Bob', 1234, 20);
 insert into events values ('1', 'Bob', 1234, 30);
 insert into events values ('1', 'Bob', 1234, 25);
 insert into events values ('2', 'Sam', 1234, 30);
 insert into events values ('3', 'Jeff', 1234, 50);
 insert into events values ('4', 'Ted', 1234, 60);
 {code}
 I realize select * and select s are not all that interesting in this context 
 but what lead us to this issue was select count(distinct s) was not returning 
 results. The above queries are the simplified queries that produce the issue. 
 I will note that if I convert the inner join to a table and select from that 
 the issue does not appear.
 Update: Found that turning off  hive.optimize.remove.identity.project fixes 
 this issue. This optimization was introduced in 
 https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-15 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-10996:
--
Priority: Critical  (was: Minor)

 Aggregation / Projection over Multi-Join Inner Query producing incorrect 
 results
 

 Key: HIVE-10996
 URL: https://issues.apache.org/jira/browse/HIVE-10996
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.0.0, 1.2.0, 1.1.0
Reporter: Gautam Kowshik
Assignee: Jesus Camacho Rodriguez
Priority: Critical
 Attachments: explain_q1.txt, explain_q2.txt


 We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
 a regression.
 The following query (Q1) produces no results:
 {code}
 select s
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 {code}
 While this one (Q2) does produce results :
 {code}
 select *
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 1 21  20  Bob 1234
 1 31  30  Bob 1234
 3 51  50  Jeff1234
 {code}
 The setup to test this is:
 {code}
 create table purchase_history (s string, product string, price double, 
 timestamp int);
 insert into purchase_history values ('1', 'Belt', 20.00, 21);
 insert into purchase_history values ('1', 'Socks', 3.50, 31);
 insert into purchase_history values ('3', 'Belt', 20.00, 51);
 insert into purchase_history values ('4', 'Shirt', 15.50, 59);
 create table cart_history (s string, cart_id int, timestamp int);
 insert into cart_history values ('1', 1, 10);
 insert into cart_history values ('1', 2, 20);
 insert into cart_history values ('1', 3, 30);
 insert into cart_history values ('1', 4, 40);
 insert into cart_history values ('3', 5, 50);
 insert into cart_history values ('4', 6, 60);
 create table events (s string, st2 string, n int, timestamp int);
 insert into events values ('1', 'Bob', 1234, 20);
 insert into events values ('1', 'Bob', 1234, 30);
 insert into events values ('1', 'Bob', 1234, 25);
 insert into events values ('2', 'Sam', 1234, 30);
 insert into events values ('3', 'Jeff', 1234, 50);
 insert into events values ('4', 'Ted', 1234, 60);
 {code}
 I realize select * and select s are not all that interesting in this context 
 but what lead us to this issue was select count(distinct s) was not returning 
 results. The above queries are the simplified queries that produce the issue. 
 I will note that if I convert the inner join to a table and select from that 
 the issue does not appear.
 Update: Found that turning off  hive.optimize.remove.identity.project fixes 
 this issue. This optimization was introduced in 
 https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-15 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586996#comment-14586996
 ] 

Prasanth Jayachandran commented on HIVE-10940:
--

Patch mostly looks good. Although it will be good to add some debug logging 
after each if null checks. Also from simple reference look up we don't seem be 
using textual representation of the filter expression anywhere. I don't think 
we need to set the text representation of filter expression. If we need text 
representation we have methods in PlanUtils to do so.

[~ashutoshc]/[~gopalv] Any idea why we set the filter expression in text form 
to job conf?

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587018#comment-14587018
 ] 

Sergey Shelukhin commented on HIVE-10940:
-

text representation is preserved for backward compat (if you mean the original 
one we used to serialize). Will add logging

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10107) Union All : Vertex missing stats resulting in OOM and in-efficient plans

2015-06-15 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-10107:
---
Fix Version/s: 1.2.1

 Union All : Vertex missing stats resulting in OOM and in-efficient plans
 

 Key: HIVE-10107
 URL: https://issues.apache.org/jira/browse/HIVE-10107
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
 Fix For: 1.2.1


 Reducer Vertices sending data to a Union all edge are missing statistics and 
 as a result we either use very few reducers in the UNION ALL edge or decide 
 to broadcast the results of UNION ALL.
 Query
 {code}
 select 
 count(*) rowcount
 from
 (select 
 ss_item_sk, ss_ticket_number, ss_store_sk
 from
 store_sales a, store_returns b
 where
 a.ss_item_sk = b.sr_item_sk
 and a.ss_ticket_number = b.sr_ticket_number union all select 
 ss_item_sk, ss_ticket_number, ss_store_sk
 from
 store_sales c, store_returns d
 where
 c.ss_item_sk = d.sr_item_sk
 and c.ss_ticket_number = d.sr_ticket_number) t
 group by t.ss_store_sk , t.ss_item_sk , t.ss_ticket_number
 having rowcount  1;
 {code}
 Plan snippet 
 {code}
  Edges:
 Reducer 2 - Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 
 (CONTAINS)
 Reducer 4 - Union 3 (SIMPLE_EDGE)
 Reducer 7 - Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 
 (CONTAINS)
   Reducer 4
 Reduce Operator Tree:
   Group By Operator
 aggregations: count(VALUE._col0)
 keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 
 (type: int)
 mode: mergepartial
 outputColumnNames: _col0, _col1, _col2, _col3
 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: COMPLETE
 Filter Operator
   predicate: (_col3  1) (type: boolean)
   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
 Column stats: COMPLETE
   Select Operator
 expressions: _col3 (type: bigint)
 outputColumnNames: _col0
 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
 Column stats: COMPLETE
 File Output Operator
   compressed: false
   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
 Column stats: COMPLETE
   table:
   input format: 
 org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   serde: 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
 Reducer 7
 Reduce Operator Tree:
   Merge Join Operator
 condition map:
  Inner Join 0 to 1
 keys:
   0 ss_item_sk (type: int), ss_ticket_number (type: int)
   1 sr_item_sk (type: int), sr_ticket_number (type: int)
 outputColumnNames: _col1, _col6, _col8, _col27, _col34
 Filter Operator
   predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: 
 boolean)
   Select Operator
 expressions: _col1 (type: int), _col8 (type: int), _col6 
 (type: int)
 outputColumnNames: _col0, _col1, _col2
 Group By Operator
   aggregations: count()
   keys: _col2 (type: int), _col0 (type: int), _col1 
 (type: int)
   mode: hash
   outputColumnNames: _col0, _col1, _col2, _col3
   Reduce Output Operator
 key expressions: _col0 (type: int), _col1 (type: 
 int), _col2 (type: int)
 sort order: +++
 Map-reduce partition columns: _col0 (type: int), 
 _col1 (type: int), _col2 (type: int)
 value expressions: _col3 (type: bigint)
 {code}
 The full explain plan 
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Reducer 2 - Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 
 (CONTAINS)
 Reducer 4 - Union 3 (SIMPLE_EDGE)
 Reducer 7 - Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 
 (CONTAINS)
   DagName: mmokhtar_20150214132727_95878ea1-ee6a-4b7e-bc86-843abd5cf664:7
   Vertices:
 Map 1
 Map Operator Tree:
 

[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-15 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587114#comment-14587114
 ] 

Jesus Camacho Rodriguez commented on HIVE-10996:


This seems to be fixed in HIVE-9613. The fix was backported to 1.0, but not to 
1.1.

[~hagleitn], [~brocknoland], could we backport HIVE-9613 to 1.1 to solve this 
issue? Thanks

 Aggregation / Projection over Multi-Join Inner Query producing incorrect 
 results
 

 Key: HIVE-10996
 URL: https://issues.apache.org/jira/browse/HIVE-10996
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.0.0, 1.2.0, 1.1.0
Reporter: Gautam Kowshik
Assignee: Jesus Camacho Rodriguez
Priority: Critical
 Attachments: explain_q1.txt, explain_q2.txt


 We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
 a regression.
 The following query (Q1) produces no results:
 {code}
 select s
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 {code}
 While this one (Q2) does produce results :
 {code}
 select *
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 1 21  20  Bob 1234
 1 31  30  Bob 1234
 3 51  50  Jeff1234
 {code}
 The setup to test this is:
 {code}
 create table purchase_history (s string, product string, price double, 
 timestamp int);
 insert into purchase_history values ('1', 'Belt', 20.00, 21);
 insert into purchase_history values ('1', 'Socks', 3.50, 31);
 insert into purchase_history values ('3', 'Belt', 20.00, 51);
 insert into purchase_history values ('4', 'Shirt', 15.50, 59);
 create table cart_history (s string, cart_id int, timestamp int);
 insert into cart_history values ('1', 1, 10);
 insert into cart_history values ('1', 2, 20);
 insert into cart_history values ('1', 3, 30);
 insert into cart_history values ('1', 4, 40);
 insert into cart_history values ('3', 5, 50);
 insert into cart_history values ('4', 6, 60);
 create table events (s string, st2 string, n int, timestamp int);
 insert into events values ('1', 'Bob', 1234, 20);
 insert into events values ('1', 'Bob', 1234, 30);
 insert into events values ('1', 'Bob', 1234, 25);
 insert into events values ('2', 'Sam', 1234, 30);
 insert into events values ('3', 'Jeff', 1234, 50);
 insert into events values ('4', 'Ted', 1234, 60);
 {code}
 I realize select * and select s are not all that interesting in this context 
 but what lead us to this issue was select count(distinct s) was not returning 
 results. The above queries are the simplified queries that produce the issue. 
 I will note that if I convert the inner join to a table and select from that 
 the issue does not appear.
 Update: Found that turning off  hive.optimize.remove.identity.project fixes 
 this issue. This optimization was introduced in 
 https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10937) LLAP: make ObjectCache for plans work properly in the daemon

2015-06-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10937:

Attachment: HIVE-10937.01.patch

simplified patch

 LLAP: make ObjectCache for plans work properly in the daemon
 

 Key: HIVE-10937
 URL: https://issues.apache.org/jira/browse/HIVE-10937
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: llap

 Attachments: HIVE-10937.01.patch, HIVE-10937.patch


 There's perf hit otherwise, esp. when stupid planner creates 1009 reducers of 
 4Mb each.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11006) improve logging wrt ACID module

2015-06-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587157#comment-14587157
 ] 

Alan Gates commented on HIVE-11006:
---

Will review.

 improve logging wrt ACID module
 ---

 Key: HIVE-11006
 URL: https://issues.apache.org/jira/browse/HIVE-11006
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11006.patch


 especially around metastore DB operations (TxnHandler) which are retried or 
 fail for some reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10940:

Attachment: HIVE-10940.01.patch

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-10940.01.patch, HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11006) improve logging wrt ACID module

2015-06-15 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587045#comment-14587045
 ] 

Eugene Koifman commented on HIVE-11006:
---

[~sushanth], could we get this into 1.2.1?  It's only logging changes but will 
make diagnostics easier.

 improve logging wrt ACID module
 ---

 Key: HIVE-11006
 URL: https://issues.apache.org/jira/browse/HIVE-11006
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11006.patch


 especially around metastore DB operations (TxnHandler) which are retried or 
 fail for some reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-15 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587060#comment-14587060
 ] 

Prasanth Jayachandran commented on HIVE-10940:
--

+1

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-10940.01.patch, HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-15 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587064#comment-14587064
 ] 

Laljo John Pullokkaran commented on HIVE-10841:
---

Committed to branch 1.0.

 [WHERE col is not null] does not work sometimes for queries with many JOIN 
 statements
 -

 Key: HIVE-10841
 URL: https://issues.apache.org/jira/browse/HIVE-10841
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
Reporter: Alexander Pivovarov
Assignee: Laljo John Pullokkaran
 Fix For: 1.2.1

 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
 HIVE-10841.2.patch, HIVE-10841.patch


 The result from the following SELECT query is 3 rows but it should be 1 row.
 I checked it in MySQL - it returned 1 row.
 To reproduce the issue in Hive
 1. prepare tables
 {code}
 drop table if exists L;
 drop table if exists LA;
 drop table if exists FR;
 drop table if exists A;
 drop table if exists PI;
 drop table if exists acct;
 create table L as select 4436 id;
 create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
 create table FR as select 4436 loan_id;
 create table A as select 4748 id;
 create table PI as select 4415 id;
 create table acct as select 4748 aid, 10 acc_n, 122 brn;
 insert into table acct values(4748, null, null);
 insert into table acct values(4748, null, null);
 {code}
 2. run SELECT query
 {code}
 select
   acct.ACC_N,
   acct.brn
 FROM L
 JOIN LA ON L.id = LA.loan_id
 JOIN FR ON L.id = FR.loan_id
 JOIN A ON LA.aid = A.id
 JOIN PI ON PI.id = LA.pi_id
 JOIN acct ON A.id = acct.aid
 WHERE
   L.id = 4436
   and acct.brn is not null;
 {code}
 the result is 3 rows
 {code}
 10122
 NULL  NULL
 NULL  NULL
 {code}
 but it should be 1 row
 {code}
 10122
 {code}
 2.1 explain select ... output for hive-1.3.0 MR
 {code}
 STAGE DEPENDENCIES:
   Stage-12 is a root stage
   Stage-9 depends on stages: Stage-12
   Stage-0 depends on stages: Stage-9
 STAGE PLANS:
   Stage: Stage-12
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
 acct 
   Fetch Operator
 limit: -1
 fr 
   Fetch Operator
 limit: -1
 l 
   Fetch Operator
 limit: -1
 pi 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 acct 
   TableScan
 alias: acct
 Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: aid is not null (type: boolean)
   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 _col5 (type: int)
   1 id (type: int)
   2 aid (type: int)
 fr 
   TableScan
 alias: fr
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (loan_id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 l 
   TableScan
 alias: l
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: (id = 4436) (type: boolean)
   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 keys:
   0 4436 (type: int)
   1 4436 (type: int)
   2 4436 (type: int)
 pi 
   TableScan
 alias: pi
 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
 stats: NONE
 Filter Operator
   predicate: id is not null (type: boolean)
   Statistics: Num 

[jira] [Commented] (HIVE-10937) LLAP: make ObjectCache for plans work properly in the daemon

2015-06-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587071#comment-14587071
 ] 

Sergey Shelukhin commented on HIVE-10937:
-

I didn't get a repro of issues reported in the cluster

 LLAP: make ObjectCache for plans work properly in the daemon
 

 Key: HIVE-10937
 URL: https://issues.apache.org/jira/browse/HIVE-10937
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: llap

 Attachments: HIVE-10937.patch


 There's perf hit otherwise, esp. when stupid planner creates 1009 reducers of 
 4Mb each.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11010) Accumulo storage handler queries via HS2 fail

2015-06-15 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-11010:
--
Summary: Accumulo storage handler queries via HS2 fail  (was: Accumulo 
storage handler throws [usrname]@[principlaname] is not allowed to impersonate 
[username] via beeline on kerberized cluster)

 Accumulo storage handler queries via HS2 fail
 -

 Key: HIVE-11010
 URL: https://issues.apache.org/jira/browse/HIVE-11010
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
 Environment: Secure
Reporter: Takahiko Saito
Assignee: Josh Elser

 On Kerberized cluster, accumulo storage handler throws an error, 
 [usrname]@[principlaname] is not allowed to impersonate [username] 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-4239) Remove lock on compilation stage

2015-06-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-4239:
---
Attachment: HIVE-4239.07.patch

Rebased the patch. Some other commit has refactored Hive out of the session 
class, so the issue with this change is moot

 Remove lock on compilation stage
 

 Key: HIVE-4239
 URL: https://issues.apache.org/jira/browse/HIVE-4239
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Reporter: Carl Steinbach
Assignee: Sergey Shelukhin
 Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, 
 HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, 
 HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11010) Accumulo storage handler queries via HS2 fail

2015-06-15 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587091#comment-14587091
 ] 

Josh Elser commented on HIVE-11010:
---

Thanks for filing this. Was doing some debugging with [~taksaito] with the 
AccumuloStorageHandler -- loaded some data in both HBase and Accumulo, ran some 
Hive queries against both and found that when we ran the Accumulo queries via 
hiveserver (but not in the local client) both the queries would fail on the RPC 
handshakes. Short story, AccumuloStorageHandler queries with Kerberos on don't 
work with HiveServer2.

I think what was happening is that the additions to the AccumuloStorageHandler 
in HIVE-10857 don't work as expected because HS2 is going to be running with 
its own Kerberos credentials. I think we need to change how we set up the 
credentials inside of AccumuloStorageHandler so that it will work regardless of 
a local hive client or hs2 -- running a doAs with a PROXY instead of replacing 
the HS2 credentials.

The second half is that we'd need to make sure Accumulo itself is configured to 
allow HS2 to proxy on behalf of users -- not relevant for Hive code, but 
something to document for users to set up in Accumulo.

 Accumulo storage handler queries via HS2 fail
 -

 Key: HIVE-11010
 URL: https://issues.apache.org/jira/browse/HIVE-11010
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0, 1.2.1
 Environment: Secure
Reporter: Takahiko Saito
Assignee: Josh Elser
 Fix For: 1.2.1


 On Kerberized cluster, accumulo storage handler throws an error, 
 [usrname]@[principlaname] is not allowed to impersonate [username] 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11010) Accumulo storage handler queries via HS2 fail

2015-06-15 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-11010:
--
Affects Version/s: 1.2.1

 Accumulo storage handler queries via HS2 fail
 -

 Key: HIVE-11010
 URL: https://issues.apache.org/jira/browse/HIVE-11010
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0, 1.2.1
 Environment: Secure
Reporter: Takahiko Saito
Assignee: Josh Elser
 Fix For: 1.2.1


 On Kerberized cluster, accumulo storage handler throws an error, 
 [usrname]@[principlaname] is not allowed to impersonate [username] 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-15 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587171#comment-14587171
 ] 

Jesus Camacho Rodriguez commented on HIVE-10996:


I can reproduce the problem in 1.2. Still investigating the issue...

 Aggregation / Projection over Multi-Join Inner Query producing incorrect 
 results
 

 Key: HIVE-10996
 URL: https://issues.apache.org/jira/browse/HIVE-10996
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.0.0, 1.2.0, 1.1.0
Reporter: Gautam Kowshik
Assignee: Jesus Camacho Rodriguez
Priority: Critical
 Attachments: explain_q1.txt, explain_q2.txt


 We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
 a regression.
 The following query (Q1) produces no results:
 {code}
 select s
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 {code}
 While this one (Q2) does produce results :
 {code}
 select *
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 1 21  20  Bob 1234
 1 31  30  Bob 1234
 3 51  50  Jeff1234
 {code}
 The setup to test this is:
 {code}
 create table purchase_history (s string, product string, price double, 
 timestamp int);
 insert into purchase_history values ('1', 'Belt', 20.00, 21);
 insert into purchase_history values ('1', 'Socks', 3.50, 31);
 insert into purchase_history values ('3', 'Belt', 20.00, 51);
 insert into purchase_history values ('4', 'Shirt', 15.50, 59);
 create table cart_history (s string, cart_id int, timestamp int);
 insert into cart_history values ('1', 1, 10);
 insert into cart_history values ('1', 2, 20);
 insert into cart_history values ('1', 3, 30);
 insert into cart_history values ('1', 4, 40);
 insert into cart_history values ('3', 5, 50);
 insert into cart_history values ('4', 6, 60);
 create table events (s string, st2 string, n int, timestamp int);
 insert into events values ('1', 'Bob', 1234, 20);
 insert into events values ('1', 'Bob', 1234, 30);
 insert into events values ('1', 'Bob', 1234, 25);
 insert into events values ('2', 'Sam', 1234, 30);
 insert into events values ('3', 'Jeff', 1234, 50);
 insert into events values ('4', 'Ted', 1234, 60);
 {code}
 I realize select * and select s are not all that interesting in this context 
 but what lead us to this issue was select count(distinct s) was not returning 
 results. The above queries are the simplified queries that produce the issue. 
 I will note that if I convert the inner join to a table and select from that 
 the issue does not appear.
 Update: Found that turning off  hive.optimize.remove.identity.project fixes 
 this issue. This optimization was introduced in 
 https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-15 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10999:
---
Attachment: HIVE-10999.1-spark.patch

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-15 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10999:
---
Comment: was deleted

(was: 

{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739493/HIVE-10999.1-spark.patch

{color:red}ERROR:{color} -1 due to 604 failed/errored test(s), 7420 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_auto_sortmerge_join_16
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketizedhiveinputformat
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_empty_dir_in_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_external_table_with_space_in_location_path
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap_auto
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_merge
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_leftsemijoin_mr
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_quotedid_smb
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_smb_mapjoin_8
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter_partitioned
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_truncate_column_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_uber_reduce
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_add_part_multiple
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join17
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18_multi_distinct

[jira] [Resolved] (HIVE-10915) ORC fails to read table with a 38Gb ORC file

2015-06-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-10915.
--
Resolution: Fixed

Fixed by HIVE-10685. Verified it against lineitem TPCH 1000 scale.

 ORC fails to read table with a 38Gb ORC file
 

 Key: HIVE-10915
 URL: https://issues.apache.org/jira/browse/HIVE-10915
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.3.0
Reporter: Gopal V

 {code}
 hive  set mapreduce.input.fileinputformat.split.maxsize=1;
 hive set  mapreduce.input.fileinputformat.split.maxsize=1;
 hive alter table lineitem concatenate;
 ..
 hive dfs -ls /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem;
 Found 12 items
 -rwxr-xr-x   3 gopal supergroup 41368976599 2015-06-03 15:49 
 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/00_0
 -rwxr-xr-x   3 gopal supergroup 36226719673 2015-06-03 15:48 
 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/01_0
 -rwxr-xr-x   3 gopal supergroup 27544042018 2015-06-03 15:50 
 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/02_0
 -rwxr-xr-x   3 gopal supergroup 23147063608 2015-06-03 15:44 
 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/03_0
 -rwxr-xr-x   3 gopal supergroup 21079035936 2015-06-03 15:44 
 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/04_0
 -rwxr-xr-x   3 gopal supergroup 13813961419 2015-06-03 15:43 
 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/05_0
 -rwxr-xr-x   3 gopal supergroup  8155299977 2015-06-03 15:40 
 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/06_0
 -rwxr-xr-x   3 gopal supergroup  6264478613 2015-06-03 15:40 
 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/07_0
 -rwxr-xr-x   3 gopal supergroup  4653393054 2015-06-03 15:40 
 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/08_0
 -rwxr-xr-x   3 gopal supergroup  3621672928 2015-06-03 15:39 
 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/09_0
 -rwxr-xr-x   3 gopal supergroup  1460919310 2015-06-03 15:38 
 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/10_0
 -rwxr-xr-x   3 gopal supergroup   485129789 2015-06-03 15:38 
 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/11_0
 {code}
 Errors without PPD
 Suspicious offsets in the stream information - 
 {code}
 Caused by: java.io.EOFException: Read past end of RLE integer from compressed 
 stream Stream for column 1 kind DATA position: 1608840 length: 1608840 range: 
 0 offset: 1608840 limit: 1608840 range 0 = 0 to 1608840 uncompressed: 36845 
 to 36845
 at 
 org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:56)
 at 
 org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:302)
 at 
 org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:346)
 at 
 org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:582)
 at 
 org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:2026)
 at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1070)
 ... 25 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10907) Hive on Tez: Classcast exception in some cases with SMB joins

2015-06-15 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-10907:
--
Affects Version/s: 1.0.0
   1.2.0

 Hive on Tez: Classcast exception in some cases with SMB joins
 -

 Key: HIVE-10907
 URL: https://issues.apache.org/jira/browse/HIVE-10907
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0, 1.2.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 1.2.1

 Attachments: HIVE-10907.1.patch, HIVE-10907.2.patch, 
 HIVE-10907.3.patch, HIVE-10907.4.patch


 In cases where there is a mix of Map side work and reduce side work, we get a 
 classcast exception because we assume homogeneity in the code. We need to fix 
 this correctly. For now this is a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11014) LLAP: MiniTez vector_binary_join_groupby test has result changes compared to master

2015-06-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587205#comment-14587205
 ] 

Sergey Shelukhin commented on HIVE-11014:
-

Note: right now there are some incorrect changes committed there, I'm going to 
commit the master version again

 LLAP: MiniTez vector_binary_join_groupby test has result changes compared to 
 master
 ---

 Key: HIVE-11014
 URL: https://issues.apache.org/jira/browse/HIVE-11014
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Matt McCline





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9248) Vectorization : Tez Reduce vertex not getting vectorized when GROUP BY is Hash mode

2015-06-15 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587209#comment-14587209
 ] 

Jason Dere commented on HIVE-9248:
--

+1

 Vectorization : Tez Reduce vertex not getting vectorized when GROUP BY is 
 Hash mode
 ---

 Key: HIVE-9248
 URL: https://issues.apache.org/jira/browse/HIVE-9248
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Affects Versions: 0.14.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-9248.01.patch, HIVE-9248.02.patch, 
 HIVE-9248.03.patch, HIVE-9248.04.patch, HIVE-9248.05.patch, HIVE-9248.06.patch


 Under Tez and Vectorization, ReduceWork not getting vectorized unless it 
 GROUP BY operator is MergePartial.  Add valid cases where GROUP BY is Hash 
 (and presumably there are downstream reducers that will do MergePartial).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)