[jira] [Commented] (HIVE-11669) OrcFileDump service should support directories

2015-08-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718115#comment-14718115
 ] 

Hive QA commented on HIVE-11669:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12752835/HIVE-11669.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9380 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5095/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5095/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5095/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12752835 - PreCommit-HIVE-TRUNK-Build

 OrcFileDump service should support directories
 --

 Key: HIVE-11669
 URL: https://issues.apache.org/jira/browse/HIVE-11669
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.3.0, 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11669.1.patch


 orcfiledump service does not support directories. If directory is specified 
 then the program should iterate through all the files in the directory and 
 perform file dump.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11538) Add an option to skip init script while running tests

2015-08-28 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-11538:
--
Labels: TODOC2.0  (was: )

 Add an option to skip init script while running tests
 -

 Key: HIVE-11538
 URL: https://issues.apache.org/jira/browse/HIVE-11538
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
  Labels: TODOC2.0
 Fix For: 2.0.0

 Attachments: HIVE-11538.2.patch, HIVE-11538.3.patch, HIVE-11538.patch


 {{q_test_init.sql}} has grown over time. Now, it takes substantial amount of 
 time. When debugging a particular query which doesn't need such 
 initialization, this delay is annoyance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10175) DynamicPartitionPruning lacks a fast-path exit for large IN() queries

2015-08-28 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10175:
---
Fix Version/s: (was: 2.0.0)
   (was: 1.3.0)

 DynamicPartitionPruning lacks a fast-path exit for large IN() queries
 -

 Key: HIVE-10175
 URL: https://issues.apache.org/jira/browse/HIVE-10175
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer, Tez
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-10175.1.patch, HIVE-10175.profile.html


 TezCompiler::runDynamicPartitionPruning()  ppr.PartitionPruner() calls the 
 graph walker even if all tables provided to the optimizer are unpartitioned 
 (or temporary) tables.
 This makes it extremely slow as it will walk  inspect a large/complex 
 FilterOperator later in the pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2015-08-28 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11652:
---
Attachment: HIVE-11652.02.patch

 Avoid expensive call to removeAll in DefaultGraphWalker
 ---

 Key: HIVE-11652
 URL: https://issues.apache.org/jira/browse/HIVE-11652
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.3.0, 2.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, 
 HIVE-11652.patch


 When the plan is too large, the removeAll call in DefaultGraphWalker (line 
 140) will take very long as it will have to go through the list looking for 
 each of the nodes. We try to get rid of this call by rewriting the logic in 
 the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-10175) DynamicPartitionPruning lacks a fast-path exit for large IN() queries

2015-08-28 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reopened HIVE-10175:


Reverting this patch as this needs to cycle-break  fold out the SyntheticJoin 
predicates even if the target table is unpartitioned.

 DynamicPartitionPruning lacks a fast-path exit for large IN() queries
 -

 Key: HIVE-10175
 URL: https://issues.apache.org/jira/browse/HIVE-10175
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer, Tez
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-10175.1.patch, HIVE-10175.profile.html


 TezCompiler::runDynamicPartitionPruning()  ppr.PartitionPruner() calls the 
 graph walker even if all tables provided to the optimizer are unpartitioned 
 (or temporary) tables.
 This makes it extremely slow as it will walk  inspect a large/complex 
 FilterOperator later in the pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10175) DynamicPartitionPruning lacks a fast-path exit for large IN() queries

2015-08-28 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718285#comment-14718285
 ] 

Gopal V commented on HIVE-10175:


The issue is only visible in TPC-H q21, which hit the following error in the 
nightly runs.

{code}
Caused by: java.lang.RuntimeException: Cannot find ExprNodeEvaluator for the 
exprNodeDesc = RS[4]
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.get(ExprNodeEvaluatorFactory.java:57)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.init(ExprNodeGenericFuncEvaluator.java:100)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.get(ExprNodeEvaluatorFactory.java:51)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.init(ExprNodeGenericFuncEvaluator.java:100)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.get(ExprNodeEvaluatorFactory.java:51)
at 
org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:59)
... 21 more
{code}

 DynamicPartitionPruning lacks a fast-path exit for large IN() queries
 -

 Key: HIVE-10175
 URL: https://issues.apache.org/jira/browse/HIVE-10175
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer, Tez
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-10175.1.patch, HIVE-10175.profile.html


 TezCompiler::runDynamicPartitionPruning()  ppr.PartitionPruner() calls the 
 graph walker even if all tables provided to the optimizer are unpartitioned 
 (or temporary) tables.
 This makes it extremely slow as it will walk  inspect a large/complex 
 FilterOperator later in the pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10175) DynamicPartitionPruning lacks a fast-path exit for large IN() queries

2015-08-28 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V resolved HIVE-10175.

   Resolution: Fixed
Fix Version/s: 2.0.0
   1.3.0

Pushed to master and branch-1, thanks [~jcamachorodriguez]!

 DynamicPartitionPruning lacks a fast-path exit for large IN() queries
 -

 Key: HIVE-10175
 URL: https://issues.apache.org/jira/browse/HIVE-10175
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer, Tez
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-10175.1.patch, HIVE-10175.profile.html


 TezCompiler::runDynamicPartitionPruning()  ppr.PartitionPruner() calls the 
 graph walker even if all tables provided to the optimizer are unpartitioned 
 (or temporary) tables.
 This makes it extremely slow as it will walk  inspect a large/complex 
 FilterOperator later in the pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11629) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix the filter expressions for full outer join and right outer join

2015-08-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718368#comment-14718368
 ] 

Hive QA commented on HIVE-11629:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12752839/HIVE-11629.02.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9376 tests executed
*Failed tests:*
{noformat}
TestJdbcWithLocalClusterSpark - did not produce a TEST-*.xml file
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5097/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5097/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5097/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12752839 - PreCommit-HIVE-TRUNK-Build

 CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix the filter 
 expressions for full outer join and right outer join
 --

 Key: HIVE-11629
 URL: https://issues.apache.org/jira/browse/HIVE-11629
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11629.01.patch, HIVE-11629.02.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11634) Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)

2015-08-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718181#comment-14718181
 ] 

Hive QA commented on HIVE-11634:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12752837/HIVE-11634.4.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9381 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pointlookup
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pointlookup2
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5096/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5096/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5096/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12752837 - PreCommit-HIVE-TRUNK-Build

 Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
 --

 Key: HIVE-11634
 URL: https://issues.apache.org/jira/browse/HIVE-11634
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11634.1.patch, HIVE-11634.2.patch, 
 HIVE-11634.3.patch, HIVE-11634.4.patch


 Currently, we do not support partition pruning for the following scenario
 {code}
 create table pcr_t1 (key int, value string) partitioned by (ds string);
 insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src 
 where key  20 order by key;
 insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src 
 where key  20 order by key;
 insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src 
 where key  20 order by key;
 explain extended select ds from pcr_t1 where struct(ds, key) in 
 (struct('2000-04-08',1), struct('2000-04-09',2));
 {code}
 If we run the above query, we see that all the partitions of table pcr_t1 are 
 present in the filter predicate where as we can prune  partition 
 (ds='2000-04-10'). 
 The optimization is to rewrite the above query into the following.
 {code}
 explain extended select ds from pcr_t1 where  (struct(ds)) IN 
 (struct('2000-04-08'), struct('2000-04-09')) and  struct(ds, key) in 
 (struct('2000-04-08',1), struct('2000-04-09',2));
 {code}
 The predicate (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09'))  
 is used by partition pruner to prune the columns which otherwise will not be 
 pruned.
 This is an extension of the idea presented in HIVE-11573.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11544) LazyInteger should avoid throwing NumberFormatException

2015-08-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718576#comment-14718576
 ] 

Hive QA commented on HIVE-11544:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12752834/HIVE-11544.4.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9380 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5098/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5098/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5098/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12752834 - PreCommit-HIVE-TRUNK-Build

 LazyInteger should avoid throwing NumberFormatException
 ---

 Key: HIVE-11544
 URL: https://issues.apache.org/jira/browse/HIVE-11544
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.14.0, 1.2.0, 1.3.0, 2.0.0
Reporter: William Slacum
Assignee: Gopal V
Priority: Minor
  Labels: Performance
 Attachments: HIVE-11544.1.patch, HIVE-11544.2.patch, 
 HIVE-11544.3.patch, HIVE-11544.4.patch


 {{LazyInteger#parseInt}} will throw a {{NumberFormatException}} under these 
 conditions:
 # bytes are null
 # radix is invalid
 # length is 0
 # the string is '+' or '-'
 # {{LazyInteger#parse}} throws a {{NumberFormatException}}
 Most of the time, such as in {{LazyInteger#init}} and {{LazyByte#init}}, the 
 exception is caught, swallowed, and {{isNull}} is set to {{true}}.
 This is generally a bad workflow, as exception creation is a performance 
 bottleneck, and potentially repeating for many rows in a query can have a 
 drastic performance consequence.
 It would be better if this method returned an {{OptionalInteger}}, which 
 would provide similar functionality with a higher throughput rate.
 I've tested against 0.14.0, and saw that the logic is unchanged in 1.2.0, so 
 I've marked those as affected. Any version in between would also suffer from 
 this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10021) Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled

2015-08-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720007#comment-14720007
 ] 

Ashutosh Chauhan commented on HIVE-10021:
-

+1 pending tests

 Alter index rebuild statements submitted through HiveServer2 fail when 
 Sentry is enabled
 --

 Key: HIVE-10021
 URL: https://issues.apache.org/jira/browse/HIVE-10021
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Indexing
Affects Versions: 0.13.1, 2.0.0
 Environment: CDH 5.3.2
Reporter: Richard Williams
Assignee: Aihua Xu
 Attachments: HIVE-10021.2.patch, HIVE-10021.patch


 When HiveServer2 is configured to authorize submitted queries and statements 
 through Sentry, any attempt to issue an alter index rebuild statement fails 
 with a SemanticException caused by a NullPointerException. This occurs 
 regardless of whether the index is a compact or bitmap index. 
 The root cause of the problem appears to be the fact that the static 
 createRootTask function in org.apache.hadoop.hive.ql.optimizer.IndexUtils 
 creates a new 
 org.apache.hadoop.hive.ql.Driver object to compile the index builder query, 
 and this new Driver object, unlike the one used by HiveServer2 to compile the 
 submitted statement, is used without having its userName field initialized 
 with the submitting user's username. Adding null checks to the Sentry code is 
 insufficient to solve this problem, because Sentry needs the userName to 
 determine whether or not the submitting user should be able to execute the 
 index rebuild statement.
 Example stack trace from the HiveServer2 logs:
 {noformat}
 FAILED: NullPointerException null
 java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
   at 
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
   at org.apache.hadoop.security.Groups.getGroups(Groups.java:161)
   at 
 org.apache.sentry.provider.common.HadoopGroupMappingService.getGroups(HadoopGroupMappingService.java:46)
   at 
 org.apache.sentry.binding.hive.authz.HiveAuthzBinding.getGroups(HiveAuthzBinding.java:370)
   at 
 org.apache.sentry.binding.hive.HiveAuthzBindingHook.postAnalyze(HiveAuthzBindingHook.java:314)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:440)
   at 
 org.apache.hadoop.hive.ql.optimizer.IndexUtils.createRootTask(IndexUtils.java:258)
   at 
 org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler.getIndexBuilderMapRedTask(CompactIndexHandler.java:149)
   at 
 org.apache.hadoop.hive.ql.index.TableBasedIndexHandler.generateIndexBuildTaskList(TableBasedIndexHandler.java:67)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.getIndexBuilderMapRed(DDLSemanticAnalyzer.java:1171)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterIndexRebuild(DDLSemanticAnalyzer.java:1117)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:410)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:204)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1026)
   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1019)
   at 
 org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:100)
   at 
 org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:173)
   at 
 org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:715)
   at 
 org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:370)
   at 
 org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:357)
   at 
 org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:238)
   at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:393)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1373)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1358)
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
   at org.apache.thrift.server.TServlet.doPost(TServlet.java:83)
   at 
 org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:99)
   at 

[jira] [Commented] (HIVE-11617) Explain plan for multiple lateral views is very slow

2015-08-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720001#comment-14720001
 ] 

Ashutosh Chauhan commented on HIVE-11617:
-

FYI : [~jcamachorodriguez] , [~hsubramaniyan]

 Explain plan for multiple lateral views is very slow
 

 Key: HIVE-11617
 URL: https://issues.apache.org/jira/browse/HIVE-11617
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-11617.patch, HIVE-11617.patch


 The following explain job will be very slow or never finish if there are many 
 lateral views involved. High CPU usage is also noticed.
 {noformat}
 CREATE TABLE `t1`(`pattern` arrayint);
   
 explain select * from t1 
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1;
 {noformat}
 From jstack, the job is busy with preorder tree traverse. 
 {noformat}
 at java.util.regex.Matcher.getTextLength(Matcher.java:1234)
 at java.util.regex.Matcher.reset(Matcher.java:308)
 at java.util.regex.Matcher.init(Matcher.java:228)
 at java.util.regex.Pattern.matcher(Pattern.java:1088)
 at org.apache.hadoop.hive.ql.lib.RuleRegExp.cost(RuleRegExp.java:67)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:72)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
 at 
 

[jira] [Commented] (HIVE-11617) Explain plan for multiple lateral views is very slow

2015-08-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14719966#comment-14719966
 ] 

Hive QA commented on HIVE-11617:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12752844/HIVE-11617.patch

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 9380 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join33
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join32
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join32_lessSize
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join33
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_multiinsert
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5099/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5099/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5099/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12752844 - PreCommit-HIVE-TRUNK-Build

 Explain plan for multiple lateral views is very slow
 

 Key: HIVE-11617
 URL: https://issues.apache.org/jira/browse/HIVE-11617
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-11617.patch, HIVE-11617.patch


 The following explain job will be very slow or never finish if there are many 
 lateral views involved. High CPU usage is also noticed.
 {noformat}
 CREATE TABLE `t1`(`pattern` arrayint);
   
 explain select * from t1 
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1
 lateral view explode(pattern) tbl1 as col1;
 {noformat}
 From jstack, the job is busy with preorder tree traverse. 
 {noformat}
 at java.util.regex.Matcher.getTextLength(Matcher.java:1234)
 at java.util.regex.Matcher.reset(Matcher.java:308)
 at java.util.regex.Matcher.init(Matcher.java:228)
 at java.util.regex.Pattern.matcher(Pattern.java:1088)
 at org.apache.hadoop.hive.ql.lib.RuleRegExp.cost(RuleRegExp.java:67)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:72)
 at 
 

[jira] [Updated] (HIVE-10021) Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled

2015-08-28 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10021:

Attachment: HIVE-10021.2.patch

Rather than passing the userName around, use the one saved in the SessionState.

 Alter index rebuild statements submitted through HiveServer2 fail when 
 Sentry is enabled
 --

 Key: HIVE-10021
 URL: https://issues.apache.org/jira/browse/HIVE-10021
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Indexing
Affects Versions: 0.13.1, 2.0.0
 Environment: CDH 5.3.2
Reporter: Richard Williams
Assignee: Aihua Xu
 Attachments: HIVE-10021.2.patch, HIVE-10021.patch


 When HiveServer2 is configured to authorize submitted queries and statements 
 through Sentry, any attempt to issue an alter index rebuild statement fails 
 with a SemanticException caused by a NullPointerException. This occurs 
 regardless of whether the index is a compact or bitmap index. 
 The root cause of the problem appears to be the fact that the static 
 createRootTask function in org.apache.hadoop.hive.ql.optimizer.IndexUtils 
 creates a new 
 org.apache.hadoop.hive.ql.Driver object to compile the index builder query, 
 and this new Driver object, unlike the one used by HiveServer2 to compile the 
 submitted statement, is used without having its userName field initialized 
 with the submitting user's username. Adding null checks to the Sentry code is 
 insufficient to solve this problem, because Sentry needs the userName to 
 determine whether or not the submitting user should be able to execute the 
 index rebuild statement.
 Example stack trace from the HiveServer2 logs:
 {noformat}
 FAILED: NullPointerException null
 java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
   at 
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
   at org.apache.hadoop.security.Groups.getGroups(Groups.java:161)
   at 
 org.apache.sentry.provider.common.HadoopGroupMappingService.getGroups(HadoopGroupMappingService.java:46)
   at 
 org.apache.sentry.binding.hive.authz.HiveAuthzBinding.getGroups(HiveAuthzBinding.java:370)
   at 
 org.apache.sentry.binding.hive.HiveAuthzBindingHook.postAnalyze(HiveAuthzBindingHook.java:314)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:440)
   at 
 org.apache.hadoop.hive.ql.optimizer.IndexUtils.createRootTask(IndexUtils.java:258)
   at 
 org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler.getIndexBuilderMapRedTask(CompactIndexHandler.java:149)
   at 
 org.apache.hadoop.hive.ql.index.TableBasedIndexHandler.generateIndexBuildTaskList(TableBasedIndexHandler.java:67)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.getIndexBuilderMapRed(DDLSemanticAnalyzer.java:1171)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterIndexRebuild(DDLSemanticAnalyzer.java:1117)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:410)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:204)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1026)
   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1019)
   at 
 org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:100)
   at 
 org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:173)
   at 
 org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:715)
   at 
 org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:370)
   at 
 org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:357)
   at 
 org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:238)
   at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:393)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1373)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1358)
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
   at org.apache.thrift.server.TServlet.doPost(TServlet.java:83)
   at 
 org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:99)
   at 

[jira] [Commented] (HIVE-11668) make sure directsql calls pre-query init when needed

2015-08-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720972#comment-14720972
 ] 

Hive QA commented on HIVE-11668:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12753065/HIVE-11668.01.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5109/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5109/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5109/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/common/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/common/target/tmp/conf
 [copy] Copying 10 files to 
/data/hive-ptest/working/apache-github-source-source/common/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-common ---
[INFO] Compiling 21 source files to 
/data/hive-ptest/working/apache-github-source-source/common/target/test-classes
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/common/src/test/org/apache/hadoop/hive/common/TestValidReadTxnList.java:
 
/data/hive-ptest/working/apache-github-source-source/common/src/test/org/apache/hadoop/hive/common/TestValidReadTxnList.java
 uses or overrides a deprecated API.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/common/src/test/org/apache/hadoop/hive/common/TestValidReadTxnList.java:
 Recompile with -Xlint:deprecation for details.
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-common ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-common ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-github-source-source/common/target/hive-common-2.0.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-common ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-common ---
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/common/target/hive-common-2.0.0-SNAPSHOT.jar
 to 
/home/hiveptest/.m2/repository/org/apache/hive/hive-common/2.0.0-SNAPSHOT/hive-common-2.0.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/common/pom.xml to 
/home/hiveptest/.m2/repository/org/apache/hive/hive-common/2.0.0-SNAPSHOT/hive-common-2.0.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive Serde 2.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-serde ---
[INFO] Deleting 
/data/hive-ptest/working/apache-github-source-source/serde/target
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/serde 
(includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-serde ---
[INFO] 
[INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-serde 
---
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/serde/src/gen/protobuf/gen-java
 added.
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/serde/src/gen/thrift/gen-javabean
 added.
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-serde ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
hive-serde ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-github-source-source/serde/src/main/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-serde ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-serde ---
[INFO] Compiling 405 source files to 
/data/hive-ptest/working/apache-github-source-source/serde/target/classes
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/serde/src/java/org/apache/hadoop/hive/serde2/SerDe.java:
 Some input files use or override a deprecated API.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/serde/src/java/org/apache/hadoop/hive/serde2/SerDe.java:
 Recompile with -Xlint:deprecation for details.
[WARNING] 

[jira] [Updated] (HIVE-11669) OrcFileDump service should support directories

2015-08-28 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-11669:
--
Labels: TODOC1.3  (was: )

 OrcFileDump service should support directories
 --

 Key: HIVE-11669
 URL: https://issues.apache.org/jira/browse/HIVE-11669
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.3.0, 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
  Labels: TODOC1.3
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11669.1.patch


 orcfiledump service does not support directories. If directory is specified 
 then the program should iterate through all the files in the directory and 
 perform file dump.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11320) ACID enable predicate pushdown for insert-only delta file

2015-08-28 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720895#comment-14720895
 ] 

Eugene Koifman commented on HIVE-11320:
---

I'm not sure I follow, can you explain

 ACID enable predicate pushdown for insert-only delta file
 -

 Key: HIVE-11320
 URL: https://issues.apache.org/jira/browse/HIVE-11320
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 1.3.0

 Attachments: HIVE-11320.patch


 Given ACID table T against which some Insert/Update/Delete has been executed 
 but not Major Compaction.
 This table will have some number of delta files.  (and possibly base files).
 Given a query: select * from T where c1 = 5;
 OrcRawRecordMerger() c'tor currently disables predicate pushdown in ORC to 
 the delta file via  eventOptions.searchArgument(null, null);
 When a delta file is known to only have Insert events we can safely push the 
 predicate.  
 ORC maintains stats in a footer which have counts of insert/update/delete 
 events in the file - this can be used to determine that a given delta file 
 only has Insert events.
 See OrcRecordUpdate.parseAcidStats()
 This will enable PPD for Streaming Ingest (HIVE-5687) use cases which by 
 definition only generate Insert events. 
 PPD for deltas with arbitrary types of events can be achieved but it is more 
 complicated and will be addressed separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11689) minor changes to ORC split generation

2015-08-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11689:

Attachment: HIVE-11689.patch

Patch

 minor changes to ORC split generation
 -

 Key: HIVE-11689
 URL: https://issues.apache.org/jira/browse/HIVE-11689
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11689.patch


 There are two changes that would help future work on split PPD into HBase 
 metastore. 
 1) Move non-HDFS split strategy determination logic into main thread from 
 threadpool.
 2) Instead of iterating thru the futures and waiting, use CompletionService 
 to get futures in order of completion. That might be useful by itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10924) add support for MERGE statement

2015-08-28 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720975#comment-14720975
 ] 

Lefty Leverenz commented on HIVE-10924:
---

Nice doc, [~ekoifman].  But the MERGE statement has a THEN THEN typo.

 add support for MERGE statement
 ---

 Key: HIVE-10924
 URL: https://issues.apache.org/jira/browse/HIVE-10924
 Project: Hive
  Issue Type: New Feature
  Components: Query Planning, Query Processor, Transactions
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman

 add support for 
 MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11510) Metatool updateLocation warning on views

2015-08-28 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720974#comment-14720974
 ] 

Wei Zheng commented on HIVE-11510:
--

Test failure not related.

 Metatool updateLocation warning on views
 

 Key: HIVE-11510
 URL: https://issues.apache.org/jira/browse/HIVE-11510
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.14.0
Reporter: Eric Czech
Assignee: Wei Zheng
 Attachments: HIVE-11510.1.patch, HIVE-11510.2.patch


 If views are present in a hive database, issuing a 'hive metatool 
 -updateLocation' command will result in an error like this:
 ...
 Warning: Found records with bad LOCATION in SDS table.. 
 bad location URI: null
 bad location URI: null
 bad location URI: null
 
 Based on the source code for Metatool, it looks like there would then be a 
 bad location URI: null message for every view and it also appears this is 
 happening simply because the 'sds' table in the hive schema has a column 
 called location that is NULL only for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11678) Add AggregateProjectMergeRule

2015-08-28 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11678:

Attachment: HIVE-11678.2.patch

With updated golden files.

 Add AggregateProjectMergeRule
 -

 Key: HIVE-11678
 URL: https://issues.apache.org/jira/browse/HIVE-11678
 Project: Hive
  Issue Type: New Feature
  Components: CBO, Logical Optimizer
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-11678.2.patch, HIVE-11678.patch


 This will help to get rid of extra projects on top of Aggregation, thus 
 compacting query plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11595) refactor ORC footer reading to make it usable from outside

2015-08-28 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720978#comment-14720978
 ] 

Lefty Leverenz commented on HIVE-11595:
---

Any doc needed for this?

 refactor ORC footer reading to make it usable from outside
 --

 Key: HIVE-11595
 URL: https://issues.apache.org/jira/browse/HIVE-11595
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 2.0.0

 Attachments: HIVE-10595.patch, HIVE-11595.01.patch, 
 HIVE-11595.02.patch, HIVE-11595.03.patch, HIVE-11595.04.patch


 If ORC footer is read from cache, we want to parse it without having the 
 reader, opening a file, etc. I thought it would be as simple as protobuf 
 parseFrom bytes, but apparently there's bunch of stuff going on there. It 
 needs to be accessible via something like parseFrom(ByteBuffer), or similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11689) minor flow changes to ORC split generation

2015-08-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11689:

Summary: minor flow changes to ORC split generation  (was: minor changes to 
ORC split generation)

 minor flow changes to ORC split generation
 --

 Key: HIVE-11689
 URL: https://issues.apache.org/jira/browse/HIVE-11689
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11689.patch


 There are two changes that would help future work on split PPD into HBase 
 metastore. 
 1) Move non-HDFS split strategy determination logic into main thread from 
 threadpool.
 2) Instead of iterating thru the futures and waiting, use CompletionService 
 to get futures in order of completion. That might be useful by itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11689) minor flow changes to ORC split generation

2015-08-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720898#comment-14720898
 ] 

Sergey Shelukhin commented on HIVE-11689:
-

[~prasanth_j] can you review https://reviews.apache.org/r/37917/

 minor flow changes to ORC split generation
 --

 Key: HIVE-11689
 URL: https://issues.apache.org/jira/browse/HIVE-11689
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11689.patch


 There are two changes that would help future work on split PPD into HBase 
 metastore. 
 1) Move non-HDFS split strategy determination logic into main thread from 
 threadpool.
 2) Instead of iterating thru the futures and waiting, use CompletionService 
 to get futures in order of completion. That might be useful by itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11320) ACID enable predicate pushdown for insert-only delta file

2015-08-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720886#comment-14720886
 ] 

Sergey Shelukhin commented on HIVE-11320:
-

I wonder if it would also apply to stripe elimination during split generation...

 ACID enable predicate pushdown for insert-only delta file
 -

 Key: HIVE-11320
 URL: https://issues.apache.org/jira/browse/HIVE-11320
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 1.3.0

 Attachments: HIVE-11320.patch


 Given ACID table T against which some Insert/Update/Delete has been executed 
 but not Major Compaction.
 This table will have some number of delta files.  (and possibly base files).
 Given a query: select * from T where c1 = 5;
 OrcRawRecordMerger() c'tor currently disables predicate pushdown in ORC to 
 the delta file via  eventOptions.searchArgument(null, null);
 When a delta file is known to only have Insert events we can safely push the 
 predicate.  
 ORC maintains stats in a footer which have counts of insert/update/delete 
 events in the file - this can be used to determine that a given delta file 
 only has Insert events.
 See OrcRecordUpdate.parseAcidStats()
 This will enable PPD for Streaming Ingest (HIVE-5687) use cases which by 
 definition only generate Insert events. 
 PPD for deltas with arbitrary types of events can be achieved but it is more 
 complicated and will be addressed separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11670) Strip out password information from TezSessionState configuration

2015-08-28 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720918#comment-14720918
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11670:
--

The failures are unrelated to the change.

Thanks
Hari

 Strip out password information from TezSessionState configuration
 -

 Key: HIVE-11670
 URL: https://issues.apache.org/jira/browse/HIVE-11670
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11670.1.patch


 Remove password information from configuration copy that is sent to Yarn/Tez. 
 We don't need it there. The config entries can potentially be visible to 
 other users.
 HIVE-10508 had the fix which removed this in certain places, however, when I 
 initiated a session via Hive Cli, I could still see the password information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11510) Metatool updateLocation warning on views

2015-08-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720936#comment-14720936
 ] 

Hive QA commented on HIVE-11510:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12753071/HIVE-11510.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9380 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5107/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5107/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5107/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12753071 - PreCommit-HIVE-TRUNK-Build

 Metatool updateLocation warning on views
 

 Key: HIVE-11510
 URL: https://issues.apache.org/jira/browse/HIVE-11510
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.14.0
Reporter: Eric Czech
Assignee: Wei Zheng
 Attachments: HIVE-11510.1.patch, HIVE-11510.2.patch


 If views are present in a hive database, issuing a 'hive metatool 
 -updateLocation' command will result in an error like this:
 ...
 Warning: Found records with bad LOCATION in SDS table.. 
 bad location URI: null
 bad location URI: null
 bad location URI: null
 
 Based on the source code for Metatool, it looks like there would then be a 
 bad location URI: null message for every view and it also appears this is 
 happening simply because the 'sds' table in the hive schema has a column 
 called location that is NULL only for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11684) Implement limit pushdown through outer join in CBO

2015-08-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720969#comment-14720969
 ] 

Hive QA commented on HIVE-11684:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12753056/HIVE-11684.patch

{color:red}ERROR:{color} -1 due to 125 failed/errored test(s), 9381 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguitycheck
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_pad_convert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas_colname
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_resolution
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_interval_arithmetic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_vc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_literal_double
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_literal_ints
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_literal_string
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_macro
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch_threshold
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_num_op_type_conv
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pcr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_vc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quote2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_reduce_deduplicate_extended
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regex_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_str_to_map
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_temp_table_windowing_expressions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_type_cast_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_type_widening
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_context_aware
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_example_add
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_in_file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_isnull_isnotnull
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_reflect2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_top_level
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_date_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_cast
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_round
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_elt
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_if_expr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_interval_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_interval_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_string_concat
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_decimal_date
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_div0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress

[jira] [Commented] (HIVE-11669) OrcFileDump service should support directories

2015-08-28 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720981#comment-14720981
 ] 

Lefty Leverenz commented on HIVE-11669:
---

Doc note:  This should be documented, with version information, in the ORC File 
Dump Utility section of the ORC doc.

* [ORC Files -- ORC File Dump Utility | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-ORCFileDumpUtility]

By the way, I don't see anything about the file dump utility in the ORC project 
documentation (https://orc.apache.org/docs/).

 OrcFileDump service should support directories
 --

 Key: HIVE-11669
 URL: https://issues.apache.org/jira/browse/HIVE-11669
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.3.0, 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
  Labels: TODOC1.3
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11669.1.patch


 orcfiledump service does not support directories. If directory is specified 
 then the program should iterate through all the files in the directory and 
 perform file dump.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-11689) minor flow changes to ORC split generation

2015-08-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720898#comment-14720898
 ] 

Sergey Shelukhin edited comment on HIVE-11689 at 8/29/15 1:50 AM:
--

[~prasanth_j] can you review https://reviews.apache.org/r/37917/

the code in determineSplitStrategy is entirely old, except for list of files 
variable being renamed


was (Author: sershe):
[~prasanth_j] can you review https://reviews.apache.org/r/37917/

 minor flow changes to ORC split generation
 --

 Key: HIVE-11689
 URL: https://issues.apache.org/jira/browse/HIVE-11689
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11689.patch


 There are two changes that would help future work on split PPD into HBase 
 metastore. 
 1) Move non-HDFS split strategy determination logic into main thread from 
 threadpool.
 2) Instead of iterating thru the futures and waiting, use CompletionService 
 to get futures in order of completion. That might be useful by itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10924) add support for MERGE statement

2015-08-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-10924:
--
Component/s: Transactions

 add support for MERGE statement
 ---

 Key: HIVE-10924
 URL: https://issues.apache.org/jira/browse/HIVE-10924
 Project: Hive
  Issue Type: New Feature
  Components: Query Planning, Query Processor, Transactions
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman

 add support for 
 MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10615) LLAP: Invalid containerId prefix

2015-08-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720281#comment-14720281
 ] 

Siddharth Seth commented on HIVE-10615:
---

That sounds about right. ContainerId string was changed in Hadoop 2.6 iirc. 
Having multiple versions of the jar in the classpath will cause such issues. 
Running against clusters with a different version of the client libs could also 
cause such problems.

 LLAP: Invalid containerId prefix
 

 Key: HIVE-10615
 URL: https://issues.apache.org/jira/browse/HIVE-10615
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran

 I encountered this error when I ran a simple query in llap mode today. 
 {code}org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
 java.lang.IllegalArgumentException: Invalid ContainerId prefix: 
   at 
 org.apache.hadoop.yarn.api.records.ContainerId.fromString(ContainerId.java:211)
   at 
 org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:178)
   at 
 org.apache.tez.dag.app.TezTaskCommunicatorImpl$TezTaskUmbilicalProtocolImpl.heartbeat(TezTaskCommunicatorImpl.java:311)
   at 
 org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator$LlapTaskUmbilicalProtocolImpl.heartbeat(LlapTaskCommunicator.java:398)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:514)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
   at org.apache.hadoop.ipc.Client.call(Client.java:1468)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244)
   at com.sun.proxy.$Proxy14.heartbeat(Unknown Source)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.heartbeat(LlapTaskReporter.java:256)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:184)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:126)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 15/05/05 15:24:22 [Task-Executor-0] INFO task.TezTaskRunner : Interrupted 
 while waiting for task to complete. Interrupting task
 15/05/05 15:24:22 [TezTaskRunner_attempt_1430816501738_0034_1_00_00_0] 
 INFO task.TezTaskRunner : Encounted an error while executing task: 
 attempt_1430816501738_0034_1_00_00_0
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
   at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
   at 
 java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.initialize(LogicalIOProcessorRuntimeTask.java:218)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:177)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
   at 

[jira] [Updated] (HIVE-11672) Hive Streaming API handles bucketing incorrectly

2015-08-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11672:
--
Component/s: Transactions
 Hive
 HCatalog

 Hive Streaming API handles bucketing incorrectly
 

 Key: HIVE-11672
 URL: https://issues.apache.org/jira/browse/HIVE-11672
 Project: Hive
  Issue Type: Bug
  Components: HCatalog, Hive, Transactions
Affects Versions: 1.2.1
Reporter: Raj Bains
Assignee: Roshan Naik
Priority: Critical

 Hive Streaming API allows the clients to get a random bucket and then insert 
 data into it. However, this leads to incorrect bucketing as Hive expects data 
 to be distributed into buckets based on a hash function applied to bucket 
 key. The data is inserted randomly by the clients right now. They have no way 
 of
 # Knowing what bucket a row (tuple) belongs to
 # Asking for a specific bucket
 There are optimization such as Sort Merge Join and Bucket Map Join that rely 
 on the data being correctly distributed across buckets and these will cause 
 incorrect read results if the data is not distributed correctly.
 There are two obvious design choices
 # Hive Streaming API should fix this internally by distributing the data 
 correctly
 # Hive Streaming API should expose data distribution scheme to the clients 
 and allow them to distribute the data correctly
 The first option will mean every client thread will write to many buckets, 
 causing many small files in each bucket and too many connections open. this 
 does not seem feasible. The second option pushes more functionality into the 
 client of the Hive Streaming API, but can maintain high throughput and write 
 good sized ORC files. This option seems preferable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10615) LLAP: Invalid containerId prefix

2015-08-28 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved HIVE-10615.
---
Resolution: Not A Problem

 LLAP: Invalid containerId prefix
 

 Key: HIVE-10615
 URL: https://issues.apache.org/jira/browse/HIVE-10615
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran

 I encountered this error when I ran a simple query in llap mode today. 
 {code}org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
 java.lang.IllegalArgumentException: Invalid ContainerId prefix: 
   at 
 org.apache.hadoop.yarn.api.records.ContainerId.fromString(ContainerId.java:211)
   at 
 org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:178)
   at 
 org.apache.tez.dag.app.TezTaskCommunicatorImpl$TezTaskUmbilicalProtocolImpl.heartbeat(TezTaskCommunicatorImpl.java:311)
   at 
 org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator$LlapTaskUmbilicalProtocolImpl.heartbeat(LlapTaskCommunicator.java:398)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:514)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
   at org.apache.hadoop.ipc.Client.call(Client.java:1468)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244)
   at com.sun.proxy.$Proxy14.heartbeat(Unknown Source)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.heartbeat(LlapTaskReporter.java:256)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:184)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:126)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 15/05/05 15:24:22 [Task-Executor-0] INFO task.TezTaskRunner : Interrupted 
 while waiting for task to complete. Interrupting task
 15/05/05 15:24:22 [TezTaskRunner_attempt_1430816501738_0034_1_00_00_0] 
 INFO task.TezTaskRunner : Encounted an error while executing task: 
 attempt_1430816501738_0034_1_00_00_0
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
   at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
   at 
 java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.initialize(LogicalIOProcessorRuntimeTask.java:218)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:177)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at 

[jira] [Commented] (HIVE-10978) Document fs.trash.interval wrt Hive and HDFS Encryption

2015-08-28 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720235#comment-14720235
 ] 

Eugene Koifman commented on HIVE-10978:
---

I think mentioning this in Drop Table/Partition section is a good idea, but the 
most critical part is that  fs.trash.interval has to be set in core-site.xml 
(i.e. the hadoop config file) not any hive-site.xml or at CLI.  

 Document fs.trash.interval wrt Hive and HDFS Encryption
 ---

 Key: HIVE-10978
 URL: https://issues.apache.org/jira/browse/HIVE-10978
 Project: Hive
  Issue Type: Bug
  Components: Documentation, Security
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Priority: Critical
  Labels: TODOC1.2

 This should be documented in 1.2.1 Release Notes
 When HDFS is encrypted (TDE is enabled), DROP TABLE and DROP PARTITION have 
 unexpected behavior when Hadoop Trash feature is enabled.
 The later is enabled by setting fs.trash.interval  0 in core-site.xml.
 When Trash is enabled, the data file for the table, should be moved to 
 Trash bin. If the table is inside an Encryption Zone, this move operation 
 is not allowed.
 There are 2 ways to deal with this:
 1. use PURGE, as in DROP TABLE blah PURGE. This skips the Trash bin even if 
 enabled.
 2. set fs.trash.interval = 0. It is critical that this config change is done 
 in core-site.xml. Setting it in hive-site.xml may lead to very strange 
 behavior where the table metadata is deleted but the data file remains.  This 
 will lead to data corruption if a table with the same name is later created.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11672) Hive Streaming API handles bucketing incorrectly

2015-08-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11672:
--
Fix Version/s: (was: 1.2.2)

 Hive Streaming API handles bucketing incorrectly
 

 Key: HIVE-11672
 URL: https://issues.apache.org/jira/browse/HIVE-11672
 Project: Hive
  Issue Type: Bug
  Components: HCatalog, Hive, Transactions
Affects Versions: 1.2.1
Reporter: Raj Bains
Assignee: Roshan Naik
Priority: Critical

 Hive Streaming API allows the clients to get a random bucket and then insert 
 data into it. However, this leads to incorrect bucketing as Hive expects data 
 to be distributed into buckets based on a hash function applied to bucket 
 key. The data is inserted randomly by the clients right now. They have no way 
 of
 # Knowing what bucket a row (tuple) belongs to
 # Asking for a specific bucket
 There are optimization such as Sort Merge Join and Bucket Map Join that rely 
 on the data being correctly distributed across buckets and these will cause 
 incorrect read results if the data is not distributed correctly.
 There are two obvious design choices
 # Hive Streaming API should fix this internally by distributing the data 
 correctly
 # Hive Streaming API should expose data distribution scheme to the clients 
 and allow them to distribute the data correctly
 The first option will mean every client thread will write to many buckets, 
 causing many small files in each bucket and too many connections open. this 
 does not seem feasible. The second option pushes more functionality into the 
 client of the Hive Streaming API, but can maintain high throughput and write 
 good sized ORC files. This option seems preferable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11670) Strip out password information from TezSessionState configuration

2015-08-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720230#comment-14720230
 ] 

Hive QA commented on HIVE-11670:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12752863/HIVE-11670.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9378 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5101/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5101/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5101/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12752863 - PreCommit-HIVE-TRUNK-Build

 Strip out password information from TezSessionState configuration
 -

 Key: HIVE-11670
 URL: https://issues.apache.org/jira/browse/HIVE-11670
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11670.1.patch


 Remove password information from configuration copy that is sent to Yarn/Tez. 
 We don't need it there. The config entries can potentially be visible to 
 other users.
 HIVE-10508 had the fix which removed this in certain places, however, when I 
 initiated a session via Hive Cli, I could still see the password information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10615) LLAP: Invalid containerId prefix

2015-08-28 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720248#comment-14720248
 ] 

Daniel Dai commented on HIVE-10615:
---

It turns out there is a hadoop-2.5.jar in my classpath. So this is no longer an 
issue for me.

 LLAP: Invalid containerId prefix
 

 Key: HIVE-10615
 URL: https://issues.apache.org/jira/browse/HIVE-10615
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran

 I encountered this error when I ran a simple query in llap mode today. 
 {code}org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
 java.lang.IllegalArgumentException: Invalid ContainerId prefix: 
   at 
 org.apache.hadoop.yarn.api.records.ContainerId.fromString(ContainerId.java:211)
   at 
 org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:178)
   at 
 org.apache.tez.dag.app.TezTaskCommunicatorImpl$TezTaskUmbilicalProtocolImpl.heartbeat(TezTaskCommunicatorImpl.java:311)
   at 
 org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator$LlapTaskUmbilicalProtocolImpl.heartbeat(LlapTaskCommunicator.java:398)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:514)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
   at org.apache.hadoop.ipc.Client.call(Client.java:1468)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244)
   at com.sun.proxy.$Proxy14.heartbeat(Unknown Source)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.heartbeat(LlapTaskReporter.java:256)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:184)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:126)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 15/05/05 15:24:22 [Task-Executor-0] INFO task.TezTaskRunner : Interrupted 
 while waiting for task to complete. Interrupting task
 15/05/05 15:24:22 [TezTaskRunner_attempt_1430816501738_0034_1_00_00_0] 
 INFO task.TezTaskRunner : Encounted an error while executing task: 
 attempt_1430816501738_0034_1_00_00_0
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
   at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
   at 
 java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.initialize(LogicalIOProcessorRuntimeTask.java:218)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:177)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  

[jira] [Commented] (HIVE-10615) LLAP: Invalid containerId prefix

2015-08-28 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720275#comment-14720275
 ] 

Daniel Dai commented on HIVE-10615:
---

Note my cluster is on Hadoop 2.7.1.

 LLAP: Invalid containerId prefix
 

 Key: HIVE-10615
 URL: https://issues.apache.org/jira/browse/HIVE-10615
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran

 I encountered this error when I ran a simple query in llap mode today. 
 {code}org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
 java.lang.IllegalArgumentException: Invalid ContainerId prefix: 
   at 
 org.apache.hadoop.yarn.api.records.ContainerId.fromString(ContainerId.java:211)
   at 
 org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:178)
   at 
 org.apache.tez.dag.app.TezTaskCommunicatorImpl$TezTaskUmbilicalProtocolImpl.heartbeat(TezTaskCommunicatorImpl.java:311)
   at 
 org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator$LlapTaskUmbilicalProtocolImpl.heartbeat(LlapTaskCommunicator.java:398)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:514)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
   at org.apache.hadoop.ipc.Client.call(Client.java:1468)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244)
   at com.sun.proxy.$Proxy14.heartbeat(Unknown Source)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.heartbeat(LlapTaskReporter.java:256)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:184)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:126)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 15/05/05 15:24:22 [Task-Executor-0] INFO task.TezTaskRunner : Interrupted 
 while waiting for task to complete. Interrupting task
 15/05/05 15:24:22 [TezTaskRunner_attempt_1430816501738_0034_1_00_00_0] 
 INFO task.TezTaskRunner : Encounted an error while executing task: 
 attempt_1430816501738_0034_1_00_00_0
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
   at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
   at 
 java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.initialize(LogicalIOProcessorRuntimeTask.java:218)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:177)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 

[jira] [Resolved] (HIVE-11682) LLAP: Merge master into branch

2015-08-28 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-11682.
--
   Resolution: Fixed
Fix Version/s: llap

 LLAP: Merge master into branch
 --

 Key: HIVE-11682
 URL: https://issues.apache.org/jira/browse/HIVE-11682
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: llap






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11510) Metatool updateLocation warning on views

2015-08-28 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720352#comment-14720352
 ] 

Wei Zheng commented on HIVE-11510:
--

[~sushanth] Can you review?

 Metatool updateLocation warning on views
 

 Key: HIVE-11510
 URL: https://issues.apache.org/jira/browse/HIVE-11510
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.14.0
Reporter: Eric Czech
Assignee: Wei Zheng
 Attachments: HIVE-11510.1.patch


 If views are present in a hive database, issuing a 'hive metatool 
 -updateLocation' command will result in an error like this:
 ...
 Warning: Found records with bad LOCATION in SDS table.. 
 bad location URI: null
 bad location URI: null
 bad location URI: null
 
 Based on the source code for Metatool, it looks like there would then be a 
 bad location URI: null message for every view and it also appears this is 
 happening simply because the 'sds' table in the hive schema has a column 
 called location that is NULL only for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-4198) Move HCatalog code into Hive

2015-08-28 Thread Mohit Sabharwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohit Sabharwal updated HIVE-4198:
--
Fix Version/s: 0.11.0

 Move HCatalog code into Hive
 

 Key: HIVE-4198
 URL: https://issues.apache.org/jira/browse/HIVE-4198
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.11.0


 The HCatalog code needs to be moved into Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-4198) Move HCatalog code into Hive

2015-08-28 Thread Mohit Sabharwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohit Sabharwal updated HIVE-4198:
--
Fix Version/s: (was: 0.11.0)

 Move HCatalog code into Hive
 

 Key: HIVE-4198
 URL: https://issues.apache.org/jira/browse/HIVE-4198
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Alan Gates
Assignee: Alan Gates

 The HCatalog code needs to be moved into Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11668) make sure directsql calls pre-query init when needed

2015-08-28 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720412#comment-14720412
 ] 

Sushanth Sowmyan commented on HIVE-11668:
-

This patch will still have an issue, as observed by [~wzheng] earlier today:

{noformat}
Caused by: org.datanucleus.api.jdo.exceptions.TransactionNotActiveException: 
Transaction is not active. You either need to define a transaction around this, 
or run your PersistenceManagerFactory with 'NontransactionalRead' and 
'NontransactionalWrite' set to 'true'
FailedObject:org.datanucleus.exceptions.TransactionNotActiveException: 
Transaction is not active. You either need to define a transaction around this, 
or run your PersistenceManagerFactory with 'NontransactionalRead' and 
'NontransactionalWrite' set to 'true'
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:396)
at org.datanucleus.api.jdo.JDOTransaction.rollback(JDOTransaction.java:186)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.ensureDbInit(MetaStoreDirectSql.java:196)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.init(MetaStoreDirectSql.java:137)
at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:335)
at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:286)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:57)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:601)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:579)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:632)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:468)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.init(RetryingHMSHandler.java:66)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5815)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:203)
at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.init(SessionHiveMetaStoreClient.java:74)
... 19 more
{noformat}

The issue here is this. Earlier, the runDbCheck() function was instantiating a 
transaction if it wasn't already open. So, as long as we were determining the 
db type by using runDbCheck, we were opening the txn as a side-effect (ugh). 
Now, by determining the product name by the jdbc provider, we're not calling 
runDbCheck, and thus, the txn is never opened.

You need the following in your chain, hopefully in a more sane place than in 
runDbCheck():

{noformat}
 Transaction tx = pm.currentTransaction();
+if (!tx.isActive()) {
+  tx.begin();
+}
{noformat}



 make sure directsql calls pre-query init when needed
 

 Key: HIVE-11668
 URL: https://issues.apache.org/jira/browse/HIVE-11668
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11668.patch


 See comments in HIVE-11123



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11636) NPE in stats conversion with HBase metastore

2015-08-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720364#comment-14720364
 ] 

Sergey Shelukhin commented on HIVE-11636:
-

Thanks!

 NPE in stats conversion with HBase metastore
 

 Key: HIVE-11636
 URL: https://issues.apache.org/jira/browse/HIVE-11636
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-metastore-branch

 Attachments: HIVE-11636.01.patch, HIVE-11636.patch


 NO PRECOMMIT TESTS
 {noformat}
 2015-08-24T20:37:22,285 ERROR [main]: ql.Driver 
 (SessionState.java:printError(963)) - FAILED: NullPointerException null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.stats.StatsUtils.convertColStats(StatsUtils.java:740)
 at 
 org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:731)
 at 
 org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:186)
 at 
 org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:139)
 at 
 org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:127)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:110)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
 at 
 org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
 at 
 org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
 at 
 org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:249)
 at 
 org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:123)
 at 
 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10219)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:212)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:240)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:310)
 {noformat}
 Fails after importing some databases from regular metastore and running TPCDS 
 Q27.
 Simple select-where-limit query (not FetchTask) appears to run fine.
 With standalone Hbase metastore (might be the same issue):
 {noformat}
 2015-08-25 14:41:04,793 ERROR [pool-6-thread-53] server.TThreadPoolServer: 
 Thrift error occurred during processing of message.
 org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is 
 unset! Struct:AggrStats(colStats:null, partsFound:0)
 at 
 org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:393)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1655)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 

[jira] [Commented] (HIVE-11217) CTAS statements throws error, when the table is stored as ORC File format and select clause has NULL/VOID type column

2015-08-28 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720387#comment-14720387
 ] 

Prasanth Jayachandran commented on HIVE-11217:
--

OrcProto is generated file and should never be modified. Also I don't think 
VOID is valid supported type in hive 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types
IMO, it should be handled at the hive level. Hive should not pass any type 
other than the ones supported. 

 CTAS statements throws error, when the table is stored as ORC File format and 
 select clause has NULL/VOID type column 
 --

 Key: HIVE-11217
 URL: https://issues.apache.org/jira/browse/HIVE-11217
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.13.1
Reporter: Gaurav Kohli
Assignee: Yongzhi Chen
Priority: Minor
 Attachments: HIVE-11217.1.patch, HIVE-11217.2.patch


 If you try to use create-table-as-select (CTAS) statement and create a ORC 
 File format based table, then you can't use NULL as a column value in select 
 clause 
 CREATE TABLE empty (x int);
 CREATE TABLE orc_table_with_null 
 STORED AS ORC 
 AS 
 SELECT 
 x,
 null
 FROM empty;
 Error: 
 {quote}
 347084 [main] ERROR hive.ql.exec.DDLTask  - 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IllegalArgumentException: Unknown primitive type VOID
   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:643)
   at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4242)
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:285)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:952)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:431)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:367)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:464)
   at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:474)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
   at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:323)
   at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:284)
   at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39)
   at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:227)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.IllegalArgumentException: Unknown primitive type VOID
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcStruct.createObjectInspector(OrcStruct.java:530)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.init(OrcStruct.java:195)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcStruct.createObjectInspector(OrcStruct.java:534)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:106)
   at 
 org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:519)
   at 
 

[jira] [Commented] (HIVE-11123) Fix how to confirm the RDBMS product name at Metastore.

2015-08-28 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720402#comment-14720402
 ] 

Sushanth Sowmyan commented on HIVE-11123:
-

Also, this patch broke hive working against mysql and potentially other dbs - I 
will follow up with comments on HIVE-11668. Testing with derby alone in unit 
test mode is problematic. Sorry I didn't catch this before it was committed.

 Fix how to confirm the RDBMS product name at Metastore.
 ---

 Key: HIVE-11123
 URL: https://issues.apache.org/jira/browse/HIVE-11123
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.2.0
 Environment: PostgreSQL
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11123.1.patch, HIVE-11123.2.patch, 
 HIVE-11123.3.patch, HIVE-11123.4.patch, HIVE-11123.4a.patch


 I use PostgreSQL to Hive Metastore. And I saw the following message at 
 PostgreSQL log.
 {code}
  2015-06-26 10:58:15.488 JST ERROR:  syntax error at or near @@ at 
 character 5
  2015-06-26 10:58:15.488 JST STATEMENT:  SET @@session.sql_mode=ANSI_QUOTES
  2015-06-26 10:58:15.489 JST ERROR:  relation v$instance does not exist 
 at character 21
  2015-06-26 10:58:15.489 JST STATEMENT:  SELECT version FROM v$instance
  2015-06-26 10:58:15.490 JST ERROR:  column version does not exist at 
 character 10
  2015-06-26 10:58:15.490 JST STATEMENT:  SELECT @@version
 {code}
 When Hive CLI and Beeline embedded mode are carried out, this message is 
 output to PostgreSQL log.
 These queries are called from MetaStoreDirectSql#determineDbType. And if we 
 use MetaStoreDirectSql#getProductName, we need not to call these queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11510) Metatool updateLocation warning on views

2015-08-28 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11510:
-
Attachment: HIVE-11510.1.patch

 Metatool updateLocation warning on views
 

 Key: HIVE-11510
 URL: https://issues.apache.org/jira/browse/HIVE-11510
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.14.0
Reporter: Eric Czech
Assignee: Wei Zheng
 Attachments: HIVE-11510.1.patch


 If views are present in a hive database, issuing a 'hive metatool 
 -updateLocation' command will result in an error like this:
 ...
 Warning: Found records with bad LOCATION in SDS table.. 
 bad location URI: null
 bad location URI: null
 bad location URI: null
 
 Based on the source code for Metatool, it looks like there would then be a 
 bad location URI: null message for every view and it also appears this is 
 happening simply because the 'sds' table in the hive schema has a column 
 called location that is NULL only for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-08-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720380#comment-14720380
 ] 

Hive QA commented on HIVE-11587:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12752866/HIVE-11587.01.patch

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 9380 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_correlationoptimizer1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_inner_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_join_nulls
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_leftsemi_mapjoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_nullsafe_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join0
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5102/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5102/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5102/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 20 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12752866 - PreCommit-HIVE-TRUNK-Build

 Fix memory estimates for mapjoin hashtable
 --

 Key: HIVE-11587
 URL: https://issues.apache.org/jira/browse/HIVE-11587
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Wei Zheng
 Attachments: HIVE-11587.01.patch


 Due to the legacy in in-memory mapjoin and conservative planning, the memory 
 estimation code for mapjoin hashtable is currently not very good. It 
 allocates the probe erring on the side of more memory, not taking data into 
 account because unlike the probe, it's free to resize, so it's better for 
 perf to allocate big probe and hope for the best with regard to future data 
 size. It is not true for hybrid case.
 There's code to cap the initial allocation based on memory available 
 (memUsage argument), but due to some code rot, the memory estimates from 
 planning are not even passed to hashtable anymore (there used to be two 
 config settings, hashjoin size fraction by itself, or hashjoin size fraction 
 for group by case), so it never caps the memory anymore below 1 Gb. 
 Initial capacity is estimated from input key count, and in hybrid join cache 
 can exceed Java memory due to number of segments.
 There needs to be a review and fix of all this code.
 Suggested improvements:
 1) Make sure initialCapacity argument from Hybrid case is correct given the 
 number of segments. See how it's calculated from keys for regular case; it 
 needs to be adjusted accordingly for hybrid case if not done already.
 1.5) Note that, knowing the number of rows, the maximum capacity one will 
 ever need for probe size (in longs) is row count (assuming key per row, i.e. 
 maximum possible number of keys) divided by load factor, plus some very small 
 number to round up. That is for flat case. For hybrid case it may be more 
 complex due to skew, but that is still a good upper bound for the total probe 
 capacity of all segments.
 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
 correctly based on estimates 

[jira] [Commented] (HIVE-11217) CTAS statements throws error, when the table is stored as ORC File format and select clause has NULL/VOID type column

2015-08-28 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720391#comment-14720391
 ] 

Prasanth Jayachandran commented on HIVE-11217:
--

For insert queries hive maps null values to destination column types. For CTAS, 
may be it should default to some type (string?).

 CTAS statements throws error, when the table is stored as ORC File format and 
 select clause has NULL/VOID type column 
 --

 Key: HIVE-11217
 URL: https://issues.apache.org/jira/browse/HIVE-11217
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.13.1
Reporter: Gaurav Kohli
Assignee: Yongzhi Chen
Priority: Minor
 Attachments: HIVE-11217.1.patch, HIVE-11217.2.patch


 If you try to use create-table-as-select (CTAS) statement and create a ORC 
 File format based table, then you can't use NULL as a column value in select 
 clause 
 CREATE TABLE empty (x int);
 CREATE TABLE orc_table_with_null 
 STORED AS ORC 
 AS 
 SELECT 
 x,
 null
 FROM empty;
 Error: 
 {quote}
 347084 [main] ERROR hive.ql.exec.DDLTask  - 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IllegalArgumentException: Unknown primitive type VOID
   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:643)
   at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4242)
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:285)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:952)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:431)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:367)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:464)
   at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:474)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
   at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:323)
   at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:284)
   at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39)
   at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:227)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.IllegalArgumentException: Unknown primitive type VOID
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcStruct.createObjectInspector(OrcStruct.java:530)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.init(OrcStruct.java:195)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcStruct.createObjectInspector(OrcStruct.java:534)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:106)
   at 
 org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:519)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:345)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:292)
   at 
 

[jira] [Commented] (HIVE-11504) Predicate pushing down doesn't work for float type for Parquet

2015-08-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720752#comment-14720752
 ] 

Sergio Peña commented on HIVE-11504:


Thanks [~Ferd] for the patch.

I committed the patch from HIVE-11618 that uses INTEGER/LONG. Could you take a 
look at the current commit and make the changes for FLOAT/DOUBLE? I think that 
patch look easier.

 Predicate pushing down doesn't work for float type for Parquet
 --

 Key: HIVE-11504
 URL: https://issues.apache.org/jira/browse/HIVE-11504
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-11504.1.patch, HIVE-11504.1.patch, 
 HIVE-11504.2.patch, HIVE-11504.2.patch, HIVE-11504.3.patch, HIVE-11504.patch


 Predicate builder should use PrimitiveTypeName type in parquet side to 
 construct predicate leaf instead of the type provided by PredicateLeaf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2015-08-28 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720762#comment-14720762
 ] 

Jesus Camacho Rodriguez commented on HIVE-11652:


[~hsubramaniyan]/[~ashutoshc], could you take a look? Thanks

 Avoid expensive call to removeAll in DefaultGraphWalker
 ---

 Key: HIVE-11652
 URL: https://issues.apache.org/jira/browse/HIVE-11652
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.3.0, 2.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, 
 HIVE-11652.patch


 When the plan is too large, the removeAll call in DefaultGraphWalker (line 
 140) will take very long as it will have to go through the list looking for 
 each of the nodes. We try to get rid of this call by rewriting the logic in 
 the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-08-28 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11587:
-
Attachment: HIVE-11587.02.patch

Attach patch 2 for testing.

 Fix memory estimates for mapjoin hashtable
 --

 Key: HIVE-11587
 URL: https://issues.apache.org/jira/browse/HIVE-11587
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Wei Zheng
 Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch


 Due to the legacy in in-memory mapjoin and conservative planning, the memory 
 estimation code for mapjoin hashtable is currently not very good. It 
 allocates the probe erring on the side of more memory, not taking data into 
 account because unlike the probe, it's free to resize, so it's better for 
 perf to allocate big probe and hope for the best with regard to future data 
 size. It is not true for hybrid case.
 There's code to cap the initial allocation based on memory available 
 (memUsage argument), but due to some code rot, the memory estimates from 
 planning are not even passed to hashtable anymore (there used to be two 
 config settings, hashjoin size fraction by itself, or hashjoin size fraction 
 for group by case), so it never caps the memory anymore below 1 Gb. 
 Initial capacity is estimated from input key count, and in hybrid join cache 
 can exceed Java memory due to number of segments.
 There needs to be a review and fix of all this code.
 Suggested improvements:
 1) Make sure initialCapacity argument from Hybrid case is correct given the 
 number of segments. See how it's calculated from keys for regular case; it 
 needs to be adjusted accordingly for hybrid case if not done already.
 1.5) Note that, knowing the number of rows, the maximum capacity one will 
 ever need for probe size (in longs) is row count (assuming key per row, i.e. 
 maximum possible number of keys) divided by load factor, plus some very small 
 number to round up. That is for flat case. For hybrid case it may be more 
 complex due to skew, but that is still a good upper bound for the total probe 
 capacity of all segments.
 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
 correctly based on estimates that take into account both probe and data size, 
 esp. in hybrid case.
 3) Make sure that memory estimation for hybrid case also doesn't come up with 
 numbers that are too small, like 1-byte hashtable. I am not very familiar 
 with that code but it has happened in the past.
 Other issues we have seen:
 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
 should not allocate large array in advance. Even if some estimate passes 
 500Mb or 40Mb or whatever, it doesn't make sense to allocate that.
 5) For hybrid, don't pre-allocate WBs - only allocate on write.
 6) Change everywhere rounding up to power of two is used to rounding down, at 
 least for hybrid case (?)
 I wanted to put all of these items in single JIRA so we could keep track of 
 fixing all of them.
 I think there are JIRAs for some of these already, feel free to link them to 
 this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11657) HIVE-2573 introduces some issues during metastore init (and CLI init)

2015-08-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11657:

Summary: HIVE-2573 introduces some issues during metastore init (and CLI 
init)  (was: HIVE-2573 introduces some issues)

 HIVE-2573 introduces some issues during metastore init (and CLI init)
 -

 Key: HIVE-11657
 URL: https://issues.apache.org/jira/browse/HIVE-11657
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: HIVE-11657.patch


 HIVE-2573 introduced static reload functions call.
 It has a few problems:
 1) When metastore client is initialized using an externally supplied config 
 (i.e. Hive.get(HiveConf)), it still gets called during static init using the 
 main service config. In my case, even though I have uris in the supplied 
 config to connect to remote MS (which eventually happens), the static call 
 creates objectstore, which is undesirable.
 2) It breaks compat - old metastores do not support this call so new clients 
 will fail, and there's no workaround like not using a new feature because the 
 static call is always made



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-08-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720789#comment-14720789
 ] 

Sergey Shelukhin commented on HIVE-11587:
-

Left some feedback. Mostly, such internals as wbs should not be exposed 
externally

 Fix memory estimates for mapjoin hashtable
 --

 Key: HIVE-11587
 URL: https://issues.apache.org/jira/browse/HIVE-11587
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Wei Zheng
 Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch


 Due to the legacy in in-memory mapjoin and conservative planning, the memory 
 estimation code for mapjoin hashtable is currently not very good. It 
 allocates the probe erring on the side of more memory, not taking data into 
 account because unlike the probe, it's free to resize, so it's better for 
 perf to allocate big probe and hope for the best with regard to future data 
 size. It is not true for hybrid case.
 There's code to cap the initial allocation based on memory available 
 (memUsage argument), but due to some code rot, the memory estimates from 
 planning are not even passed to hashtable anymore (there used to be two 
 config settings, hashjoin size fraction by itself, or hashjoin size fraction 
 for group by case), so it never caps the memory anymore below 1 Gb. 
 Initial capacity is estimated from input key count, and in hybrid join cache 
 can exceed Java memory due to number of segments.
 There needs to be a review and fix of all this code.
 Suggested improvements:
 1) Make sure initialCapacity argument from Hybrid case is correct given the 
 number of segments. See how it's calculated from keys for regular case; it 
 needs to be adjusted accordingly for hybrid case if not done already.
 1.5) Note that, knowing the number of rows, the maximum capacity one will 
 ever need for probe size (in longs) is row count (assuming key per row, i.e. 
 maximum possible number of keys) divided by load factor, plus some very small 
 number to round up. That is for flat case. For hybrid case it may be more 
 complex due to skew, but that is still a good upper bound for the total probe 
 capacity of all segments.
 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
 correctly based on estimates that take into account both probe and data size, 
 esp. in hybrid case.
 3) Make sure that memory estimation for hybrid case also doesn't come up with 
 numbers that are too small, like 1-byte hashtable. I am not very familiar 
 with that code but it has happened in the past.
 Other issues we have seen:
 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
 should not allocate large array in advance. Even if some estimate passes 
 500Mb or 40Mb or whatever, it doesn't make sense to allocate that.
 5) For hybrid, don't pre-allocate WBs - only allocate on write.
 6) Change everywhere rounding up to power of two is used to rounding down, at 
 least for hybrid case (?)
 I wanted to put all of these items in single JIRA so we could keep track of 
 fixing all of them.
 I think there are JIRAs for some of these already, feel free to link them to 
 this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10021) Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled

2015-08-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720785#comment-14720785
 ] 

Hive QA commented on HIVE-10021:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12753007/HIVE-10021.2.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9380 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5105/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5105/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5105/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12753007 - PreCommit-HIVE-TRUNK-Build

 Alter index rebuild statements submitted through HiveServer2 fail when 
 Sentry is enabled
 --

 Key: HIVE-10021
 URL: https://issues.apache.org/jira/browse/HIVE-10021
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Indexing
Affects Versions: 0.13.1, 2.0.0
 Environment: CDH 5.3.2
Reporter: Richard Williams
Assignee: Aihua Xu
 Attachments: HIVE-10021.2.patch, HIVE-10021.patch


 When HiveServer2 is configured to authorize submitted queries and statements 
 through Sentry, any attempt to issue an alter index rebuild statement fails 
 with a SemanticException caused by a NullPointerException. This occurs 
 regardless of whether the index is a compact or bitmap index. 
 The root cause of the problem appears to be the fact that the static 
 createRootTask function in org.apache.hadoop.hive.ql.optimizer.IndexUtils 
 creates a new 
 org.apache.hadoop.hive.ql.Driver object to compile the index builder query, 
 and this new Driver object, unlike the one used by HiveServer2 to compile the 
 submitted statement, is used without having its userName field initialized 
 with the submitting user's username. Adding null checks to the Sentry code is 
 insufficient to solve this problem, because Sentry needs the userName to 
 determine whether or not the submitting user should be able to execute the 
 index rebuild statement.
 Example stack trace from the HiveServer2 logs:
 {noformat}
 FAILED: NullPointerException null
 java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
   at 
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
   at org.apache.hadoop.security.Groups.getGroups(Groups.java:161)
   at 
 org.apache.sentry.provider.common.HadoopGroupMappingService.getGroups(HadoopGroupMappingService.java:46)
   at 
 org.apache.sentry.binding.hive.authz.HiveAuthzBinding.getGroups(HiveAuthzBinding.java:370)
   at 
 org.apache.sentry.binding.hive.HiveAuthzBindingHook.postAnalyze(HiveAuthzBindingHook.java:314)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:440)
   at 
 org.apache.hadoop.hive.ql.optimizer.IndexUtils.createRootTask(IndexUtils.java:258)
   at 
 org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler.getIndexBuilderMapRedTask(CompactIndexHandler.java:149)
   at 
 org.apache.hadoop.hive.ql.index.TableBasedIndexHandler.generateIndexBuildTaskList(TableBasedIndexHandler.java:67)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.getIndexBuilderMapRed(DDLSemanticAnalyzer.java:1171)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterIndexRebuild(DDLSemanticAnalyzer.java:1117)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:410)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:204)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1026)
   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1019)
   at 
 org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:100)
   at 
 org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:173)
   at 
 

[jira] [Commented] (HIVE-11595) refactor ORC footer reading to make it usable from outside

2015-08-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720790#comment-14720790
 ] 

Sergey Shelukhin commented on HIVE-11595:
-

Will commit after HiveQA

 refactor ORC footer reading to make it usable from outside
 --

 Key: HIVE-11595
 URL: https://issues.apache.org/jira/browse/HIVE-11595
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-10595.patch, HIVE-11595.01.patch, 
 HIVE-11595.02.patch, HIVE-11595.03.patch, HIVE-11595.04.patch


 If ORC footer is read from cache, we want to parse it without having the 
 reader, opening a file, etc. I thought it would be as simple as protobuf 
 parseFrom bytes, but apparently there's bunch of stuff going on there. It 
 needs to be accessible via something like parseFrom(ByteBuffer), or similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10113) LLAP: reducers running in LLAP starve out map retries

2015-08-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-10113.
-
   Resolution: Done
Fix Version/s: llap

That was fixed elsewhere long ago

 LLAP: reducers running in LLAP starve out map retries
 -

 Key: HIVE-10113
 URL: https://issues.apache.org/jira/browse/HIVE-10113
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Siddharth Seth
 Fix For: llap


 When query 17 is run, some mappers from Map 1 currently fail (due to unwrap 
 issue, and also due to  HIVE-10112).
 This query has 1000+ reducers; if they are ran in llap, they all queue up, 
 and the query locks up.
 If only mappers run in LLAP, query completes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2015-08-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720810#comment-14720810
 ] 

Ashutosh Chauhan commented on HIVE-11652:
-

+1 LGTM, while your committing it will be good to add comments describing role 
of following data structures in walker:
 
* opStack
* opQueue
* toWalk

 Avoid expensive call to removeAll in DefaultGraphWalker
 ---

 Key: HIVE-11652
 URL: https://issues.apache.org/jira/browse/HIVE-11652
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.3.0, 2.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, 
 HIVE-11652.patch


 When the plan is too large, the removeAll call in DefaultGraphWalker (line 
 140) will take very long as it will have to go through the list looking for 
 each of the nodes. We try to get rid of this call by rewriting the logic in 
 the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11688) OrcRawRecordMerger does not close primary reader if not fully consumed

2015-08-28 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated HIVE-11688:
---
Attachment: HIVE-11688.patch

 OrcRawRecordMerger does not close primary reader if not fully consumed
 --

 Key: HIVE-11688
 URL: https://issues.apache.org/jira/browse/HIVE-11688
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Sudheesh Katkam
Assignee: Sudheesh Katkam
  Labels: orc
 Attachments: HIVE-11688.patch


 If {{OrcRawRecordMerger#close}} is called before fully reading an orc file, 
 the {{primary}} reader is not closed. 
 The {{primary}} reader is assigned using {{readers.pollFirstEntry()}} which 
 deletes the reader from {{readers}}, and currently the 
 {{OrcRawRecordMerger#close}} method only closes readers in the map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11657) HIVE-2573 introduces some issues

2015-08-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11657:

Attachment: HIVE-11657.patch

This changes the reloadFunctions call to be done once globally, but on the 
object, so that it is done after the proper config is set. 

It also improves retry logic to not retry on some non-recoverable errors, like 
a missing method.

[~gopalv] [~ashutoshc] can you take a look

 HIVE-2573 introduces some issues
 

 Key: HIVE-11657
 URL: https://issues.apache.org/jira/browse/HIVE-11657
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: HIVE-11657.patch


 HIVE-2573 introduced static reload functions call.
 It has a few problems:
 1) When metastore client is initialized using an externally supplied config 
 (i.e. Hive.get(HiveConf)), it still gets called during static init using the 
 main service config. In my case, even though I have uris in the supplied 
 config to connect to remote MS (which eventually happens), the static call 
 creates objectstore, which is undesirable.
 2) It breaks compat - old metastores do not support this call so new clients 
 will fail, and there's no workaround like not using a new feature because the 
 static call is always made



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11688) OrcRawRecordMerger does not close primary reader if not fully consumed

2015-08-28 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720723#comment-14720723
 ] 

Sudheesh Katkam commented on HIVE-11688:


Review board [link|https://reviews.apache.org/r/37909/].

 OrcRawRecordMerger does not close primary reader if not fully consumed
 --

 Key: HIVE-11688
 URL: https://issues.apache.org/jira/browse/HIVE-11688
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Sudheesh Katkam
  Labels: orc
 Attachments: HIVE-11688.patch


 If {{OrcRawRecordMerger#close}} is called before fully reading an orc file, 
 the {{primary}} reader is not closed. 
 The {{primary}} reader is assigned using {{readers.pollFirstEntry()}} which 
 deletes the reader from {{readers}}, and currently the 
 {{OrcRawRecordMerger#close}} method only closes readers in the map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11510) Metatool updateLocation warning on views

2015-08-28 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11510:
-
Attachment: HIVE-11510.2.patch

Thanks [~sushanth]. Updated patch as suggested.

 Metatool updateLocation warning on views
 

 Key: HIVE-11510
 URL: https://issues.apache.org/jira/browse/HIVE-11510
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.14.0
Reporter: Eric Czech
Assignee: Wei Zheng
 Attachments: HIVE-11510.1.patch, HIVE-11510.2.patch


 If views are present in a hive database, issuing a 'hive metatool 
 -updateLocation' command will result in an error like this:
 ...
 Warning: Found records with bad LOCATION in SDS table.. 
 bad location URI: null
 bad location URI: null
 bad location URI: null
 
 Based on the source code for Metatool, it looks like there would then be a 
 bad location URI: null message for every view and it also appears this is 
 happening simply because the 'sds' table in the hive schema has a column 
 called location that is NULL only for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11668) make sure directsql calls pre-query init when needed

2015-08-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11668:

Attachment: HIVE-11668.01.patch

Cleans up the behavior in initial checks - makes sure tx is always open, and 
that it only commits a tx when it has opened it.

 make sure directsql calls pre-query init when needed
 

 Key: HIVE-11668
 URL: https://issues.apache.org/jira/browse/HIVE-11668
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11668.01.patch, HIVE-11668.patch


 See comments in HIVE-11123



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2015-08-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720660#comment-14720660
 ] 

Hive QA commented on HIVE-11652:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12752973/HIVE-11652.02.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9380 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5104/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5104/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5104/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12752973 - PreCommit-HIVE-TRUNK-Build

 Avoid expensive call to removeAll in DefaultGraphWalker
 ---

 Key: HIVE-11652
 URL: https://issues.apache.org/jira/browse/HIVE-11652
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.3.0, 2.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, 
 HIVE-11652.patch


 When the plan is too large, the removeAll call in DefaultGraphWalker (line 
 140) will take very long as it will have to go through the list looking for 
 each of the nodes. We try to get rid of this call by rewriting the logic in 
 the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11678) Add AggregateProjectMergeRule

2015-08-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720503#comment-14720503
 ] 

Hive QA commented on HIVE-11678:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12752925/HIVE-11678.patch

{color:red}ERROR:{color} -1 due to 232 failed/errored test(s), 9380 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binarysortable_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_gby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_subq_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_subq_not_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_not_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_count
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas_colname
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_database
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_precision
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_distinct_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fetch_aggregation
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_gby_star
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby5_map
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby5_map_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_cube1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_distinct_samekey
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_position
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_resolution
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_rollup1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_self_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_update
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_compression
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join18_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join31
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_partition_metadataonly
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_multiskew_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadata_only_queries
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataonly1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multiMapJoin2

[jira] [Commented] (HIVE-11668) make sure directsql calls pre-query init when needed

2015-08-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720534#comment-14720534
 ] 

Sergey Shelukhin commented on HIVE-11668:
-

Hmm...

 make sure directsql calls pre-query init when needed
 

 Key: HIVE-11668
 URL: https://issues.apache.org/jira/browse/HIVE-11668
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11668.patch


 See comments in HIVE-11123



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11668) make sure directsql calls pre-query init when needed

2015-08-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720537#comment-14720537
 ] 

Sergey Shelukhin commented on HIVE-11668:
-

Actually SQL helpers already ensure txn is always there in all other cases. 
It's just the DB init that requires it. I guess it rolls back on failure so 
there should be a txn


 make sure directsql calls pre-query init when needed
 

 Key: HIVE-11668
 URL: https://issues.apache.org/jira/browse/HIVE-11668
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11668.patch


 See comments in HIVE-11123



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11510) Metatool updateLocation warns on views

2015-08-28 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11510:
-
Summary: Metatool updateLocation warns on views  (was: Metatool 
updateLocation fails on views)

 Metatool updateLocation warns on views
 --

 Key: HIVE-11510
 URL: https://issues.apache.org/jira/browse/HIVE-11510
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.14.0
Reporter: Eric Czech
Assignee: Wei Zheng

 If views are present in a hive database, issuing a 'hive metatool 
 -updateLocation' command will result in an error like this:
 ...
 Warning: Found records with bad LOCATION in SDS table.. 
 bad location URI: null
 bad location URI: null
 bad location URI: null
 
 Based on the source code for Metatool, it looks like there would then be a 
 bad location URI: null message for every view and it also appears this is 
 happening simply because the 'sds' table in the hive schema has a column 
 called location that is NULL only for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11510) Metatool updateLocation warning on views

2015-08-28 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11510:
-
Summary: Metatool updateLocation warning on views  (was: Metatool 
updateLocation warns on views)

 Metatool updateLocation warning on views
 

 Key: HIVE-11510
 URL: https://issues.apache.org/jira/browse/HIVE-11510
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.14.0
Reporter: Eric Czech
Assignee: Wei Zheng

 If views are present in a hive database, issuing a 'hive metatool 
 -updateLocation' command will result in an error like this:
 ...
 Warning: Found records with bad LOCATION in SDS table.. 
 bad location URI: null
 bad location URI: null
 bad location URI: null
 
 Based on the source code for Metatool, it looks like there would then be a 
 bad location URI: null message for every view and it also appears this is 
 happening simply because the 'sds' table in the hive schema has a column 
 called location that is NULL only for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11621) Fix TestMiniTezCliDriver test failures when HBase Metastore is used

2015-08-28 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved HIVE-11621.
---
Resolution: Fixed

Patch 2 committed.  Thanks Daniel.

 Fix TestMiniTezCliDriver test failures when HBase Metastore is used
 ---

 Key: HIVE-11621
 URL: https://issues.apache.org/jira/browse/HIVE-11621
 Project: Hive
  Issue Type: Sub-task
  Components: HBase Metastore
Affects Versions: hbase-metastore-branch
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: hbase-metastore-branch

 Attachments: HIVE-11621.1.patch, HIVE-11621.2.patch


 As a first step, Fix hbase-metastore unit tests with TestMiniTezCliDriver, so 
 we can test LLAP and hbase-metastore together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11654) After HIVE-10289, HBase metastore tests failing

2015-08-28 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved HIVE-11654.
---
   Resolution: Fixed
Fix Version/s: hbase-metastore-branch

Patch committed.  Thanks Daniel.

 After HIVE-10289, HBase metastore tests failing
 ---

 Key: HIVE-11654
 URL: https://issues.apache.org/jira/browse/HIVE-11654
 Project: Hive
  Issue Type: Bug
  Components: HBase Metastore
Reporter: Alan Gates
Assignee: Daniel Dai
Priority: Blocker
 Fix For: hbase-metastore-branch

 Attachments: HIVE-11654.1.patch


 After the latest merge from trunk a number of the HBase unit tests are 
 failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11357) ACID enable predicate pushdown for insert-only delta file 2

2015-08-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720326#comment-14720326
 ] 

Alan Gates commented on HIVE-11357:
---

+1

 ACID enable predicate pushdown for insert-only delta file 2
 ---

 Key: HIVE-11357
 URL: https://issues.apache.org/jira/browse/HIVE-11357
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11357.2.patch, HIVE-11357.patch


 HIVE-11320 missed a case.  That fix enabled PPD for insert-only delta files 
 when a base file is present.  It won't work if only delta files are present.
 see {{OrcInputFormat.getReader(InputSplit inputSplit, Options options)}}
 which only calls {{setSearchArgument()}} if there is a base file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11595) refactor ORC footer reading to make it usable from outside

2015-08-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11595:

Attachment: HIVE-11595.04.patch

Some more small fixes from metastore branch. ORC allocates a new buffer so 
patch 03 code works on normal path, but in case of non-0 position it breaks 
(i.e. when footer comes from HBase response w/o copy). This fixes these issues.

 refactor ORC footer reading to make it usable from outside
 --

 Key: HIVE-11595
 URL: https://issues.apache.org/jira/browse/HIVE-11595
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-10595.patch, HIVE-11595.01.patch, 
 HIVE-11595.02.patch, HIVE-11595.03.patch, HIVE-11595.04.patch


 If ORC footer is read from cache, we want to parse it without having the 
 reader, opening a file, etc. I thought it would be as simple as protobuf 
 parseFrom bytes, but apparently there's bunch of stuff going on there. It 
 needs to be accessible via something like parseFrom(ByteBuffer), or similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11595) refactor ORC footer reading to make it usable from outside

2015-08-28 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720332#comment-14720332
 ] 

Prasanth Jayachandran commented on HIVE-11595:
--

+1 on the new change

 refactor ORC footer reading to make it usable from outside
 --

 Key: HIVE-11595
 URL: https://issues.apache.org/jira/browse/HIVE-11595
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-10595.patch, HIVE-11595.01.patch, 
 HIVE-11595.02.patch, HIVE-11595.03.patch, HIVE-11595.04.patch


 If ORC footer is read from cache, we want to parse it without having the 
 reader, opening a file, etc. I thought it would be as simple as protobuf 
 parseFrom bytes, but apparently there's bunch of stuff going on there. It 
 needs to be accessible via something like parseFrom(ByteBuffer), or similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10924) add support for MERGE statement

2015-08-28 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720334#comment-14720334
 ] 

Eugene Koifman commented on HIVE-10924:
---

h3. Feature design notes
Hive supports [multi-insert 
statement|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromqueries].
  The idea is that you can execute a select statement and split the result 
stream into several to write to multiple targets.

This matches very closely to what MERGE statement needs to do.
When modeling MERGE as multi-insert, we'd split the stream into 2 stream, 1 for 
the insert part, 1 for update part but write both results to the same table.  
Section 14.12 of  ISO/IEC 9075-2:2011(E) (SQL 2011) defines MERGE statement.   

Suppose we have tables 
{code:SQL}
CREATE TABLE target(a int, b int, c int);
CREATE TABLE source(x int, y int, z int);
{code}

Then an example that covers most possibilities might look like this
{code:SQL}
MERGE INTO target 
USING source ON b = y
WHEN MATCHED AND c + 1 + z  0
THEN THEN UPDATE SET a = 1, c = z
WHEN NOT MATCHED AND z IS NULL
THEN INSERT(a,b) VALUES(z, 7)
{code}
\\
\\
And is interpreted as follows
\\
\\
|| Line || Statement Part || Notes ||
| 1 | {code:SQL} MERGE INTO target {code} | Specifies the table being modified |
| 2 | {code:SQL} USING source {code} | specifies the source of the data which 
may be a table or expression such as SELECT … FROM … |
| 3 | {code:SQL} ON b = y {code} | is interpreted like exactly like an ON 
clause of a JOIN between source and target. |
| 4 | {code:SQL} WHEN MATCHED {code} | Applies if expr in ON is true |
| 5 | {code:SQL} AND c + 1 + z  0 {code} | Additional predicate to test 
before performing the action. |
| 6 | {code:SQL} THEN UPDATE SET a = 1, c = z {code} | May be UPDATE or 
DELETE.  The later deletes the row from target.  SET clause is exactly like in 
regular UPDATE stmt. |
| 7 | {code:SQL} WHEN NOT MATCHED {code} | Applies if expr in ON is false |
| 8 | {code:SQL} AND z IS NULL {code} | Additional predicate to test before 
performing the action. |
| 9 | {code:SQL} THEN INSERT(a,b) VALUES(z, 7){code} | Insert to perform on 
target. |
\\
\\

Then the equivalent _multi-insert statement_ looks like this:
\\
\\
|| Statement Part ||  Refernce to previous table ||
| {code:SQL} FROM (SELECT * FROM target RIGHT OUTER JOIN SOURCE ON b = y) 
{code} | Lines 1 - 3 | 
| {code:SQL} INSERT INTO target(a,c) SELECT 1, z {code} | This represents the 
update part of merge; Line 6 |
| {code:SQL} WHERE c + 1 + z  0 {code} | Line 5 |
| {code:SQL} AND b = y {code} | Only include ‘matched’ rows; Line 4 |
| {code:SQL} INSERT INTO target(a,b) SELECT z, 7 {code} | This represents the 
‘insert’ part of merge; Line 9 |
| {code:SQL} WHERE z IS NULL {code} | Line 8 |
| {code:SQL} AND a = null AND b = null AND c = null; {code} | Only include ‘not 
matched’ rows; Line 7 |

h4. Some caveats
# Current multi-insert doesn’t support writing to the same table more than 
once.  Can we fix this?
# This requires the same change as for multi-statement txn, that is to support 
multiple delta files per transaction. (HIVE-11030)
# Requires annotating each insert (of multi-insert) with whether it’s doing 
update/delete or insert


Since Hive can already compile an operator pipeline for such a _multi-insert 
statement_ (almost) support for MERGE doesn't require additional operators.
Also, Update/Delete are actually compiled int Insert statements.

 add support for MERGE statement
 ---

 Key: HIVE-10924
 URL: https://issues.apache.org/jira/browse/HIVE-10924
 Project: Hive
  Issue Type: New Feature
  Components: Query Planning, Query Processor, Transactions
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman

 add support for 
 MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11684) Implement limit pushdown through outer join in CBO

2015-08-28 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11684:
---
Component/s: CBO

 Implement limit pushdown through outer join in CBO
 --

 Key: HIVE-11684
 URL: https://issues.apache.org/jira/browse/HIVE-11684
 Project: Hive
  Issue Type: New Feature
  Components: CBO
Affects Versions: 2.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11684) Implement limit pushdown through outer join in CBO

2015-08-28 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11684:
---
Attachment: HIVE-11684.patch

 Implement limit pushdown through outer join in CBO
 --

 Key: HIVE-11684
 URL: https://issues.apache.org/jira/browse/HIVE-11684
 Project: Hive
  Issue Type: New Feature
  Components: CBO
Affects Versions: 2.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11684.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11684) Implement limit pushdown through outer join in CBO

2015-08-28 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11684:
---
Attachment: HIVE-11684.patch

 Implement limit pushdown through outer join in CBO
 --

 Key: HIVE-11684
 URL: https://issues.apache.org/jira/browse/HIVE-11684
 Project: Hive
  Issue Type: New Feature
  Components: CBO
Affects Versions: 2.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11684.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11668) make sure directsql calls pre-query init when needed

2015-08-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720427#comment-14720427
 ] 

Sergey Shelukhin commented on HIVE-11668:
-

Hmm.

 make sure directsql calls pre-query init when needed
 

 Key: HIVE-11668
 URL: https://issues.apache.org/jira/browse/HIVE-11668
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11668.patch


 See comments in HIVE-11123



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11510) Metatool updateLocation warning on views

2015-08-28 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720434#comment-14720434
 ] 

Sushanth Sowmyan commented on HIVE-11510:
-

With the current patch, the metastore will do a LOG.debug for every single null 
record, which can be a lot, and will also slow down that process a lot.

Would it be possible to simply update the UpdateMStorageDescriptorTblURIRetVal 
class with a int numNullRecords initialized to zero and incremented each time 
you get a null? Also, in that case, I would imagine that we shouldn't add that 
location to badRecords, since that would bloat up the size of badRecords 
unnecessarily. After we do that, we can then do a singular log in 
HiveMetaTool.printTblURIUpdateSummary along with the other statistics, 
mentioning how many null records we found, and that that is okay if the user 
has that many indexes/views.

 Metatool updateLocation warning on views
 

 Key: HIVE-11510
 URL: https://issues.apache.org/jira/browse/HIVE-11510
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.14.0
Reporter: Eric Czech
Assignee: Wei Zheng
 Attachments: HIVE-11510.1.patch


 If views are present in a hive database, issuing a 'hive metatool 
 -updateLocation' command will result in an error like this:
 ...
 Warning: Found records with bad LOCATION in SDS table.. 
 bad location URI: null
 bad location URI: null
 bad location URI: null
 
 Based on the source code for Metatool, it looks like there would then be a 
 bad location URI: null message for every view and it also appears this is 
 happening simply because the 'sds' table in the hive schema has a column 
 called location that is NULL only for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11684) Implement limit pushdown through outer join in CBO

2015-08-28 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11684:
---
Attachment: (was: HIVE-11684.patch)

 Implement limit pushdown through outer join in CBO
 --

 Key: HIVE-11684
 URL: https://issues.apache.org/jira/browse/HIVE-11684
 Project: Hive
  Issue Type: New Feature
  Components: CBO
Affects Versions: 2.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11553) use basic file metadata cache in ETLSplitStrategy-related paths

2015-08-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720813#comment-14720813
 ] 

Sergey Shelukhin commented on HIVE-11553:
-

Actually, I misread some code. Something is easier to do than I thought. I may 
yet update this patch. It does work now, still :)

 use basic file metadata cache in ETLSplitStrategy-related paths
 ---

 Key: HIVE-11553
 URL: https://issues.apache.org/jira/browse/HIVE-11553
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-metastore-branch

 Attachments: HIVE-11553.01.patch, HIVE-11553.02.patch, 
 HIVE-11553.patch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11660) LLAP: TestTaskExecutorService is flaky

2015-08-28 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-11660:
--
Attachment: HIVE-11660.1.txt

Attaching patch to fix the tests. Have run 100 iterations of both on a Linux 
box - where the failures are normally seen - with all of them passing.

There's some real bugs which were causing TestLlapTaskSchedulerService to fail. 
The last allocateTaskRequest for a dag could've ended up being ignored.
Also in TaskScheduler, the waitQueue can be improved - filed a separate jira 
for this.

[~sershe] - please review.

 LLAP: TestTaskExecutorService is flaky
 --

 Key: HIVE-11660
 URL: https://issues.apache.org/jira/browse/HIVE-11660
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Siddharth Seth
 Attachments: HIVE-11660.1.txt


 {noformat}
 java.lang.Exception: test timed out after 1 milliseconds
   at sun.misc.Unsafe.park(Native Method)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService$TaskExecutorServiceForTest$InternalCompletionListenerForTest.awaitCompletion(TestTaskExecutorService.java:244)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService$TaskExecutorServiceForTest$InternalCompletionListenerForTest.access$000(TestTaskExecutorService.java:208)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption(TestTaskExecutorService.java:168)
 {noformat}
 Cannot repro locally. See HIVE-11642



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11660) LLAP: TestTaskExecutorService is flaky

2015-08-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720833#comment-14720833
 ] 

Siddharth Seth commented on HIVE-11660:
---

On the TaskExecutor, this mainly moves some code around - removing the 
scheduled task from the waitQueue is now in the same sync block instead of 
being in separate synchronized blocks.

 LLAP: TestTaskExecutorService is flaky
 --

 Key: HIVE-11660
 URL: https://issues.apache.org/jira/browse/HIVE-11660
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Siddharth Seth
 Attachments: HIVE-11660.1.txt


 {noformat}
 java.lang.Exception: test timed out after 1 milliseconds
   at sun.misc.Unsafe.park(Native Method)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService$TaskExecutorServiceForTest$InternalCompletionListenerForTest.awaitCompletion(TestTaskExecutorService.java:244)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService$TaskExecutorServiceForTest$InternalCompletionListenerForTest.access$000(TestTaskExecutorService.java:208)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption(TestTaskExecutorService.java:168)
 {noformat}
 Cannot repro locally. See HIVE-11642



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11660) LLAP: TestTaskExecutorService is flaky

2015-08-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720859#comment-14720859
 ] 

Sergey Shelukhin commented on HIVE-11660:
-

can you post rb

 LLAP: TestTaskExecutorService is flaky
 --

 Key: HIVE-11660
 URL: https://issues.apache.org/jira/browse/HIVE-11660
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Siddharth Seth
 Attachments: HIVE-11660.1.txt


 {noformat}
 java.lang.Exception: test timed out after 1 milliseconds
   at sun.misc.Unsafe.park(Native Method)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService$TaskExecutorServiceForTest$InternalCompletionListenerForTest.awaitCompletion(TestTaskExecutorService.java:244)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService$TaskExecutorServiceForTest$InternalCompletionListenerForTest.access$000(TestTaskExecutorService.java:208)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption(TestTaskExecutorService.java:168)
 {noformat}
 Cannot repro locally. See HIVE-11642



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker

2015-08-28 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720870#comment-14720870
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11652:
--

+1, this should nullify HIVE-11341.

Thanks
Hari

 Avoid expensive call to removeAll in DefaultGraphWalker
 ---

 Key: HIVE-11652
 URL: https://issues.apache.org/jira/browse/HIVE-11652
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.3.0, 2.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, 
 HIVE-11652.patch


 When the plan is too large, the removeAll call in DefaultGraphWalker (line 
 140) will take very long as it will have to go through the list looking for 
 each of the nodes. We try to get rid of this call by rewriting the logic in 
 the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11595) refactor ORC footer reading to make it usable from outside

2015-08-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720883#comment-14720883
 ] 

Hive QA commented on HIVE-11595:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12753038/HIVE-11595.04.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9380 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5106/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5106/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5106/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12753038 - PreCommit-HIVE-TRUNK-Build

 refactor ORC footer reading to make it usable from outside
 --

 Key: HIVE-11595
 URL: https://issues.apache.org/jira/browse/HIVE-11595
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-10595.patch, HIVE-11595.01.patch, 
 HIVE-11595.02.patch, HIVE-11595.03.patch, HIVE-11595.04.patch


 If ORC footer is read from cache, we want to parse it without having the 
 reader, opening a file, etc. I thought it would be as simple as protobuf 
 parseFrom bytes, but apparently there's bunch of stuff going on there. It 
 needs to be accessible via something like parseFrom(ByteBuffer), or similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)