[jira] [Commented] (HIVE-11375) Broken processing of queries containing NOT (x IS NOT NULL and x 0)

2015-08-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700277#comment-14700277
 ] 

Ashutosh Chauhan commented on HIVE-11375:
-

Yeah.. that seems like a bug to me.

 Broken processing of queries containing NOT (x IS NOT NULL and x  0)
 --

 Key: HIVE-11375
 URL: https://issues.apache.org/jira/browse/HIVE-11375
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 2.0.0
Reporter: Mariusz Sakowski
Assignee: Aihua Xu
 Fix For: 2.0.0

 Attachments: HIVE-11375.2.patch, HIVE-11375.patch


 When running query like this:
 {code}explain select * from test where (val is not null and val  0);{code}
 hive will simplify expression in parenthesis and omit is not null check:
 {code}
   Filter Operator
 predicate: (val  0) (type: boolean)
 {code}
 which is fine.
 but if we negate condition using NOT operator:
 {code}explain select * from test where not (val is not null and val  
 0);{code}
 hive will also simplify thing, but now it will break stuff:
 {code}
   Filter Operator
 predicate: (not (val  0)) (type: boolean)
 {code}
 because valid predicate should be *val == 0 or val is null*, while above row 
 is equivalent to *val == 0* only, filtering away rows where val is null
 simple example:
 {code}
 CREATE TABLE example (
 val bigint
 );
 INSERT INTO example VALUES (1), (NULL), (0);
 -- returns 2 rows - NULL and 0
 select * from example where (val is null or val == 0);
 -- returns 1 row - 0
 select * from example where not (val is not null and val  0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11526) LLAP: implement LLAP UI as a separate service

2015-08-17 Thread Kai Sasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700723#comment-14700723
 ] 

Kai Sasaki commented on HIVE-11526:
---

[~sershe] Thank you so much for detail information. I can figure out where we 
should start at the beginning.
I'll break down this JIRA into a few tasks as you suggested.

* Create LLAP Monitor Daemon
* Running script and Slider integration
* Selecting metrics
* Sophisticate UI


 LLAP: implement LLAP UI as a separate service
 -

 Key: HIVE-11526
 URL: https://issues.apache.org/jira/browse/HIVE-11526
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Kai Sasaki

 The specifics are vague at this point. 
 Hadoop metrics can be output, as well as metrics we collect and output in 
 jmx, as well as those we collect per fragment and log right now. 
 This service can do LLAP-specific views, and per-query aggregation.
 [~gopalv] may have some information on how to reuse existing solutions for 
 part of the work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10810) Document Beeline/CLI changes

2015-08-17 Thread Shannon Ladymon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700767#comment-14700767
 ] 

Shannon Ladymon commented on HIVE-10810:


Thanks!  I'll watch this jira.

 Document Beeline/CLI changes
 

 Key: HIVE-10810
 URL: https://issues.apache.org/jira/browse/HIVE-10810
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Reporter: Xuefu Zhang
Assignee: Ferdinand Xu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11383) Upgrade Hive to Calcite 1.4

2015-08-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699300#comment-14699300
 ] 

Hive QA commented on HIVE-11383:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12750765/HIVE-11383.9.patch

{color:red}ERROR:{color} -1 due to 22 failed/errored test(s), 9370 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_subq_exists
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_subq_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_cond_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_views
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dynpart_hashjoin_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join13
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithPerformanceMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithVerboseMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithPerformanceMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithVerboseMode
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4984/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4984/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4984/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 22 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12750765 - PreCommit-HIVE-TRUNK-Build

 Upgrade Hive to Calcite 1.4
 ---

 Key: HIVE-11383
 URL: https://issues.apache.org/jira/browse/HIVE-11383
 Project: Hive
  Issue Type: Bug
Reporter: Julian Hyde
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11383.1.patch, HIVE-11383.2.patch, 
 HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, 
 HIVE-11383.4.patch, HIVE-11383.5.patch, HIVE-11383.6.patch, 
 HIVE-11383.7.patch, HIVE-11383.8.patch, HIVE-11383.8.patch, HIVE-11383.9.patch


 CLEAR LIBRARY CACHE
 Upgrade Hive to Calcite 1.4.0-incubating.
 There is currently a snapshot release, which is close to what will be in 1.4. 
 I have checked that Hive compiles against the new snapshot, fixing one issue. 
 The patch is attached.
 Next step is to validate that Hive runs against the new Calcite, and post any 
 issues to the Calcite list or log Calcite Jira cases. [~jcamachorodriguez], 
 can you please do that.
 [~pxiong], I gather you are dependent on CALCITE-814, which will be fixed in 
 the new Calcite version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11424) Improve HivePreFilteringRule performance

2015-08-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699392#comment-14699392
 ] 

Hive QA commented on HIVE-11424:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12750767/HIVE-11424.01.patch

{color:red}ERROR:{color} -1 due to 35 failed/errored test(s), 9370 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_deep_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_auto_join1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_multiskew_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pcr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_gby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_gby2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_gby_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_udf_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_vc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_views
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_pcr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_gby_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join3
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithPerformanceMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithVerboseMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithPerformanceMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithVerboseMode
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4985/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4985/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4985/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 35 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12750767 - PreCommit-HIVE-TRUNK-Build

 Improve HivePreFilteringRule performance
 

 Key: HIVE-11424
 URL: https://issues.apache.org/jira/browse/HIVE-11424
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, 
 HIVE-11424.patch


 We create a rule that will transform OR clauses into IN clauses (when 
 possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11591) change thrift generation to use undated annotations

2015-08-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11591:

Attachment: HIVE-11591.WIP.patch

The patch. We can commit it now, and then presumably get the benefit when 
someone upgrades to 0.9.3 and re-gens the files, or we can wait and 
regen/commit after 0.9.3 upgrade.

 change thrift generation to use undated annotations
 ---

 Key: HIVE-11591
 URL: https://issues.apache.org/jira/browse/HIVE-11591
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11591.WIP.patch


 Thrift has added class annotations to generated classes; these contain 
 generation date. Because of this, all the Java thrift files change on every 
 re-gen, even if you only make a small change that should not affect bazillion 
 files. 
 This depends on upgrading to Thrift 0.9.3, which doesn't exist yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11341) Avoid expensive resizing of ASTNode tree

2015-08-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700514#comment-14700514
 ] 

Hive QA commented on HIVE-11341:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12750916/HIVE-11341.8.patch

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 9370 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join24
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_limit
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby1_map
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby1_map_skew
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby3_map
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby3_map_multi_distinct
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_empty
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_reorder2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union23
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_1
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4990/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4990/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4990/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12750916 - PreCommit-HIVE-TRUNK-Build

 Avoid expensive resizing of ASTNode tree 
 -

 Key: HIVE-11341
 URL: https://issues.apache.org/jira/browse/HIVE-11341
 Project: Hive
  Issue Type: Bug
  Components: Hive, Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11341.1.patch, HIVE-11341.2.patch, 
 HIVE-11341.3.patch, HIVE-11341.4.patch, HIVE-11341.5.patch, 
 HIVE-11341.6.patch, HIVE-11341.7.patch, HIVE-11341.8.patch


 {code}
 Stack TraceSample CountPercentage(%) 
 parse.BaseSemanticAnalyzer.analyze(ASTNode, Context)   1,605   90 
parse.CalcitePlanner.analyzeInternal(ASTNode)   1,605   90 
   parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
 SemanticAnalyzer$PlannerContext) 1,605   90 
  parse.CalcitePlanner.genOPTree(ASTNode, 
 SemanticAnalyzer$PlannerContext)  1,604   90 
 parse.SemanticAnalyzer.genOPTree(ASTNode, 
 SemanticAnalyzer$PlannerContext) 1,604   90 
parse.SemanticAnalyzer.genPlan(QB)  1,604   90 
   parse.SemanticAnalyzer.genPlan(QB, boolean)  1,604   90 
  parse.SemanticAnalyzer.genBodyPlan(QB, Operator, Map)
  1,604   90 
 parse.SemanticAnalyzer.genFilterPlan(ASTNode, QB, 
 Operator, Map, boolean)  1,603   90 
parse.SemanticAnalyzer.genFilterPlan(QB, ASTNode, 
 Operator, boolean)1,603   90 
   parse.SemanticAnalyzer.genExprNodeDesc(ASTNode, 
 RowResolver, boolean)1,603   90 
  
 parse.SemanticAnalyzer.genExprNodeDesc(ASTNode, RowResolver, TypeCheckCtx)
 1,603   90 
 
 parse.SemanticAnalyzer.genAllExprNodeDesc(ASTNode, RowResolver, TypeCheckCtx) 
  1,603   90 

 parse.TypeCheckProcFactory.genExprNode(ASTNode, TypeCheckCtx)   1,603   90 
   
 parse.TypeCheckProcFactory.genExprNode(ASTNode, TypeCheckCtx, 
 TypeCheckProcFactory)  1,603   90 
  
 lib.DefaultGraphWalker.startWalking(Collection, HashMap)  1,579   89 
 
 lib.DefaultGraphWalker.walk(Node)  1,571   89 

 java.util.ArrayList.removeAll(Collection)   1,433   81 
   
 

[jira] [Commented] (HIVE-10304) Add deprecation message to HiveCLI

2015-08-17 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700516#comment-14700516
 ] 

Ferdinand Xu commented on HIVE-10304:
-

I am working on the smoke test and waiting for the fix of HIVE-11579. I think 
we can begin the process of merging beeline-cli into the master in 1~2 weeks if 
smoke test goes well.

 Add deprecation message to HiveCLI
 --

 Key: HIVE-10304
 URL: https://issues.apache.org/jira/browse/HIVE-10304
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Affects Versions: 1.1.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC1.2
 Attachments: HIVE-10304.2.patch, HIVE-10304.3.patch, HIVE-10304.patch


 As Beeline is now the recommended command line tool to Hive, we should add a 
 message to HiveCLI to indicate that it is deprecated and redirect them to 
 Beeline.  
 This is not suggesting to remove HiveCLI for now, but just a helpful 
 direction for user to know the direction to focus attention in Beeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]

2015-08-17 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700593#comment-14700593
 ] 

Xuefu Zhang commented on HIVE-11579:


[~Ferd], thanks for looking into this. I'm not quite sure of the root cause. 
Error output is part of client, and why is error output is closed when 
statement is closed? I'm wondering if we should fix that part instead.

 Invoke the set command will close standard error output[beeline-cli]
 

 Key: HIVE-11579
 URL: https://issues.apache.org/jira/browse/HIVE-11579
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-11579-beeline-cli.patch


 We can easily reproduce the debug by the following steps:
 {code}
 hive set system:xx=yy;
 hive lss;
 hive 
 {code}
 The error output disappeared since the err outputstream is closed when 
 closing the Hive statement.
 This bug occurred also in the upstream when using the embeded mode as the new 
 CLI uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11552) implement basic methods for getting/putting file metadata

2015-08-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700458#comment-14700458
 ] 

Sergey Shelukhin commented on HIVE-11552:
-

I was going to re-gen thrift files after master merge to make patch less huge, 
but due to issue discussed in HIVE-11591 the re-gen is still going to touch all 
Java files. nogen patch is ready for review, attaching a new generated patch 
after merge

 implement basic methods for getting/putting file metadata
 -

 Key: HIVE-11552
 URL: https://issues.apache.org/jira/browse/HIVE-11552
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-metastore-branch

 Attachments: HIVE-11552.01.patch, HIVE-11552.nogen.patch, 
 HIVE-11552.nogen.patch, HIVE-11552.patch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11591) change thrift generation to use undated annotations

2015-08-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11591:

Description: 
Thrift has added class annotations to generated classes; these contain 
generation date. Because of this, all the Java thrift files change on every 
re-gen, even if you only make a small change that should not affect bazillion 
files. We should use undated annotations to avoid this problem.

This depends on upgrading to Thrift 0.9.3, which doesn't exist yet.

  was:
Thrift has added class annotations to generated classes; these contain 
generation date. Because of this, all the Java thrift files change on every 
re-gen, even if you only make a small change that should not affect bazillion 
files. 

This depends on upgrading to Thrift 0.9.3, which doesn't exist yet.


 change thrift generation to use undated annotations
 ---

 Key: HIVE-11591
 URL: https://issues.apache.org/jira/browse/HIVE-11591
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11591.WIP.patch


 Thrift has added class annotations to generated classes; these contain 
 generation date. Because of this, all the Java thrift files change on every 
 re-gen, even if you only make a small change that should not affect bazillion 
 files. We should use undated annotations to avoid this problem.
 This depends on upgrading to Thrift 0.9.3, which doesn't exist yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]

2015-08-17 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-11579:
---
Comment: was deleted

(was: When I am debuging, I see the error output is closed after invoking the 
closeClientOperation method in class HiveStatement. The main purpose for this 
method is to send a request to the server to close handler. You can see that 
the error output is redirected to the standard error output. When the 
SQLOperation closes a driver, it will close the driver with resStream closed. 
When in the non-embedded mode, it works well since the two output stream are 
not the same one while it failed for embedded mode.)

 Invoke the set command will close standard error output[beeline-cli]
 

 Key: HIVE-11579
 URL: https://issues.apache.org/jira/browse/HIVE-11579
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-11579-beeline-cli.patch


 We can easily reproduce the debug by the following steps:
 {code}
 hive set system:xx=yy;
 hive lss;
 hive 
 {code}
 The error output disappeared since the err outputstream is closed when 
 closing the Hive statement.
 This bug occurred also in the upstream when using the embeded mode as the new 
 CLI uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]

2015-08-17 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700689#comment-14700689
 ] 

Xuefu Zhang commented on HIVE-11579:


[~Ferd], is there any resource leaking if the statement isn't closed? Here the 
problem seems to be that error output shouldn't be closed when a statement is 
closed. If we fix that by not closing the statement, I'm not sure of any 
inadvertent consequences. Any further thoughts?

 Invoke the set command will close standard error output[beeline-cli]
 

 Key: HIVE-11579
 URL: https://issues.apache.org/jira/browse/HIVE-11579
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-11579-beeline-cli.patch


 We can easily reproduce the debug by the following steps:
 {code}
 hive set system:xx=yy;
 hive lss;
 hive 
 {code}
 The error output disappeared since the err outputstream is closed when 
 closing the Hive statement.
 This bug occurred also in the upstream when using the embeded mode as the new 
 CLI uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11589) Invalid value such as '-1' should be checked for 'hive.txn.timeout'.

2015-08-17 Thread Takahiko Saito (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takahiko Saito updated HIVE-11589:
--
Description: 
When an user accidentally set an invalid value such as '-1' for 
'hive.txn.timeout', the query simply fails throwing 'NoSuchLockException'
{noformat}
2015-08-16 23:25:43,149 ERROR [HiveServer2-Background-Pool: Thread-206]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(159)) - 
NoSuchLockException(message:No such lock: 40)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.unlock(TxnHandler.java:501)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.unlock(HiveMetaStore.java:5571)
at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy7.unlock(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.unlock(HiveMetaStoreClient.java:1876)
at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy8.unlock(Unknown Source)
at 
org.apache.hadoop.hive.ql.lockmgr.DbLockManager.unlock(DbLockManager.java:134)
at 
org.apache.hadoop.hive.ql.lockmgr.DbLockManager.releaseLocks(DbLockManager.java:153)
at 
org.apache.hadoop.hive.ql.Driver.releaseLocksAndCommitOrRollback(Driver.java:1038)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1208)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
at 
org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

The better way to handle such an invalid value is to check the value beforehand 
instead of throwing NoSuchLockException.

  was:
When an user accidentally set an invalid value such as '-1' for 
'hive.txn.timeout', the query simply fails throwing 'NoSuchLockException'
{noformat}
2015-08-16 23:25:43,149 ERROR [HiveServer2-Background-Pool: Thread-206]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(159)) - 
NoSuchLockException(message:No such lock: 40)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.unlock(TxnHandler.java:501)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.unlock(HiveMetaStore.java:5571)
at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy7.unlock(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.unlock(HiveMetaStoreClient.java:1876)
at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy8.unlock(Unknown Source)
at 
org.apache.hadoop.hive.ql.lockmgr.DbLockManager.unlock(DbLockManager.java:134)
at 

[jira] [Commented] (HIVE-11555) Beeline sends password in clear text if we miss -ssl=true flag in the connect string

2015-08-17 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700412#comment-14700412
 ] 

Thejas M Nair commented on HIVE-11555:
--

HIVE-11581 addresses this to an extent, but it is applicable only when 
zookeeper HA mode is enabled.


 Beeline sends password in clear text if we miss -ssl=true flag in the connect 
 string
 

 Key: HIVE-11555
 URL: https://issues.apache.org/jira/browse/HIVE-11555
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 1.2.0
Reporter: bharath v

 {code}
 I used tcpdump to display the network traffic: 
 [root@fe01 ~]# beeline 
 Beeline version 0.13.1-cdh5.3.2 by Apache Hive 
 beeline !connect jdbc:hive2://fe01.sectest.poc:1/default 
 Connecting to jdbc:hive2://fe01.sectest.poc:1/default 
 Enter username for jdbc:hive2://fe01.sectest.poc:1/default: tdaranyi 
 Enter password for jdbc:hive2://fe01.sectest.poc:1/default: * 
 (I entered cleartext as the password) 
 The tcpdump in a different window 
 tdara...@fe01.sectest.poc:~$ sudo tcpdump -n -X -i lo port 1 
 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode 
 listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes 
 (...) 
 10:25:16.329974 IP 192.168.32.102.54322  192.168.32.102.ndmp: Flags [P.], 
 seq 11:35, ack 1, win 512, options [nop,nop,TS val 2412851969 ecr 
 2412851969], length 24 
 0x: 4500 004c 3dd3 4000 4006 3abc c0a8 2066 E..L=.@.@.:f 
 0x0010: c0a8 2066 d432 2710 714c 0edc b45c 9268 ...f.2'.qL...\.h 
 0x0020: 8018 0200 c25b  0101 080a 8fd1 3301 .[3. 
 0x0030: 8fd1 3301 0500  1300 7464 6172 616e ..3...tdaran 
 0x0040: 7969 0063 6c65 6172 7465 7874 yi.cleartext 
 (...) 
 {code}
 We rely on the user supplied configuration to decide whether to open an SSL 
 socket or a Plain one. Instead we can negotiate this information from the HS2 
 and connect accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11552) implement basic methods for getting/putting file metadata

2015-08-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11552:

Attachment: HIVE-11552.01.patch

 implement basic methods for getting/putting file metadata
 -

 Key: HIVE-11552
 URL: https://issues.apache.org/jira/browse/HIVE-11552
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-metastore-branch

 Attachments: HIVE-11552.01.patch, HIVE-11552.nogen.patch, 
 HIVE-11552.nogen.patch, HIVE-11552.patch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-11588) merge master into branch

2015-08-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700326#comment-14700326
 ] 

Sergey Shelukhin edited comment on HIVE-11588 at 8/17/15 10:20 PM:
---

Need to pick up HIVE-11542, also log4j-related fixes for tests


was (Author: sershe):
Need to pick up HIVE-11542

 merge master into branch
 

 Key: HIVE-11588
 URL: https://issues.apache.org/jira/browse/HIVE-11588
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-metastore-branch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11588) merge master into branch

2015-08-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-11588.
-
Resolution: Fixed

pushed to branch

 merge master into branch
 

 Key: HIVE-11588
 URL: https://issues.apache.org/jira/browse/HIVE-11588
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-metastore-branch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11515) Still some possible race condition in DynamicPartitionPruner

2015-08-17 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700569#comment-14700569
 ] 

Siddharth Seth commented on HIVE-11515:
---

[~navis] - Do you have more details on how the query fails in the situations 
where you see this occur ? Is there an exception or does the query hang or some 
other manifestation ?

I think the patch is primarily moving the registerForVertexNotifications into 
the constructor - which is fine. However, the condition where an event is 
received before registering for vertex success notifications is already handled 
in checkForSourceCompletion
{code}
int expectedEvents = numExpectedEventsPerSource.get(name).getValue();
if (expectedEvents  0) {
  // Expected events not updated yet - vertex SUCCESS notification not 
received.
  return;
} else {
{code}

Even if all the events were to come in before registering for the notification, 
when prune is finally called - and a notification is received, these events 
will be processed.

On the Tez side, it makes sure to send in events only after the Initializer has 
been constructed - that's the HiveSplitGenerator.

 Still some possible race condition in DynamicPartitionPruner
 

 Key: HIVE-11515
 URL: https://issues.apache.org/jira/browse/HIVE-11515
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tez
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-11515.1.patch.txt


 Even after HIVE-9976, I could see race condition in DPP sometimes. Hard to 
 reproduce but it seemed related to the fact that prune() is called by 
 thread-pool. With some delay in queue, events from fast tasks are arrived 
 before prune() is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10810) Document Beeline/CLI changes

2015-08-17 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700655#comment-14700655
 ] 

Ferdinand Xu commented on HIVE-10810:
-

Hi [~sladymon], this is the jira tracking the 
wiki(https://cwiki.apache.org/confluence/display/Hive/Replacing+the+Implementation+of+Hive+CLI+Using+Beeline
 ). 

Yours,
Ferd

 Document Beeline/CLI changes
 

 Key: HIVE-10810
 URL: https://issues.apache.org/jira/browse/HIVE-10810
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Reporter: Xuefu Zhang
Assignee: Ferdinand Xu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11245) LLAP: Fix the LLAP to ORC APIs

2015-08-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-11245.
-
   Resolution: Done
Fix Version/s: llap

All the encoded path related code has been moved to orc.encoded package as 
described above. Main ORC package doesn't have any encoded dependencies 
anymore. 

 LLAP: Fix the LLAP to ORC APIs
 --

 Key: HIVE-11245
 URL: https://issues.apache.org/jira/browse/HIVE-11245
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Sergey Shelukhin
Priority: Blocker
 Fix For: llap


 Currently the LLAP branch has refactored the ORC code to have different code 
 paths depending on whether the data is coming from the cache or a FileSystem.
 We need to introduce a concept of a DataSource that is responsible for 
 getting the necessary bytes regardless of whether they are coming from a 
 FileSystem, in memory cache, or both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]

2015-08-17 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700649#comment-14700649
 ] 

Ferdinand Xu commented on HIVE-11579:
-

When I am debuging, I see the error output is closed after invoking the 
closeClientOperation method in class HiveStatement. The main purpose for this 
method is to send a request to the server to close handler. You can see that 
the error output is redirected to the standard error output. When the 
SQLOperation closes a driver, it will close the driver with resStream closed. 
When in the non-embedded mode, it works well since the two output stream are 
not the same one while it failed for embedded mode.

 Invoke the set command will close standard error output[beeline-cli]
 

 Key: HIVE-11579
 URL: https://issues.apache.org/jira/browse/HIVE-11579
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-11579-beeline-cli.patch


 We can easily reproduce the debug by the following steps:
 {code}
 hive set system:xx=yy;
 hive lss;
 hive 
 {code}
 The error output disappeared since the err outputstream is closed when 
 closing the Hive statement.
 This bug occurred also in the upstream when using the embeded mode as the new 
 CLI uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]

2015-08-17 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700648#comment-14700648
 ] 

Ferdinand Xu commented on HIVE-11579:
-

When I am debuging, I see the error output is closed after invoking the 
closeClientOperation method in class HiveStatement. The main purpose for this 
method is to send a request to the server to close handler. You can see that 
the error output is redirected to the standard error output. When the 
SQLOperation closes a driver, it will close the driver with resStream closed. 
When in the non-embedded mode, it works well since the two output stream are 
not the same one while it failed for embedded mode.

 Invoke the set command will close standard error output[beeline-cli]
 

 Key: HIVE-11579
 URL: https://issues.apache.org/jira/browse/HIVE-11579
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-11579-beeline-cli.patch


 We can easily reproduce the debug by the following steps:
 {code}
 hive set system:xx=yy;
 hive lss;
 hive 
 {code}
 The error output disappeared since the err outputstream is closed when 
 closing the Hive statement.
 This bug occurred also in the upstream when using the embeded mode as the new 
 CLI uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10697) ObjectInspectorConvertors#UnionConvertor does a faulty conversion

2015-08-17 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700683#comment-14700683
 ] 

Swarnim Kulkarni commented on HIVE-10697:
-

[~hsubramaniyan] Would you mind reviewing the patch?

 ObjectInspectorConvertors#UnionConvertor does a faulty conversion
 -

 Key: HIVE-10697
 URL: https://issues.apache.org/jira/browse/HIVE-10697
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
 Attachments: HIVE-10697.1.patch.txt


 Currently the UnionConvertor in the ObjectInspectorConvertors class has an 
 issue with the convert method where it attempts to convert the 
 objectinspector itself instead of converting the field.[1]. This should be 
 changed to convert the field itself. This could result in a 
 ClassCastException as shown below:
 {code}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazy.objectinspector.LazyUnionObjectInspector 
 cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyString
   at 
 org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveWritableObject(LazyStringObjectInspector.java:51)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:391)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:338)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$UnionConverter.convert(ObjectInspectorConverters.java:456)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$MapConverter.convert(ObjectInspectorConverters.java:539)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
   ... 9 more
 {code}
 [1] 
 https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java#L466



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11375) Broken processing of queries containing NOT (x IS NOT NULL and x 0)

2015-08-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700322#comment-14700322
 ] 

Hive QA commented on HIVE-11375:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12750897/HIVE-11375.2.patch

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 9371 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_cond_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_when
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_testxpath2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_unquote_not
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_isnull_isnotnull
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_size
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_udf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_filter_join_breaktask
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_udf
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_filter_join_breaktask
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithPerformanceMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithVerboseMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithPerformanceMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithVerboseMode
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4989/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4989/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4989/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12750897 - PreCommit-HIVE-TRUNK-Build

 Broken processing of queries containing NOT (x IS NOT NULL and x  0)
 --

 Key: HIVE-11375
 URL: https://issues.apache.org/jira/browse/HIVE-11375
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 2.0.0
Reporter: Mariusz Sakowski
Assignee: Aihua Xu
 Fix For: 2.0.0

 Attachments: HIVE-11375.2.patch, HIVE-11375.patch


 When running query like this:
 {code}explain select * from test where (val is not null and val  0);{code}
 hive will simplify expression in parenthesis and omit is not null check:
 {code}
   Filter Operator
 predicate: (val  0) (type: boolean)
 {code}
 which is fine.
 but if we negate condition using NOT operator:
 {code}explain select * from test where not (val is not null and val  
 0);{code}
 hive will also simplify thing, but now it will break stuff:
 {code}
   Filter Operator
 predicate: (not (val  0)) (type: boolean)
 {code}
 because valid predicate should be *val == 0 or val is null*, while above row 
 is equivalent to *val == 0* only, filtering away rows where val is null
 simple example:
 {code}
 CREATE TABLE example (
 val bigint
 );
 INSERT INTO example VALUES (1), (NULL), (0);
 -- returns 2 rows - NULL and 0
 select * from example where (val is null or val == 0);
 -- returns 1 row - 0
 select * from example where not (val is not null and val  0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11317) ACID: Improve transaction Abort logic due to timeout

2015-08-17 Thread Shannon Ladymon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700477#comment-14700477
 ] 

Shannon Ladymon commented on HIVE-11317:


Doc note: I have added/updated documentation for 
*hive.timedout.txn.reaper.start* and *hive.timedout.txn.reaper.interval* to the 
following pages in the wiki:
* [Hive Transactions - Configuration | 
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration]
* [Configuration Properties - Transactions and Compactor | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-TransactionsandCompactor]

If it looks okay, we can remove the TODOC1.3 label.

 ACID: Improve transaction Abort logic due to timeout
 

 Key: HIVE-11317
 URL: https://issues.apache.org/jira/browse/HIVE-11317
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
  Labels: TODOC1.3, triage
 Fix For: 1.3.0

 Attachments: HIVE-11317.2.patch, HIVE-11317.3.patch, 
 HIVE-11317.4.patch, HIVE-11317.5.patch, HIVE-11317.6.patch, HIVE-11317.patch


 the logic to Abort transactions that have stopped heartbeating is in
 TxnHandler.timeOutTxns()
 This is only called when DbTxnManger.getValidTxns() is called.
 So if there is a lot of txns that need to be timed out and the there are not 
 SQL clients talking to the system, there is nothing to abort dead 
 transactions, and thus compaction can't clean them up so garbage accumulates 
 in the system.
 Also, streaming api doesn't call DbTxnManager at all.
 Need to move this logic into Initiator (or some other metastore side thread).
 Also, make sure it is broken up into multiple small(er) transactions against 
 metastore DB.
 Also more timeOutLocks() locks there as well.
 see about adding TXNS.COMMENT field which can be used for Auto aborted due 
 to timeout for example.
 The symptom of this is that the system keeps showing more and more Open 
 transactions that don't seem to ever go away (and have no locks associated 
 with them)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11542) port fileId support on shims and splits from llap branch

2015-08-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700315#comment-14700315
 ] 

Sergey Shelukhin commented on HIVE-11542:
-

Will remove HdfsUtils for now, I guess

 port fileId support on shims and splits from llap branch
 

 Key: HIVE-11542
 URL: https://issues.apache.org/jira/browse/HIVE-11542
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-metastore-branch, 2.0.0

 Attachments: HIVE-11542.patch


 This is helpful for any kind of file-based cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11588) merge master into branch

2015-08-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700326#comment-14700326
 ] 

Sergey Shelukhin commented on HIVE-11588:
-

Need to pick up HIVE-11542

 merge master into branch
 

 Key: HIVE-11588
 URL: https://issues.apache.org/jira/browse/HIVE-11588
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-metastore-branch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11590) AvroDeserializer is very chatty

2015-08-17 Thread Swarnim Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni reassigned HIVE-11590:
---

Assignee: Swarnim Kulkarni

 AvroDeserializer is very chatty
 ---

 Key: HIVE-11590
 URL: https://issues.apache.org/jira/browse/HIVE-11590
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni

 It seems like AvroDeserializer is currently very chatty with it logging tons 
 of messages at INFO level in the mapreduce logs. It would be helpful to push 
 down some of these to debug level to keep the logs clean.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11592) ORC metadata section can sometimes exceed protobuf message size limit

2015-08-17 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11592:
-
Attachment: HIVE-11592.1.patch

Here is a initial patch based on Sergey's suggestion. [~gopalv]/[~sershe] can 
someone take a look at this patch?

 ORC metadata section can sometimes exceed protobuf message size limit
 -

 Key: HIVE-11592
 URL: https://issues.apache.org/jira/browse/HIVE-11592
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.3.0, 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11592.1.patch


 If there are too many small stripes and with many columns, the overhead for 
 storing metadata (column stats) can exceed the default protobuf message size 
 of 64MB. Reading such files will throw the following exception
 {code}
 Exception in thread main 
 com.google.protobuf.InvalidProtocolBufferException: Protocol message was too 
 large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase 
 the size limit.
 at 
 com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
 at 
 com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
 at 
 com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:811)
 at 
 com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:329)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.init(OrcProto.java:1331)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.init(OrcProto.java:1281)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1374)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1369)
 at 
 com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.init(OrcProto.java:4887)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.init(OrcProto.java:4803)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4990)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4985)
 at 
 com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.init(OrcProto.java:12925)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.init(OrcProto.java:12872)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12961)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12956)
 at 
 com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.init(OrcProto.java:13599)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.init(OrcProto.java:13546)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13635)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13630)
 at 
 com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
 at 
 com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
 at 
 com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
 at 
 com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.parseFrom(OrcProto.java:13746)
 at 
 org.apache.hadoop.hive.ql.io.orc.ReaderImpl$MetaInfoObjExtractor.init(ReaderImpl.java:468)
 at 
 org.apache.hadoop.hive.ql.io.orc.ReaderImpl.init(ReaderImpl.java:314)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
 at org.apache.hadoop.hive.ql.io.orc.FileDump.main(FileDump.java:67)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {code}
 The only solution for this is to programmatically increase the 
 CodeInputStream size limit. We should make this configurable via hive config 
 so that the orc file is 

[jira] [Updated] (HIVE-11504) Predicate pushing down doesn't work for float type for Parquet

2015-08-17 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-11504:

Attachment: HIVE-11504.3.patch

Add float type to the predicate leaf.

 Predicate pushing down doesn't work for float type for Parquet
 --

 Key: HIVE-11504
 URL: https://issues.apache.org/jira/browse/HIVE-11504
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-11504.1.patch, HIVE-11504.2.patch, 
 HIVE-11504.2.patch, HIVE-11504.3.patch, HIVE-11504.patch


 Predicate builder should use PrimitiveTypeName type in parquet side to 
 construct predicate leaf instead of the type provided by PredicateLeaf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11515) Still some possible race condition in DynamicPartitionPruner

2015-08-17 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700470#comment-14700470
 ] 

Gunther Hagleitner commented on HIVE-11515:
---

Thanks [~navis]. [~wzheng]/[~sseth] can you help me review this - you know the 
code too?

 Still some possible race condition in DynamicPartitionPruner
 

 Key: HIVE-11515
 URL: https://issues.apache.org/jira/browse/HIVE-11515
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tez
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-11515.1.patch.txt


 Even after HIVE-9976, I could see race condition in DPP sometimes. Hard to 
 reproduce but it seemed related to the fact that prune() is called by 
 thread-pool. With some delay in queue, events from fast tasks are arrived 
 before prune() is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11504) Predicate pushing down doesn't work for float type for Parquet

2015-08-17 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700642#comment-14700642
 ] 

Ferdinand Xu commented on HIVE-11504:
-

Hi [~spena], can you help me review this patch? Thanks!

 Predicate pushing down doesn't work for float type for Parquet
 --

 Key: HIVE-11504
 URL: https://issues.apache.org/jira/browse/HIVE-11504
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-11504.1.patch, HIVE-11504.2.patch, 
 HIVE-11504.2.patch, HIVE-11504.3.patch, HIVE-11504.patch


 Predicate builder should use PrimitiveTypeName type in parquet side to 
 construct predicate leaf instead of the type provided by PredicateLeaf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]

2015-08-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700670#comment-14700670
 ] 

Hive QA commented on HIVE-11579:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12750815/HIVE-11579-beeline-cli.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9234 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_8
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-BEELINE-Build/19/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-BEELINE-Build/19/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-BEELINE-Build-19/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12750815 - PreCommit-HIVE-BEELINE-Build

 Invoke the set command will close standard error output[beeline-cli]
 

 Key: HIVE-11579
 URL: https://issues.apache.org/jira/browse/HIVE-11579
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-11579-beeline-cli.patch


 We can easily reproduce the debug by the following steps:
 {code}
 hive set system:xx=yy;
 hive lss;
 hive 
 {code}
 The error output disappeared since the err outputstream is closed when 
 closing the Hive statement.
 This bug occurred also in the upstream when using the embeded mode as the new 
 CLI uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10304) Add deprecation message to HiveCLI

2015-08-17 Thread Shannon Ladymon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700373#comment-14700373
 ] 

Shannon Ladymon commented on HIVE-10304:


Can I get an update on the current status of the deprecation of HiveCLI?  I 
would like to add more information to the [Hive wiki | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli] about the 
deprecation, including a Hive version number for when it was/will be deprecated.

 Add deprecation message to HiveCLI
 --

 Key: HIVE-10304
 URL: https://issues.apache.org/jira/browse/HIVE-10304
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Affects Versions: 1.1.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC1.2
 Attachments: HIVE-10304.2.patch, HIVE-10304.3.patch, HIVE-10304.patch


 As Beeline is now the recommended command line tool to Hive, we should add a 
 message to HiveCLI to indicate that it is deprecated and redirect them to 
 Beeline.  
 This is not suggesting to remove HiveCLI for now, but just a helpful 
 direction for user to know the direction to focus attention in Beeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11581) HiveServer2 should store connection params in ZK when using dynamic service discovery for simpler client connection string.

2015-08-17 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700419#comment-14700419
 ] 

Thejas M Nair commented on HIVE-11581:
--

It also helps in making the use of hive more secure. As discussed in 
HIVE-11555, admins can enable SSL option for not sending clear text passwords 
by setting connection parameters in zookeeper.


 HiveServer2 should store connection params in ZK when using dynamic service 
 discovery for simpler client connection string.
 ---

 Key: HIVE-11581
 URL: https://issues.apache.org/jira/browse/HIVE-11581
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 1.3.0, 2.0.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta

 Currently, the client needs to specify several parameters based on which an 
 appropriate connection is created with the server. In case of dynamic service 
 discovery, when multiple HS2 instances are running, it is much more usable 
 for the server to add its config parameters to ZK which the driver can use to 
 configure the connection, instead of the jdbc/odbc user adding those in 
 connection string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11454) Using select columns vs. select * results in long running query

2015-08-17 Thread Russell Pierce (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700444#comment-14700444
 ] 

Russell Pierce commented on HIVE-11454:
---

How sure are you that you are looking at a JDBC issue?  On 0.13.1 HiveServer1 
was still around and I'm seeing slower query performance for the select columns 
than select * without JDBC being (as far as I know, and I'm a novice here - so 
I might be wrong) involved.  The slow down (as far as I can tell) is associated 
with the time to start the Hadoop job... that is, select columns triggers a 
hadoop job but select * does not.

 Using select columns vs. select * results in long running query
 ---

 Key: HIVE-11454
 URL: https://issues.apache.org/jira/browse/HIVE-11454
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.13.1
Reporter: Steven Hawkins

 Originally logged as https://issues.jboss.org/browse/TEIID-3580
 When using the JDBC jars for Hive 0.13.1 running on HDP 2.1, some queries 
 executed against table 'default.sample_07' takes approximately 20-30 seconds 
 to return.
 The Hive JDBC jars for version 0.13.1 can be found here :
 https://github.com/vchintal/hive-jdbc-jars-archive
 SELECT g_0.code, g_0.description, g_0.total_emp, g_0.salary FROM sample_07 
 AS g_0 run from a standalone JDBC project results in a 20+ second delay.  
 However SELECT * FROM sample_07 has no delay.  The same 500 are returned 
 either way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11317) ACID: Improve transaction Abort logic due to timeout

2015-08-17 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700482#comment-14700482
 ] 

Lefty Leverenz commented on HIVE-11317:
---

Just curious:  Why do the TimeValidators for these new parameters have 
MILLISECONDS while their default values are in seconds?  Other parameters match 
the defaults to the TimeValidator unit.

{code}
+HIVE_TIMEDOUT_TXN_REAPER_START(hive.timedout.txn.reaper.start, 100s,
+  new TimeValidator(TimeUnit.MILLISECONDS), Time delay of 1st reaper run 
after metastore start),
+HIVE_TIMEDOUT_TXN_REAPER_INTERVAL(hive.timedout.txn.reaper.interval, 
180s,
+  new TimeValidator(TimeUnit.MILLISECONDS), Time interval describing how 
often the reaper runs),
{code}


 ACID: Improve transaction Abort logic due to timeout
 

 Key: HIVE-11317
 URL: https://issues.apache.org/jira/browse/HIVE-11317
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
  Labels: TODOC1.3, triage
 Fix For: 1.3.0

 Attachments: HIVE-11317.2.patch, HIVE-11317.3.patch, 
 HIVE-11317.4.patch, HIVE-11317.5.patch, HIVE-11317.6.patch, HIVE-11317.patch


 the logic to Abort transactions that have stopped heartbeating is in
 TxnHandler.timeOutTxns()
 This is only called when DbTxnManger.getValidTxns() is called.
 So if there is a lot of txns that need to be timed out and the there are not 
 SQL clients talking to the system, there is nothing to abort dead 
 transactions, and thus compaction can't clean them up so garbage accumulates 
 in the system.
 Also, streaming api doesn't call DbTxnManager at all.
 Need to move this logic into Initiator (or some other metastore side thread).
 Also, make sure it is broken up into multiple small(er) transactions against 
 metastore DB.
 Also more timeOutLocks() locks there as well.
 see about adding TXNS.COMMENT field which can be used for Auto aborted due 
 to timeout for example.
 The symptom of this is that the system keeps showing more and more Open 
 transactions that don't seem to ever go away (and have no locks associated 
 with them)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11592) ORC metadata section can sometimes exceed protobuf message size limit

2015-08-17 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11592:
-
Description: 
If there are too many small stripes and with many columns, the overhead for 
storing metadata (column stats) can exceed the default protobuf message size of 
64MB. Reading such files will throw the following exception
{code}
Exception in thread main com.google.protobuf.InvalidProtocolBufferException: 
Protocol message was too large.  May be malicious.  Use 
CodedInputStream.setSizeLimit() to increase the size limit.
at 
com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
at 
com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
at 
com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:811)
at 
com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:329)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.init(OrcProto.java:1331)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.init(OrcProto.java:1281)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1374)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1369)
at 
com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.init(OrcProto.java:4887)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.init(OrcProto.java:4803)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4990)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4985)
at 
com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.init(OrcProto.java:12925)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.init(OrcProto.java:12872)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12961)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12956)
at 
com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.init(OrcProto.java:13599)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.init(OrcProto.java:13546)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13635)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13630)
at 
com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.parseFrom(OrcProto.java:13746)
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl$MetaInfoObjExtractor.init(ReaderImpl.java:468)
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.init(ReaderImpl.java:314)
at 
org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
at org.apache.hadoop.hive.ql.io.orc.FileDump.main(FileDump.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
{code}

The only solution for this is to programmatically increase the CodeInputStream 
size limit. We should make this configurable via hive config so that the orc 
file is readable. Alternatively, we can keep increasing the size until it 
parsing succeeds.

  was:
If there are too many small stripes and with many columns, the overhead for 
storing metadata (column stats) can exceed the default protobuf message size of 
64MB. Reading such files will throw the following exception
{code}
Exception in thread main com.google.protobuf.InvalidProtocolBufferException: 
Protocol message was too large.  May be malicious.  Use 
CodedInputStream.setSizeLimit() to increase the size limit.
at 
com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
at 

[jira] [Updated] (HIVE-10697) ObjectInspectorConvertors#UnionConvertor does a faulty conversion

2015-08-17 Thread Swarnim Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-10697:

Attachment: HIVE-10697.1.patch.txt

Patch attached.

 ObjectInspectorConvertors#UnionConvertor does a faulty conversion
 -

 Key: HIVE-10697
 URL: https://issues.apache.org/jira/browse/HIVE-10697
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
 Attachments: HIVE-10697.1.patch.txt


 Currently the UnionConvertor in the ObjectInspectorConvertors class has an 
 issue with the convert method where it attempts to convert the 
 objectinspector itself instead of converting the field.[1]. This should be 
 changed to convert the field itself. This could result in a 
 ClassCastException as shown below:
 {code}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazy.objectinspector.LazyUnionObjectInspector 
 cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyString
   at 
 org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveWritableObject(LazyStringObjectInspector.java:51)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:391)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:338)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$UnionConverter.convert(ObjectInspectorConverters.java:456)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$MapConverter.convert(ObjectInspectorConverters.java:539)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
   ... 9 more
 {code}
 [1] 
 https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java#L466



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10697) ObjectInspectorConvertors#UnionConvertor does a faulty conversion

2015-08-17 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700682#comment-14700682
 ] 

Swarnim Kulkarni commented on HIVE-10697:
-

RB: https://reviews.apache.org/r/37563/

 ObjectInspectorConvertors#UnionConvertor does a faulty conversion
 -

 Key: HIVE-10697
 URL: https://issues.apache.org/jira/browse/HIVE-10697
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
 Attachments: HIVE-10697.1.patch.txt


 Currently the UnionConvertor in the ObjectInspectorConvertors class has an 
 issue with the convert method where it attempts to convert the 
 objectinspector itself instead of converting the field.[1]. This should be 
 changed to convert the field itself. This could result in a 
 ClassCastException as shown below:
 {code}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazy.objectinspector.LazyUnionObjectInspector 
 cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyString
   at 
 org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveWritableObject(LazyStringObjectInspector.java:51)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:391)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:338)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$UnionConverter.convert(ObjectInspectorConverters.java:456)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$MapConverter.convert(ObjectInspectorConverters.java:539)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
   ... 9 more
 {code}
 [1] 
 https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java#L466



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11383) Upgrade Hive to Calcite 1.4

2015-08-17 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11383:
---
Attachment: HIVE-11383.10.patch

 Upgrade Hive to Calcite 1.4
 ---

 Key: HIVE-11383
 URL: https://issues.apache.org/jira/browse/HIVE-11383
 Project: Hive
  Issue Type: Bug
Reporter: Julian Hyde
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11383.1.patch, HIVE-11383.10.patch, 
 HIVE-11383.2.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, 
 HIVE-11383.3.patch, HIVE-11383.4.patch, HIVE-11383.5.patch, 
 HIVE-11383.6.patch, HIVE-11383.7.patch, HIVE-11383.8.patch, 
 HIVE-11383.8.patch, HIVE-11383.9.patch


 CLEAR LIBRARY CACHE
 Upgrade Hive to Calcite 1.4.0-incubating.
 There is currently a snapshot release, which is close to what will be in 1.4. 
 I have checked that Hive compiles against the new snapshot, fixing one issue. 
 The patch is attached.
 Next step is to validate that Hive runs against the new Calcite, and post any 
 issues to the Calcite list or log Calcite Jira cases. [~jcamachorodriguez], 
 can you please do that.
 [~pxiong], I gather you are dependent on CALCITE-814, which will be fixed in 
 the new Calcite version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11513) AvroLazyObjectInspector could handle empty data better

2015-08-17 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699580#comment-14699580
 ] 

Swarnim Kulkarni commented on HIVE-11513:
-

[~xuefuz] Are we good to merge this?

 AvroLazyObjectInspector could handle empty data better
 --

 Key: HIVE-11513
 URL: https://issues.apache.org/jira/browse/HIVE-11513
 Project: Hive
  Issue Type: Improvement
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
 Attachments: HIVE-11513.1.patch.txt, HIVE-11513.2.patch.txt, 
 HIVE-11513.3.patch.txt


 Currently in the AvroLazyObjectInspector, it looks like we only handle the 
 case  when the data send to deserialize is null[1]. It would be nice to 
 handle the case when it is empty.
 [1] 
 https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroLazyObjectInspector.java#L226-L228



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]

2015-08-17 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-11579:

Attachment: HIVE-11579-beeline-cli.patch

For embedded mode, we should not close the statement. [~xuefuz] any suggestions 
for this?

 Invoke the set command will close standard error output[beeline-cli]
 

 Key: HIVE-11579
 URL: https://issues.apache.org/jira/browse/HIVE-11579
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-11579-beeline-cli.patch


 We can easily reproduce the debug by the following steps:
 {code}
 hive set system:xx=yy;
 hive lss;
 hive 
 {code}
 The error output disappeared since the err outputstream is closed when 
 closing the Hive statement.
 This bug occurred also in the upstream when using the embeded mode as the new 
 CLI uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]

2015-08-17 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700775#comment-14700775
 ] 

Ferdinand Xu commented on HIVE-11579:
-

This will cause the operation handlers leakage in the OperationManager side. 
The general process is as follows:
{noformat}
HiveStatement#close - HiveStatement#closeClientOperation - 
ThriftCLIService#closeOperation 
- CLIService#closeOperation - HiveSessionImpl#closeOperation - 
HiveCommandOperation#close 
-  HiveCommandOperation#teardownIO
{noformat}
And you can see the how IO is initialized in Operation:
{noformat}
  private void setupSessionIO(SessionState sessionState) {
try {
  LOG.info(Putting temp output to file  + 
sessionState.getTmpOutputFile().toString());
  sessionState.in = null; // hive server's session input stream is not used
  // open a per-session file in auto-flush mode for writing temp results
  sessionState.out = new PrintStream(new 
FileOutputStream(sessionState.getTmpOutputFile()), true, UTF-8);
  // TODO: for hadoop jobs, progress is printed out to session.err,
  // we should find a way to feed back job progress to client
  sessionState.err = new PrintStream(System.err, true, UTF-8);
} catch (IOException e) {
  LOG.error(Error in creating temp output file , e);
  try {
sessionState.in = null;
sessionState.out = new PrintStream(System.out, true, UTF-8);
sessionState.err = new PrintStream(System.err, true, UTF-8);
  } catch (UnsupportedEncodingException ee) {
LOG.error(Error creating PrintStream, e);
ee.printStackTrace();
sessionState.out = null;
sessionState.err = null;
  }
}
  }
{noformat}

And how it closed:

{noformat}
  private void tearDownSessionIO() {
IOUtils.cleanup(LOG, parentSession.getSessionState().out);
IOUtils.cleanup(LOG, parentSession.getSessionState().err);
  }
{noformat}
Another way we can get rid of is to add a tmp err file like tmp output file 
instead of using the system.err file descriptor in the session level. But it 
will cause no error printed in the server side.

 Invoke the set command will close standard error output[beeline-cli]
 

 Key: HIVE-11579
 URL: https://issues.apache.org/jira/browse/HIVE-11579
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-11579-beeline-cli.patch


 We can easily reproduce the debug by the following steps:
 {code}
 hive set system:xx=yy;
 hive lss;
 hive 
 {code}
 The error output disappeared since the err outputstream is closed when 
 closing the Hive statement.
 This bug occurred also in the upstream when using the embeded mode as the new 
 CLI uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11542) port fileId support on shims and splits from llap branch

2015-08-17 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-11542:
--
Labels: TODOC2.0  (was: )

 port fileId support on shims and splits from llap branch
 

 Key: HIVE-11542
 URL: https://issues.apache.org/jira/browse/HIVE-11542
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
  Labels: TODOC2.0
 Fix For: 2.0.0

 Attachments: HIVE-11542.patch


 This is helpful for any kind of file-based cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10304) Add deprecation message to HiveCLI

2015-08-17 Thread Shannon Ladymon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700771#comment-14700771
 ] 

Shannon Ladymon commented on HIVE-10304:


Thanks for the info!

 Add deprecation message to HiveCLI
 --

 Key: HIVE-10304
 URL: https://issues.apache.org/jira/browse/HIVE-10304
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Affects Versions: 1.1.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC1.2
 Attachments: HIVE-10304.2.patch, HIVE-10304.3.patch, HIVE-10304.patch


 As Beeline is now the recommended command line tool to Hive, we should add a 
 message to HiveCLI to indicate that it is deprecated and redirect them to 
 Beeline.  
 This is not suggesting to remove HiveCLI for now, but just a helpful 
 direction for user to know the direction to focus attention in Beeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11592) ORC metadata section can sometimes exceed protobuf message size limit

2015-08-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700724#comment-14700724
 ] 

Hive QA commented on HIVE-11592:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12750949/HIVE-11592.1.patch

{color:green}SUCCESS:{color} +1 9370 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4991/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4991/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4991/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12750949 - PreCommit-HIVE-TRUNK-Build

 ORC metadata section can sometimes exceed protobuf message size limit
 -

 Key: HIVE-11592
 URL: https://issues.apache.org/jira/browse/HIVE-11592
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.3.0, 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11592.1.patch


 If there are too many small stripes and with many columns, the overhead for 
 storing metadata (column stats) can exceed the default protobuf message size 
 of 64MB. Reading such files will throw the following exception
 {code}
 Exception in thread main 
 com.google.protobuf.InvalidProtocolBufferException: Protocol message was too 
 large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase 
 the size limit.
 at 
 com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
 at 
 com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
 at 
 com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:811)
 at 
 com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:329)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.init(OrcProto.java:1331)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.init(OrcProto.java:1281)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1374)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1369)
 at 
 com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.init(OrcProto.java:4887)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.init(OrcProto.java:4803)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4990)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4985)
 at 
 com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.init(OrcProto.java:12925)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.init(OrcProto.java:12872)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12961)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12956)
 at 
 com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.init(OrcProto.java:13599)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.init(OrcProto.java:13546)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13635)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13630)
 at 
 com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
 at 
 com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
 at 
 com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
 at 
 com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.parseFrom(OrcProto.java:13746)
 at 
 org.apache.hadoop.hive.ql.io.orc.ReaderImpl$MetaInfoObjExtractor.init(ReaderImpl.java:468)
 at 
 org.apache.hadoop.hive.ql.io.orc.ReaderImpl.init(ReaderImpl.java:314)
 at 
 

[jira] [Commented] (HIVE-11542) port fileId support on shims and splits from llap branch

2015-08-17 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700754#comment-14700754
 ] 

Lefty Leverenz commented on HIVE-11542:
---

Doc note:  This adds *hive.orc.splits.include.fileid* to HiveConf.java, so it 
needs to be documented in the ORC section of Configuration Properties for 
release 2.0.0.

* [Configuration Properties -- ORC File Format | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-ORCFileFormat]

(Typo alert:  the parameter description says thaty for that which would be 
nice to fix sometime.)

 port fileId support on shims and splits from llap branch
 

 Key: HIVE-11542
 URL: https://issues.apache.org/jira/browse/HIVE-11542
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
  Labels: TODOC2.0
 Fix For: 2.0.0

 Attachments: HIVE-11542.patch


 This is helpful for any kind of file-based cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11383) Upgrade Hive to Calcite 1.4

2015-08-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699625#comment-14699625
 ] 

Hive QA commented on HIVE-11383:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12750799/HIVE-11383.10.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 9370 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join13
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithPerformanceMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithVerboseMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithPerformanceMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithVerboseMode
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4986/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4986/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4986/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12750799 - PreCommit-HIVE-TRUNK-Build

 Upgrade Hive to Calcite 1.4
 ---

 Key: HIVE-11383
 URL: https://issues.apache.org/jira/browse/HIVE-11383
 Project: Hive
  Issue Type: Bug
Reporter: Julian Hyde
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11383.1.patch, HIVE-11383.10.patch, 
 HIVE-11383.2.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, 
 HIVE-11383.3.patch, HIVE-11383.4.patch, HIVE-11383.5.patch, 
 HIVE-11383.6.patch, HIVE-11383.7.patch, HIVE-11383.8.patch, 
 HIVE-11383.8.patch, HIVE-11383.9.patch


 CLEAR LIBRARY CACHE
 Upgrade Hive to Calcite 1.4.0-incubating.
 There is currently a snapshot release, which is close to what will be in 1.4. 
 I have checked that Hive compiles against the new snapshot, fixing one issue. 
 The patch is attached.
 Next step is to validate that Hive runs against the new Calcite, and post any 
 issues to the Calcite list or log Calcite Jira cases. [~jcamachorodriguez], 
 can you please do that.
 [~pxiong], I gather you are dependent on CALCITE-814, which will be fixed in 
 the new Calcite version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11580) ThriftUnionObjectInspector#toString throws NPE

2015-08-17 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-11580:
---
Attachment: HIVE-11580.1.patch

 ThriftUnionObjectInspector#toString throws NPE
 --

 Key: HIVE-11580
 URL: https://issues.apache.org/jira/browse/HIVE-11580
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: HIVE-11580.1.patch


 ThriftUnionObjectInspector uses toString from StructObjectInspector, which 
 accesses uninitialized member variable fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3725) Add support for pulling HBase columns with prefixes

2015-08-17 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699912#comment-14699912
 ] 

Swarnim Kulkarni commented on HIVE-3725:


[~leftylev] Mind giving me access to the wiki so I can document this and a 
couple others on the wiki?

 Add support for pulling HBase columns with prefixes
 ---

 Key: HIVE-3725
 URL: https://issues.apache.org/jira/browse/HIVE-3725
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.9.0
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
  Labels: TODOC12
 Fix For: 0.12.0

 Attachments: HIVE-3725.1.patch.txt, HIVE-3725.2.patch.txt, 
 HIVE-3725.3.patch.txt, HIVE-3725.4.patch.txt, HIVE-3725.patch.3.txt


 Current HBase Hive integration supports reading many values from the same row 
 by specifying a column family. And specifying just the column family can pull 
 in all qualifiers within the family.
 We should add in support to be able to specify a prefix for the qualifier and 
 all columns that start with the prefix would automatically get pulled in. A 
 wildcard support would be ideal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11569) Use PreOrderOnceWalker where feasible

2015-08-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699931#comment-14699931
 ] 

Ashutosh Chauhan commented on HIVE-11569:
-

Failures are unrelated. [~jcamachorodriguez] Can you take a look?

 Use PreOrderOnceWalker where feasible
 -

 Key: HIVE-11569
 URL: https://issues.apache.org/jira/browse/HIVE-11569
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-11569.patch, HIVE-11569.patch


 Because of its early exit criteria it has better performance than 
 {{PreOrderWalker}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3725) Add support for pulling HBase columns with prefixes

2015-08-17 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699947#comment-14699947
 ] 

Lefty Leverenz commented on HIVE-3725:
--

Happy to, [~swarnim], but you need a Confluence username.

See [How to get permission to edit | 
https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit].

 Add support for pulling HBase columns with prefixes
 ---

 Key: HIVE-3725
 URL: https://issues.apache.org/jira/browse/HIVE-3725
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.9.0
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
  Labels: TODOC12
 Fix For: 0.12.0

 Attachments: HIVE-3725.1.patch.txt, HIVE-3725.2.patch.txt, 
 HIVE-3725.3.patch.txt, HIVE-3725.4.patch.txt, HIVE-3725.patch.3.txt


 Current HBase Hive integration supports reading many values from the same row 
 by specifying a column family. And specifying just the column family can pull 
 in all qualifiers within the family.
 We should add in support to be able to specify a prefix for the qualifier and 
 all columns that start with the prefix would automatically get pulled in. A 
 wildcard support would be ideal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11513) AvroLazyObjectInspector could handle empty data better

2015-08-17 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699946#comment-14699946
 ] 

Xuefu Zhang commented on HIVE-11513:


+1

 AvroLazyObjectInspector could handle empty data better
 --

 Key: HIVE-11513
 URL: https://issues.apache.org/jira/browse/HIVE-11513
 Project: Hive
  Issue Type: Improvement
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
 Attachments: HIVE-11513.1.patch.txt, HIVE-11513.2.patch.txt, 
 HIVE-11513.3.patch.txt


 Currently in the AvroLazyObjectInspector, it looks like we only handle the 
 case  when the data send to deserialize is null[1]. It would be nice to 
 handle the case when it is empty.
 [1] 
 https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroLazyObjectInspector.java#L226-L228



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3725) Add support for pulling HBase columns with prefixes

2015-08-17 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699952#comment-14699952
 ] 

Swarnim Kulkarni commented on HIVE-3725:


Thanks [~leftylev]. My confluence username is swarnim

 Add support for pulling HBase columns with prefixes
 ---

 Key: HIVE-3725
 URL: https://issues.apache.org/jira/browse/HIVE-3725
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.9.0
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
  Labels: TODOC12
 Fix For: 0.12.0

 Attachments: HIVE-3725.1.patch.txt, HIVE-3725.2.patch.txt, 
 HIVE-3725.3.patch.txt, HIVE-3725.4.patch.txt, HIVE-3725.patch.3.txt


 Current HBase Hive integration supports reading many values from the same row 
 by specifying a column family. And specifying just the column family can pull 
 in all qualifiers within the family.
 We should add in support to be able to specify a prefix for the qualifier and 
 all columns that start with the prefix would automatically get pulled in. A 
 wildcard support would be ideal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11572) Datanucleus loads Log4j1.x Logger from AppClassLoader

2015-08-17 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699889#comment-14699889
 ] 

Prasanth Jayachandran commented on HIVE-11572:
--

These test failures are handled in HIVE-11575.

 Datanucleus loads Log4j1.x Logger from AppClassLoader
 -

 Key: HIVE-11572
 URL: https://issues.apache.org/jira/browse/HIVE-11572
 Project: Hive
  Issue Type: Sub-task
  Components: Logging
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 2.0.0

 Attachments: HIVE-11572.patch


 As part of HIVE-11304, we moved from Log4j1.x to Log4j2. But DataNucleus log 
 messages gets logged to console when launching the hive cli. The reason is 
 DataNucleus is trying to load Log4j1.x Logger by traversing its class loader. 
 Although we use log4j-1.2-api bridge we are loading log4j-1.2.16 jar that was 
 pulled by ZooKeeper. We should make sure that there is no log4j-1.2.16 in 
 datanucleus classloader hierarchy (classpath). 
 DataNucleus logger has this 
 {code}
 NucleusLogger.class.getClassLoader().loadClass(org.apache.log4j.Logger);
 loggerClass = org.datanucleus.util.Log4JLogger.class;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11569) Use PreOrderOnceWalker where feasible

2015-08-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699920#comment-14699920
 ] 

Hive QA commented on HIVE-11569:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12750831/HIVE-11569.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9370 tests executed
*Failed tests:*
{noformat}
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithPerformanceMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithVerboseMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithPerformanceMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithVerboseMode
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4987/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4987/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4987/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12750831 - PreCommit-HIVE-TRUNK-Build

 Use PreOrderOnceWalker where feasible
 -

 Key: HIVE-11569
 URL: https://issues.apache.org/jira/browse/HIVE-11569
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-11569.patch, HIVE-11569.patch


 Because of its early exit criteria it has better performance than 
 {{PreOrderWalker}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3725) Add support for pulling HBase columns with prefixes

2015-08-17 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699959#comment-14699959
 ] 

Lefty Leverenz commented on HIVE-3725:
--

Done.  Welcome to the Hive wiki team, Swarnim.

 Add support for pulling HBase columns with prefixes
 ---

 Key: HIVE-3725
 URL: https://issues.apache.org/jira/browse/HIVE-3725
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.9.0
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
  Labels: TODOC12
 Fix For: 0.12.0

 Attachments: HIVE-3725.1.patch.txt, HIVE-3725.2.patch.txt, 
 HIVE-3725.3.patch.txt, HIVE-3725.4.patch.txt, HIVE-3725.patch.3.txt


 Current HBase Hive integration supports reading many values from the same row 
 by specifying a column family. And specifying just the column family can pull 
 in all qualifiers within the family.
 We should add in support to be able to specify a prefix for the qualifier and 
 all columns that start with the prefix would automatically get pulled in. A 
 wildcard support would be ideal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11581) HiveServer2 should store connection params in ZK when using dynamic service discovery for simpler client connection string.

2015-08-17 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-11581:

Affects Version/s: 2.0.0
   1.3.0

 HiveServer2 should store connection params in ZK when using dynamic service 
 discovery for simpler client connection string.
 ---

 Key: HIVE-11581
 URL: https://issues.apache.org/jira/browse/HIVE-11581
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 1.3.0, 2.0.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta

 Currently, the client needs to specify several parameters based on which an 
 appropriate connection is created with the server. In case of dynamic service 
 discovery, when multiple HS2 instances are running, it is much more usable 
 for the server to add its config parameters to ZK which the driver can use to 
 configure the connection, instead of the jdbc/odbc user adding those in 
 connection string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11569) Use PreOrderOnceWalker where feasible

2015-08-17 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11569:

Attachment: HIVE-11569.patch

 Use PreOrderOnceWalker where feasible
 -

 Key: HIVE-11569
 URL: https://issues.apache.org/jira/browse/HIVE-11569
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-11569.patch, HIVE-11569.patch


 Because of its early exit criteria it has better performance than 
 {{PreOrderWalker}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11383) Upgrade Hive to Calcite 1.4

2015-08-17 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11383:
---
Attachment: HIVE-11383.9.patch

 Upgrade Hive to Calcite 1.4
 ---

 Key: HIVE-11383
 URL: https://issues.apache.org/jira/browse/HIVE-11383
 Project: Hive
  Issue Type: Bug
Reporter: Julian Hyde
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11383.1.patch, HIVE-11383.2.patch, 
 HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, 
 HIVE-11383.4.patch, HIVE-11383.5.patch, HIVE-11383.6.patch, 
 HIVE-11383.7.patch, HIVE-11383.8.patch, HIVE-11383.8.patch, HIVE-11383.9.patch


 CLEAR LIBRARY CACHE
 Upgrade Hive to Calcite 1.4.0-incubating.
 There is currently a snapshot release, which is close to what will be in 1.4. 
 I have checked that Hive compiles against the new snapshot, fixing one issue. 
 The patch is attached.
 Next step is to validate that Hive runs against the new Calcite, and post any 
 issues to the Calcite list or log Calcite Jira cases. [~jcamachorodriguez], 
 can you please do that.
 [~pxiong], I gather you are dependent on CALCITE-814, which will be fixed in 
 the new Calcite version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11424) Improve HivePreFilteringRule performance

2015-08-17 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11424:
---
Attachment: HIVE-11424.01.patch

 Improve HivePreFilteringRule performance
 

 Key: HIVE-11424
 URL: https://issues.apache.org/jira/browse/HIVE-11424
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, 
 HIVE-11424.patch


 We create a rule that will transform OR clauses into IN clauses (when 
 possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]

2015-08-17 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699156#comment-14699156
 ] 

Ferdinand Xu commented on HIVE-11579:
-

I came up with this error when I did some smoke tests. The root cause is that 
the standard error output is closed when the Hive Statement is closed. I can 
also reproduce this bug using the upstream code.  

 Invoke the set command will close standard error output[beeline-cli]
 

 Key: HIVE-11579
 URL: https://issues.apache.org/jira/browse/HIVE-11579
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu

 We can easily reproduce the debug by the following steps:
 {code}
 hive set system:xx=yy;
 hive lss;
 hive 
 {code}
 The error output disappeared since the err outputstream is closed when 
 closing the Hive statement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]

2015-08-17 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699158#comment-14699158
 ] 

Ferdinand Xu commented on HIVE-11579:
-

We can reproduce this error using beeline based on the package built by 
upstream.

 Invoke the set command will close standard error output[beeline-cli]
 

 Key: HIVE-11579
 URL: https://issues.apache.org/jira/browse/HIVE-11579
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu

 We can easily reproduce the debug by the following steps:
 {code}
 hive set system:xx=yy;
 hive lss;
 hive 
 {code}
 The error output disappeared since the err outputstream is closed when 
 closing the Hive statement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11375) Broken processing of queries containing NOT (x IS NOT NULL and x 0)

2015-08-17 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11375:

Attachment: HIVE-11375.2.patch

 Broken processing of queries containing NOT (x IS NOT NULL and x  0)
 --

 Key: HIVE-11375
 URL: https://issues.apache.org/jira/browse/HIVE-11375
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 2.0.0
Reporter: Mariusz Sakowski
Assignee: Aihua Xu
 Fix For: 2.0.0

 Attachments: HIVE-11375.2.patch, HIVE-11375.patch


 When running query like this:
 {code}explain select * from test where (val is not null and val  0);{code}
 hive will simplify expression in parenthesis and omit is not null check:
 {code}
   Filter Operator
 predicate: (val  0) (type: boolean)
 {code}
 which is fine.
 but if we negate condition using NOT operator:
 {code}explain select * from test where not (val is not null and val  
 0);{code}
 hive will also simplify thing, but now it will break stuff:
 {code}
   Filter Operator
 predicate: (not (val  0)) (type: boolean)
 {code}
 because valid predicate should be *val == 0 or val is null*, while above row 
 is equivalent to *val == 0* only, filtering away rows where val is null
 simple example:
 {code}
 CREATE TABLE example (
 val bigint
 );
 INSERT INTO example VALUES (1), (NULL), (0);
 -- returns 2 rows - NULL and 0
 select * from example where (val is null or val == 0);
 -- returns 1 row - 0
 select * from example where not (val is not null and val  0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11580) ThriftUnionObjectInspector#toString throws NPE

2015-08-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700084#comment-14700084
 ] 

Hive QA commented on HIVE-11580:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12750851/HIVE-11580.1.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9370 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_acid
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithPerformanceMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithVerboseMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithPerformanceMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithVerboseMode
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4988/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4988/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4988/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12750851 - PreCommit-HIVE-TRUNK-Build

 ThriftUnionObjectInspector#toString throws NPE
 --

 Key: HIVE-11580
 URL: https://issues.apache.org/jira/browse/HIVE-11580
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11580.1.patch


 ThriftUnionObjectInspector uses toString from StructObjectInspector, which 
 accesses uninitialized member variable fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11341) Avoid expensive resizing of ASTNode tree

2015-08-17 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11341:
-
Attachment: HIVE-11341.8.patch

 Avoid expensive resizing of ASTNode tree 
 -

 Key: HIVE-11341
 URL: https://issues.apache.org/jira/browse/HIVE-11341
 Project: Hive
  Issue Type: Bug
  Components: Hive, Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-11341.1.patch, HIVE-11341.2.patch, 
 HIVE-11341.3.patch, HIVE-11341.4.patch, HIVE-11341.5.patch, 
 HIVE-11341.6.patch, HIVE-11341.7.patch, HIVE-11341.8.patch


 {code}
 Stack TraceSample CountPercentage(%) 
 parse.BaseSemanticAnalyzer.analyze(ASTNode, Context)   1,605   90 
parse.CalcitePlanner.analyzeInternal(ASTNode)   1,605   90 
   parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
 SemanticAnalyzer$PlannerContext) 1,605   90 
  parse.CalcitePlanner.genOPTree(ASTNode, 
 SemanticAnalyzer$PlannerContext)  1,604   90 
 parse.SemanticAnalyzer.genOPTree(ASTNode, 
 SemanticAnalyzer$PlannerContext) 1,604   90 
parse.SemanticAnalyzer.genPlan(QB)  1,604   90 
   parse.SemanticAnalyzer.genPlan(QB, boolean)  1,604   90 
  parse.SemanticAnalyzer.genBodyPlan(QB, Operator, Map)
  1,604   90 
 parse.SemanticAnalyzer.genFilterPlan(ASTNode, QB, 
 Operator, Map, boolean)  1,603   90 
parse.SemanticAnalyzer.genFilterPlan(QB, ASTNode, 
 Operator, boolean)1,603   90 
   parse.SemanticAnalyzer.genExprNodeDesc(ASTNode, 
 RowResolver, boolean)1,603   90 
  
 parse.SemanticAnalyzer.genExprNodeDesc(ASTNode, RowResolver, TypeCheckCtx)
 1,603   90 
 
 parse.SemanticAnalyzer.genAllExprNodeDesc(ASTNode, RowResolver, TypeCheckCtx) 
  1,603   90 

 parse.TypeCheckProcFactory.genExprNode(ASTNode, TypeCheckCtx)   1,603   90 
   
 parse.TypeCheckProcFactory.genExprNode(ASTNode, TypeCheckCtx, 
 TypeCheckProcFactory)  1,603   90 
  
 lib.DefaultGraphWalker.startWalking(Collection, HashMap)  1,579   89 
 
 lib.DefaultGraphWalker.walk(Node)  1,571   89 

 java.util.ArrayList.removeAll(Collection)   1,433   81 
   
 java.util.ArrayList.batchRemove(Collection, boolean) 1,433   81 
  
 java.util.ArrayList.contains(Object)  1,228   69 
 
 java.util.ArrayList.indexOf(Object)1,228   69 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-08-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11587:

Description: 
Due to the legacy in in-memory mapjoin and conservative planning, the memory 
estimation code for mapjoin hashtable is currently not very good. It allocates 
the probe erring on the side of more memory, not taking data into account 
because unlike the probe, it's free to resize, so it's better for perf to 
allocate big and hope for the best on data size. It is not true for hybrid case.
There's code to cap the initial allocation based on memory available (memUsage 
argument), but due to some code rot, the memory estimates from planning are not 
even passed to hashtable anymore (there used to be two config settings, 
hashjoin size fraction by itself, or hashjoin size fraction for group by case), 
so it never caps the memory anymore below 1 Gb. 
Initial capacity is estimated from input key count, and in hybrid join cache 
can exceed Java memory due to number of segments.

There needs to be a review and fix of all this code.
Suggested improvements:
1) Make sure initialCapacity argument from Hybrid case is correct given the 
number of segments. See how it's calculated from keys for regular case; it 
needs to be adjusted accordingly for hybrid case if not done already.
1.5) Note that, knowing the number of rows, the maximum capacity one will ever 
need for probe size (in longs) is row count (assuming key per row, i.e. maximum 
possible number of keys) divided by load factor, plus some very small number to 
round up. That is for flat case. For hybrid case it may be more complex due to 
skew, but that is still a good upper bound for the total probe capacity of all 
segments.
2) Rename memSize to maxProbeSize, or something, make sure it's passed 
correctly based on estimates that take into account both probe and data size, 
esp. in hybrid case.
3) Make sure that memory estimation for hybrid case also doesn't come up with 
numbers that are too small, like 1-byte hashtable. I am not very familiar with 
that code but it has happened in the past.

Other issues we have seen:
4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
should not allocate large array in advance. Even if some estimate passes 500Mb 
or 40Mb of whatever, it doesn't make sense to allocate that.
5) For hybrid, don't pre-allocate WBs - only pre-allocate on write.
6) Change everywhere rounding up to power of two is used to rounding down, at 
least for hybrid case (?)

I wanted to put all of this in single JIRA so we could keep track of fixing all 
of this.
I think there are JIRAs for some of this already, feel free to link them to 
this one.

  was:
Due to the legacy in in-memory mapjoin and conservative planning, the memory 
estimation code for mapjoin hashtable is currently not very good. It allocates 
the probe erring on the side of more memory, not taking data into account 
because unlike the probe, it's free to resize, so it's better for perf to 
allocate big and hope for the best on data size. It is not true for hybrid case.
There's code to cap the initial allocation based on memory available (memSize 
argument), but due to some code rot, the memory estimates from planning are not 
even passed to hashtable anymore (there used to be two config settings, 
hashjoin size fraction by itself, or hashjoin size fraction for group by case), 
so it never caps the memory anymore below 1 Gb. 
Initial capacity is estimated from input key count, and in hybrid join cache 
can exceed Java memory due to number of segments.

There needs to be a review and fix of all this code.
Suggested improvements:
1) Make sure initialCapacity argument from Hybrid case is correct given the 
number of segments. See how it's calculated from keys for regular case; it 
needs to be adjusted accordingly for hybrid case if not done already.
2) Rename memSize to maxProbeSize, or something, make sure it's passed 
correctly based on estimates that take into account both probe and data size, 
esp. in hybrid case.
3) Cap single write buffer size to 8-16Mb.
4) For hybrid, don't pre-allocate WBs - only pre-allocate on write.
5) Change everywhere rounding up to power of two is used to rounding down, at 
least for hybrid case (?)



 Fix memory estimates for mapjoin hashtable
 --

 Key: HIVE-11587
 URL: https://issues.apache.org/jira/browse/HIVE-11587
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Wei Zheng

 Due to the legacy in in-memory mapjoin and conservative planning, the memory 
 estimation code for mapjoin hashtable is currently not very good. It 
 allocates the probe erring on the side of more memory, not taking data into 
 account because unlike the probe, it's free to resize, so it's better for 
 perf to 

[jira] [Updated] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-08-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11587:

Description: 
Due to the legacy in in-memory mapjoin and conservative planning, the memory 
estimation code for mapjoin hashtable is currently not very good. It allocates 
the probe erring on the side of more memory, not taking data into account 
because unlike the probe, it's free to resize, so it's better for perf to 
allocate big probe and hope for the best with regard to future data size. It is 
not true for hybrid case.
There's code to cap the initial allocation based on memory available (memUsage 
argument), but due to some code rot, the memory estimates from planning are not 
even passed to hashtable anymore (there used to be two config settings, 
hashjoin size fraction by itself, or hashjoin size fraction for group by case), 
so it never caps the memory anymore below 1 Gb. 
Initial capacity is estimated from input key count, and in hybrid join cache 
can exceed Java memory due to number of segments.

There needs to be a review and fix of all this code.
Suggested improvements:
1) Make sure initialCapacity argument from Hybrid case is correct given the 
number of segments. See how it's calculated from keys for regular case; it 
needs to be adjusted accordingly for hybrid case if not done already.
1.5) Note that, knowing the number of rows, the maximum capacity one will ever 
need for probe size (in longs) is row count (assuming key per row, i.e. maximum 
possible number of keys) divided by load factor, plus some very small number to 
round up. That is for flat case. For hybrid case it may be more complex due to 
skew, but that is still a good upper bound for the total probe capacity of all 
segments.
2) Rename memSize to maxProbeSize, or something, make sure it's passed 
correctly based on estimates that take into account both probe and data size, 
esp. in hybrid case.
3) Make sure that memory estimation for hybrid case also doesn't come up with 
numbers that are too small, like 1-byte hashtable. I am not very familiar with 
that code but it has happened in the past.

Other issues we have seen:
4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
should not allocate large array in advance. Even if some estimate passes 500Mb 
or 40Mb of whatever, it doesn't make sense to allocate that.
5) For hybrid, don't pre-allocate WBs - only pre-allocate on write.
6) Change everywhere rounding up to power of two is used to rounding down, at 
least for hybrid case (?)

I wanted to put all of this in single JIRA so we could keep track of fixing all 
of this.
I think there are JIRAs for some of this already, feel free to link them to 
this one.

  was:
Due to the legacy in in-memory mapjoin and conservative planning, the memory 
estimation code for mapjoin hashtable is currently not very good. It allocates 
the probe erring on the side of more memory, not taking data into account 
because unlike the probe, it's free to resize, so it's better for perf to 
allocate big and hope for the best on data size. It is not true for hybrid case.
There's code to cap the initial allocation based on memory available (memUsage 
argument), but due to some code rot, the memory estimates from planning are not 
even passed to hashtable anymore (there used to be two config settings, 
hashjoin size fraction by itself, or hashjoin size fraction for group by case), 
so it never caps the memory anymore below 1 Gb. 
Initial capacity is estimated from input key count, and in hybrid join cache 
can exceed Java memory due to number of segments.

There needs to be a review and fix of all this code.
Suggested improvements:
1) Make sure initialCapacity argument from Hybrid case is correct given the 
number of segments. See how it's calculated from keys for regular case; it 
needs to be adjusted accordingly for hybrid case if not done already.
1.5) Note that, knowing the number of rows, the maximum capacity one will ever 
need for probe size (in longs) is row count (assuming key per row, i.e. maximum 
possible number of keys) divided by load factor, plus some very small number to 
round up. That is for flat case. For hybrid case it may be more complex due to 
skew, but that is still a good upper bound for the total probe capacity of all 
segments.
2) Rename memSize to maxProbeSize, or something, make sure it's passed 
correctly based on estimates that take into account both probe and data size, 
esp. in hybrid case.
3) Make sure that memory estimation for hybrid case also doesn't come up with 
numbers that are too small, like 1-byte hashtable. I am not very familiar with 
that code but it has happened in the past.

Other issues we have seen:
4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
should not allocate large array in advance. Even if some estimate passes 500Mb 
or 40Mb of whatever, 

[jira] [Updated] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-08-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11587:

Description: 
Due to the legacy in in-memory mapjoin and conservative planning, the memory 
estimation code for mapjoin hashtable is currently not very good. It allocates 
the probe erring on the side of more memory, not taking data into account 
because unlike the probe, it's free to resize, so it's better for perf to 
allocate big probe and hope for the best with regard to future data size. It is 
not true for hybrid case.
There's code to cap the initial allocation based on memory available (memUsage 
argument), but due to some code rot, the memory estimates from planning are not 
even passed to hashtable anymore (there used to be two config settings, 
hashjoin size fraction by itself, or hashjoin size fraction for group by case), 
so it never caps the memory anymore below 1 Gb. 
Initial capacity is estimated from input key count, and in hybrid join cache 
can exceed Java memory due to number of segments.

There needs to be a review and fix of all this code.
Suggested improvements:
1) Make sure initialCapacity argument from Hybrid case is correct given the 
number of segments. See how it's calculated from keys for regular case; it 
needs to be adjusted accordingly for hybrid case if not done already.
1.5) Note that, knowing the number of rows, the maximum capacity one will ever 
need for probe size (in longs) is row count (assuming key per row, i.e. maximum 
possible number of keys) divided by load factor, plus some very small number to 
round up. That is for flat case. For hybrid case it may be more complex due to 
skew, but that is still a good upper bound for the total probe capacity of all 
segments.
2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
correctly based on estimates that take into account both probe and data size, 
esp. in hybrid case.
3) Make sure that memory estimation for hybrid case also doesn't come up with 
numbers that are too small, like 1-byte hashtable. I am not very familiar with 
that code but it has happened in the past.

Other issues we have seen:
4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
should not allocate large array in advance. Even if some estimate passes 500Mb 
or 40Mb of whatever, it doesn't make sense to allocate that.
5) For hybrid, don't pre-allocate WBs - only pre-allocate on write.
6) Change everywhere rounding up to power of two is used to rounding down, at 
least for hybrid case (?)

I wanted to put all of this in single JIRA so we could keep track of fixing all 
of this.
I think there are JIRAs for some of this already, feel free to link them to 
this one.

  was:
Due to the legacy in in-memory mapjoin and conservative planning, the memory 
estimation code for mapjoin hashtable is currently not very good. It allocates 
the probe erring on the side of more memory, not taking data into account 
because unlike the probe, it's free to resize, so it's better for perf to 
allocate big probe and hope for the best with regard to future data size. It is 
not true for hybrid case.
There's code to cap the initial allocation based on memory available (memUsage 
argument), but due to some code rot, the memory estimates from planning are not 
even passed to hashtable anymore (there used to be two config settings, 
hashjoin size fraction by itself, or hashjoin size fraction for group by case), 
so it never caps the memory anymore below 1 Gb. 
Initial capacity is estimated from input key count, and in hybrid join cache 
can exceed Java memory due to number of segments.

There needs to be a review and fix of all this code.
Suggested improvements:
1) Make sure initialCapacity argument from Hybrid case is correct given the 
number of segments. See how it's calculated from keys for regular case; it 
needs to be adjusted accordingly for hybrid case if not done already.
1.5) Note that, knowing the number of rows, the maximum capacity one will ever 
need for probe size (in longs) is row count (assuming key per row, i.e. maximum 
possible number of keys) divided by load factor, plus some very small number to 
round up. That is for flat case. For hybrid case it may be more complex due to 
skew, but that is still a good upper bound for the total probe capacity of all 
segments.
2) Rename memSize to maxProbeSize, or something, make sure it's passed 
correctly based on estimates that take into account both probe and data size, 
esp. in hybrid case.
3) Make sure that memory estimation for hybrid case also doesn't come up with 
numbers that are too small, like 1-byte hashtable. I am not very familiar with 
that code but it has happened in the past.

Other issues we have seen:
4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
should not allocate large array in advance. Even if some estimate passes 

[jira] [Updated] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-08-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11587:

Description: 
Due to the legacy in in-memory mapjoin and conservative planning, the memory 
estimation code for mapjoin hashtable is currently not very good. It allocates 
the probe erring on the side of more memory, not taking data into account 
because unlike the probe, it's free to resize, so it's better for perf to 
allocate big probe and hope for the best with regard to future data size. It is 
not true for hybrid case.
There's code to cap the initial allocation based on memory available (memUsage 
argument), but due to some code rot, the memory estimates from planning are not 
even passed to hashtable anymore (there used to be two config settings, 
hashjoin size fraction by itself, or hashjoin size fraction for group by case), 
so it never caps the memory anymore below 1 Gb. 
Initial capacity is estimated from input key count, and in hybrid join cache 
can exceed Java memory due to number of segments.

There needs to be a review and fix of all this code.
Suggested improvements:
1) Make sure initialCapacity argument from Hybrid case is correct given the 
number of segments. See how it's calculated from keys for regular case; it 
needs to be adjusted accordingly for hybrid case if not done already.
1.5) Note that, knowing the number of rows, the maximum capacity one will ever 
need for probe size (in longs) is row count (assuming key per row, i.e. maximum 
possible number of keys) divided by load factor, plus some very small number to 
round up. That is for flat case. For hybrid case it may be more complex due to 
skew, but that is still a good upper bound for the total probe capacity of all 
segments.
2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
correctly based on estimates that take into account both probe and data size, 
esp. in hybrid case.
3) Make sure that memory estimation for hybrid case also doesn't come up with 
numbers that are too small, like 1-byte hashtable. I am not very familiar with 
that code but it has happened in the past.

Other issues we have seen:
4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
should not allocate large array in advance. Even if some estimate passes 500Mb 
or 40Mb or whatever, it doesn't make sense to allocate that.
5) For hybrid, don't pre-allocate WBs - only pre-allocate on write.
6) Change everywhere rounding up to power of two is used to rounding down, at 
least for hybrid case (?)

I wanted to put all of this in single JIRA so we could keep track of fixing all 
of this.
I think there are JIRAs for some of this already, feel free to link them to 
this one.

  was:
Due to the legacy in in-memory mapjoin and conservative planning, the memory 
estimation code for mapjoin hashtable is currently not very good. It allocates 
the probe erring on the side of more memory, not taking data into account 
because unlike the probe, it's free to resize, so it's better for perf to 
allocate big probe and hope for the best with regard to future data size. It is 
not true for hybrid case.
There's code to cap the initial allocation based on memory available (memUsage 
argument), but due to some code rot, the memory estimates from planning are not 
even passed to hashtable anymore (there used to be two config settings, 
hashjoin size fraction by itself, or hashjoin size fraction for group by case), 
so it never caps the memory anymore below 1 Gb. 
Initial capacity is estimated from input key count, and in hybrid join cache 
can exceed Java memory due to number of segments.

There needs to be a review and fix of all this code.
Suggested improvements:
1) Make sure initialCapacity argument from Hybrid case is correct given the 
number of segments. See how it's calculated from keys for regular case; it 
needs to be adjusted accordingly for hybrid case if not done already.
1.5) Note that, knowing the number of rows, the maximum capacity one will ever 
need for probe size (in longs) is row count (assuming key per row, i.e. maximum 
possible number of keys) divided by load factor, plus some very small number to 
round up. That is for flat case. For hybrid case it may be more complex due to 
skew, but that is still a good upper bound for the total probe capacity of all 
segments.
2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
correctly based on estimates that take into account both probe and data size, 
esp. in hybrid case.
3) Make sure that memory estimation for hybrid case also doesn't come up with 
numbers that are too small, like 1-byte hashtable. I am not very familiar with 
that code but it has happened in the past.

Other issues we have seen:
4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
should not allocate large array in advance. Even if some estimate passes 

[jira] [Commented] (HIVE-11375) Broken processing of queries containing NOT (x IS NOT NULL and x 0)

2015-08-17 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700256#comment-14700256
 ] 

Aihua Xu commented on HIVE-11375:
-

[~ashutoshc]  I notice that GenericUDFOPEqualOrGreaterThan's flip() is 
GenericUDFOPEqualOrLessThan() rather than GenericUDFOPLessThan(). Also the same 
for the other ops. Do you know what the flip() here really meant to be? Is that 
a bug or intentional?

 Broken processing of queries containing NOT (x IS NOT NULL and x  0)
 --

 Key: HIVE-11375
 URL: https://issues.apache.org/jira/browse/HIVE-11375
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 2.0.0
Reporter: Mariusz Sakowski
Assignee: Aihua Xu
 Fix For: 2.0.0

 Attachments: HIVE-11375.2.patch, HIVE-11375.patch


 When running query like this:
 {code}explain select * from test where (val is not null and val  0);{code}
 hive will simplify expression in parenthesis and omit is not null check:
 {code}
   Filter Operator
 predicate: (val  0) (type: boolean)
 {code}
 which is fine.
 but if we negate condition using NOT operator:
 {code}explain select * from test where not (val is not null and val  
 0);{code}
 hive will also simplify thing, but now it will break stuff:
 {code}
   Filter Operator
 predicate: (not (val  0)) (type: boolean)
 {code}
 because valid predicate should be *val == 0 or val is null*, while above row 
 is equivalent to *val == 0* only, filtering away rows where val is null
 simple example:
 {code}
 CREATE TABLE example (
 val bigint
 );
 INSERT INTO example VALUES (1), (NULL), (0);
 -- returns 2 rows - NULL and 0
 select * from example where (val is null or val == 0);
 -- returns 1 row - 0
 select * from example where not (val is not null and val  0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-08-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11587:

Description: 
Due to the legacy in in-memory mapjoin and conservative planning, the memory 
estimation code for mapjoin hashtable is currently not very good. It allocates 
the probe erring on the side of more memory, not taking data into account 
because unlike the probe, it's free to resize, so it's better for perf to 
allocate big probe and hope for the best with regard to future data size. It is 
not true for hybrid case.
There's code to cap the initial allocation based on memory available (memUsage 
argument), but due to some code rot, the memory estimates from planning are not 
even passed to hashtable anymore (there used to be two config settings, 
hashjoin size fraction by itself, or hashjoin size fraction for group by case), 
so it never caps the memory anymore below 1 Gb. 
Initial capacity is estimated from input key count, and in hybrid join cache 
can exceed Java memory due to number of segments.

There needs to be a review and fix of all this code.
Suggested improvements:
1) Make sure initialCapacity argument from Hybrid case is correct given the 
number of segments. See how it's calculated from keys for regular case; it 
needs to be adjusted accordingly for hybrid case if not done already.
1.5) Note that, knowing the number of rows, the maximum capacity one will ever 
need for probe size (in longs) is row count (assuming key per row, i.e. maximum 
possible number of keys) divided by load factor, plus some very small number to 
round up. That is for flat case. For hybrid case it may be more complex due to 
skew, but that is still a good upper bound for the total probe capacity of all 
segments.
2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
correctly based on estimates that take into account both probe and data size, 
esp. in hybrid case.
3) Make sure that memory estimation for hybrid case also doesn't come up with 
numbers that are too small, like 1-byte hashtable. I am not very familiar with 
that code but it has happened in the past.

Other issues we have seen:
4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
should not allocate large array in advance. Even if some estimate passes 500Mb 
or 40Mb or whatever, it doesn't make sense to allocate that.
5) For hybrid, don't pre-allocate WBs - only allocate on write.
6) Change everywhere rounding up to power of two is used to rounding down, at 
least for hybrid case (?)

I wanted to put all of these items in single JIRA so we could keep track of 
fixing all of them.
I think there are JIRAs for some of these already, feel free to link them to 
this one.

  was:
Due to the legacy in in-memory mapjoin and conservative planning, the memory 
estimation code for mapjoin hashtable is currently not very good. It allocates 
the probe erring on the side of more memory, not taking data into account 
because unlike the probe, it's free to resize, so it's better for perf to 
allocate big probe and hope for the best with regard to future data size. It is 
not true for hybrid case.
There's code to cap the initial allocation based on memory available (memUsage 
argument), but due to some code rot, the memory estimates from planning are not 
even passed to hashtable anymore (there used to be two config settings, 
hashjoin size fraction by itself, or hashjoin size fraction for group by case), 
so it never caps the memory anymore below 1 Gb. 
Initial capacity is estimated from input key count, and in hybrid join cache 
can exceed Java memory due to number of segments.

There needs to be a review and fix of all this code.
Suggested improvements:
1) Make sure initialCapacity argument from Hybrid case is correct given the 
number of segments. See how it's calculated from keys for regular case; it 
needs to be adjusted accordingly for hybrid case if not done already.
1.5) Note that, knowing the number of rows, the maximum capacity one will ever 
need for probe size (in longs) is row count (assuming key per row, i.e. maximum 
possible number of keys) divided by load factor, plus some very small number to 
round up. That is for flat case. For hybrid case it may be more complex due to 
skew, but that is still a good upper bound for the total probe capacity of all 
segments.
2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
correctly based on estimates that take into account both probe and data size, 
esp. in hybrid case.
3) Make sure that memory estimation for hybrid case also doesn't come up with 
numbers that are too small, like 1-byte hashtable. I am not very familiar with 
that code but it has happened in the past.

Other issues we have seen:
4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
should not allocate large array in advance. Even if some estimate 

[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-08-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700257#comment-14700257
 ] 

Sergey Shelukhin commented on HIVE-11587:
-

[~mmokhtar] [~mmccline] [~gopalv] fyi

 Fix memory estimates for mapjoin hashtable
 --

 Key: HIVE-11587
 URL: https://issues.apache.org/jira/browse/HIVE-11587
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Wei Zheng

 Due to the legacy in in-memory mapjoin and conservative planning, the memory 
 estimation code for mapjoin hashtable is currently not very good. It 
 allocates the probe erring on the side of more memory, not taking data into 
 account because unlike the probe, it's free to resize, so it's better for 
 perf to allocate big probe and hope for the best with regard to future data 
 size. It is not true for hybrid case.
 There's code to cap the initial allocation based on memory available 
 (memUsage argument), but due to some code rot, the memory estimates from 
 planning are not even passed to hashtable anymore (there used to be two 
 config settings, hashjoin size fraction by itself, or hashjoin size fraction 
 for group by case), so it never caps the memory anymore below 1 Gb. 
 Initial capacity is estimated from input key count, and in hybrid join cache 
 can exceed Java memory due to number of segments.
 There needs to be a review and fix of all this code.
 Suggested improvements:
 1) Make sure initialCapacity argument from Hybrid case is correct given the 
 number of segments. See how it's calculated from keys for regular case; it 
 needs to be adjusted accordingly for hybrid case if not done already.
 1.5) Note that, knowing the number of rows, the maximum capacity one will 
 ever need for probe size (in longs) is row count (assuming key per row, i.e. 
 maximum possible number of keys) divided by load factor, plus some very small 
 number to round up. That is for flat case. For hybrid case it may be more 
 complex due to skew, but that is still a good upper bound for the total probe 
 capacity of all segments.
 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
 correctly based on estimates that take into account both probe and data size, 
 esp. in hybrid case.
 3) Make sure that memory estimation for hybrid case also doesn't come up with 
 numbers that are too small, like 1-byte hashtable. I am not very familiar 
 with that code but it has happened in the past.
 Other issues we have seen:
 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
 should not allocate large array in advance. Even if some estimate passes 
 500Mb or 40Mb or whatever, it doesn't make sense to allocate that.
 5) For hybrid, don't pre-allocate WBs - only allocate on write.
 6) Change everywhere rounding up to power of two is used to rounding down, at 
 least for hybrid case (?)
 I wanted to put all of these items in single JIRA so we could keep track of 
 fixing all of them.
 I think there are JIRAs for some of these already, feel free to link them to 
 this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11585) Explicitly set pmf.setDetachAllOnCommit on metastore unless configured otherwise

2015-08-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700267#comment-14700267
 ] 

Sergey Shelukhin commented on HIVE-11585:
-

There may be implications of that in metastore code... IIRC all the MBlah 
objects become invalid when detached, which can be an epic PITA. Fixing this 
will need some testing.

 Explicitly set pmf.setDetachAllOnCommit on metastore unless configured 
 otherwise
 

 Key: HIVE-11585
 URL: https://issues.apache.org/jira/browse/HIVE-11585
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan

 datanucleus.detachAllOnCommit has a default value of false. However, we've 
 observed a number of objects (especially FieldSchema objects) being retained  
 that causes us OOM issues on the metastore. Hive should prefer using a 
 default of datanucleus.detachAllOnCommit as true, unless otherwise explicitly 
 overridden by users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-8398) ExprNodeColumnDesc cannot be cast to ExprNodeConstantDesc

2015-08-17 Thread Akshay Goyal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay Goyal reassigned HIVE-8398:
--

Assignee: Akshay Goyal  (was: Zhichun Wu)

 ExprNodeColumnDesc cannot be cast to ExprNodeConstantDesc
 -

 Key: HIVE-8398
 URL: https://issues.apache.org/jira/browse/HIVE-8398
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0
Reporter: Zhichun Wu
Assignee: Akshay Goyal
 Attachments: HIVE-8398.2.patch, HIVE-8398.patch


 The following explain statement would fail  in hive 0.13 and trunk
 with ExprNodeColumnDesc cannot be cast to ExprNodeConstantDesc exception:
 {code}
 create table test.t2( key string, value int);
 explain select
sum(u.value) value
 from test.t2 u
 group by u.key
 having sum(u.value)  30;
 {code}
 The full stack trace:
 {code}
 java.lang.ClassCastException: 
 org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc cannot be cast to 
 org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc
 at 
 org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1067)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:184)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:9561)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:9517)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:9488)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:2314)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:2295)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genHavingPlan(SemanticAnalyzer.java:2139)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8170)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8133)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8963)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9216)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:422)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
 at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}
 I think it's due to HIVE-3107.  HIVE-3107 introduces alternate mapping for a 
 column in RowResolver.  While mapping the having clause in 
 TypeCheckProcFactory, it first maps value to col_1(output of groupby 
 clause) which has type of  ExprNodeColumnDesc (Before HIVE-3107, value is 
 not recognized).  When it comes to u.value, it finds that u is a table 
 alias but fails to cast nodeOutputs\[1\] to ExprNodeConstantDesc.  
 Here I think we can use  the text attribute in the expr node as colAlias 
 instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8398) ExprNodeColumnDesc cannot be cast to ExprNodeConstantDesc

2015-08-17 Thread Akshay Goyal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699093#comment-14699093
 ] 

Akshay Goyal commented on HIVE-8398:


It seems this has been fixed as part of HIVE-9867. The explain statement 
mentioned by [~wzc1989] in description is getting passed with the latest 
master. 

 ExprNodeColumnDesc cannot be cast to ExprNodeConstantDesc
 -

 Key: HIVE-8398
 URL: https://issues.apache.org/jira/browse/HIVE-8398
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0
Reporter: Zhichun Wu
Assignee: Akshay Goyal
 Attachments: HIVE-8398.2.patch, HIVE-8398.patch


 The following explain statement would fail  in hive 0.13 and trunk
 with ExprNodeColumnDesc cannot be cast to ExprNodeConstantDesc exception:
 {code}
 create table test.t2( key string, value int);
 explain select
sum(u.value) value
 from test.t2 u
 group by u.key
 having sum(u.value)  30;
 {code}
 The full stack trace:
 {code}
 java.lang.ClassCastException: 
 org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc cannot be cast to 
 org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc
 at 
 org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1067)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:184)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:9561)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:9517)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:9488)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:2314)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:2295)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genHavingPlan(SemanticAnalyzer.java:2139)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8170)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8133)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8963)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9216)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:422)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
 at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}
 I think it's due to HIVE-3107.  HIVE-3107 introduces alternate mapping for a 
 column in RowResolver.  While mapping the having clause in 
 TypeCheckProcFactory, it first maps value to col_1(output of groupby 
 clause) which has type of  ExprNodeColumnDesc (Before HIVE-3107, value is 
 not recognized).  When it comes to u.value, it finds that u is a table 
 alias but fails to cast nodeOutputs\[1\] to ExprNodeConstantDesc.  
 Here I think we can use  the text attribute in the expr node as colAlias 
 instead.



[jira] [Commented] (HIVE-11429) Increase default JDBC result set fetch size (# rows it fetches in one RPC call) to 1000 from 50

2015-08-17 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700179#comment-14700179
 ] 

Thejas M Nair commented on HIVE-11429:
--

The test failures are not related.

 Increase default JDBC result set fetch size (# rows it fetches in one RPC 
 call) to 1000 from 50
 ---

 Key: HIVE-11429
 URL: https://issues.apache.org/jira/browse/HIVE-11429
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.2.1
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Attachments: HIVE-11429.1.patch


 This is in addition to HIVE-10982 which plans to make the fetch size 
 customizable. This just bumps the default to 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11575) Fix test failures in master due to log4j changes

2015-08-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700188#comment-14700188
 ] 

Sergey Shelukhin commented on HIVE-11575:
-

+1

 Fix test failures in master due to log4j changes
 

 Key: HIVE-11575
 URL: https://issues.apache.org/jira/browse/HIVE-11575
 Project: Hive
  Issue Type: Sub-task
  Components: Logging
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 2.0.0

 Attachments: HIVE-11575.patch


 Some of the recent commits related to HIVE-11304 is causing test failures in 
 master. Following tests are failing
 {code}
 org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithVerboseMode
 org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithPerformanceMode
 org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithVerboseMode
 org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithPerformanceMode
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11530) push limit thru outer join

2015-08-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700218#comment-14700218
 ] 

Sergey Shelukhin commented on HIVE-11530:
-

Done! Thanks for working on this.

 push limit thru outer join
 --

 Key: HIVE-11530
 URL: https://issues.apache.org/jira/browse/HIVE-11530
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer
Reporter: Sergey Shelukhin
Assignee: Yuya OZAWA

 When the query has a left or right outer join with limit, we can push the 
 limit into the left/right side of the join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11530) push limit thru outer join

2015-08-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11530:

Assignee: Yuya OZAWA

 push limit thru outer join
 --

 Key: HIVE-11530
 URL: https://issues.apache.org/jira/browse/HIVE-11530
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer
Reporter: Sergey Shelukhin
Assignee: Yuya OZAWA

 When the query has a left or right outer join with limit, we can push the 
 limit into the left/right side of the join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11375) Broken processing of queries containing NOT (x IS NOT NULL and x 0)

2015-08-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700221#comment-14700221
 ] 

Ashutosh Chauhan commented on HIVE-11375:
-

Instead of negativeUDFMapping, other possibility is to use GenericUDF::flip() ?

 Broken processing of queries containing NOT (x IS NOT NULL and x  0)
 --

 Key: HIVE-11375
 URL: https://issues.apache.org/jira/browse/HIVE-11375
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 2.0.0
Reporter: Mariusz Sakowski
Assignee: Aihua Xu
 Fix For: 2.0.0

 Attachments: HIVE-11375.2.patch, HIVE-11375.patch


 When running query like this:
 {code}explain select * from test where (val is not null and val  0);{code}
 hive will simplify expression in parenthesis and omit is not null check:
 {code}
   Filter Operator
 predicate: (val  0) (type: boolean)
 {code}
 which is fine.
 but if we negate condition using NOT operator:
 {code}explain select * from test where not (val is not null and val  
 0);{code}
 hive will also simplify thing, but now it will break stuff:
 {code}
   Filter Operator
 predicate: (not (val  0)) (type: boolean)
 {code}
 because valid predicate should be *val == 0 or val is null*, while above row 
 is equivalent to *val == 0* only, filtering away rows where val is null
 simple example:
 {code}
 CREATE TABLE example (
 val bigint
 );
 INSERT INTO example VALUES (1), (NULL), (0);
 -- returns 2 rows - NULL and 0
 select * from example where (val is null or val == 0);
 -- returns 1 row - 0
 select * from example where not (val is not null and val  0);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]

2015-08-17 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-11579:

Description: 
We can easily reproduce the debug by the following steps:
{code}
hive set system:xx=yy;
hive lss;
hive 
{code}
The error output disappeared since the err outputstream is closed when closing 
the Hive statement.
This bug occurred also in the upstream when using the embeded mode as the new 
CLI uses.

  was:
We can easily reproduce the debug by the following steps:
{code}
hive set system:xx=yy;
hive lss;
hive 
{code}
The error output disappeared since the err outputstream is closed when closing 
the Hive statement.


 Invoke the set command will close standard error output[beeline-cli]
 

 Key: HIVE-11579
 URL: https://issues.apache.org/jira/browse/HIVE-11579
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu

 We can easily reproduce the debug by the following steps:
 {code}
 hive set system:xx=yy;
 hive lss;
 hive 
 {code}
 The error output disappeared since the err outputstream is closed when 
 closing the Hive statement.
 This bug occurred also in the upstream when using the embeded mode as the new 
 CLI uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)