date:20141119


[ 
https://issues.apache.org/jira/browse/HIVE-8910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217659#comment-14217659
 ] 

Hive QA commented on HIVE-8910:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12682367/HIVE-8910.1.patch.txt

{color:red}ERROR:{color} -1 due to 54 failed/errored test(s), 6648 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_index
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_database_drop
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fileformat_mix
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auth
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_empty
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_file_format
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_multiple
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_self_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_unused
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_update
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_compression
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_rc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_binary_search
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compression
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_creation
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_serde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_skewtable
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_stale
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_stale_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input45
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_indexes_edge_cases
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_indexes_syntax
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap_auto
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_create_insert_outputformat
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fileformat_void_output
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_index_compact_entry_limit
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_index_compact_size_limit
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1841/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1841/console
Test logs:

Re: Review Request 27699: HIVE-8435

2014-11-19 Thread Jesús Camacho Rodríguez


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27699/
---

(Updated Nov. 19, 2014, 11:51 a.m.)


Review request for hive and Ashutosh Chauhan.


Changes
---

Only observed changes in results in infer_bucket_sort.q, multiMapJoin1.q, 
windowing.q, and in Tez mrr.q (change of order of results). The rest are 
changes in the plans. Ashutosh, can you check?


Repository: hive-git


Description (updated)
---

HIVE-8435

Patch with the most conservative approach of project remover optimization.


Diffs (updated)
-

  accumulo-handler/src/test/results/positive/accumulo_queries.q.out 
254eeaba4b8d633c63c706c0c74bb1165089 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
a8411c9edb2f2db84cf2540deb20133c36152103 
  hbase-handler/src/test/results/positive/hbase_queries.q.out 
b1e7936738b1121c14132909178646290ee8b4d5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java 
95d2d76c80aa59b62e9464f704523d921302d401 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/IdentityProjectRemover.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 
5be0e4540a6843c6b40cb5c22db6e90e1f0da922 
  ql/src/test/results/clientpositive/annotate_stats_groupby.q.out 
718b43c6e0fc2c28981f8caf0f38c1360e69837d 
  ql/src/test/results/clientpositive/auto_join0.q.out 
9261ce02f3cfcfd9f048f15fe7357846bb386c31 
  ql/src/test/results/clientpositive/auto_join10.q.out 
3d2bcc216dea80522002f149e5777a73ca52fe5b 
  ql/src/test/results/clientpositive/auto_join11.q.out 
8dbad6724475b71dc53d1198c77e36dfe752484e 
  ql/src/test/results/clientpositive/auto_join12.q.out 
037116c2c6994fe8bc7bccbb89950a13854cd9af 
  ql/src/test/results/clientpositive/auto_join13.q.out 
0cb9b4ffc460121584887f395eb1697bd53013c3 
  ql/src/test/results/clientpositive/auto_join16.q.out 
f96bae3590f5e26b059458650dd508b3dd4b1235 
  ql/src/test/results/clientpositive/auto_join18.q.out 
0de3f2a2c8ca5646071fb852c838337b76aab9f9 
  ql/src/test/results/clientpositive/auto_join18_multi_distinct.q.out 
46559a746f51fa3ad516629220bcf0f31bef685a 
  ql/src/test/results/clientpositive/auto_join24.q.out 
1fa3e6ea54f809c529d4ec7b50d5d5191284939f 
  ql/src/test/results/clientpositive/auto_join26.q.out 
d494d95785283b7083820d0defaadb351f783085 
  ql/src/test/results/clientpositive/auto_join27.q.out 
c16992f2bed4de9dd23dcfbe004825f37abbe56e 
  ql/src/test/results/clientpositive/auto_join30.q.out 
608ca22323e3b4f1900dd5077a7aecf54d8a8ca2 
  ql/src/test/results/clientpositive/auto_join31.q.out 
b0df20270ba3dbb9115c529c50aaca5d13d57a95 
  ql/src/test/results/clientpositive/auto_join32.q.out 
bc2d56c0199133e84efd213dff1538173f1686c7 
  ql/src/test/results/clientpositive/auto_smb_mapjoin_14.q.out 
2583d9a50d4a07db50dca7f88c6db141c392a3b8 
  ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out 
5a7f174a52d60028f524a7aac14a9b326d060af8 
  ql/src/test/results/clientpositive/auto_sortmerge_join_10.q.out 
7606dd2adcd43ca410e66e0c8f1799084fa4f39e 
  ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out 
8372a6312a2fe85fd78f0c6da0665164b49b320c 
  ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out 
3c30a315d9028fda114def015e41a6171341153a 
  ql/src/test/results/clientpositive/auto_sortmerge_join_14.q.out 
69bd43af9a8210b19cbea17181f90bf707d93e85 
  ql/src/test/results/clientpositive/auto_sortmerge_join_15.q.out 
10b20d84eb06a30ed3655e346431bc52dfb486fe 
  ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out 
72242bbd713baa216d41c40749f9c732271102cb 
  ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out 
35fa02fa60f6c50d6acf55ed3fae1570a644c1e1 
  ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out 
4fea70d4e47bbd75530e92f5b2a8be2edd66bdbd 
  ql/src/test/results/clientpositive/auto_sortmerge_join_5.q.out 
1904cc246729a8d3fd2cd1815e563b50e261da6a 
  ql/src/test/results/clientpositive/auto_sortmerge_join_6.q.out 
e5e2a6a770d5064df944c69576d81d07b1d95c77 
  ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out 
abb1db4a87e6b8e820ff7df53d21a4036254b098 
  ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out 
9226dc6b2929c2b185f5904bd607a7b18e356dca 
  ql/src/test/results/clientpositive/auto_sortmerge_join_9.q.out 
1a7fdf9650f3e5650400ecc24177637856701536 
  ql/src/test/results/clientpositive/bucket_map_join_1.q.out 
b194a2be3e39c0294df14c00fe69c6d6f9283702 
  ql/src/test/results/clientpositive/bucket_map_join_2.q.out 
07c887854179e333e4c68d02c247216b1c06dee7 
  ql/src/test/results/clientpositive/bucketcontext_1.q.out 
0ea304dbff38d878d271930cda22b852f0175329 
  ql/src/test/results/clientpositive/bucketcontext_2.q.out 
e961f062d1cf21d058566c6c9c6a73db16a3454e 
  ql/src/test/results/clientpositive/bucketcontext_3.q.out 
1de62119c2909f4ff49dcec0a50843df7a00419a

[jira] [Updated] (HIVE-8536) Enable SkewJoinResolver for spark [Spark Branch]

2014-11-19 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8536:
-
Summary: Enable SkewJoinResolver for spark [Spark Branch]  (was: Enable 
runtime skew join optimization for spark [Spark Branch])

 Enable SkewJoinResolver for spark [Spark Branch]
 

 Key: HIVE-8536
 URL: https://issues.apache.org/jira/browse/HIVE-8536
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li

 Sub-task of HIVE-8406



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8913) Make SparkMapJoinResolver handle runtime skew join [Spark Branch]

2014-11-19 Thread Rui Li (JIRA)

Rui Li created HIVE-8913:


 Summary: Make SparkMapJoinResolver handle runtime skew join [Spark 
Branch]
 Key: HIVE-8913
 URL: https://issues.apache.org/jira/browse/HIVE-8913
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Rui Li


Sub-task of HIVE-8406.
Now we have {{SparkMapJoinResolver}} in place. But at the moment, it doesn't 
handle the map join task created by upstream SkewJoinResolver, i.e. those 
wrapped in a ConditionalTask. We have to implement this part for runtime skew 
join to work on spark. To do so, we can borrow logic from {{MapJoinResolver}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8435) Add identity project remover optimization


 [ 
https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesús Camacho Rodríguez updated HIVE-8435:
--
Attachment: HIVE-8435.09.patch

This patch is the same than .08, but it contains the changes in the test 
results files.

 Add identity project remover optimization
 -

 Key: HIVE-8435
 URL: https://issues.apache.org/jira/browse/HIVE-8435
 Project: Hive
  Issue Type: New Feature
  Components: Logical Optimizer
Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0
Reporter: Ashutosh Chauhan
Assignee: Jesús Camacho Rodríguez
 Attachments: HIVE-8435.02.patch, HIVE-8435.03.patch, 
 HIVE-8435.03.patch, HIVE-8435.04.patch, HIVE-8435.05.patch, 
 HIVE-8435.05.patch, HIVE-8435.06.patch, HIVE-8435.07.patch, 
 HIVE-8435.08.patch, HIVE-8435.09.patch, HIVE-8435.1.patch, HIVE-8435.patch


 In some cases there is an identity project in plan which is useless. Better 
 to optimize it away to avoid evaluating it without any benefit at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8435) Add identity project remover optimization


[ 
https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217971#comment-14217971
 ] 

Hive QA commented on HIVE-8435:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12682415/HIVE-8435.09.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6648 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1842/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1842/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1842/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12682415 - PreCommit-HIVE-TRUNK-Build

 Add identity project remover optimization
 -

 Key: HIVE-8435
 URL: https://issues.apache.org/jira/browse/HIVE-8435
 Project: Hive
  Issue Type: New Feature
  Components: Logical Optimizer
Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0
Reporter: Ashutosh Chauhan
Assignee: Jesús Camacho Rodríguez
 Attachments: HIVE-8435.02.patch, HIVE-8435.03.patch, 
 HIVE-8435.03.patch, HIVE-8435.04.patch, HIVE-8435.05.patch, 
 HIVE-8435.05.patch, HIVE-8435.06.patch, HIVE-8435.07.patch, 
 HIVE-8435.08.patch, HIVE-8435.09.patch, HIVE-8435.1.patch, HIVE-8435.patch


 In some cases there is an identity project in plan which is useless. Better 
 to optimize it away to avoid evaluating it without any benefit at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8839) Support alter table .. add/replace columns cascade

[
https://issues.apache.org/jira/browse/HIVE-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218012#comment-14218012
]

Chaoyu Tang commented on HIVE-8839:
---

Thanks [~szehon] and [~jdere] for reviewing the patch and getting it committed.
[~leftylev] I have requested the write permission to Apache Hive Wiki. Once I
get it, I will update
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterColumn
with CASCADE|RESTRICT in alter table syntax.
We currently do not support the partition protection mode in alter table, maybe
we consider adding that to alter table partition as another enhancement?

Support alter table .. add/replace columns cascade

Attachments: HIVE-8839.1.patch, HIVE-8839.2.patch, HIVE-8839.2.patch,
HIVE-8839.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Urgent help: Not able to connect to hiveserver2 through beeline remote client

2014-11-19 Thread Ravi Kumar

Hi All,

I am trying to connect hiveserver2 through beeline remote client. I followed
following steps but not able to connect through Remote Client with HiveServer2
TCP Transport Mode and SASL Authentication

I followed following steps to run hive tests through beeline in authorization
mode,

1. I did required configuration change and started hiveserver2 in
authorization mode. I followed below document,
https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization#SQLStandardBasedHiveAuthorization-Configuration

2. I started the hiveserver with following command,

hive --service hiveserver2 --hiveconf hive.security.authorization.enabled=true
--hiveconf
hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator
--hiveconf
hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory
--hiveconf hive.metastore.uris=' '

3. Server started successfully, I verified it in server log file. As per
document it should be started in standard authorization mode.

After that I tried to connect using beeline steps mentioned at
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.5/bk_dataintegration/content/ch_using-hive-clients-examples.html

4. When I run following command

beeline !connect jdbc:hive2://localhost:1/default

it ask for username and password and after that it hangs. Some times in server
log I found Out Of Memory error.

I am able to connect to hiveserver2 through beeline in embedded
mode.
I tried to search older conversation for similar issues. I found similar
issues discussed @
http://mail-archives.apache.org/mod_mbox/hive-user/201407.mbox/browser
But I there is no solution for this issue is mentioned till end.

Any pointer on this would be great help.

Thanks and Regards,
Ravi

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the
property of Persistent Systems Ltd. It is intended only for the use of the
individual or entity to which it is addressed. If you are not the intended
recipient, you are not authorized to read, retain, copy, print, distribute or
use this message. If you have received this communication in error, please
notify the sender and delete all copies of this message. Persistent Systems
Ltd. does not accept any liability for virus infected mails.

[jira] [Commented] (HIVE-8863) Cannot drop table with uppercase name after compute statistics for columns


[ 
https://issues.apache.org/jira/browse/HIVE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218031#comment-14218031
 ] 

Chaoyu Tang commented on HIVE-8863:
---

[~j...@cloudera.com] what database you are using? I am not able to reproduce 
the issue with trunk code against postgresql.

 Cannot drop table with uppercase name after compute statistics for columns
 

 Key: HIVE-8863
 URL: https://issues.apache.org/jira/browse/HIVE-8863
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Juan Yu

 Create a table with uppercase name Test,
 run analyze table Test compute statistics for columns col1
 After this, you cannot drop the table by
 drop table Test;
 Got error:
 NestedThrowablesStackTrace: 
 java.sql.BatchUpdateException: Cannot delete or update a parent row: a 
 foreign key constraint fails (hive2.TAB_COL_STATS, CONSTRAINT 
 TAB_COL_STATS_FK FOREIGN KEY (TBL_ID) REFERENCES TBLS (TBL_ID)) 
 workaround is to use lowercase table name
 drop table test;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8863) Cannot drop table with uppercase name after compute statistics for columns


[ 
https://issues.apache.org/jira/browse/HIVE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218032#comment-14218032
 ] 

Chaoyu Tang commented on HIVE-8863:
---

[~j...@cloudera.com] what database you are using? I am not able to reproduce 
the issue with trunk code against postgresql.

 Cannot drop table with uppercase name after compute statistics for columns
 

 Key: HIVE-8863
 URL: https://issues.apache.org/jira/browse/HIVE-8863
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Juan Yu

 Create a table with uppercase name Test,
 run analyze table Test compute statistics for columns col1
 After this, you cannot drop the table by
 drop table Test;
 Got error:
 NestedThrowablesStackTrace: 
 java.sql.BatchUpdateException: Cannot delete or update a parent row: a 
 foreign key constraint fails (hive2.TAB_COL_STATS, CONSTRAINT 
 TAB_COL_STATS_FK FOREIGN KEY (TBL_ID) REFERENCES TBLS (TBL_ID)) 
 workaround is to use lowercase table name
 drop table test;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8861) Use hiveconf:test.data.dir instead of hardcoded path

2014-11-19 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-8861:
--
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

 Use hiveconf:test.data.dir instead of hardcoded path
 

 Key: HIVE-8861
 URL: https://issues.apache.org/jira/browse/HIVE-8861
 Project: Hive
  Issue Type: Test
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Trivial
 Attachments: HIVE-8861.patch


 In loading test schema to a standalone cluster, I got this error:
 {noformat}
 FAILED: SemanticException Line 3:23 Invalid path 
 ''/home/jxiang/data/files/cbo_t1.txt''
 {noformat}
 We should use hiveconf:test.data.dir instead of ../../data/files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8435) Add identity project remover optimization

2014-11-19 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8435:
---
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, [~jcamachorodriguez]

 Add identity project remover optimization
 -

 Key: HIVE-8435
 URL: https://issues.apache.org/jira/browse/HIVE-8435
 Project: Hive
  Issue Type: New Feature
  Components: Logical Optimizer
Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0
Reporter: Ashutosh Chauhan
Assignee: Jesús Camacho Rodríguez
 Fix For: 0.15.0

 Attachments: HIVE-8435.02.patch, HIVE-8435.03.patch, 
 HIVE-8435.03.patch, HIVE-8435.04.patch, HIVE-8435.05.patch, 
 HIVE-8435.05.patch, HIVE-8435.06.patch, HIVE-8435.07.patch, 
 HIVE-8435.08.patch, HIVE-8435.09.patch, HIVE-8435.1.patch, HIVE-8435.patch


 In some cases there is an identity project in plan which is useless. Better 
 to optimize it away to avoid evaluating it without any benefit at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8910) Refactoring of PassThroughOutputFormat

2014-11-19 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218046#comment-14218046
 ] 

Ashutosh Chauhan commented on HIVE-8910:


I agree, its more complicated than it needs to be. Lets simplify!

 Refactoring of PassThroughOutputFormat 
 ---

 Key: HIVE-8910
 URL: https://issues.apache.org/jira/browse/HIVE-8910
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-8910.1.patch.txt


 It's overly complicated just for doing simple wrapping of output format. 
 Before things get more worse, we should refactor this codes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8836) Enable automatic tests with remote spark client.[Spark Branch]

2014-11-19 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-8836:

Attachment: HIVE-8836.1-spark.patch

upload the patch as a reference since i would take a week leave.

 Enable automatic tests with remote spark client.[Spark Branch]
 --

 Key: HIVE-8836
 URL: https://issues.apache.org/jira/browse/HIVE-8836
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M3
 Attachments: HIVE-8836.1-spark.patch


 In real production environment, remote spark client should be used to submit 
 spark job for Hive mostly, we should enable automatic test with remote spark 
 client to make sure the Hive feature workable with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8836) Enable automatic tests with remote spark client.[Spark Branch]

2014-11-19 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-8836:

Assignee: Rui Li  (was: Chengxiang Li)

 Enable automatic tests with remote spark client.[Spark Branch]
 --

 Key: HIVE-8836
 URL: https://issues.apache.org/jira/browse/HIVE-8836
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Rui Li
  Labels: Spark-M3
 Attachments: HIVE-8836.1-spark.patch


 In real production environment, remote spark client should be used to submit 
 spark job for Hive mostly, we should enable automatic test with remote spark 
 client to make sure the Hive feature workable with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8868) SparkSession and SparkClient mapping[Spark Branch]

2014-11-19 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-8868:

Assignee: Rui Li  (was: Chengxiang Li)

 SparkSession and SparkClient mapping[Spark Branch]
 --

 Key: HIVE-8868
 URL: https://issues.apache.org/jira/browse/HIVE-8868
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Rui Li
  Labels: Spark-M3, TODOC-SPARK
 Attachments: HIVE-8868.1-spark.patch, HIVE-8868.2-spark.patch


 It should be a seperate spark context for each user session, currently we 
 share a singleton local spark context in all user sessions with local spark, 
 and create remote spark context for each spark job with spark cluster.
 To binding one spark context to each user session, we may construct spark 
 client on session open, one thing to notify is that, is SparkSession::conf is 
 consist with Context::getConf? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8914) HDFSCleanup thread holds reference to FileSystem

2014-11-19 Thread shanyu zhao (JIRA)

shanyu zhao created HIVE-8914:
-

 Summary: HDFSCleanup thread holds reference to FileSystem
 Key: HIVE-8914
 URL: https://issues.apache.org/jira/browse/HIVE-8914
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.13.1
Reporter: shanyu zhao
Assignee: shanyu zhao


WebHCat server has a long running cleanup thread (HDFSCleanup) which holds a 
reference to FileSystem. Because of this reference, many FileSystem related 
objects (e.g. MetricsSystemImpl) cannot be garbage collected. Sometimes this 
causes OOM exception. Since the cleanup is done every 12 hours by default, we 
can simply recreate a FileSystem when we need to use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8914) HDFSCleanup thread holds reference to FileSystem

2014-11-19 Thread shanyu zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HIVE-8914:
--
Attachment: HIVE-8914.patch

Patch attached.

 HDFSCleanup thread holds reference to FileSystem
 

 Key: HIVE-8914
 URL: https://issues.apache.org/jira/browse/HIVE-8914
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.13.1
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HIVE-8914.patch


 WebHCat server has a long running cleanup thread (HDFSCleanup) which holds a 
 reference to FileSystem. Because of this reference, many FileSystem related 
 objects (e.g. MetricsSystemImpl) cannot be garbage collected. Sometimes this 
 causes OOM exception. Since the cleanup is done every 12 hours by default, we 
 can simply recreate a FileSystem when we need to use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8863) Cannot drop table with uppercase name after compute statistics for columns

2014-11-19 Thread Juan Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218115#comment-14218115
 ] 

Juan Yu commented on HIVE-8863:
---

I tested with both mysql and postgresql. but I am not using trunk version. 

 Cannot drop table with uppercase name after compute statistics for columns
 

 Key: HIVE-8863
 URL: https://issues.apache.org/jira/browse/HIVE-8863
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Juan Yu

 Create a table with uppercase name Test,
 run analyze table Test compute statistics for columns col1
 After this, you cannot drop the table by
 drop table Test;
 Got error:
 NestedThrowablesStackTrace: 
 java.sql.BatchUpdateException: Cannot delete or update a parent row: a 
 foreign key constraint fails (hive2.TAB_COL_STATS, CONSTRAINT 
 TAB_COL_STATS_FK FOREIGN KEY (TBL_ID) REFERENCES TBLS (TBL_ID)) 
 workaround is to use lowercase table name
 drop table test;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6914) parquet-hive cannot write nested map (map value is map)

2014-11-19 Thread Mickael Lacour (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218143#comment-14218143
 ] 

Mickael Lacour commented on HIVE-6914:
--

[~spena], [~brocknoland], [~rdblue]

I will redo this patch using the path available for HIVE-8359. I think there is 
a link with this one too HIVE-8909. With the previous patch I can read and 
write parquet complex nested types.
So maybe it will be better to add my qtests to HIVE-8909 and fix the writing 
bug ?

What do you think ?


 parquet-hive cannot write nested map (map value is map)
 ---

 Key: HIVE-6914
 URL: https://issues.apache.org/jira/browse/HIVE-6914
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.12.0, 0.13.0
Reporter: Tongjie Chen
  Labels: parquet, serialization
 Attachments: HIVE-6914.1.patch, HIVE-6914.2.patch


 // table schema (identical for both plain text version and parquet version)
 desc hive desc text_mmap;
 m map
 // sample nested map entry
 {level1:{level2_key1:value1,level2_key2:value2}}
 The following query will fail, 
 insert overwrite table parquet_mmap select * from text_mmap;
 Caused by: parquet.io.ParquetEncodingException: This should be an 
 ArrayWritable or MapWritable: 
 org.apache.hadoop.hive.ql.io.parquet.writable.BinaryWritable@f2f8106
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:85)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeArray(DataWritableWriter.java:118)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:80)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:82)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:55)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
 at 
 parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:115)
 at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81)
 at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:77)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:90)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:622)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
 ... 9 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Hive-0.14 - Build # 735 - Still Failing

2014-11-19 Thread Apache Jenkins Server

Changes for Build #696
[rohini] PIG-4186: Fix e2e run against new build of pig and some enhancements 
(rohini)


Changes for Build #697

Changes for Build #698

Changes for Build #699

Changes for Build #700

Changes for Build #701

Changes for Build #702

Changes for Build #703
[daijy] HIVE-8484: HCatalog throws an exception if Pig job is of type 'fetch' 
(Lorand Bendig via Daniel Dai)


Changes for Build #704
[gunther] HIVE-8781: Nullsafe joins are busted on Tez (Gunther Hagleitner, 
reviewed by Prasanth J)


Changes for Build #705
[gunther] HIVE-8760: Pass a copy of HiveConf to hooks (Gunther Hagleitner, 
reviewed by Gopal V)


Changes for Build #706
[thejas] HIVE-8772 : zookeeper info logs are always printed from beeline with 
service discovery mode (Thejas Nair, reviewed by Vaibhav Gumashta)


Changes for Build #707
[gunther] HIVE-8782: HBase handler doesn't compile with hadoop-1 (Jimmy Xiang, 
reviewed by Xuefu and Sergey)


Changes for Build #708

Changes for Build #709
[thejas] HIVE-8785 : HiveServer2 LogDivertAppender should be more selective for 
beeline getLogs (Thejas Nair, reviewed by Gopal V)


Changes for Build #710
[vgumashta] HIVE-8764: Windows: HiveServer2 TCP SSL cannot recognize localhost 
(Vaibhav Gumashta reviewed by Thejas Nair)


Changes for Build #711
[gunther] HIVE-8768: CBO: Fix filter selectivity for 'in clause'  '' (Laljo 
John Pullokkaran via Gunther Hagleitner)


Changes for Build #712
[gunther] HIVE-8794: Hive on Tez leaks AMs when killed before first dag is run 
(Gunther Hagleitner, reviewed by Gopal V)


Changes for Build #713
[gunther] HIVE-8798: Some Oracle deadlocks not being caught in TxnHandler (Alan 
Gates via Gunther Hagleitner)


Changes for Build #714
[gunther] HIVE-8800: Update release notes and notice for hive .14 (Gunther 
Hagleitner, reviewed by Prasanth J)

[gunther] HIVE-8799: boatload of missing apache headers (Gunther Hagleitner, 
reviewed by Thejas M Nair)


Changes for Build #715
[gunther] Preparing for release 0.14.0


Changes for Build #716
[gunther] Preparing for release 0.14.0

[gunther] Preparing for release 0.14.0


Changes for Build #717

Changes for Build #718

Changes for Build #719

Changes for Build #720
[gunther] HIVE-8811: Dynamic partition pruning can result in NPE during query 
compilation (Gunther Hagleitner, reviewed by Gopal V)


Changes for Build #721
[gunther] HIVE-8805: CBO skipped due to SemanticException: Line 0:-1 Both left 
and right aliases encountered in JOIN 'avg_cs_ext_discount_amt' (Laljo John 
Pullokkaran via Gunther Hagleitner)

[sershe] HIVE-8715 : Hive 14 upgrade scripts can fail for statistics if 
database was created using auto-create
 ADDENDUM (Sergey Shelukhin, reviewed by Ashutosh Chauhan and Gunther 
Hagleitner)


Changes for Build #722

Changes for Build #723

Changes for Build #724
[gunther] HIVE-8845: Switch to Tez 0.5.2 (Gunther Hagleitner, reviewed by Gopal 
V)


Changes for Build #725
[sershe] HIVE-8295 : Add batch retrieve partition objects for metastore direct 
sql (Selina Zhang and Sergey Shelukhin, reviewed by Ashutosh Chauhan)


Changes for Build #726

Changes for Build #727
[gunther] HIVE-8873: Switch to calcite 0.9.2 (Gunther Hagleitner, reviewed by 
Gopal V)


Changes for Build #728
[thejas] HIVE-8830 : hcatalog process don't exit because of non daemon thread 
(Thejas Nair, reviewed by Eugene Koifman, Sushanth Sowmyan)


Changes for Build #729

Changes for Build #730

Changes for Build #731

Changes for Build #732

Changes for Build #733

Changes for Build #734

Changes for Build #735



No tests ran.

The Apache Jenkins build system has built Hive-0.14 (build #735)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-0.14/735/ to view 
the results.

[jira] [Commented] (HIVE-8639) Convert SMBJoin to MapJoin [Spark Branch]

2014-11-19 Thread Chinna Rao Lalam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218168#comment-14218168
 ] 

Chinna Rao Lalam commented on HIVE-8639:


Hi [~brocknoland],

I am investigating test failures. I need some time for this issue, If folks 
freeing up they can take it over.

 Convert SMBJoin to MapJoin [Spark Branch]
 -

 Key: HIVE-8639
 URL: https://issues.apache.org/jira/browse/HIVE-8639
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Szehon Ho
Assignee: Chinna Rao Lalam

 HIVE-8202 supports auto-conversion of SMB Join.  However, if the tables are 
 partitioned, there could be a slow down as each mapper would need to get a 
 very small chunk of a partition which has a single key. Thus, in some 
 scenarios it's beneficial to convert SMB join to map join.
 The task is to research and support the conversion from SMB join to map join 
 for Spark execution engine.  See the equivalent of MapReduce in 
 SortMergeJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8850) ObjectStore:: rollbackTransaction() should set the transaction status to TXN_STATUS.ROLLBACK irrespective of whether it is active or not


[ 
https://issues.apache.org/jira/browse/HIVE-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218230#comment-14218230
 ] 

Chaoyu Tang commented on HIVE-8850:
---

[~sushanth] Thanks for the deep insight and analysis to the cause of the 
unbalanced calls to openTransaction/commitTransaction issue. I think 
HIVE-8891 is slightly different from the issue here and it just ensures to 
clean PersistenceManager cache after rollback to avoid the 
NucleusObjectNotFoundException and should not involve in the nested txn count 
issue.
Because rollbackTransaction always resets the openTransactionCalls to 0, so 
even if you set the transactionStatus to TXN_STATUS.ROLLBACK regardless in 
rollbackTrasaction, in the nested transaction example you gave above (e.g. 
sqldirect fallback to jdo), the nested openTransaction following 
rollbackTransaction will set it back to TXN_STATUS.OPEN, and patch provided 
here would still not solve the issue for this example, is it right?

 ObjectStore:: rollbackTransaction() should set the transaction status to 
 TXN_STATUS.ROLLBACK irrespective of whether it is active or not
 

 Key: HIVE-8850
 URL: https://issues.apache.org/jira/browse/HIVE-8850
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-8850.1.patch


 We can run into issues as described below:
 Hive script adds 2800 partitions to a table and during this it can get a 
 SQLState 08S01 [Communication Link Error] and bonecp kill all the connections 
 in the pool. The partitions are added and a create table statement executes 
 (Metering_IngestedData_Compressed). The map job finishes successfully and 
 while moving the table to the hive warehouse the ObjectStore.java 
 commitTransaction() raises the error: commitTransaction was called but 
 openTransactionCalls = 0. This probably indicates that there are unbalanced 
 calls to openTransaction/commitTransaction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6914) parquet-hive cannot write nested map (map value is map)


[ 
https://issues.apache.org/jira/browse/HIVE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218250#comment-14218250
 ] 

Sergio Peña commented on HIVE-6914:
---

Hi [~mickaellcr],

It sounds good if you use the patch from HIVE-8359 for this bug. Regarding 
adding the qtests to HIVE-8909, I think that ticket is meant to fix the reading 
part of different nested types formats generated by Thrift and Avro tools (it 
does not touch the writing part); so I think it should be good to have these 
writing tests separated from the reading tests.



 parquet-hive cannot write nested map (map value is map)
 ---

 Key: HIVE-6914
 URL: https://issues.apache.org/jira/browse/HIVE-6914
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.12.0, 0.13.0
Reporter: Tongjie Chen
  Labels: parquet, serialization
 Attachments: HIVE-6914.1.patch, HIVE-6914.2.patch


 // table schema (identical for both plain text version and parquet version)
 desc hive desc text_mmap;
 m map
 // sample nested map entry
 {level1:{level2_key1:value1,level2_key2:value2}}
 The following query will fail, 
 insert overwrite table parquet_mmap select * from text_mmap;
 Caused by: parquet.io.ParquetEncodingException: This should be an 
 ArrayWritable or MapWritable: 
 org.apache.hadoop.hive.ql.io.parquet.writable.BinaryWritable@f2f8106
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:85)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeArray(DataWritableWriter.java:118)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:80)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:82)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:55)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
 at 
 parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:115)
 at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81)
 at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:77)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:90)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:622)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
 ... 9 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8909) Hive doesn't correctly read Parquet nested types


[ 
https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218252#comment-14218252
 ] 

Sergio Peña commented on HIVE-8909:
---

[~rdblue], is this ticket related to the different nested types found on this 
document?
https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-list-and-map-spec/LogicalTypes.md

 Hive doesn't correctly read Parquet nested types
 

 Key: HIVE-8909
 URL: https://issues.apache.org/jira/browse/HIVE-8909
 Project: Hive
  Issue Type: Bug
Reporter: Ryan Blue
Assignee: Ryan Blue
 Attachments: HIVE-8909-1.patch


 Parquet's Avro and Thrift object models don't produce the same parquet type 
 representation for lists and maps that Hive does. In the Parquet community, 
 we've defined what should be written and backward-compatibility rules for 
 existing data written by parquet-avro and parquet-thrift in PARQUET-113. We 
 need to implement those rules in the Hive Converter classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8909) Hive doesn't correctly read Parquet nested types

2014-11-19 Thread Ryan Blue (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218286#comment-14218286
]

Ryan Blue commented on HIVE-8909:
-

Yes. It implements the rules for reading lists in existing data:

1. If the repeated field is not a group, then its type is the element type and
elements are required.
2. If the repeated field is a group with multiple fields, then its type is the
element type and elements are required.
3. If the repeated field is a group with one field and is named either array
or uses the LIST-annotated group's name with _tuple appended then the
repeated type is the element type and elements are required.
4. Otherwise, the repeated field's type is the element type with the repeated
field's repetition.

It also structures the converters to match the other projects. LIST and MAP
will use ElementConverter and KeyValueConverter and the list version supports
these rules while matching the ArrayWritable structure expected by the SerDe
(confirmed by tests that pass in both trunk and this patch).

Repeated groups that aren't annotated are deserialized into lists as before,
but I changed this to put less work on the DataWritableGroupConverter that is
now called StructConverter. Struct needs to support repeated inner groups, but
rather than keeping a second array of objects, it passes its start() and end()
calls to the repeated children converters, which use them to add the correct
object to the struct. It's an easier-to-follow method that produces the same
result. (By all means, please verify this!)

Hive doesn't correctly read Parquet nested types

Key: HIVE-8909
URL: https://issues.apache.org/jira/browse/HIVE-8909
Project: Hive
Issue Type: Bug
Reporter: Ryan Blue
Assignee: Ryan Blue
Attachments: HIVE-8909-1.patch

Parquet's Avro and Thrift object models don't produce the same parquet type
representation for lists and maps that Hive does. In the Parquet community,
we've defined what should be written and backward-compatibility rules for
existing data written by parquet-avro and parquet-thrift in PARQUET-113. We
need to implement those rules in the Hive Converter classes.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7073) Implement Binary in ParquetSerDe


[ 
https://issues.apache.org/jira/browse/HIVE-7073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218291#comment-14218291
 ] 

Brock Noland commented on HIVE-7073:


instead of outputting raw binary the out file perhaps we should call hex as 
define here HIVE-2482?

 Implement Binary in ParquetSerDe
 

 Key: HIVE-7073
 URL: https://issues.apache.org/jira/browse/HIVE-7073
 Project: Hive
  Issue Type: Sub-task
Reporter: David Chen
Assignee: Ferdinand Xu
 Attachments: HIVE-7073.1.patch, HIVE-7073.patch


 The ParquetSerDe currently does not support the BINARY data type. This ticket 
 is to implement the BINARY data type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8359) Map containing null values are not correctly written in Parquet files


 [ 
https://issues.apache.org/jira/browse/HIVE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8359:
---
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Thank you so much Sergio, Ryan and Mickael!! I have committed this contribution 
to trunk!

 Map containing null values are not correctly written in Parquet files
 -

 Key: HIVE-8359
 URL: https://issues.apache.org/jira/browse/HIVE-8359
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.13.1
Reporter: Frédéric TERRAZZONI
Assignee: Sergio Peña
 Fix For: 0.15.0

 Attachments: HIVE-8359.1.patch, HIVE-8359.2.patch, HIVE-8359.4.patch, 
 HIVE-8359.5.patch, map_null_val.avro


 Tried write a mapstring,string column in a Parquet file. The table should 
 contain :
 {code}
 {key3:val3,key4:null}
 {key3:val3,key4:null}
 {key1:null,key2:val2}
 {key3:val3,key4:null}
 {key3:val3,key4:null}
 {code}
 ... and when you do a query like {code}SELECT * from mytable{code}
 We can see that the table is corrupted :
 {code}
 {key3:val3}
 {key4:val3}
 {key3:val2}
 {key4:val3}
 {key1:val3}
 {code}
 I've not been able to read the Parquet file in our software afterwards, and 
 consequently I suspect it to be corrupted. 
 For those who are interested, I generated this Parquet table from an Avro 
 file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8909) Hive doesn't correctly read Parquet nested types


[ 
https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218323#comment-14218323
 ] 

Brock Noland commented on HIVE-8909:


FYI - I think this patch will need a rebase post HIVE-6914. Additionally once 
ready, please click {{Submit Patch}} to have the patch tested.

 Hive doesn't correctly read Parquet nested types
 

 Key: HIVE-8909
 URL: https://issues.apache.org/jira/browse/HIVE-8909
 Project: Hive
  Issue Type: Bug
Reporter: Ryan Blue
Assignee: Ryan Blue
 Attachments: HIVE-8909-1.patch


 Parquet's Avro and Thrift object models don't produce the same parquet type 
 representation for lists and maps that Hive does. In the Parquet community, 
 we've defined what should be written and backward-compatibility rules for 
 existing data written by parquet-avro and parquet-thrift in PARQUET-113. We 
 need to implement those rules in the Hive Converter classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]

2014-11-19 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HIVE-8854:
-
Status: Patch Available  (was: Open)

 Guava dependency conflict between hive driver and remote spark context[Spark 
 Branch]
 

 Key: HIVE-8854
 URL: https://issues.apache.org/jira/browse/HIVE-8854
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Marcelo Vanzin
  Labels: Spark-M3
 Attachments: HIVE-8854.1-spark.patch, 
 hive-dirver-classloader-info.output


 Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark 
 context depends on guava 14.0.1, It should be JobMetrics deserialize failed 
 on Hive driver side since Absent is used in Metrics, here is the hive driver 
 log:
 {noformat}
 java.lang.IllegalAccessError: tried to access method 
 com.google.common.base.Optional.init()V from class 
 com.google.common.base.Absent
 at com.google.common.base.Absent.init(Absent.java:35)
 at com.google.common.base.Absent.clinit(Absent.java:33)
 at sun.misc.Unsafe.ensureClassInitialized(Native Method)
 at 
 sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
 at 
 sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140)
 at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057)
 at java.lang.reflect.Field.getFieldAccessor(Field.java:1038)
 at java.lang.reflect.Field.getLong(Field.java:591)
 at 
 java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663)
 at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
 at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
 at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.io.ObjectStreamClass.init(ObjectStreamClass.java:468)
 at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
 at 
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
 at 
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 at 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 at 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 at 
 akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
 at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
 at 
 akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
 at scala.util.Try$.apply(Try.scala:161)
 at 
 akka.serialization.Serialization.deserialize(Serialization.scala:98)
 at 
 akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63)
 at 
 akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
 at scala.util.Try$.apply(Try.scala:161)
 at 
 akka.serialization.Serialization.deserialize(Serialization.scala:98)
 at 
 akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
 at 
 akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58)
 at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58)
 at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76)
 at 
 akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at

[jira] [Updated] (HIVE-8829) Upgrade to Thrift 0.9.2


 [ 
https://issues.apache.org/jira/browse/HIVE-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8829:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thank you Prasad for the patch and Vaibhav for the report! I have committed 
this to trunk!

 Upgrade to Thrift 0.9.2
 ---

 Key: HIVE-8829
 URL: https://issues.apache.org/jira/browse/HIVE-8829
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Vaibhav Gumashta
Assignee: Prasad Mujumdar
  Labels: HiveServer2, metastore
 Fix For: 0.15.0

 Attachments: HIVE-8829.1.patch, HIVE-8829.1.patch


 Apache Thrift 0.9.2 was released recently 
 (https://thrift.apache.org/download). It has a fix for THRIFT-2660 which can 
 cause HS2 (tcp mode) and Metastore processes to go OOM on getting a 
 non-thrift request when they use SASL transport. The reason ([thrift 
 code|https://github.com/apache/thrift/blob/0.9.x/lib/java/src/org/apache/thrift/transport/TSaslTransport.java#L177]):
 {code}
   protected SaslResponse receiveSaslMessage() throws TTransportException {
 underlyingTransport.readAll(messageHeader, 0, messageHeader.length);
 byte statusByte = messageHeader[0];
 byte[] payload = new byte[EncodingUtils.decodeBigEndian(messageHeader, 
 STATUS_BYTES)];
 underlyingTransport.readAll(payload, 0, payload.length);
 NegotiationStatus status = NegotiationStatus.byValue(statusByte);
 if (status == null) {
   sendAndThrowMessage(NegotiationStatus.ERROR, Invalid status  + 
 statusByte);
 } else if (status == NegotiationStatus.BAD || status == 
 NegotiationStatus.ERROR) {
   try {
 String remoteMessage = new String(payload, UTF-8);
 throw new TTransportException(Peer indicated failure:  + 
 remoteMessage);
   } catch (UnsupportedEncodingException e) {
 throw new TTransportException(e);
   }
 }
 {code}
 Basically since there are no message format checks / size checks before 
 creating the byte array, on getting a non-SASL message this creates a huge 
 byte array from some garbage size.
 For HS2, an attempt was made to fix it here: HIVE-6468, which never went in.  
 I think for 0.15.0 it's best to upgarde to Thrift 0.9.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join


[ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218342#comment-14218342
 ] 

Sergio Peña commented on HIVE-8745:
---

[~leftylev] 

I believed you added a statement on documentation for HIVE-7373 fix; but this 
patch is reverting the trailing zeroes fix. So you might wanna revert that 
document statement as well.

 Joins on decimal keys return different results whether they are run as reduce 
 join or map join
 --

 Key: HIVE-8745
 URL: https://issues.apache.org/jira/browse/HIVE-8745
 Project: Hive
  Issue Type: Bug
  Components: Types
Affects Versions: 0.14.0
Reporter: Gunther Hagleitner
Assignee: Jason Dere
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8745.1.patch, HIVE-8745.2.patch, HIVE-8745.3.patch, 
 join_test.q


 See attached .q file to reproduce. The difference seems to be whether 
 trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-8854:

Attachment: HIVE-8854.1-spark.patch

Not sure why precommit test not picking up, attaching again.

 Guava dependency conflict between hive driver and remote spark context[Spark 
 Branch]
 

 Key: HIVE-8854
 URL: https://issues.apache.org/jira/browse/HIVE-8854
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Marcelo Vanzin
  Labels: Spark-M3
 Attachments: HIVE-8854.1-spark.patch, HIVE-8854.1-spark.patch, 
 hive-dirver-classloader-info.output


 Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark 
 context depends on guava 14.0.1, It should be JobMetrics deserialize failed 
 on Hive driver side since Absent is used in Metrics, here is the hive driver 
 log:
 {noformat}
 java.lang.IllegalAccessError: tried to access method 
 com.google.common.base.Optional.init()V from class 
 com.google.common.base.Absent
 at com.google.common.base.Absent.init(Absent.java:35)
 at com.google.common.base.Absent.clinit(Absent.java:33)
 at sun.misc.Unsafe.ensureClassInitialized(Native Method)
 at 
 sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
 at 
 sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140)
 at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057)
 at java.lang.reflect.Field.getFieldAccessor(Field.java:1038)
 at java.lang.reflect.Field.getLong(Field.java:591)
 at 
 java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663)
 at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
 at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
 at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.io.ObjectStreamClass.init(ObjectStreamClass.java:468)
 at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
 at 
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
 at 
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 at 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 at 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 at 
 akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
 at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
 at 
 akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
 at scala.util.Try$.apply(Try.scala:161)
 at 
 akka.serialization.Serialization.deserialize(Serialization.scala:98)
 at 
 akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63)
 at 
 akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
 at scala.util.Try$.apply(Try.scala:161)
 at 
 akka.serialization.Serialization.deserialize(Serialization.scala:98)
 at 
 akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
 at 
 akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58)
 at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58)
 at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76)
 at 
 akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)

[jira] [Commented] (HIVE-8850) ObjectStore:: rollbackTransaction() should set the transaction status to TXN_STATUS.ROLLBACK irrespective of whether it is active or not

2014-11-19 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218356#comment-14218356
 ] 

Sushanth Sowmyan commented on HIVE-8850:


Yeah, I agree HIVE-8891 is different, and is necessary as well - I called you 
guys in here so that I have more people familiar with this looking at this as 
well. :)

And yes, the patch provided here would still not solve the issue. It's a good 
first step in that it solves the issue of commitTransaction after a 
rollbackTransaction where the connection is invalidated in the background by 
something else, such as bonecp, but it does not yet solve the issue of an 
openTransaction after a rollbackTransaction in a nested scope.

 ObjectStore:: rollbackTransaction() should set the transaction status to 
 TXN_STATUS.ROLLBACK irrespective of whether it is active or not
 

 Key: HIVE-8850
 URL: https://issues.apache.org/jira/browse/HIVE-8850
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-8850.1.patch


 We can run into issues as described below:
 Hive script adds 2800 partitions to a table and during this it can get a 
 SQLState 08S01 [Communication Link Error] and bonecp kill all the connections 
 in the pool. The partitions are added and a create table statement executes 
 (Metering_IngestedData_Compressed). The map job finishes successfully and 
 while moving the table to the hive warehouse the ObjectStore.java 
 commitTransaction() raises the error: commitTransaction was called but 
 openTransactionCalls = 0. This probably indicates that there are unbalanced 
 calls to openTransaction/commitTransaction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8638) Implement bucket map join optimization [Spark Branch]

2014-11-19 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HIVE-8638:
-

Assignee: Jimmy Xiang  (was: Na Yang)

 Implement bucket map join optimization [Spark Branch]
 -

 Key: HIVE-8638
 URL: https://issues.apache.org/jira/browse/HIVE-8638
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Na Yang
Assignee: Jimmy Xiang

 In the hive-on-mr implementation, bucket map join optimization has to depend 
 on the map join hint. While in the hive-on-tez implementation, a join can be 
 automatically converted to bucket map join if certain conditions are met such 
 as: 
 1. the optimization flag hive.convert.join.bucket.mapjoin.tez is ON
 2. all join tables are buckets and each small table's bucket number can be 
 divided by big table's bucket number
 3. bucket columns == join columns
 In the hive-on-spark implementation, it is ideal to have the bucket map join 
 auto-convertion support. when all the required criteria are met, a join can 
 be automatically converted to a bucket map join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8915) Log file explosion due to non-existence of COMPACTION_QUEUE table

2014-11-19 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-8915:
--

 Summary: Log file explosion due to non-existence of 
COMPACTION_QUEUE table
 Key: HIVE-8915
 URL: https://issues.apache.org/jira/browse/HIVE-8915
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0, 0.15.0, 0.14.1
Reporter: Sushanth Sowmyan


I hit an issue with a fresh set up of hive in a vm, where I did not have db 
tables as specified by hive-txn-schema-0.14.0.mysql.sql created.

On metastore startup, I got an endless loop of errors being populated to the 
log file, which caused the log file to grow to 1.7GB in 5 minutes, with 950k 
copies of the same error stack trace in it before I realized what was happening 
and killed it. We should either have a delay of sorts to make sure we don't 
endlessly respin on that error so quickly, or we should error out and fail if 
we're not able to start.

The stack trace in question is as follows:

{noformat}
2014-11-19 01:44:57,654 ERROR compactor.Cleaner
(Cleaner.java:run(143)) - Caught an exception in the main loop of
compactor cleaner, MetaException(message:Unable to connect to
transaction database
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table
'hive.COMPACTION_QUEUE' doesn't exist
at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.Util.getInstance(Util.java:386)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2619)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2569)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1524)
at com.jolbox.bonecp.StatementHandle.executeQuery(StatementHandle.java:464)
at 
org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:266)
at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86)
)
at 
org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:291)
at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8915) Log file explosion due to non-existence of COMPACTION_QUEUE table

2014-11-19 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218371#comment-14218371
 ] 

Sushanth Sowmyan commented on HIVE-8915:


[~alangates], you might be interested in this issue.

 Log file explosion due to non-existence of COMPACTION_QUEUE table
 -

 Key: HIVE-8915
 URL: https://issues.apache.org/jira/browse/HIVE-8915
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0, 0.15.0, 0.14.1
Reporter: Sushanth Sowmyan

 I hit an issue with a fresh set up of hive in a vm, where I did not have db 
 tables as specified by hive-txn-schema-0.14.0.mysql.sql created.
 On metastore startup, I got an endless loop of errors being populated to the 
 log file, which caused the log file to grow to 1.7GB in 5 minutes, with 950k 
 copies of the same error stack trace in it before I realized what was 
 happening and killed it. We should either have a delay of sorts to make sure 
 we don't endlessly respin on that error so quickly, or we should error out 
 and fail if we're not able to start.
 The stack trace in question is as follows:
 {noformat}
 2014-11-19 01:44:57,654 ERROR compactor.Cleaner
 (Cleaner.java:run(143)) - Caught an exception in the main loop of
 compactor cleaner, MetaException(message:Unable to connect to
 transaction database
 com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table
 'hive.COMPACTION_QUEUE' doesn't exist
 at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown Source)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
 at com.mysql.jdbc.Util.getInstance(Util.java:386)
 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)
 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990)
 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2619)
 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2569)
 at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1524)
 at com.jolbox.bonecp.StatementHandle.executeQuery(StatementHandle.java:464)
 at 
 org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:266)
 at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86)
 )
 at 
 org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:291)
 at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8848) data loading from text files or text file processing doesn't handle nulls correctly


 [ 
https://issues.apache.org/jira/browse/HIVE-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8848:
---
Status: Patch Available  (was: Open)

 data loading from text files or text file processing doesn't handle nulls 
 correctly
 ---

 Key: HIVE-8848
 URL: https://issues.apache.org/jira/browse/HIVE-8848
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
 Attachments: HIVE-8848.patch


 I am not sure how nulls are supposed to be stored in text tables, but after 
 loading some data with null or NULL strings, or x00 characters, we get 
 bunch of annoying logging from LazyPrimitive that data is not in INT format 
 and was converted to null, with data being null (string saying null, I 
 assume from the code).
 Either load should load them as nulls, or there should be some defined way to 
 load nulls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8739) handle Derby errors with joins and filters in Direct SQL in a Derby-specific path


[ 
https://issues.apache.org/jira/browse/HIVE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218385#comment-14218385
 ] 

Sergey Shelukhin commented on HIVE-8739:


It also affects Oracle

 handle Derby errors with joins and filters in Direct SQL in a Derby-specific 
 path
 -

 Key: HIVE-8739
 URL: https://issues.apache.org/jira/browse/HIVE-8739
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.15.0

 Attachments: HIVE-8739.01.patch, HIVE-8739.02.patch, HIVE-8739.patch, 
 HIVE-8739.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8739) handle Derby and Oracle errors with joins and filters in Direct SQL in a invalid-DB-specific path


 [ 
https://issues.apache.org/jira/browse/HIVE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8739:
---
Summary: handle Derby and Oracle errors with joins and filters in Direct 
SQL in a invalid-DB-specific path  (was: handle Derby errors with joins and 
filters in Direct SQL in a Derby-specific path)

 handle Derby and Oracle errors with joins and filters in Direct SQL in a 
 invalid-DB-specific path
 -

 Key: HIVE-8739
 URL: https://issues.apache.org/jira/browse/HIVE-8739
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.15.0

 Attachments: HIVE-8739.01.patch, HIVE-8739.02.patch, HIVE-8739.patch, 
 HIVE-8739.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8876) incorrect upgrade script for Oracle (13-14)


 [ 
https://issues.apache.org/jira/browse/HIVE-8876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8876:
---
Fix Version/s: 0.14.1

 incorrect upgrade script for Oracle (13-14)
 

 Key: HIVE-8876
 URL: https://issues.apache.org/jira/browse/HIVE-8876
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
 Fix For: 0.15.0, 0.14.1

 Attachments: HIVE-8876.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8876) incorrect upgrade script for Oracle (13-14)


[ 
https://issues.apache.org/jira/browse/HIVE-8876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218393#comment-14218393
 ] 

Sergey Shelukhin commented on HIVE-8876:


committed to 14

 incorrect upgrade script for Oracle (13-14)
 

 Key: HIVE-8876
 URL: https://issues.apache.org/jira/browse/HIVE-8876
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
 Fix For: 0.15.0, 0.14.1

 Attachments: HIVE-8876.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8915) Log file explosion due to non-existence of COMPACTION_QUEUE table

2014-11-19 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned HIVE-8915:


Assignee: Alan Gates

 Log file explosion due to non-existence of COMPACTION_QUEUE table
 -

 Key: HIVE-8915
 URL: https://issues.apache.org/jira/browse/HIVE-8915
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0, 0.15.0, 0.14.1
Reporter: Sushanth Sowmyan
Assignee: Alan Gates

 I hit an issue with a fresh set up of hive in a vm, where I did not have db 
 tables as specified by hive-txn-schema-0.14.0.mysql.sql created.
 On metastore startup, I got an endless loop of errors being populated to the 
 log file, which caused the log file to grow to 1.7GB in 5 minutes, with 950k 
 copies of the same error stack trace in it before I realized what was 
 happening and killed it. We should either have a delay of sorts to make sure 
 we don't endlessly respin on that error so quickly, or we should error out 
 and fail if we're not able to start.
 The stack trace in question is as follows:
 {noformat}
 2014-11-19 01:44:57,654 ERROR compactor.Cleaner
 (Cleaner.java:run(143)) - Caught an exception in the main loop of
 compactor cleaner, MetaException(message:Unable to connect to
 transaction database
 com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table
 'hive.COMPACTION_QUEUE' doesn't exist
 at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown Source)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
 at com.mysql.jdbc.Util.getInstance(Util.java:386)
 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)
 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990)
 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2619)
 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2569)
 at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1524)
 at com.jolbox.bonecp.StatementHandle.executeQuery(StatementHandle.java:464)
 at 
 org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:266)
 at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86)
 )
 at 
 org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:291)
 at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Hive-0.14 - Build # 736 - Still Failing

2014-11-19 Thread Apache Jenkins Server

Changes for Build #696
[rohini] PIG-4186: Fix e2e run against new build of pig and some enhancements 
(rohini)


Changes for Build #697

Changes for Build #698

Changes for Build #699

Changes for Build #700

Changes for Build #701

Changes for Build #702

Changes for Build #703
[daijy] HIVE-8484: HCatalog throws an exception if Pig job is of type 'fetch' 
(Lorand Bendig via Daniel Dai)


Changes for Build #704
[gunther] HIVE-8781: Nullsafe joins are busted on Tez (Gunther Hagleitner, 
reviewed by Prasanth J)


Changes for Build #705
[gunther] HIVE-8760: Pass a copy of HiveConf to hooks (Gunther Hagleitner, 
reviewed by Gopal V)


Changes for Build #706
[thejas] HIVE-8772 : zookeeper info logs are always printed from beeline with 
service discovery mode (Thejas Nair, reviewed by Vaibhav Gumashta)


Changes for Build #707
[gunther] HIVE-8782: HBase handler doesn't compile with hadoop-1 (Jimmy Xiang, 
reviewed by Xuefu and Sergey)


Changes for Build #708

Changes for Build #709
[thejas] HIVE-8785 : HiveServer2 LogDivertAppender should be more selective for 
beeline getLogs (Thejas Nair, reviewed by Gopal V)


Changes for Build #710
[vgumashta] HIVE-8764: Windows: HiveServer2 TCP SSL cannot recognize localhost 
(Vaibhav Gumashta reviewed by Thejas Nair)


Changes for Build #711
[gunther] HIVE-8768: CBO: Fix filter selectivity for 'in clause'  '' (Laljo 
John Pullokkaran via Gunther Hagleitner)


Changes for Build #712
[gunther] HIVE-8794: Hive on Tez leaks AMs when killed before first dag is run 
(Gunther Hagleitner, reviewed by Gopal V)


Changes for Build #713
[gunther] HIVE-8798: Some Oracle deadlocks not being caught in TxnHandler (Alan 
Gates via Gunther Hagleitner)


Changes for Build #714
[gunther] HIVE-8800: Update release notes and notice for hive .14 (Gunther 
Hagleitner, reviewed by Prasanth J)

[gunther] HIVE-8799: boatload of missing apache headers (Gunther Hagleitner, 
reviewed by Thejas M Nair)


Changes for Build #715
[gunther] Preparing for release 0.14.0


Changes for Build #716
[gunther] Preparing for release 0.14.0

[gunther] Preparing for release 0.14.0


Changes for Build #717

Changes for Build #718

Changes for Build #719

Changes for Build #720
[gunther] HIVE-8811: Dynamic partition pruning can result in NPE during query 
compilation (Gunther Hagleitner, reviewed by Gopal V)


Changes for Build #721
[gunther] HIVE-8805: CBO skipped due to SemanticException: Line 0:-1 Both left 
and right aliases encountered in JOIN 'avg_cs_ext_discount_amt' (Laljo John 
Pullokkaran via Gunther Hagleitner)

[sershe] HIVE-8715 : Hive 14 upgrade scripts can fail for statistics if 
database was created using auto-create
 ADDENDUM (Sergey Shelukhin, reviewed by Ashutosh Chauhan and Gunther 
Hagleitner)


Changes for Build #722

Changes for Build #723

Changes for Build #724
[gunther] HIVE-8845: Switch to Tez 0.5.2 (Gunther Hagleitner, reviewed by Gopal 
V)


Changes for Build #725
[sershe] HIVE-8295 : Add batch retrieve partition objects for metastore direct 
sql (Selina Zhang and Sergey Shelukhin, reviewed by Ashutosh Chauhan)


Changes for Build #726

Changes for Build #727
[gunther] HIVE-8873: Switch to calcite 0.9.2 (Gunther Hagleitner, reviewed by 
Gopal V)


Changes for Build #728
[thejas] HIVE-8830 : hcatalog process don't exit because of non daemon thread 
(Thejas Nair, reviewed by Eugene Koifman, Sushanth Sowmyan)


Changes for Build #729

Changes for Build #730

Changes for Build #731

Changes for Build #732

Changes for Build #733

Changes for Build #734

Changes for Build #735

Changes for Build #736
[sershe] HIVE-8876 : incorrect upgrade script for Oracle (13-14) (Sergey 
Shelukhin, reviewed by Ashutosh Chauhan)




No tests ran.

The Apache Jenkins build system has built Hive-0.14 (build #736)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-0.14/736/ to view 
the results.

[jira] [Created] (HIVE-8916) Handle user@domain username under LDAP authentication

2014-11-19 Thread Mohit Sabharwal (JIRA)

Mohit Sabharwal created HIVE-8916:
-

 Summary: Handle user@domain username under LDAP authentication
 Key: HIVE-8916
 URL: https://issues.apache.org/jira/browse/HIVE-8916
 Project: Hive
  Issue Type: Bug
  Components: Authentication
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal


If LDAP is configured with multiple domains for authentication, users can be in 
different domains.

Currently, LdapAuthenticationProviderImpl blindly appends the domain configured 
hive.server2.authentication.ldap.Domain to the username, which limits user to 
that domain. However, under multi-domain authentication, the username may 
already include the domain (ex:  u...@domain.foo.com). We should not append a 
domain if one is already present.

Also, if username already includes the domain, rest of Hive and authorization 
providers still expects the short name (user and not u...@domain.foo.com) 
for looking up privilege rules, etc.  As such, any domain info in the username 
should be stripped off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218453#comment-14218453
 ] 

Hive QA commented on HIVE-8854:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12681608/HIVE-8854.1-spark.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 7181 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/393/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/393/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-393/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12681608 - PreCommit-HIVE-SPARK-Build

 Guava dependency conflict between hive driver and remote spark context[Spark 
 Branch]
 

 Key: HIVE-8854
 URL: https://issues.apache.org/jira/browse/HIVE-8854
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Marcelo Vanzin
  Labels: Spark-M3
 Attachments: HIVE-8854.1-spark.patch, HIVE-8854.1-spark.patch, 
 hive-dirver-classloader-info.output


 Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark 
 context depends on guava 14.0.1, It should be JobMetrics deserialize failed 
 on Hive driver side since Absent is used in Metrics, here is the hive driver 
 log:
 {noformat}
 java.lang.IllegalAccessError: tried to access method 
 com.google.common.base.Optional.init()V from class 
 com.google.common.base.Absent
 at com.google.common.base.Absent.init(Absent.java:35)
 at com.google.common.base.Absent.clinit(Absent.java:33)
 at sun.misc.Unsafe.ensureClassInitialized(Native Method)
 at 
 sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
 at 
 sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140)
 at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057)
 at java.lang.reflect.Field.getFieldAccessor(Field.java:1038)
 at java.lang.reflect.Field.getLong(Field.java:591)
 at 
 java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663)
 at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
 at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
 at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.io.ObjectStreamClass.init(ObjectStreamClass.java:468)
 at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
 at 
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
 at 
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 at 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 at 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 at 
 akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
 at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
 at 
 akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)

Review Request 28255: HIVE-8916 : Handle user@domain username under LDAP authentication

2014-11-19 Thread Mohit Sabharwal


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28255/
---

Review request for hive.


Bugs: HIVE-8916
https://issues.apache.org/jira/browse/HIVE-8916


Repository: hive-git


Description
---

HIVE-8916 : Handle user@domain username under LDAP authentication

If LDAP is configured with multiple domains for authentication, users can be in 
different domains.

Currently, LdapAuthenticationProviderImpl blindly appends the domain configured 
hive.server2.authentication.ldap.Domain to the username, which limits user to 
that domain. However, under multi-domain authentication, the username may 
already include the domain (ex: u...@domain.foo.com). We should not append a 
domain if one is already present.

Also, if username already includes the domain, rest of Hive and authorization 
providers still expects the short name (user and not u...@domain.foo.com) 
for looking up privilege rules, etc. As such, any domain info in the username 
should be stripped off.


Diffs
-

  service/src/java/org/apache/hive/service/ServiceUtils.java PRE-CREATION 
  
service/src/java/org/apache/hive/service/auth/LdapAuthenticationProviderImpl.java
 d075761d079f8a18d7d317483783fe3b801e00d5 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
3a8ae70d8bd31c9958ea6ae00a2d01c315c80615 

Diff: https://reviews.apache.org/r/28255/diff/


Testing
---

Configured HS2 for LDAP authentication:

property
  namehive.server2.authentication/name
  valueLDAP/value
/property
property
  namehive.server2.authentication.ldap.url/name
  valueldap://foo.ldap.server.com/value
/property
property
  namehive.server2.authentication.ldap.Domain/name
  valuefoo.ldap.domain.com/value
/property

Ran beeline with user names with and without ldap domain, in both cases
authentication works. Before the change, authentication failed if
domain was present in username:

beeline -u jdbc:hive2://localhost:1 -n u...@foo.ldap.domain.com -p 
TestPassword --debug

beeline -u jdbc:hive2://localhost:1 -n user -p TestPassword --debug


Thanks,

Mohit Sabharwal

[jira] [Updated] (HIVE-8916) Handle user@domain username under LDAP authentication

2014-11-19 Thread Mohit Sabharwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohit Sabharwal updated HIVE-8916:
--
Attachment: HIVE-8916.patch

 Handle user@domain username under LDAP authentication
 -

 Key: HIVE-8916
 URL: https://issues.apache.org/jira/browse/HIVE-8916
 Project: Hive
  Issue Type: Bug
  Components: Authentication
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal
 Attachments: HIVE-8916.patch


 If LDAP is configured with multiple domains for authentication, users can be 
 in different domains.
 Currently, LdapAuthenticationProviderImpl blindly appends the domain 
 configured hive.server2.authentication.ldap.Domain to the username, which 
 limits user to that domain. However, under multi-domain authentication, the 
 username may already include the domain (ex:  u...@domain.foo.com). We should 
 not append a domain if one is already present.
 Also, if username already includes the domain, rest of Hive and authorization 
 providers still expects the short name (user and not 
 u...@domain.foo.com) for looking up privilege rules, etc.  As such, any 
 domain info in the username should be stripped off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8916) Handle user@domain username under LDAP authentication

2014-11-19 Thread Mohit Sabharwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohit Sabharwal updated HIVE-8916:
--
Status: Patch Available  (was: Open)

 Handle user@domain username under LDAP authentication
 -

 Key: HIVE-8916
 URL: https://issues.apache.org/jira/browse/HIVE-8916
 Project: Hive
  Issue Type: Bug
  Components: Authentication
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal
 Attachments: HIVE-8916.patch


 If LDAP is configured with multiple domains for authentication, users can be 
 in different domains.
 Currently, LdapAuthenticationProviderImpl blindly appends the domain 
 configured hive.server2.authentication.ldap.Domain to the username, which 
 limits user to that domain. However, under multi-domain authentication, the 
 username may already include the domain (ex:  u...@domain.foo.com). We should 
 not append a domain if one is already present.
 Also, if username already includes the domain, rest of Hive and authorization 
 providers still expects the short name (user and not 
 u...@domain.foo.com) for looking up privilege rules, etc.  As such, any 
 domain info in the username should be stripped off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218496#comment-14218496
 ] 

Szehon Ho commented on HIVE-8854:
-

+1, test failures dont look related

 Guava dependency conflict between hive driver and remote spark context[Spark 
 Branch]
 

 Key: HIVE-8854
 URL: https://issues.apache.org/jira/browse/HIVE-8854
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Marcelo Vanzin
  Labels: Spark-M3
 Attachments: HIVE-8854.1-spark.patch, HIVE-8854.1-spark.patch, 
 hive-dirver-classloader-info.output


 Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark 
 context depends on guava 14.0.1, It should be JobMetrics deserialize failed 
 on Hive driver side since Absent is used in Metrics, here is the hive driver 
 log:
 {noformat}
 java.lang.IllegalAccessError: tried to access method 
 com.google.common.base.Optional.init()V from class 
 com.google.common.base.Absent
 at com.google.common.base.Absent.init(Absent.java:35)
 at com.google.common.base.Absent.clinit(Absent.java:33)
 at sun.misc.Unsafe.ensureClassInitialized(Native Method)
 at 
 sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
 at 
 sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140)
 at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057)
 at java.lang.reflect.Field.getFieldAccessor(Field.java:1038)
 at java.lang.reflect.Field.getLong(Field.java:591)
 at 
 java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663)
 at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
 at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
 at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.io.ObjectStreamClass.init(ObjectStreamClass.java:468)
 at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
 at 
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
 at 
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 at 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 at 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 at 
 akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
 at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
 at 
 akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
 at scala.util.Try$.apply(Try.scala:161)
 at 
 akka.serialization.Serialization.deserialize(Serialization.scala:98)
 at 
 akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63)
 at 
 akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
 at scala.util.Try$.apply(Try.scala:161)
 at 
 akka.serialization.Serialization.deserialize(Serialization.scala:98)
 at 
 akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
 at 
 akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58)
 at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58)
 at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76)
 at 
 akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at

[jira] [Commented] (HIVE-8893) Implement whitelist for builtin UDFs to avoid untrused code execution in multiuser mode

[
https://issues.apache.org/jira/browse/HIVE-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218512#comment-14218512
]

Szehon Ho commented on HIVE-8893:
-

Hi Prasad, sorry about that, was looking at the patch again and whitespace is
still there on the latest patch, I didnt notice it.

Also I took a look and its inconsistent in setupBlockedUdfs() to check for
empty for black and white list. While we are changing, can we also use the
guava splitter with omitEmptyString() argument for this situation, so the logic
is cleaner? Again sorry I didnt look that closely before.

Implement whitelist for builtin UDFs to avoid untrused code execution in
multiuser mode
---

Key: HIVE-8893
URL: https://issues.apache.org/jira/browse/HIVE-8893
Project: Hive
Issue Type: Bug
Components: Authorization, HiveServer2, SQL
Affects Versions: 0.14.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
Fix For: 0.15.0

Attachments: HIVE-8893.3.patch, HIVE-8893.4.patch, HIVE-8893.5.patch

The udfs like reflect() or java_method() enables executing a java method as
udf. While this offers lot of flexibility in the standalone mode, it can
become a security loophole in a secure multiuser environment. For example, in
HiveServer2 one can execute any available java code with user hive's
credentials.
We need a whitelist and blacklist to restrict builtin udfs in Hiveserver2.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 28255: HIVE-8916 : Handle user@domain username under LDAP authentication

2014-11-19 Thread Szehon Ho


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28255/#review62235
---


Hi Mohit, looks great, just one suggestion on rb for your consideration.


service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java
https://reviews.apache.org/r/28255/#comment104245

Will be it simpler to use a regex like [^\@]+ to find this?


- Szehon Ho


On Nov. 19, 2014, 8:49 p.m., Mohit Sabharwal wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28255/
 ---
 
 (Updated Nov. 19, 2014, 8:49 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-8916
 https://issues.apache.org/jira/browse/HIVE-8916
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-8916 : Handle user@domain username under LDAP authentication
 
 If LDAP is configured with multiple domains for authentication, users can be 
 in different domains.
 
 Currently, LdapAuthenticationProviderImpl blindly appends the domain 
 configured hive.server2.authentication.ldap.Domain to the username, which 
 limits user to that domain. However, under multi-domain authentication, the 
 username may already include the domain (ex: u...@domain.foo.com). We should 
 not append a domain if one is already present.
 
 Also, if username already includes the domain, rest of Hive and authorization 
 providers still expects the short name (user and not 
 u...@domain.foo.com) for looking up privilege rules, etc. As such, any 
 domain info in the username should be stripped off.
 
 
 Diffs
 -
 
   service/src/java/org/apache/hive/service/ServiceUtils.java PRE-CREATION 
   
 service/src/java/org/apache/hive/service/auth/LdapAuthenticationProviderImpl.java
  d075761d079f8a18d7d317483783fe3b801e00d5 
   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
 3a8ae70d8bd31c9958ea6ae00a2d01c315c80615 
 
 Diff: https://reviews.apache.org/r/28255/diff/
 
 
 Testing
 ---
 
 Configured HS2 for LDAP authentication:
 
 property
   namehive.server2.authentication/name
   valueLDAP/value
 /property
 property
   namehive.server2.authentication.ldap.url/name
   valueldap://foo.ldap.server.com/value
 /property
 property
   namehive.server2.authentication.ldap.Domain/name
   valuefoo.ldap.domain.com/value
 /property
 
 Ran beeline with user names with and without ldap domain, in both cases
 authentication works. Before the change, authentication failed if
 domain was present in username:
 
 beeline -u jdbc:hive2://localhost:1 -n u...@foo.ldap.domain.com -p 
 TestPassword --debug
 
 beeline -u jdbc:hive2://localhost:1 -n user -p TestPassword --debug
 
 
 Thanks,
 
 Mohit Sabharwal

[jira] [Commented] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218532#comment-14218532
 ] 

Hive QA commented on HIVE-8854:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12682460/HIVE-8854.1-spark.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 7181 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/394/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/394/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-394/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12682460 - PreCommit-HIVE-SPARK-Build

 Guava dependency conflict between hive driver and remote spark context[Spark 
 Branch]
 

 Key: HIVE-8854
 URL: https://issues.apache.org/jira/browse/HIVE-8854
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Marcelo Vanzin
  Labels: Spark-M3
 Attachments: HIVE-8854.1-spark.patch, HIVE-8854.1-spark.patch, 
 hive-dirver-classloader-info.output


 Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark 
 context depends on guava 14.0.1, It should be JobMetrics deserialize failed 
 on Hive driver side since Absent is used in Metrics, here is the hive driver 
 log:
 {noformat}
 java.lang.IllegalAccessError: tried to access method 
 com.google.common.base.Optional.init()V from class 
 com.google.common.base.Absent
 at com.google.common.base.Absent.init(Absent.java:35)
 at com.google.common.base.Absent.clinit(Absent.java:33)
 at sun.misc.Unsafe.ensureClassInitialized(Native Method)
 at 
 sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
 at 
 sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140)
 at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057)
 at java.lang.reflect.Field.getFieldAccessor(Field.java:1038)
 at java.lang.reflect.Field.getLong(Field.java:591)
 at 
 java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663)
 at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
 at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
 at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.io.ObjectStreamClass.init(ObjectStreamClass.java:468)
 at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
 at 
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
 at 
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 at 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 at 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 at 
 akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
 at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
 at 
 akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)

[jira] [Commented] (HIVE-8916) Handle user@domain username under LDAP authentication


[ 
https://issues.apache.org/jira/browse/HIVE-8916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218529#comment-14218529
 ] 

Szehon Ho commented on HIVE-8916:
-

Thanks for this fix!  Left a comment on the review-board.

 Handle user@domain username under LDAP authentication
 -

 Key: HIVE-8916
 URL: https://issues.apache.org/jira/browse/HIVE-8916
 Project: Hive
  Issue Type: Bug
  Components: Authentication
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal
 Attachments: HIVE-8916.patch


 If LDAP is configured with multiple domains for authentication, users can be 
 in different domains.
 Currently, LdapAuthenticationProviderImpl blindly appends the domain 
 configured hive.server2.authentication.ldap.Domain to the username, which 
 limits user to that domain. However, under multi-domain authentication, the 
 username may already include the domain (ex:  u...@domain.foo.com). We should 
 not append a domain if one is already present.
 Also, if username already includes the domain, rest of Hive and authorization 
 providers still expects the short name (user and not 
 u...@domain.foo.com) for looking up privilege rules, etc.  As such, any 
 domain info in the username should be stripped off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8863) Cannot drop table with uppercase name after compute statistics for columns


 [ 
https://issues.apache.org/jira/browse/HIVE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-8863:
--
Status: Patch Available  (was: Open)

 Cannot drop table with uppercase name after compute statistics for columns
 

 Key: HIVE-8863
 URL: https://issues.apache.org/jira/browse/HIVE-8863
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Juan Yu
Assignee: Chaoyu Tang
 Attachments: HIVE-8863.patch


 Create a table with uppercase name Test,
 run analyze table Test compute statistics for columns col1
 After this, you cannot drop the table by
 drop table Test;
 Got error:
 NestedThrowablesStackTrace: 
 java.sql.BatchUpdateException: Cannot delete or update a parent row: a 
 foreign key constraint fails (hive2.TAB_COL_STATS, CONSTRAINT 
 TAB_COL_STATS_FK FOREIGN KEY (TBL_ID) REFERENCES TBLS (TBL_ID)) 
 workaround is to use lowercase table name
 drop table test;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8863) Cannot drop table with uppercase name after compute statistics for columns


 [ 
https://issues.apache.org/jira/browse/HIVE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang reassigned HIVE-8863:
-

Assignee: Chaoyu Tang

 Cannot drop table with uppercase name after compute statistics for columns
 

 Key: HIVE-8863
 URL: https://issues.apache.org/jira/browse/HIVE-8863
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Juan Yu
Assignee: Chaoyu Tang
 Attachments: HIVE-8863.patch


 Create a table with uppercase name Test,
 run analyze table Test compute statistics for columns col1
 After this, you cannot drop the table by
 drop table Test;
 Got error:
 NestedThrowablesStackTrace: 
 java.sql.BatchUpdateException: Cannot delete or update a parent row: a 
 foreign key constraint fails (hive2.TAB_COL_STATS, CONSTRAINT 
 TAB_COL_STATS_FK FOREIGN KEY (TBL_ID) REFERENCES TBLS (TBL_ID)) 
 workaround is to use lowercase table name
 drop table test;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8863) Cannot drop table with uppercase name after compute statistics for columns


 [ 
https://issues.apache.org/jira/browse/HIVE-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-8863:
--
Attachment: HIVE-8863.patch

This case issue does not only occur to the table, but also to the database. For 
example, drop table TESTDB.test will also fail after compute statistics. Upload 
a patch with tests.

 Cannot drop table with uppercase name after compute statistics for columns
 

 Key: HIVE-8863
 URL: https://issues.apache.org/jira/browse/HIVE-8863
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Juan Yu
 Attachments: HIVE-8863.patch


 Create a table with uppercase name Test,
 run analyze table Test compute statistics for columns col1
 After this, you cannot drop the table by
 drop table Test;
 Got error:
 NestedThrowablesStackTrace: 
 java.sql.BatchUpdateException: Cannot delete or update a parent row: a 
 foreign key constraint fails (hive2.TAB_COL_STATS, CONSTRAINT 
 TAB_COL_STATS_FK FOREIGN KEY (TBL_ID) REFERENCES TBLS (TBL_ID)) 
 workaround is to use lowercase table name
 drop table test;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8854) Guava dependency conflict between hive driver and remote spark context[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-8854:

   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to spark branch.  Thanks Marcelo!

 Guava dependency conflict between hive driver and remote spark context[Spark 
 Branch]
 

 Key: HIVE-8854
 URL: https://issues.apache.org/jira/browse/HIVE-8854
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Marcelo Vanzin
  Labels: Spark-M3
 Fix For: spark-branch

 Attachments: HIVE-8854.1-spark.patch, HIVE-8854.1-spark.patch, 
 hive-dirver-classloader-info.output


 Hive driver would load guava 11.0.2 from hadoop/tez, while remote spark 
 context depends on guava 14.0.1, It should be JobMetrics deserialize failed 
 on Hive driver side since Absent is used in Metrics, here is the hive driver 
 log:
 {noformat}
 java.lang.IllegalAccessError: tried to access method 
 com.google.common.base.Optional.init()V from class 
 com.google.common.base.Absent
 at com.google.common.base.Absent.init(Absent.java:35)
 at com.google.common.base.Absent.clinit(Absent.java:33)
 at sun.misc.Unsafe.ensureClassInitialized(Native Method)
 at 
 sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
 at 
 sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140)
 at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057)
 at java.lang.reflect.Field.getFieldAccessor(Field.java:1038)
 at java.lang.reflect.Field.getLong(Field.java:591)
 at 
 java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1663)
 at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
 at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
 at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.io.ObjectStreamClass.init(ObjectStreamClass.java:468)
 at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
 at 
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
 at 
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 at 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 at 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 at 
 akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
 at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
 at 
 akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
 at scala.util.Try$.apply(Try.scala:161)
 at 
 akka.serialization.Serialization.deserialize(Serialization.scala:98)
 at 
 akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63)
 at 
 akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
 at scala.util.Try$.apply(Try.scala:161)
 at 
 akka.serialization.Serialization.deserialize(Serialization.scala:98)
 at 
 akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
 at 
 akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58)
 at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58)
 at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76)
 at 
 akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at

[jira] [Commented] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez

2014-11-19 Thread Gunther Hagleitner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218546#comment-14218546
 ] 

Gunther Hagleitner commented on HIVE-:
--

Don't think the test failures are related.

[~prasanth_j] thoughts? I'm +1 on the last patch.

 Mapjoin with LateralViewJoin generates wrong plan in Tez
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, 
 HIVE-.4.patch


 Queries like these 
 {code}
 with sub1 as
 (select aid, avalue from expod1 lateral view explode(av) avs as avalue ),
 sub2 as
 (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue)
 select sub1.aid, sub1.avalue, sub2.bvalue
 from sub1,sub2
 where sub1.aid=sub2.bid;
 {code}
 generates twice the number of rows in Tez when compared to MR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8888) Mapjoin with LateralViewJoin generates wrong plan in Tez

2014-11-19 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218549#comment-14218549
 ] 

Prasanth J commented on HIVE-:
--

[~hagleitn] Even I don't think the test failure is related. The code changes 
should not affect TestCliDriver tests. I ran the test locally and it ran 
successfully.

Also can we have this for 0.14.1?


 Mapjoin with LateralViewJoin generates wrong plan in Tez
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, 
 HIVE-.4.patch


 Queries like these 
 {code}
 with sub1 as
 (select aid, avalue from expod1 lateral view explode(av) avs as avalue ),
 sub2 as
 (select bid, bvalue from expod2 lateral view explode(bv) bvs as bvalue)
 select sub1.aid, sub1.avalue, sub2.bvalue
 from sub1,sub2
 where sub1.aid=sub2.bid;
 {code}
 generates twice the number of rows in Tez when compared to MR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7948) Add an E2E test to verify fix for HIVE-7155

2014-11-19 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218560#comment-14218560
 ] 

Eugene Koifman commented on HIVE-7948:
--

there is currently only 1 webhcat-site.xml.  This patch modifies it to set the 
templeton.mapper.memory.mb such that every (at least most) job will fail.  So 
basically I think this patch will break a lot of other tests.

 Add an E2E test  to verify fix for HIVE-7155
 

 Key: HIVE-7948
 URL: https://issues.apache.org/jira/browse/HIVE-7948
 Project: Hive
  Issue Type: Test
  Components: Tests, WebHCat
Reporter: Aswathy Chellammal Sreekumar
Assignee: Aswathy Chellammal Sreekumar
Priority: Minor
 Attachments: HIVE-7948.1.patch, HIVE-7948.patch


 E2E Test to verify webhcat property templeton.mapper.memory.mb correctly 
 overrides mapreduce.map.memory.mb. The feature was added as part of HIVE-7155.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8917) HIVE-5679 adds two thread safety problems


[ 
https://issues.apache.org/jira/browse/HIVE-8917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218567#comment-14218567
 ] 

Brock Noland commented on HIVE-8917:


FYI [~sershe]

 HIVE-5679 adds two thread safety problems
 -

 Key: HIVE-8917
 URL: https://issues.apache.org/jira/browse/HIVE-8917
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland

 HIVE-5679 adds two static {{SimpleDateFormat}} objects and 
 {{SimpleDateFormat}} is not thread safe. These should be converted to thread 
 locals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8917) HIVE-5679 adds two thread safety problems

Brock Noland created HIVE-8917:
--

 Summary: HIVE-5679 adds two thread safety problems
 Key: HIVE-8917
 URL: https://issues.apache.org/jira/browse/HIVE-8917
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland


HIVE-5679 adds two static {{SimpleDateFormat}} objects and {{SimpleDateFormat}} 
is not thread safe. These should be converted to thread locals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8918) Beeline terminal cannot be initialized due to jline2 change

Sergio Peña created HIVE-8918:
-

 Summary: Beeline terminal cannot be initialized due to jline2 
change
 Key: HIVE-8918
 URL: https://issues.apache.org/jira/browse/HIVE-8918
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Sergio Peña


I fetched the latest changes from trunk, and I got the following error when 
attempting to execute beeline:

{noformat}
ERROR] Terminal initialization failed; falling back to unsupported
java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but 
interface was expected
at jline.TerminalFactory.create(TerminalFactory.java:101)
at jline.TerminalFactory.get(TerminalFactory.java:158)
at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:73)
at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117)
at 
org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Exception in thread main java.lang.IncompatibleClassChangeError: Found class 
jline.Terminal, but interface was expected
at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:101)
at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117)
at 
org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
{noformat}

I executed the following command:
{noformat}
hive --service beeline -u jdbc:hive2://localhost:1 -n sergio
{noformat}

The commit before the jline2 is working fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8918) Beeline terminal cannot be initialized due to jline2 change


[ 
https://issues.apache.org/jira/browse/HIVE-8918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218574#comment-14218574
 ] 

Sergio Peña commented on HIVE-8918:
---

FYI [~Ferd]. You worked on moving jline2, so you might have some ideas about 
what is happening.

 Beeline terminal cannot be initialized due to jline2 change
 ---

 Key: HIVE-8918
 URL: https://issues.apache.org/jira/browse/HIVE-8918
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Sergio Peña

 I fetched the latest changes from trunk, and I got the following error when 
 attempting to execute beeline:
 {noformat}
 ERROR] Terminal initialization failed; falling back to unsupported
 java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but 
 interface was expected
   at jline.TerminalFactory.create(TerminalFactory.java:101)
   at jline.TerminalFactory.get(TerminalFactory.java:158)
   at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:73)
   at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117)
   at 
 org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469)
   at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 Exception in thread main java.lang.IncompatibleClassChangeError: Found 
 class jline.Terminal, but interface was expected
   at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:101)
   at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117)
   at 
 org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469)
   at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {noformat}
 I executed the following command:
 {noformat}
 hive --service beeline -u jdbc:hive2://localhost:1 -n sergio
 {noformat}
 The commit before the jline2 is working fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8730) schemaTool failure when date partition has non-date value


 [ 
https://issues.apache.org/jira/browse/HIVE-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang reassigned HIVE-8730:
-

Assignee: Chaoyu Tang

 schemaTool failure when date partition has non-date value
 -

 Key: HIVE-8730
 URL: https://issues.apache.org/jira/browse/HIVE-8730
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.0
 Environment: CDH5.2
Reporter: Johndee Burks
Assignee: Chaoyu Tang
Priority: Minor

 If there is a none date value in the PART_KEY_VAL column within the 
 PARTITION_KEY_VALS table in the metastore db, this will cause the HIVE-5700 
 script to fail. The failure will be picked up by the schemaTool causing the 
 upgrade to fail. A classic example of a value that can be present without 
 users really being aware is __HIVE_DEFAULT_PARTITION__ which is filled in by 
 hive automatically when doing dynamic partitioning and value is not present 
 in source data for the partition column.
 The reason for the failure is that the upgrade script does not account for 
 none date values. What it is currently:
 {code}
 UPDATE PARTITION_KEY_VALS
   INNER JOIN PARTITIONS ON PARTITION_KEY_VALS.PART_ID = PARTITIONS.PART_ID
   INNER JOIN PARTITION_KEYS ON PARTITION_KEYS.TBL_ID = PARTITIONS.TBL_ID
 AND PARTITION_KEYS.INTEGER_IDX = PARTITION_KEY_VALS.INTEGER_IDX
 AND PARTITION_KEYS.PKEY_TYPE = 'date'
 SET PART_KEY_VAL = IFNULL(DATE_FORMAT(cast(PART_KEY_VAL as date),'%Y-%m-%d'), 
 PART_KEY_VAL);
 {code}
 What it should be to avoid issue: 
 {code}
 UPDATE PARTITION_KEY_VALS
   INNER JOIN PARTITIONS ON PARTITION_KEY_VALS.PART_ID = PARTITIONS.PART_ID
   INNER JOIN PARTITION_KEYS ON PARTITION_KEYS.TBL_ID = PARTITIONS.TBL_ID
 AND PARTITION_KEYS.INTEGER_IDX = PARTITION_KEY_VALS.INTEGER_IDX
 AND PARTITION_KEYS.PKEY_TYPE = 'date'
 AND PART_KEY_VAL != '__HIVE_DEFAULT_PARTITION__'
 SET PART_KEY_VAL = IFNULL(DATE_FORMAT(cast(PART_KEY_VAL as date),'%Y-%m-%d'), 
 PART_KEY_VAL);
 {code}
 == Metastore DB
 {code}
 mysql select * from PARTITION_KEY_VALS;
 +-++-+
 | PART_ID | PART_KEY_VAL   | INTEGER_IDX |
 +-++-+
 | 171 | 2099-12-31 |   0 |
 | 172 | __HIVE_DEFAULT_PARTITION__ |   0 |
 | 184 | 2099-12-01 |   0 |
 | 185 | 2099-12-30 |   0 |
 +-++-+
 {code} 
 == stdout.log
 {code}
 0: jdbc:mysql://10.16.8.121:3306/metastore !autocommit on
 0: jdbc:mysql://10.16.8.121:3306/metastore SELECT 'Upgrading MetaStore 
 schema from 0.12.0 to 0.13.0' AS ' '
 +---+--+
 |   |
 +---+--+
 | Upgrading MetaStore schema from 0.12.0 to 0.13.0  |
 +---+--+
 0: jdbc:mysql://10.16.8.121:3306/metastore SELECT ' HIVE-5700 enforce 
 single date format for partition column storage ' AS ' '
 ++--+
 ||
 ++--+
 |  HIVE-5700 enforce single date format for partition column storage   |
 ++--+
 0: jdbc:mysql://10.16.8.121:3306/metastore UPDATE PARTITION_KEY_VALS INNER 
 JOIN PARTITIONS ON PARTITION_KEY_VALS.PART_ID = PARTITIONS.PART_ID INNER JOIN 
 PARTITION_KEYS ON PARTITION_KEYS.TBL_ID = PARTITIONS.TBL_ID AND 
 PARTITION_KEYS.INTEGER_IDX = PARTITION_KEY_VALS.INTEGER_IDX AND 
 PARTITION_KEYS.PKEY_TYPE = 'date' SET PART_KEY_VAL = 
 IFNULL(DATE_FORMAT(cast(PART_KEY_VAL as date),'%Y-%m-%d'), PART_KEY_VAL)
 {code}
 == stderr.log
 {code}
 exec /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop 
 jar 
 /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib/hive-cli-0.13.1-cdh5.2.0.jar
  org.apache.hive.beeline.HiveSchemaTool -verbose -dbType mysql -upgradeSchema
 Connecting to 
 jdbc:mysql://10.16.8.121:3306/metastore?useUnicode=truecharacterEncoding=UTF-8
 Connected to: MySQL (version 5.1.73)
 Driver: MySQL-AB JDBC Driver (version mysql-connector-java-5.1.17-SNAPSHOT ( 
 Revision: ${bzr.revision-id} ))
 Transaction isolation: TRANSACTION_READ_COMMITTED
 Autocommit status: true
 1 row selected (0.025 seconds)
 1 row selected (0.004 seconds)
 Closing: 0: 
 jdbc:mysql://10.16.8.121:3306/metastore?useUnicode=truecharacterEncoding=UTF-8
 org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
 state would be

[jira] [Updated] (HIVE-6421) abs() should preserve precision/scale of decimal input

2014-11-19 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-6421:
-
Attachment: HIVE-6421.2.patch

re-upload to run precommit tests

 abs() should preserve precision/scale of decimal input
 --

 Key: HIVE-6421
 URL: https://issues.apache.org/jira/browse/HIVE-6421
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-6421.1.txt, HIVE-6421.2.patch


 {noformat}
 hive describe dec1;
 OK
 c1decimal(10,2)   None 
 hive explain select c1, abs(c1) from dec1;
  ...
 Select Operator
   expressions: c1 (type: decimal(10,2)), abs(c1) (type: 
 decimal(38,18))
 {noformat}
 Given that abs() is a GenericUDF it should be possible for the return type 
 precision/scale to match the input precision/scale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8909) Hive doesn't correctly read Parquet nested types

2014-11-19 Thread Ryan Blue (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated HIVE-8909:

Attachment: HIVE-8909-2.patch

Rebased patch on Sergio's changes. This didn't conflict except for the change 
to ArrayWritableGroupConverter, which was removed (so any change would 
conflict).

 Hive doesn't correctly read Parquet nested types
 

 Key: HIVE-8909
 URL: https://issues.apache.org/jira/browse/HIVE-8909
 Project: Hive
  Issue Type: Bug
Reporter: Ryan Blue
Assignee: Ryan Blue
 Attachments: HIVE-8909-1.patch, HIVE-8909-2.patch


 Parquet's Avro and Thrift object models don't produce the same parquet type 
 representation for lists and maps that Hive does. In the Parquet community, 
 we've defined what should be written and backward-compatibility rules for 
 existing data written by parquet-avro and parquet-thrift in PARQUET-113. We 
 need to implement those rules in the Hive Converter classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8909) Hive doesn't correctly read Parquet nested types

2014-11-19 Thread Ryan Blue (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated HIVE-8909:

Affects Version/s: 0.13.1
   Status: Patch Available  (was: Open)

 Hive doesn't correctly read Parquet nested types
 

 Key: HIVE-8909
 URL: https://issues.apache.org/jira/browse/HIVE-8909
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Ryan Blue
Assignee: Ryan Blue
 Attachments: HIVE-8909-1.patch, HIVE-8909-2.patch


 Parquet's Avro and Thrift object models don't produce the same parquet type 
 representation for lists and maps that Hive does. In the Parquet community, 
 we've defined what should be written and backward-compatibility rules for 
 existing data written by parquet-avro and parquet-thrift in PARQUET-113. We 
 need to implement those rules in the Hive Converter classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 28145: HIVE-8883 - Investigate test failures on auto_join30.q [Spark Branch]

2014-11-19 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28145/
---

(Updated Nov. 19, 2014, 11:35 p.m.)


Review request for hive, Jimmy Xiang and Szehon Ho.


Bugs: HIVE-8883
https://issues.apache.org/jira/browse/HIVE-8883


Repository: hive-git


Description
---

This test fails with the following stack trace:
java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
  at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
  at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
  at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:56)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: 
spark.SparkReduceRecordHandler (SparkReduceRecordHandler.java:processRow(285)) 
- org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) {key:{reducesinkkey0:val_0},value:{_col0:0}}
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
  at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
  at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
  at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:56)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
exception: null
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
  ... 14 more
Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
  ... 17 more
auto_join27.q and auto_join31.q seem to fail with the same error.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 2895d80 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
141ae6f 

Diff: https://reviews.apache.org/r/28145/diff/


Testing
---

Tested with auto_join30.q, auto_join31.q, and auto_join27.q. They now generates 
correct results.


Thanks,

Chao Sun

[jira] [Updated] (HIVE-8883) Investigate test failures on auto_join30.q [Spark Branch]

2014-11-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-8883:
---
Attachment: HIVE-8883.3-spark.patch

This patch solves two issues when a MapJoinOperator is in a ReduceWork
1. Like in SparkMapRecordHandler, in SparkReduceRecordHandler, we also need to 
initialize all the dummy operators associated with the MJ operator, and close 
them at end;
2. in HashTableLoader, the currentInputPath wil be null, since it's only set in 
a MapWork. It looks hard to pass the path info between MapWork and ReduceWork. 
Currently, if this is the case, we just pass null to {{getBucketFileName}}, 
which wil treat it as a non-bucket join case. This should be fine since for 
bucket join the MJ operator will never be in a ReduceWork.

 Investigate test failures on auto_join30.q [Spark Branch]
 -

 Key: HIVE-8883
 URL: https://issues.apache.org/jira/browse/HIVE-8883
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8883.1-spark.patch, HIVE-8883.2-spark.patch, 
 HIVE-8883.3-spark.patch


 This test fails with the following stack trace:
 {noformat}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: 
 spark.SparkReduceRecordHandler 
 (SparkReduceRecordHandler.java:processRow(285)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) 
 {key:{reducesinkkey0:val_0},value:{_col0:0}}
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
 exception: null
   at

Re: Review Request 28145: HIVE-8883 - Investigate test failures on auto_join30.q [Spark Branch]

2014-11-19 Thread Jimmy Xiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28145/#review62285
---



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java
https://reviews.apache.org/r/28145/#comment104311

We don't need this any more?


- Jimmy Xiang


On Nov. 19, 2014, 11:35 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28145/
 ---
 
 (Updated Nov. 19, 2014, 11:35 p.m.)
 
 
 Review request for hive, Jimmy Xiang and Szehon Ho.
 
 
 Bugs: HIVE-8883
 https://issues.apache.org/jira/browse/HIVE-8883
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This test fails with the following stack trace:
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: 
 spark.SparkReduceRecordHandler 
 (SparkReduceRecordHandler.java:processRow(285)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) 
 {key:{reducesinkkey0:val_0},value:{_col0:0}}
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
 exception: null
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
   ... 14 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
   ... 17 more

[jira] [Updated] (HIVE-8893) Implement whitelist for builtin UDFs to avoid untrused code execution in multiuser mode

2014-11-19 Thread Prasad Mujumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-8893:
--
Attachment: HIVE-8893.6.patch

Updated patch that addressed review feedback.

 Implement whitelist for builtin UDFs to avoid untrused code execution in 
 multiuser mode
 ---

 Key: HIVE-8893
 URL: https://issues.apache.org/jira/browse/HIVE-8893
 Project: Hive
  Issue Type: Bug
  Components: Authorization, HiveServer2, SQL
Affects Versions: 0.14.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.15.0

 Attachments: HIVE-8893.3.patch, HIVE-8893.4.patch, HIVE-8893.5.patch, 
 HIVE-8893.6.patch


 The udfs like reflect() or java_method() enables executing a java method as 
 udf. While this offers lot of flexibility in the standalone mode, it can 
 become a security loophole in a secure multiuser environment. For example, in 
  HiveServer2 one can execute any available java code with user hive's 
 credentials.
 We need a whitelist and blacklist to restrict builtin udfs in Hiveserver2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8850) ObjectStore:: rollbackTransaction() and getHelper class needs to be looked into further.


 [ 
https://issues.apache.org/jira/browse/HIVE-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-8850:

Summary: ObjectStore:: rollbackTransaction() and getHelper class needs to 
be looked into further.  (was: ObjectStore:: rollbackTransaction() should set 
the transaction status to TXN_STATUS.ROLLBACK irrespective of whether it is 
active or not)

 ObjectStore:: rollbackTransaction() and getHelper class needs to be looked 
 into further.
 

 Key: HIVE-8850
 URL: https://issues.apache.org/jira/browse/HIVE-8850
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-8850.1.patch


 We can run into issues as described below:
 Hive script adds 2800 partitions to a table and during this it can get a 
 SQLState 08S01 [Communication Link Error] and bonecp kill all the connections 
 in the pool. The partitions are added and a create table statement executes 
 (Metering_IngestedData_Compressed). The map job finishes successfully and 
 while moving the table to the hive warehouse the ObjectStore.java 
 commitTransaction() raises the error: commitTransaction was called but 
 openTransactionCalls = 0. This probably indicates that there are unbalanced 
 calls to openTransaction/commitTransaction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8919) Fix FileUtils.copy() method to call distcp only for HDFS files (not local files)

Sergio Peña created HIVE-8919:
-

 Summary: Fix FileUtils.copy() method to call distcp only for HDFS 
files (not local files)
 Key: HIVE-8919
 URL: https://issues.apache.org/jira/browse/HIVE-8919
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña


When loading a big file ( 32Mb) from the local filesystem to the HDFS 
filesystem, Hive fails because the local filesystem cannot load the 'distcp' 
class.

The 'distcp' class is used only by HDFS filesystem.

We should use distcp only when copying files between the HDFS filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HIVE-8919) Fix FileUtils.copy() method to call distcp only for HDFS files (not local files)


 [ 
https://issues.apache.org/jira/browse/HIVE-8919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-8919 started by Sergio Peña.
-
 Fix FileUtils.copy() method to call distcp only for HDFS files (not local 
 files)
 

 Key: HIVE-8919
 URL: https://issues.apache.org/jira/browse/HIVE-8919
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña

 When loading a big file ( 32Mb) from the local filesystem to the HDFS 
 filesystem, Hive fails because the local filesystem cannot load the 'distcp' 
 class.
 The 'distcp' class is used only by HDFS filesystem.
 We should use distcp only when copying files between the HDFS filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 28145: HIVE-8883 - Investigate test failures on auto_join30.q [Spark Branch]

2014-11-19 Thread Chao Sun



 On Nov. 19, 2014, 11:50 p.m., Jimmy Xiang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java, line 
  74
  https://reviews.apache.org/r/28145/diff/3/?file=770558#file770558line74
 
  We don't need this any more?

I was thinking about cleaning it and then restoring the code in the non-staged 
map join JIRA. But, after talking with Szehon, I decided to keep it anyway.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28145/#review62285
---


On Nov. 19, 2014, 11:35 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28145/
 ---
 
 (Updated Nov. 19, 2014, 11:35 p.m.)
 
 
 Review request for hive, Jimmy Xiang and Szehon Ho.
 
 
 Bugs: HIVE-8883
 https://issues.apache.org/jira/browse/HIVE-8883
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This test fails with the following stack trace:
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: 
 spark.SparkReduceRecordHandler 
 (SparkReduceRecordHandler.java:processRow(285)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) 
 {key:{reducesinkkey0:val_0},value:{_col0:0}}
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
 exception: null
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at

Re: Review Request 28145: HIVE-8883 - Investigate test failures on auto_join30.q [Spark Branch]

2014-11-19 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28145/
---

(Updated Nov. 19, 2014, 11:57 p.m.)


Review request for hive, Jimmy Xiang and Szehon Ho.


Bugs: HIVE-8883
https://issues.apache.org/jira/browse/HIVE-8883


Repository: hive-git


Description
---

This test fails with the following stack trace:
java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
  at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
  at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
  at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:56)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: 
spark.SparkReduceRecordHandler (SparkReduceRecordHandler.java:processRow(285)) 
- org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) {key:{reducesinkkey0:val_0},value:{_col0:0}}
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
  at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
  at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
  at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
  at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:56)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
exception: null
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
  at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
  ... 14 more
Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
  ... 17 more
auto_join27.q and auto_join31.q seem to fail with the same error.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 2895d80 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
141ae6f 

Diff: https://reviews.apache.org/r/28145/diff/


Testing
---

Tested with auto_join30.q, auto_join31.q, and auto_join27.q. They now generates 
correct results.


Thanks,

Chao Sun

[jira] [Updated] (HIVE-8883) Investigate test failures on auto_join30.q [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-8883:
---
Attachment: HIVE-8883.4-spark.patch

 Investigate test failures on auto_join30.q [Spark Branch]
 -

 Key: HIVE-8883
 URL: https://issues.apache.org/jira/browse/HIVE-8883
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8883.1-spark.patch, HIVE-8883.2-spark.patch, 
 HIVE-8883.3-spark.patch, HIVE-8883.4-spark.patch


 This test fails with the following stack trace:
 {noformat}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: 
 spark.SparkReduceRecordHandler 
 (SparkReduceRecordHandler.java:processRow(285)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) 
 {key:{reducesinkkey0:val_0},value:{_col0:0}}
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
 exception: null
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
   ... 14 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
   ... 17 more
 {noformat}
 {{auto_join27.q}} and {{auto_join31.q}} seem to fail with the same error.



--
This message was sent by

[jira] [Updated] (HIVE-8919) Fix FileUtils.copy() method to call distcp only for HDFS files (not local files)


 [ 
https://issues.apache.org/jira/browse/HIVE-8919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-8919:
--
Status: Patch Available  (was: In Progress)

 Fix FileUtils.copy() method to call distcp only for HDFS files (not local 
 files)
 

 Key: HIVE-8919
 URL: https://issues.apache.org/jira/browse/HIVE-8919
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: CDH-23392.1.patch


 When loading a big file ( 32Mb) from the local filesystem to the HDFS 
 filesystem, Hive fails because the local filesystem cannot load the 'distcp' 
 class.
 The 'distcp' class is used only by HDFS filesystem.
 We should use distcp only when copying files between the HDFS filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8919) Fix FileUtils.copy() method to call distcp only for HDFS files (not local files)


 [ 
https://issues.apache.org/jira/browse/HIVE-8919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-8919:
--
Attachment: CDH-23392.1.patch

 Fix FileUtils.copy() method to call distcp only for HDFS files (not local 
 files)
 

 Key: HIVE-8919
 URL: https://issues.apache.org/jira/browse/HIVE-8919
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: CDH-23392.1.patch


 When loading a big file ( 32Mb) from the local filesystem to the HDFS 
 filesystem, Hive fails because the local filesystem cannot load the 'distcp' 
 class.
 The 'distcp' class is used only by HDFS filesystem.
 We should use distcp only when copying files between the HDFS filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 28145: HIVE-8883 - Investigate test failures on auto_join30.q [Spark Branch]

2014-11-19 Thread Jimmy Xiang



 On Nov. 19, 2014, 11:50 p.m., Jimmy Xiang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java, line 
  74
  https://reviews.apache.org/r/28145/diff/3/?file=770558#file770558line74
 
  We don't need this any more?
 
 Chao Sun wrote:
 I was thinking about cleaning it and then restoring the code in the 
 non-staged map join JIRA. But, after talking with Szehon, I decided to keep 
 it anyway.

I see. Perhaps, you can move it around in the non-staged map join JIRA.


- Jimmy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28145/#review62285
---


On Nov. 19, 2014, 11:57 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28145/
 ---
 
 (Updated Nov. 19, 2014, 11:57 p.m.)
 
 
 Review request for hive, Jimmy Xiang and Szehon Ho.
 
 
 Bugs: HIVE-8883
 https://issues.apache.org/jira/browse/HIVE-8883
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This test fails with the following stack trace:
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: 
 spark.SparkReduceRecordHandler 
 (SparkReduceRecordHandler.java:processRow(285)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) 
 {key:{reducesinkkey0:val_0},value:{_col0:0}}
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
 exception: null
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at

[jira] [Commented] (HIVE-8266) create function using resource statement compilation should include resource URI entity

2014-11-19 Thread Prasad Mujumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218743#comment-14218743
 ] 

Prasad Mujumdar commented on HIVE-8266:
---

[~leftylev] That's correct, it's not changing any user experience. Doesn't need 
a doc change. Thanks!

 create function using resource statement compilation should include 
 resource URI entity
 -

 Key: HIVE-8266
 URL: https://issues.apache.org/jira/browse/HIVE-8266
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.13.1
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.15.0

 Attachments: HIVE-8266.2.patch, HIVE-8266.3.patch


 The compiler add function name and db name as write entities for create 
 function using resource statement. We should also include the resource URI 
 path in the write entity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HIVE-4009) CLI Tests fail randomly due to MapReduce LocalJobRunner race condition


 [ 
https://issues.apache.org/jira/browse/HIVE-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland reopened HIVE-4009:


 CLI Tests fail randomly due to MapReduce LocalJobRunner race condition
 --

 Key: HIVE-4009
 URL: https://issues.apache.org/jira/browse/HIVE-4009
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-4009-0.patch


 Hadoop has a race condition MAPREDUCE-5001 which causes tests to fail 
 randomly when using LocalJobRunner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-4009) CLI Tests fail randomly due to MapReduce LocalJobRunner race condition


[ 
https://issues.apache.org/jira/browse/HIVE-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218748#comment-14218748
 ] 

Brock Noland commented on HIVE-4009:


I've seen this again. Time to fix it.

 CLI Tests fail randomly due to MapReduce LocalJobRunner race condition
 --

 Key: HIVE-4009
 URL: https://issues.apache.org/jira/browse/HIVE-4009
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-4009-0.patch


 Hadoop has a race condition MAPREDUCE-5001 which causes tests to fail 
 randomly when using LocalJobRunner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-4009) CLI Tests fail randomly due to MapReduce LocalJobRunner race condition


 [ 
https://issues.apache.org/jira/browse/HIVE-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4009:
---
Attachment: HIVE-4009.patch

 CLI Tests fail randomly due to MapReduce LocalJobRunner race condition
 --

 Key: HIVE-4009
 URL: https://issues.apache.org/jira/browse/HIVE-4009
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-4009-0.patch, HIVE-4009.patch


 Hadoop has a race condition MAPREDUCE-5001 which causes tests to fail 
 randomly when using LocalJobRunner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-4009) CLI Tests fail randomly due to MapReduce LocalJobRunner race condition


[ 
https://issues.apache.org/jira/browse/HIVE-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218751#comment-14218751
 ] 

Brock Noland commented on HIVE-4009:


Too be clear, although MAPREDUCE-5001 improves the situation in that an 
exception is not throw, it's still possible for LJR to return null an fail. 
This happens on hosts which are very busy. Let's just not the racy status 
section of code when in local mode.

 CLI Tests fail randomly due to MapReduce LocalJobRunner race condition
 --

 Key: HIVE-4009
 URL: https://issues.apache.org/jira/browse/HIVE-4009
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-4009-0.patch, HIVE-4009.patch


 Hadoop has a race condition MAPREDUCE-5001 which causes tests to fail 
 randomly when using LocalJobRunner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 28145: HIVE-8883 - Investigate test failures on auto_join30.q [Spark Branch]

2014-11-19 Thread Szehon Ho


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28145/#review62295
---

Ship it!


Ship It!

- Szehon Ho


On Nov. 19, 2014, 11:57 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28145/
 ---
 
 (Updated Nov. 19, 2014, 11:57 p.m.)
 
 
 Review request for hive, Jimmy Xiang and Szehon Ho.
 
 
 Bugs: HIVE-8883
 https://issues.apache.org/jira/browse/HIVE-8883
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This test fails with the following stack trace:
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: 
 spark.SparkReduceRecordHandler 
 (SparkReduceRecordHandler.java:processRow(285)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) 
 {key:{reducesinkkey0:val_0},value:{_col0:0}}
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
 exception: null
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
   ... 14 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
   ... 17 more
 auto_join27.q and auto_join31.q seem to fail with the same error.
 
 
 Diffs
 -

[jira] [Commented] (HIVE-8893) Implement whitelist for builtin UDFs to avoid untrused code execution in multiuser mode


[ 
https://issues.apache.org/jira/browse/HIVE-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218759#comment-14218759
 ] 

Szehon Ho commented on HIVE-8893:
-

Thanks!  This looks great, will commit it once tests pass.

 Implement whitelist for builtin UDFs to avoid untrused code execution in 
 multiuser mode
 ---

 Key: HIVE-8893
 URL: https://issues.apache.org/jira/browse/HIVE-8893
 Project: Hive
  Issue Type: Bug
  Components: Authorization, HiveServer2, SQL
Affects Versions: 0.14.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.15.0

 Attachments: HIVE-8893.3.patch, HIVE-8893.4.patch, HIVE-8893.5.patch, 
 HIVE-8893.6.patch


 The udfs like reflect() or java_method() enables executing a java method as 
 udf. While this offers lot of flexibility in the standalone mode, it can 
 become a security loophole in a secure multiuser environment. For example, in 
  HiveServer2 one can execute any available java code with user hive's 
 credentials.
 We need a whitelist and blacklist to restrict builtin udfs in Hiveserver2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8883) Investigate test failures on auto_join30.q [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218763#comment-14218763
 ] 

Szehon Ho commented on HIVE-8883:
-

Thanks Chao, +1 on latest patch

 Investigate test failures on auto_join30.q [Spark Branch]
 -

 Key: HIVE-8883
 URL: https://issues.apache.org/jira/browse/HIVE-8883
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8883.1-spark.patch, HIVE-8883.2-spark.patch, 
 HIVE-8883.3-spark.patch, HIVE-8883.4-spark.patch


 This test fails with the following stack trace:
 {noformat}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 2014-11-14 17:05:09,206 ERROR [Executor task launch worker-4]: 
 spark.SparkReduceRecordHandler 
 (SparkReduceRecordHandler.java:processRow(285)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) 
 {key:{reducesinkkey0:val_0},value:{_col0:0}}
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:328)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:276)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:48)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:96)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:186)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
 exception: null
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:319)
   ... 14 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:257)
   ... 17 more
 {noformat}
 {{auto_join27.q}} and {{auto_join31.q}} seem to

[jira] [Commented] (HIVE-8889) JDBC Driver ResultSet.getXXXXXX(String columnLabel) methods Broken

2014-11-19 Thread G Lingle (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218769#comment-14218769
 ] 

G Lingle commented on HIVE-8889:


yep, don't use the as name
my code is like below. Try this:
{code}
  String sql = select * from src;
  ResultSet res = stmt.executeQuery(sql);
  while (res.next()) {
   System.out.println(key:  + res.getString(key));
   }
{code}

When it runs, an exception is thrown on the res.getString().

When I step into the code I see that normalizedColumnNames contains table 
name.column name.
In the example above you'd see src.key and src.value in the 
normalizedColumnNames list,
neither of those match the requested column name key, so the exception is 
thrown.

HTH and Thanks for the prompt response,
G

 JDBC Driver ResultSet.getXX(String columnLabel) methods Broken
 --

 Key: HIVE-8889
 URL: https://issues.apache.org/jira/browse/HIVE-8889
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: G Lingle
Assignee: Chaoyu Tang
Priority: Critical

 Using hive-jdbc-0.13.1-cdh5.2.0.jar.
 All of the get-by-column-label methods of HiveBaseResultSet are now broken.  
 They don't take just the column label as they should.  Instead you have to 
 pass in table name.column name.  This requirement doesn't conform to the 
 java ResultSet API which specifies:
 columnLabel - the label for the column specified with the SQL AS clause. If 
 the SQL AS clause was not specified, then the label is the name of the column
 Looking at the code, it seems that the problem is that findColumn() method is 
 looking in normalizedColumnNames instead of the columnNames.
 BTW, Another annoying issue with the code is that the SQLException thrown 
 gives no indication of what the problem is.  It should at least say that the 
 column name wasn't found in the description string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8918) Beeline terminal cannot be initialized due to jline2 change

2014-11-19 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218771#comment-14218771
 ] 

Ferdinand Xu commented on HIVE-8918:


Have you done the steps mentioned in HIVE-8609 which will be DOC before 
starting the beeline?

One thing needs to be DOC that users should backup and remove the 
jline-0.9.94.jar file under the path 
$HADOOP_HOME/share/hadoop/yarn/lib/jline-0.9.94.jar which is conflict with 
beeline's dependency before using beeline. Once YARN-2815 resolved, the 
jline-0.9.94.jar will be removed.

 Beeline terminal cannot be initialized due to jline2 change
 ---

 Key: HIVE-8918
 URL: https://issues.apache.org/jira/browse/HIVE-8918
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Sergio Peña

 I fetched the latest changes from trunk, and I got the following error when 
 attempting to execute beeline:
 {noformat}
 ERROR] Terminal initialization failed; falling back to unsupported
 java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but 
 interface was expected
   at jline.TerminalFactory.create(TerminalFactory.java:101)
   at jline.TerminalFactory.get(TerminalFactory.java:158)
   at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:73)
   at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117)
   at 
 org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469)
   at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 Exception in thread main java.lang.IncompatibleClassChangeError: Found 
 class jline.Terminal, but interface was expected
   at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:101)
   at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117)
   at 
 org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469)
   at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {noformat}
 I executed the following command:
 {noformat}
 hive --service beeline -u jdbc:hive2://localhost:1 -n sergio
 {noformat}
 The commit before the jline2 is working fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8889) JDBC Driver ResultSet.getXXXXXX(String columnLabel) methods Broken

2014-11-19 Thread G Lingle (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218780#comment-14218780
 ] 

G Lingle commented on HIVE-8889:


We'd been using this code for over a year and it was working fine before 
upgrading to cdh5 release.

 JDBC Driver ResultSet.getXX(String columnLabel) methods Broken
 --

 Key: HIVE-8889
 URL: https://issues.apache.org/jira/browse/HIVE-8889
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: G Lingle
Assignee: Chaoyu Tang
Priority: Critical

 Using hive-jdbc-0.13.1-cdh5.2.0.jar.
 All of the get-by-column-label methods of HiveBaseResultSet are now broken.  
 They don't take just the column label as they should.  Instead you have to 
 pass in table name.column name.  This requirement doesn't conform to the 
 java ResultSet API which specifies:
 columnLabel - the label for the column specified with the SQL AS clause. If 
 the SQL AS clause was not specified, then the label is the name of the column
 Looking at the code, it seems that the problem is that findColumn() method is 
 looking in normalizedColumnNames instead of the columnNames.
 BTW, Another annoying issue with the code is that the SQLException thrown 
 gives no indication of what the problem is.  It should at least say that the 
 column name wasn't found in the description string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8920) SplitSparkWorkResolver doesn't work with UnionWork

Chao created HIVE-8920:
--

 Summary: SplitSparkWorkResolver doesn't work with UnionWork
 Key: HIVE-8920
 URL: https://issues.apache.org/jira/browse/HIVE-8920
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao


The following query will not work:
{code}
from (select * from table0 union all select * from table1) s
insert overwrite table table3 select s.x, count(1) group by s.x
insert overwrite table table4 select s.y, count(1) group by s.y;
{code}

Currently, the plan for this query, before SplitSparkWorkResolver, looks like 
below:

{noformat}
   M1M2
 \  / \
  U3   R5
  |
  R4
{noformat}

In {{SplitSparkWorkResolver#splitBaseWork}}, it assumes that the {{childWork}} 
is a ReduceWork, but for this case, you can see that for M2 the childWork could 
be UnionWork U3. Thus, the code will fail.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8920) SplitSparkWorkResolver doesn't work with UnionWork [Spark Branch]