date:20150505


[ 
https://issues.apache.org/jira/browse/HIVE-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528428#comment-14528428
 ] 

Hive QA commented on HIVE-8915:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12729641/HIVE-8915.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 8895 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3733/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3733/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3733/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12729641 - PreCommit-HIVE-TRUNK-Build

 Log file explosion due to non-existence of COMPACTION_QUEUE table
 -

 Key: HIVE-8915
 URL: https://issues.apache.org/jira/browse/HIVE-8915
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0, 0.15.0, 0.14.1
Reporter: Sushanth Sowmyan
Assignee: Alan Gates
 Attachments: HIVE-8915.patch


 I hit an issue with a fresh set up of hive in a vm, where I did not have db 
 tables as specified by hive-txn-schema-0.14.0.mysql.sql created.
 On metastore startup, I got an endless loop of errors being populated to the 
 log file, which caused the log file to grow to 1.7GB in 5 minutes, with 950k 
 copies of the same error stack trace in it before I realized what was 
 happening and killed it. We should either have a delay of sorts to make sure 
 we don't endlessly respin on that error so quickly, or we should error out 
 and fail if we're not able to start.
 The stack trace in question is as follows:
 {noformat}
 2014-11-19 01:44:57,654 ERROR compactor.Cleaner
 (Cleaner.java:run(143)) - Caught an exception in the main loop of
 compactor cleaner, MetaException(message:Unable to connect to
 transaction database
 com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table
 'hive.COMPACTION_QUEUE' doesn't exist
 at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown Source)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
 at com.mysql.jdbc.Util.getInstance(Util.java:386)
 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)
 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990)
 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2619)
 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2569)
 at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1524)
 at com.jolbox.bonecp.StatementHandle.executeQuery(StatementHandle.java:464)
 at 
 org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:266)
 at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86)
 )
 at 
 org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:291)
 at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.


[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528348#comment-14528348
 ] 

Hive QA commented on HIVE-9736:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12730299/HIVE-9736.6.patch

{color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 8895 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc
org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3731/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3731/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3731/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 24 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12730299 - PreCommit-HIVE-TRUNK-Build

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)


 [ 
https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuben Kuhnert updated HIVE-10190:
--
Attachment: HIVE-10190.09.patch

 CBO: AST mode checks for TABLESAMPLE with 
 AST.toString().contains(TOK_TABLESPLITSAMPLE)
 -

 Key: HIVE-10190
 URL: https://issues.apache.org/jira/browse/HIVE-10190
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Reuben Kuhnert
Priority: Trivial
  Labels: perfomance
 Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, 
 HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, 
 HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, 
 HIVE-10190.07.patch, HIVE-10190.08.patch, HIVE-10190.09.patch


 {code}
 public static boolean validateASTForUnsupportedTokens(ASTNode ast) {
 String astTree = ast.toStringTree();
 // if any of following tokens are present in AST, bail out
 String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE };
 for (String token : tokens) {
   if (astTree.contains(token)) {
 return false;
   }
 }
 return true;
   }
 {code}
 This is an issue for a SQL query which is bigger in AST form than in text 
 (~700kb).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10552) hive 1.1.0 rename column fails: Invalid method name: 'alter_table_with_cascade'

2015-05-05 Thread David Watzke (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Watzke resolved HIVE-10552.
-
Resolution: Invalid

You're right, I haven't done that. Thanks for the tip. I don't have time right 
now to test it out but let's close this bug and I'll reopen it in case it 
doesn't help - but it will ;)

 hive 1.1.0 rename column fails: Invalid method name: 
 'alter_table_with_cascade'
 ---

 Key: HIVE-10552
 URL: https://issues.apache.org/jira/browse/HIVE-10552
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 1.1.0
 Environment: centos 6.6, cloudera 5.3.3
Reporter: David Watzke
Assignee: Chaoyu Tang
Priority: Blocker

 Hi,
 we're trying out hive 1.1.0 with cloudera 5.3.3 and since hive 1.0.0 there's 
 (what appears to be) a regression.
 This ALTER command that renames a table column used to work fine in older 
 versions but in hive 1.1.0 it does throws this error:
 hive CREATE TABLE test_change (a int, b int, c int);
 OK
 Time taken: 2.303 seconds
 hive ALTER TABLE test_change CHANGE a a1 INT;
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. Invalid method 
 name: 'alter_table_with_cascade'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10521) TxnHandler.timeOutTxns only times out some of the expired transactions


[ 
https://issues.apache.org/jira/browse/HIVE-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528353#comment-14528353
 ] 

Hive QA commented on HIVE-10521:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12729817/HIVE-10521.3.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3732/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3732/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3732/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-3732/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 3f72f81 HIVE-5545 : HCatRecord getInteger method returns String 
when used on Partition columns of type INT (Sushanth Sowmyan, reviewed by Jason 
Dere)
+ git clean -f -d
+ git checkout master
Already on 'master'
+ git reset --hard origin/master
HEAD is now at 3f72f81 HIVE-5545 : HCatRecord getInteger method returns String 
when used on Partition columns of type INT (Sushanth Sowmyan, reviewed by Jason 
Dere)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12729817 - PreCommit-HIVE-TRUNK-Build

 TxnHandler.timeOutTxns only times out some of the expired transactions
 --

 Key: HIVE-10521
 URL: https://issues.apache.org/jira/browse/HIVE-10521
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0, 1.0.0, 1.1.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-10521.2.patch, HIVE-10521.3.patch, HIVE-10521.patch


 {code}
   for (int i = 0; i  20  rs.next(); i++) deadTxns.add(rs.getLong(1));
   // We don't care whether all of the transactions get deleted or not,
   // if some didn't it most likely means someone else deleted them in the 
 interum
   if (deadTxns.size()  0) abortTxns(dbConn, deadTxns);
 {code}
 While it makes sense to limit the number of transactions aborted in one pass 
 (since this get's translated to an IN clause) we should still make sure all 
 are timed out.  Also, 20 seems pretty small as a batch size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10454) Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified.

2015-05-05 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528436#comment-14528436
 ] 

Aihua Xu commented on HIVE-10454:
-

It makes sense. Seems before I misunderstood Xuefu's point. I will resolve as 
won't fix then. 

 Query against partitioned table in strict mode failed with No partition 
 predicate found even if partition predicate is specified.
 ---

 Key: HIVE-10454
 URL: https://issues.apache.org/jira/browse/HIVE-10454
 Project: Hive
  Issue Type: Bug
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10454.2.patch, HIVE-10454.patch


 The following queries fail:
 {noformat}
 create table t1 (c1 int) PARTITIONED BY (c2 string);
 set hive.mapred.mode=strict;
 select * from t1 where t1.c2  to_date(date_add(from_unixtime( 
 unix_timestamp() ),1));
 {noformat}
 The query failed with No partition predicate found for alias t1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10526) CBO (Calcite Return Path): HiveCost epsilon comparison should take row count in to account

2015-05-05 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529502#comment-14529502
 ] 

Laljo John Pullokkaran commented on HIVE-10526:
---

uploaded modified patch last week.
For some reason QA run didn't kick in.

 CBO (Calcite Return Path): HiveCost epsilon comparison should take row count 
 in to account
 --

 Key: HIVE-10526
 URL: https://issues.apache.org/jira/browse/HIVE-10526
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 0.12.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 1.2.0

 Attachments: HIVE-10526.1.patch, HIVE-10526.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName


 [ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9392:
--
Attachment: (was: HIVE-9392.01.patch)

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
Priority: Critical
 Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName


 [ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9392:
--
Attachment: HIVE-9392.3.patch

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
Priority: Critical
 Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName


[ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529626#comment-14529626
 ] 

Pengcheng Xiong commented on HIVE-9392:
---

rename the patch to get QA run.

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
Priority: Critical
 Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others


[ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529676#comment-14529676
 ] 

Thejas M Nair commented on HIVE-7018:
-

I think the change here was in the right direction, however it breaks the 
preferred way to upgrade hive (using schematool). This is a release blocker for 
1.2.0. .
A patch to revert the changes here has been uploaded to  HIVE-10614 . I think 
we should go ahead with that, and reopen this jira after it is committed. Once 
the schematool/beeline breakage is fixed, this change can go back into hive. 



 Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
 not others
 -

 Key: HIVE-7018
 URL: https://issues.apache.org/jira/browse/HIVE-7018
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Yongzhi Chen
 Fix For: 1.2.0

 Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch


 It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
 column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9534) incorrect result set for query that projects a windowed aggregate

2015-05-05 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529716#comment-14529716
 ] 

Chaoyu Tang commented on HIVE-9534:
---

Oracle 11.2 treats avg(distinct tsint.csint) over () as analytic function 
instead of aggregation function, so the query return 4 rows of returns 2.5. 
Note, there is not order by clause or window clause inside the parenthesis of 
over. Could you try query like select avg(distinct tsint.csint) over (order 
by rnum rows between 1 preceding and 1 following) from tsint to see if it 
works in Oracle c12? It did not work in 11.2.

 incorrect result set for query that projects a windowed aggregate
 -

 Key: HIVE-9534
 URL: https://issues.apache.org/jira/browse/HIVE-9534
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: N Campbell
Assignee: Chaoyu Tang

 Result set returned by Hive has one row instead of 5
 {code}
 select avg(distinct tsint.csint) over () from tsint 
 create table  if not exists TSINT (RNUM int , CSINT smallint)
  ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
  STORED AS TEXTFILE;
 0|\N
 1|-1
 2|0
 3|1
 4|10
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly

2015-05-05 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529572#comment-14529572
 ] 

Sushanth Sowmyan commented on HIVE-10565:
-

Hi Matt, who would be the ideal person to review this patch?

 LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT 
 OUTER JOIN repeated key correctly
 

 Key: HIVE-10565
 URL: https://issues.apache.org/jira/browse/HIVE-10565
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Affects Versions: 1.2.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 1.2.0, 1.3.0

 Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, 
 HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, 
 HIVE-10565.06.patch


 Filtering can knock out some of the rows for a repeated key, but those 
 knocked out rows need to be included in the LEFT OUTER JOIN result and are 
 currently not when only some rows are filtered out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10614) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure


 [ 
https://issues.apache.org/jira/browse/HIVE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-10614:
-
Attachment: HIVE-10614.1.patch

This happens because schematool runs via beeline and when there is a ; in the 
command, beeline interprets it as the command terminator. Stored procedures use 
; as delimiter between statements, thus the entire stored procedure does not 
get send to mysql as a single command  and hence the above error. I am 
uploading a patch to back out the fix for HIVE-7018 for now. Once we have the 
fix for HIVE-7018 working with schematool, we can add them back. The task 
mentioned in the previous line can be done via a follow up jira. cc-ing 
[~sushanth], [~thejas] for reviewing the change.

Thanks
Hari

 schemaTool upgrade from 0.14.0 to 1.3.0 causes failure
 --

 Key: HIVE-10614
 URL: https://issues.apache.org/jira/browse/HIVE-10614
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Critical
 Attachments: HIVE-10614.1.patch


 ./schematool -dbType mysql -upgradeSchemaFrom 0.14.0 -verbose
 {code}
 ++--+
 | 
|
 ++--+
 |  HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from 
 Mysql for other DBs do not have it   |
 ++--+
 1 row selected (0.004 seconds)
 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS 
 RM_TLBS_LINKID
 No rows affected (0.005 seconds)
 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS 
 RM_PARTITIONS_LINKID
 No rows affected (0.006 seconds)
 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS RM_LINKID
 No rows affected (0.002 seconds)
 0: jdbc:mysql://node-1.example.com/hive CREATE PROCEDURE RM_TLBS_LINKID() 
 BEGIN IF EXISTS (SELECT * FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE 
 `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` = 'LINK_TARGET_ID') THEN ALTER TABLE 
 `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; 
 ALTER TABLE `TBLS` DROP COLUMN `LINK_TARGET_ID` ; END IF; END
 Error: You have an error in your SQL syntax; check the manual that 
 corresponds to your MySQL server version for the right syntax to use near '' 
 at line 1 (state=42000,code=1064)
 Closing: 0: jdbc:mysql://node-1.example.com/hive?createDatabaseIfNotExist=true
 org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
 state would be inconsistent !!
 org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
 state would be inconsistent !!
   at 
 org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:229)
   at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:468)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 Caused by: java.io.IOException: Schema script failed, errorcode 2
   at 
 org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:355)
   at 
 org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326)
   at 
 org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:224)
 {code}
 Looks like HIVE-7018 has introduced stored procedure as part of mysql upgrade 
 script and it is causing issues with schematool upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9743) Incorrect result set for vectorized left outer join


[ 
https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529775#comment-14529775
 ] 

Matt McCline commented on HIVE-9743:


[~vikram.dixit] Ok, SMB removed.  I think this one is good to go as soon as the 
Apache tests pass.

 Incorrect result set for vectorized left outer join
 ---

 Key: HIVE-9743
 URL: https://issues.apache.org/jira/browse/HIVE-9743
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.14.0
Reporter: N Campbell
Assignee: Matt McCline
 Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, 
 HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, 
 HIVE-9743.06.patch, HIVE-9743.08.patch, HIVE-9743.09.patch


 This query is supposed to return 3 rows and will when run without Tez but 
 returns 2 rows when run with Tez.
 select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left 
 outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2  15 )
 tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
 1 20  25  null
 2 null  50  null
 instead of
 tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
 0 10  15  null
 1 20  25  null
 2 null  50  null
 create table  if not exists TJOIN1 (RNUM int , C1 int, C2 int)
  STORED AS orc ;
 0|10|15
 1|20|25
 2|\N|50
 create table  if not exists TJOIN2 (RNUM int , C1 int, C2 char(2))
 ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
  STORED AS TEXTFILE ;
 0|10|BB
 1|15|DD
 2|\N|EE
 3|10|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch

2015-05-05 Thread Peter Slawski (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-10538:
-
Attachment: HIVE-10538.2.patch

I've attached the second revision of the patch which updates failed Spark 
qtests.

 Fix NPE in FileSinkOperator from hashcode mismatch
 --

 Key: HIVE-10538
 URL: https://issues.apache.org/jira/browse/HIVE-10538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.0.0, 1.2.0
Reporter: Peter Slawski
Assignee: Peter Slawski
Priority: Critical
 Fix For: 1.2.0, 1.3.0

 Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, 
 HIVE-10538.1.patch, HIVE-10538.2.patch


 A Null Pointer Exception occurs when in FileSinkOperator when using bucketed 
 tables and distribute by with multiFileSpray enabled. The following snippet 
 query reproduces this issue:
 {code}
 set hive.enforce.bucketing = true;
 set hive.exec.reducers.max = 20;
 create table bucket_a(key int, value_a string) clustered by (key) into 256 
 buckets;
 create table bucket_b(key int, value_b string) clustered by (key) into 256 
 buckets;
 create table bucket_ab(key int, value_a string, value_b string) clustered by 
 (key) into 256 buckets;
 -- Insert data into bucket_a and bucket_b
 insert overwrite table bucket_ab
 select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key 
 = b.key) distribute by key;
 {code}
 The following stack trace is logged.
 {code}
 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer 
 (ExecReducer.java:reduce(255)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}}
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
   ... 8 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9451) Add max size of column dictionaries to ORC metadata


[ 
https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529566#comment-14529566
 ] 

Sushanth Sowmyan commented on HIVE-9451:


Hi, given the previous +1 pending tests, and tests having run, do the tests 
look okay to commit?

 Add max size of column dictionaries to ORC metadata
 ---

 Key: HIVE-9451
 URL: https://issues.apache.org/jira/browse/HIVE-9451
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley
  Labels: ORC
 Fix For: 1.2.0

 Attachments: HIVE-9451.patch, HIVE-9451.patch


 To predict the amount of memory required to read an ORC file we need to know 
 the size of the dictionaries for the columns that we are reading. I propose 
 adding the number of bytes for each column's dictionary to the stripe's 
 column statistics. The file's column statistics would have the maximum 
 dictionary size for each column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9534) incorrect result set for query that projects a windowed aggregate

2015-05-05 Thread N Campbell (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529608#comment-14529608
 ] 

N Campbell commented on HIVE-9534:
--

re your comment about ORACLE

select avg(distinct tsint.csint) over () from tsint

null, -1, 0, 1, 10
ORACLE Oracle Database 12c Enterprise Edition ( 12.1.0.2.0)   returns 2.5,
2.5, 2.5, 2.5, 2.5





 incorrect result set for query that projects a windowed aggregate
 -

 Key: HIVE-9534
 URL: https://issues.apache.org/jira/browse/HIVE-9534
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: N Campbell
Assignee: Chaoyu Tang

 Result set returned by Hive has one row instead of 5
 {code}
 select avg(distinct tsint.csint) over () from tsint 
 create table  if not exists TSINT (RNUM int , CSINT smallint)
  ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
  STORED AS TEXTFILE;
 0|\N
 1|-1
 2|0
 3|1
 4|10
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10617) LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to partitioned locking


 [ 
https://issues.apache.org/jira/browse/HIVE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10617:

Summary: LLAP: fix allocator concurrency rarely causing spurious failure to 
allocate due to partitioned locking  (was: fix allocator concurrency rarely 
causing spurious failure to allocate due to partitioned locking)

 LLAP: fix allocator concurrency rarely causing spurious failure to allocate 
 due to partitioned locking
 

 Key: HIVE-10617
 URL: https://issues.apache.org/jira/browse/HIVE-10617
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin

 See HIVE-10482 and the comment in code.
 Simple case - thread can reserve memory from manager and bounce between 
 checking arena 1 and arena 2 for memory as other threads allocate and 
 deallocate from respective arenas in reverse order, making it look like 
 there's no memory. More importantly this can happen when buddy blocks are 
 split when lots of stuff is allocated.
 This can be solved either with some form of helping (esp. for split case) or 
 by making allocator an actor (or set of actors, one per 1-N arenas that 
 they would own), to satisfy alloc requests more deterministically (and also 
 get rid of most sync).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch


[ 
https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529690#comment-14529690
 ] 

Prasanth Jayachandran commented on HIVE-10538:
--

The result difference seems to be an expected change because of hashcode 
difference. [~petersla] Can you put an updated patch by running the tests again 
with -Dtest.output.overwrite=true option? This will overwrite the q.out files.

 Fix NPE in FileSinkOperator from hashcode mismatch
 --

 Key: HIVE-10538
 URL: https://issues.apache.org/jira/browse/HIVE-10538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.0.0, 1.2.0
Reporter: Peter Slawski
Assignee: Peter Slawski
Priority: Critical
 Fix For: 1.2.0, 1.3.0

 Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, 
 HIVE-10538.1.patch


 A Null Pointer Exception occurs when in FileSinkOperator when using bucketed 
 tables and distribute by with multiFileSpray enabled. The following snippet 
 query reproduces this issue:
 {code}
 set hive.enforce.bucketing = true;
 set hive.exec.reducers.max = 20;
 create table bucket_a(key int, value_a string) clustered by (key) into 256 
 buckets;
 create table bucket_b(key int, value_b string) clustered by (key) into 256 
 buckets;
 create table bucket_ab(key int, value_a string, value_b string) clustered by 
 (key) into 256 buckets;
 -- Insert data into bucket_a and bucket_b
 insert overwrite table bucket_ab
 select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key 
 = b.key) distribute by key;
 {code}
 The following stack trace is logged.
 {code}
 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer 
 (ExecReducer.java:reduce(255)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}}
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
   ... 8 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9743) Incorrect result set for vectorized left outer join


 [ 
https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-9743:
---
Attachment: HIVE-9743.09.patch

 Incorrect result set for vectorized left outer join
 ---

 Key: HIVE-9743
 URL: https://issues.apache.org/jira/browse/HIVE-9743
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.14.0
Reporter: N Campbell
Assignee: Matt McCline
 Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, 
 HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, 
 HIVE-9743.06.patch, HIVE-9743.08.patch, HIVE-9743.09.patch


 This query is supposed to return 3 rows and will when run without Tez but 
 returns 2 rows when run with Tez.
 select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left 
 outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2  15 )
 tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
 1 20  25  null
 2 null  50  null
 instead of
 tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
 0 10  15  null
 1 20  25  null
 2 null  50  null
 create table  if not exists TJOIN1 (RNUM int , C1 int, C2 int)
  STORED AS orc ;
 0|10|15
 1|20|25
 2|\N|50
 create table  if not exists TJOIN2 (RNUM int , C1 int, C2 char(2))
 ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
  STORED AS TEXTFILE ;
 0|10|BB
 1|15|DD
 2|\N|EE
 3|10|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10591) Support limited integer type promotion in ORC


[ 
https://issues.apache.org/jira/browse/HIVE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529491#comment-14529491
 ] 

Hive QA commented on HIVE-10591:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12730595/HIVE-10591.2.patch

{color:red}ERROR:{color} -1 due to 53 failed/errored test(s), 8901 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_tmp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_no_match
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_update_delete
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_acid
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_all_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_tmp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_where_no_match
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_where_non_partitioned
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_tmp_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_no_match
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_update_delete
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_all_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_tmp_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_where_no_match
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_where_non_partitioned
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc
org.apache.hadoop.hive.ql.TestTxnCommands2.testBucketizedInputFormat
org.apache.hadoop.hive.ql.TestTxnCommands2.testDeleteIn
org.apache.hadoop.hive.ql.TestTxnCommands2.testUpdateMixedCase
org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.majorCompactAfterAbort

[jira] [Commented] (HIVE-10506) CBO (Calcite Return Path): Disallow return path to be enable if CBO is off

2015-05-05 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529489#comment-14529489
 ] 

Laljo John Pullokkaran commented on HIVE-10506:
---

+1

 CBO (Calcite Return Path): Disallow return path to be enable if CBO is off
 --

 Key: HIVE-10506
 URL: https://issues.apache.org/jira/browse/HIVE-10506
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Fix For: 1.2.0

 Attachments: HIVE-10506.01.patch, HIVE-10506.patch


 If hive.cbo.enable=false and hive.cbo.returnpath=true then some optimizations 
 would kick in. It's quite possible that in customer environment, they might 
 end up in these scenarios; we should prevent it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HIVE-10564) webhcat should use webhcat-site.xml properties for controller job submission


 [ 
https://issues.apache.org/jira/browse/HIVE-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reopened HIVE-10564:
---

Unfortunately this has unexpected side effects.
Every time a job is submitted, various properties are passed in cmd line using 
-Dfoo=bar

This change causes AppConfig Configuration object to accumulate the union of 
all these properties so Job N+1 includes properties that belong previous jobs.

for example, if you run a job with -D, templeton.statusdir=TestSqoop_1 and 
then another job that does not specify statusdir, the 2nd job will write to 
TestSqoop_1

this will cause a major problem



 webhcat should use webhcat-site.xml properties for controller job submission
 

 Key: HIVE-10564
 URL: https://issues.apache.org/jira/browse/HIVE-10564
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-10564.1.patch


 webhcat should use webhcat-site.xml in configuration for the 
 TempletonController map-only job that it launches. This will allow users to 
 set any MR/hdfs properties that want to see used for the controller job.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName


[ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529623#comment-14529623
 ] 

Pengcheng Xiong commented on HIVE-9392:
---

[~mmokhtar], could you please take a look? Thanks.

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
Priority: Critical
 Attachments: HIVE-9392.01.patch, HIVE-9392.1.patch, HIVE-9392.2.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10617) LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to partitioned locking

[
https://issues.apache.org/jira/browse/HIVE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HIVE-10617:

Description:
See HIVE-10482 and the comment in code. Right now this is worked around by
retrying.
Simple case - thread can reserve memory from manager and bounce between
checking arena 1 and arena 2 for memory as other threads allocate and
deallocate from respective arenas in reverse order, making it look like there's
no memory. More importantly this can happen when buddy blocks are split when
lots of stuff is allocated.

This can be solved either with some form of helping (esp. for split case) or by
making allocator an actor (or set of actors, one per 1-N arenas that they
would own), to satisfy alloc requests more deterministically (and also get rid
of most sync).

was:
See HIVE-10482 and the comment in code.
Simple case - thread can reserve memory from manager and bounce between
checking arena 1 and arena 2 for memory as other threads allocate and
deallocate from respective arenas in reverse order, making it look like there's
no memory. More importantly this can happen when buddy blocks are split when
lots of stuff is allocated.

LLAP: fix allocator concurrency rarely causing spurious failure to allocate
due to partitioned locking

Key: HIVE-10617
URL: https://issues.apache.org/jira/browse/HIVE-10617
Project: Hive
Issue Type: Sub-task
Reporter: Sergey Shelukhin

See HIVE-10482 and the comment in code. Right now this is worked around by
retrying.
Simple case - thread can reserve memory from manager and bounce between
checking arena 1 and arena 2 for memory as other threads allocate and
deallocate from respective arenas in reverse order, making it look like
there's no memory. More importantly this can happen when buddy blocks are
split when lots of stuff is allocated.
This can be solved either with some form of helping (esp. for split case) or
by making allocator an actor (or set of actors, one per 1-N arenas that
they would own), to satisfy alloc requests more deterministically (and also
get rid of most sync).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-10617) LLAP: fix allocator concurrency rarely causing spurious failure to allocate due to partitioned locking


 [ 
https://issues.apache.org/jira/browse/HIVE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-10617:
---

Assignee: Sergey Shelukhin

 LLAP: fix allocator concurrency rarely causing spurious failure to allocate 
 due to partitioned locking
 

 Key: HIVE-10617
 URL: https://issues.apache.org/jira/browse/HIVE-10617
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 See HIVE-10482 and the comment in code. Right now this is worked around by 
 retrying.
 Simple case - thread can reserve memory from manager and bounce between 
 checking arena 1 and arena 2 for memory as other threads allocate and 
 deallocate from respective arenas in reverse order, making it look like 
 there's no memory. More importantly this can happen when buddy blocks are 
 split when lots of stuff is allocated.
 This can be solved either with some form of helping (esp. for split case) or 
 by making allocator an actor (or set of actors, one per 1-N arenas that 
 they would own), to satisfy alloc requests more deterministically (and also 
 get rid of most sync).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10595) Dropping a table can cause NPEs in the compactor


[ 
https://issues.apache.org/jira/browse/HIVE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529725#comment-14529725
 ] 

Eugene Koifman commented on HIVE-10595:
---

I'm not sure I understand how this works.
The Initiator (if the table/partition is no longer there) will not add anything 
to compaction queue.
So then there is nothing for Worker/Cleaner to do in this case.

How will data from TXNS, COMPLETED_TXN_COMPONENTS, TXN_COMPONENTS which relates 
to these table get cleaned up?

 Dropping a table can cause NPEs in the compactor
 

 Key: HIVE-10595
 URL: https://issues.apache.org/jira/browse/HIVE-10595
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0, 1.0.0, 1.1.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-10595.patch


 Reproduction:
 # start metastore with compactor off
 # insert enough entries in a table to trigger a compaction
 # drop the table
 # stop metastore
 # restart metastore with compactor on
 Result:  NPE in the compactor threads.  I suspect this would also happen if 
 the inserts and drops were done in between a run of the compactor, but I 
 haven't proven it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10482) LLAP: AsertionError cannot allocate when reading from orc


[ 
https://issues.apache.org/jira/browse/HIVE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529493#comment-14529493
 ] 

Sergey Shelukhin commented on HIVE-10482:
-

This happens when BuddyAllocator has one block of memory larger than target 
allocation. When memory is reserved and several threads go to allocate, they go 
from target size and then try to split larger sizes. If several threads try to 
split the block at the same time, one will split and re-add the remainder to 
lower level lists (e.g. 768k out of 1Mb block, after using 256k, will be added 
as one 512k block and one 256k block), but when the split is done, the others 
are waiting on the lock for the 1Mb-block list and will never again look at 
lower level lists.
There are several ways to fix this; adding some sort of helping to get 
threads to provide blocks to other threads after split is very complex (many 
special cases) and may have perf overhead in common case, plus in general case 
it may not solve similar issues e.g. with multiple arenas, where we examine 
full arena 1, then go to non-full arena 2, meanwhile someone allocates from 2 
and deallocates to 1, so we are screwed again; making allocator use 
actor-like model (removing all sync and having allocator thread that serves 
request queue); a retry loop that would retry as long as any changes have 
happened since last attempt.
Not sure yet if 2 or 3 are best.


 LLAP: AsertionError cannot allocate when reading from orc
 -

 Key: HIVE-10482
 URL: https://issues.apache.org/jira/browse/HIVE-10482
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Sergey Shelukhin
 Fix For: llap


 This was from a run of tpch query 1. [~sershe] - not sure if you've already 
 seen this. Creating a jira so that it doesn't get lost.
 {code}
 2015-04-24 13:11:54,180 
 [TezTaskRunner_attempt_1429683757595_0326_4_00_000199_0(container_1_0326_01_003216_sseth_20150424131137_8ec6200c-77c8-43ea-a6a3-a0ab1da6e1ac:4_Map
  1_199_0)] ERROR org.apache.hadoop.hive.ql.exec.tez.TezProcessor: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
 java.io.IOException: java.lang.AssertionError: Cannot allocate
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:314)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:329)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
 at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: java.io.IOException: 
 java.lang.AssertionError: Cannot allocate
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
 at 
 org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137)
 at

[jira] [Updated] (HIVE-10526) CBO (Calcite Return Path): HiveCost epsilon comparison should take row count in to account


 [ 
https://issues.apache.org/jira/browse/HIVE-10526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10526:

Attachment: HIVE-10526.2.patch

Reuploading .1.patch as .2.patch

 CBO (Calcite Return Path): HiveCost epsilon comparison should take row count 
 in to account
 --

 Key: HIVE-10526
 URL: https://issues.apache.org/jira/browse/HIVE-10526
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 0.12.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 1.2.0

 Attachments: HIVE-10526.1.patch, HIVE-10526.2.patch, HIVE-10526.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable


 [ 
https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6679:
---
Affects Version/s: 1.2.0

 HiveServer2 should support configurable the server side socket timeout and 
 keepalive for various transports types where applicable
 --

 Key: HIVE-6679
 URL: https://issues.apache.org/jira/browse/HIVE-6679
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0
Reporter: Prasad Mujumdar
Assignee: Navis
  Labels: TODOC1.0, TODOC15
 Fix For: 1.2.0

 Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, 
 HIVE-6679.3.patch, HIVE-6679.4.patch, HIVE-6679.5.patch, HIVE-6679.6.patch


  HiveServer2 should support configurable the server side socket read timeout 
 and TCP keep-alive option. Metastore server already support this (and the so 
 is the old hive server). 
 We now have multiple client connectivity options like Kerberos, Delegation 
 Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The 
 configuration should be applicable to all types (if possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable


 [ 
https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6679:
---
Affects Version/s: 1.1.0
   1.0.0

 HiveServer2 should support configurable the server side socket timeout and 
 keepalive for various transports types where applicable
 --

 Key: HIVE-6679
 URL: https://issues.apache.org/jira/browse/HIVE-6679
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0
Reporter: Prasad Mujumdar
Assignee: Navis
  Labels: TODOC1.0, TODOC15
 Fix For: 1.2.0

 Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, 
 HIVE-6679.3.patch, HIVE-6679.4.patch, HIVE-6679.5.patch, HIVE-6679.6.patch


  HiveServer2 should support configurable the server side socket read timeout 
 and TCP keep-alive option. Metastore server already support this (and the so 
 is the old hive server). 
 We now have multiple client connectivity options like Kerberos, Delegation 
 Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The 
 configuration should be applicable to all types (if possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable


 [ 
https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6679:
---
Fix Version/s: (was: 1.1.0)
   1.2.0

 HiveServer2 should support configurable the server side socket timeout and 
 keepalive for various transports types where applicable
 --

 Key: HIVE-6679
 URL: https://issues.apache.org/jira/browse/HIVE-6679
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0
Reporter: Prasad Mujumdar
Assignee: Navis
  Labels: TODOC1.0, TODOC15
 Fix For: 1.2.0

 Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, 
 HIVE-6679.3.patch, HIVE-6679.4.patch, HIVE-6679.5.patch, HIVE-6679.6.patch


  HiveServer2 should support configurable the server side socket read timeout 
 and TCP keep-alive option. Metastore server already support this (and the so 
 is the old hive server). 
 We now have multiple client connectivity options like Kerberos, Delegation 
 Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The 
 configuration should be applicable to all types (if possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10482) LLAP: AsertionError cannot allocate when reading from orc


 [ 
https://issues.apache.org/jira/browse/HIVE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-10482.
-
Resolution: Fixed

committed a workaround

 LLAP: AsertionError cannot allocate when reading from orc
 -

 Key: HIVE-10482
 URL: https://issues.apache.org/jira/browse/HIVE-10482
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Sergey Shelukhin
 Fix For: llap


 This was from a run of tpch query 1. [~sershe] - not sure if you've already 
 seen this. Creating a jira so that it doesn't get lost.
 {code}
 2015-04-24 13:11:54,180 
 [TezTaskRunner_attempt_1429683757595_0326_4_00_000199_0(container_1_0326_01_003216_sseth_20150424131137_8ec6200c-77c8-43ea-a6a3-a0ab1da6e1ac:4_Map
  1_199_0)] ERROR org.apache.hadoop.hive.ql.exec.tez.TezProcessor: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
 java.io.IOException: java.lang.AssertionError: Cannot allocate
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:314)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:329)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
 at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: java.io.IOException: 
 java.lang.AssertionError: Cannot allocate
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
 at 
 org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137)
 at 
 org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62)
 ... 16 more
 Caused by: java.io.IOException: java.lang.AssertionError: Cannot allocate
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:257)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:209)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:147)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:97)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
 ... 22 more
 Caused by: java.lang.AssertionError: Cannot allocate
 at 
 org.apache.hadoop.hive.ql.io.orc.InStream.readEncodedStream(InStream.java:761)
 at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:441)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:294)
 at

[jira] [Commented] (HIVE-10614) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure


[ 
https://issues.apache.org/jira/browse/HIVE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529684#comment-14529684
 ] 

Thejas M Nair commented on HIVE-10614:
--

+1 for current patch, it would work with 1.2 branch. We need another one for 
master (that also has similar change for hive-schema-1.3.0.mysql.sql)


 schemaTool upgrade from 0.14.0 to 1.3.0 causes failure
 --

 Key: HIVE-10614
 URL: https://issues.apache.org/jira/browse/HIVE-10614
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Critical
 Attachments: HIVE-10614.1.patch


 ./schematool -dbType mysql -upgradeSchemaFrom 0.14.0 -verbose
 {code}
 ++--+
 | 
|
 ++--+
 |  HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from 
 Mysql for other DBs do not have it   |
 ++--+
 1 row selected (0.004 seconds)
 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS 
 RM_TLBS_LINKID
 No rows affected (0.005 seconds)
 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS 
 RM_PARTITIONS_LINKID
 No rows affected (0.006 seconds)
 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS RM_LINKID
 No rows affected (0.002 seconds)
 0: jdbc:mysql://node-1.example.com/hive CREATE PROCEDURE RM_TLBS_LINKID() 
 BEGIN IF EXISTS (SELECT * FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE 
 `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` = 'LINK_TARGET_ID') THEN ALTER TABLE 
 `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; 
 ALTER TABLE `TBLS` DROP COLUMN `LINK_TARGET_ID` ; END IF; END
 Error: You have an error in your SQL syntax; check the manual that 
 corresponds to your MySQL server version for the right syntax to use near '' 
 at line 1 (state=42000,code=1064)
 Closing: 0: jdbc:mysql://node-1.example.com/hive?createDatabaseIfNotExist=true
 org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
 state would be inconsistent !!
 org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
 state would be inconsistent !!
   at 
 org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:229)
   at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:468)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 Caused by: java.io.IOException: Schema script failed, errorcode 2
   at 
 org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:355)
   at 
 org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326)
   at 
 org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:224)
 {code}
 Looks like HIVE-7018 has introduced stored procedure as part of mysql upgrade 
 script and it is causing issues with schematool upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName


 [ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9392:
--
Attachment: HIVE-9392.4.patch

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
Priority: Critical
 Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, 
 HIVE-9392.4.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10620) ZooKeeperHiveLock overrides equal() method but not hashcode()

2015-05-05 Thread Chaoyu Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-10620:
---
Description: ZooKeeperHiveLock overrides the public boolean equals(Object 
o) method but does not for public int hashCode(). It violates the Java contract 
and may cause unexpected results.  (was: ZooKeeperHiveLock overrides the public 
boolean equals(Object o) method but does not for public int hashCode(). It 
violates the Java contract that equal and may cause unexpected results.)

 ZooKeeperHiveLock overrides equal() method but not hashcode()
 -

 Key: HIVE-10620
 URL: https://issues.apache.org/jira/browse/HIVE-10620
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang

 ZooKeeperHiveLock overrides the public boolean equals(Object o) method but 
 does not for public int hashCode(). It violates the Java contract and may 
 cause unexpected results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10620) ZooKeeperHiveLock overrides equal() method but not hashcode()

2015-05-05 Thread Chaoyu Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-10620:
---
Attachment: HIVE-10620.patch

[~szehon] [~ashutoshc] could you review the code? Thanks

 ZooKeeperHiveLock overrides equal() method but not hashcode()
 -

 Key: HIVE-10620
 URL: https://issues.apache.org/jira/browse/HIVE-10620
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
 Attachments: HIVE-10620.patch


 ZooKeeperHiveLock overrides the public boolean equals(Object o) method but 
 does not for public int hashCode(). It violates the Java contract and may 
 cause unexpected results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)


 [ 
https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-8769:
-

Assignee: Pengcheng Xiong  (was: Prasanth Jayachandran)

 Physical optimizer : Incorrect CE results in a shuffle join instead of a Map 
 join (PK/FK pattern not detected)
 --

 Key: HIVE-8769
 URL: https://issues.apache.org/jira/browse/HIVE-8769
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
 Fix For: 1.2.0


 TPC-DS Q82 is running slower than hive 13 because the join type is not 
 correct.
 The estimate for item x inventory x date_dim is 227 Million rows while the 
 actual is  3K rows.
 Hive 13 finishes in  753  seconds.
 Hive 14 finishes in  1,267  seconds.
 Hive 14 + force map join finished in 431 seconds.
 Query
 {code}
 select  i_item_id
,i_item_desc
,i_current_price
  from item, inventory, date_dim, store_sales
  where i_current_price between 30 and 30+30
  and inv_item_sk = i_item_sk
  and d_date_sk=inv_date_sk
  and d_date between '2002-05-30' and '2002-07-30'
  and i_manufact_id in (437,129,727,663)
  and inv_quantity_on_hand between 100 and 500
  and ss_item_sk = i_item_sk
  group by i_item_id,i_item_desc,i_current_price
  order by i_item_id
  limit 100
 {code}
 Plan 
 {code}
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 7 - Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
 Reducer 4 - Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE)
 Reducer 5 - Reducer 4 (SIMPLE_EDGE)
 Reducer 6 - Reducer 5 (SIMPLE_EDGE)
   DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
   alias: item
   filterExpr: ((i_current_price BETWEEN 30 AND 60 and 
 (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
 boolean)
   Statistics: Num rows: 462000 Data size: 663862160 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: ((i_current_price BETWEEN 30 AND 60 and 
 (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
 boolean)
 Statistics: Num rows: 115500 Data size: 34185680 Basic 
 stats: COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: i_item_sk (type: int), i_item_id (type: 
 string), i_item_desc (type: string), i_current_price (type: float)
   outputColumnNames: _col0, _col1, _col2, _col3
   Statistics: Num rows: 115500 Data size: 33724832 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Reduce Output Operator
 key expressions: _col0 (type: int)
 sort order: +
 Map-reduce partition columns: _col0 (type: int)
 Statistics: Num rows: 115500 Data size: 33724832 
 Basic stats: COMPLETE Column stats: COMPLETE
 value expressions: _col1 (type: string), _col2 (type: 
 string), _col3 (type: float)
 Execution mode: vectorized
 Map 2 
 Map Operator Tree:
 TableScan
   alias: date_dim
   filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
 and d_date_sk is not null) (type: boolean)
   Statistics: Num rows: 73049 Data size: 81741831 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
 and d_date_sk is not null) (type: boolean)
 Statistics: Num rows: 36524 Data size: 3579352 Basic 
 stats: COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: d_date_sk (type: int)
   outputColumnNames: _col0
   Statistics: Num rows: 36524 Data size: 146096 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Reduce Output Operator
 key expressions: _col0 (type: int)
 sort order: +
 Map-reduce partition columns: _col0 (type: int)
 Statistics: Num rows: 36524 Data size: 146096 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Select Operator
 expressions: _col0 (type: int)
 outputColumnNames: _col0
 Statistics: Num rows: 36524 Data size: 146096 Basic 
 stats:

[jira] [Commented] (HIVE-9451) Add max size of column dictionaries to ORC metadata


[ 
https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529570#comment-14529570
 ] 

Prasanth Jayachandran commented on HIVE-9451:
-

No.. The test failures looks related. [~owen.omalley] Can you take a look at 
the test failures? I am assuming all these are related to file size differences.

 Add max size of column dictionaries to ORC metadata
 ---

 Key: HIVE-9451
 URL: https://issues.apache.org/jira/browse/HIVE-9451
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley
  Labels: ORC
 Fix For: 1.2.0

 Attachments: HIVE-9451.patch, HIVE-9451.patch


 To predict the amount of memory required to read an ORC file we need to know 
 the size of the dictionaries for the columns that we are reading. I propose 
 adding the number of bytes for each column's dictionary to the stripe's 
 column statistics. The file's column statistics would have the maximum 
 dictionary size for each column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable


[ 
https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529631#comment-14529631
 ] 

Thejas M Nair commented on HIVE-6679:
-

+1 .
Just a minor comment. Can you also update the description of 
HIVE_SERVER2_TCP_SOCKET_BLOCKING_TIMEOUT to say that its applicable only in 
binary mode, and for http mode, the equivalent is 
hive.server2.thrift.http.max.idle.time?

 HiveServer2 should support configurable the server side socket timeout and 
 keepalive for various transports types where applicable
 --

 Key: HIVE-6679
 URL: https://issues.apache.org/jira/browse/HIVE-6679
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0
Reporter: Prasad Mujumdar
Assignee: Navis
  Labels: TODOC1.0, TODOC15
 Fix For: 1.2.0

 Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, 
 HIVE-6679.3.patch, HIVE-6679.4.patch, HIVE-6679.5.patch, HIVE-6679.6.patch


  HiveServer2 should support configurable the server side socket read timeout 
 and TCP keep-alive option. Metastore server already support this (and the so 
 is the old hive server). 
 We now have multiple client connectivity options like Kerberos, Delegation 
 Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The 
 configuration should be applicable to all types (if possible).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10616) TypeInfoUtils doesn't handle DECIMAL with just precision specified

2015-05-05 Thread Thomas Friedrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Friedrich updated HIVE-10616:

Attachment: HIVE-10616.1.patch

 TypeInfoUtils doesn't handle DECIMAL with just precision specified
 --

 Key: HIVE-10616
 URL: https://issues.apache.org/jira/browse/HIVE-10616
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 1.0.0
Reporter: Thomas Friedrich
Assignee: Thomas Friedrich
Priority: Minor
 Attachments: HIVE-10616.1.patch


 The parseType method in TypeInfoUtils doesn't handle decimal types with just 
 precision specified although that's a valid type definition. 
 As a result, TypeInfoUtils.getTypeInfoFromTypeString will always return 
 decimal(10,0) for any decimal(precision) string. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10614) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure

2015-05-05 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529657#comment-14529657
 ] 

Hive QA commented on HIVE-10614:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12730668/HIVE-10614.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-METASTORE-Test/43/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-METASTORE-Test/43/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-METASTORE-Test-43/

This message is automatically generated.

ATTACHMENT ID: 12730668 - PreCommit-HIVE-METASTORE-Test

 schemaTool upgrade from 0.14.0 to 1.3.0 causes failure
 --

 Key: HIVE-10614
 URL: https://issues.apache.org/jira/browse/HIVE-10614
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Critical
 Attachments: HIVE-10614.1.patch


 ./schematool -dbType mysql -upgradeSchemaFrom 0.14.0 -verbose
 {code}
 ++--+
 | 
|
 ++--+
 |  HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from 
 Mysql for other DBs do not have it   |
 ++--+
 1 row selected (0.004 seconds)
 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS 
 RM_TLBS_LINKID
 No rows affected (0.005 seconds)
 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS 
 RM_PARTITIONS_LINKID
 No rows affected (0.006 seconds)
 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS RM_LINKID
 No rows affected (0.002 seconds)
 0: jdbc:mysql://node-1.example.com/hive CREATE PROCEDURE RM_TLBS_LINKID() 
 BEGIN IF EXISTS (SELECT * FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE 
 `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` = 'LINK_TARGET_ID') THEN ALTER TABLE 
 `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; 
 ALTER TABLE `TBLS` DROP COLUMN `LINK_TARGET_ID` ; END IF; END
 Error: You have an error in your SQL syntax; check the manual that 
 corresponds to your MySQL server version for the right syntax to use near '' 
 at line 1 (state=42000,code=1064)
 Closing: 0: jdbc:mysql://node-1.example.com/hive?createDatabaseIfNotExist=true
 org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
 state would be inconsistent !!
 org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
 state would be inconsistent !!
   at 
 org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:229)
   at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:468)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 Caused by: java.io.IOException: Schema script failed, errorcode 2
   at 
 org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:355)
   at 
 org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326)
   at 
 org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:224)
 {code}
 Looks like HIVE-7018 has introduced stored procedure as part of mysql upgrade 
 script and it is causing issues with schematool upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10614) schemaTool upgrade from 0.14.0 to 1.3.0 causes failure


 [ 
https://issues.apache.org/jira/browse/HIVE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-10614:
-
Attachment: HIVE-10614.1.master.patch

[~thejas] Thanks for the review, added HIVE-10614.1.master.patch for the master 
branch


 schemaTool upgrade from 0.14.0 to 1.3.0 causes failure
 --

 Key: HIVE-10614
 URL: https://issues.apache.org/jira/browse/HIVE-10614
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Critical
 Attachments: HIVE-10614.1.master.patch, HIVE-10614.1.patch


 ./schematool -dbType mysql -upgradeSchemaFrom 0.14.0 -verbose
 {code}
 ++--+
 | 
|
 ++--+
 |  HIVE-7018 Remove Table and Partition tables column LINK_TARGET_ID from 
 Mysql for other DBs do not have it   |
 ++--+
 1 row selected (0.004 seconds)
 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS 
 RM_TLBS_LINKID
 No rows affected (0.005 seconds)
 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS 
 RM_PARTITIONS_LINKID
 No rows affected (0.006 seconds)
 0: jdbc:mysql://node-1.example.com/hive DROP PROCEDURE IF EXISTS RM_LINKID
 No rows affected (0.002 seconds)
 0: jdbc:mysql://node-1.example.com/hive CREATE PROCEDURE RM_TLBS_LINKID() 
 BEGIN IF EXISTS (SELECT * FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE 
 `TABLE_NAME` = 'TBLS' AND `COLUMN_NAME` = 'LINK_TARGET_ID') THEN ALTER TABLE 
 `TBLS` DROP FOREIGN KEY `TBLS_FK3` ; ALTER TABLE `TBLS` DROP KEY `TBLS_N51` ; 
 ALTER TABLE `TBLS` DROP COLUMN `LINK_TARGET_ID` ; END IF; END
 Error: You have an error in your SQL syntax; check the manual that 
 corresponds to your MySQL server version for the right syntax to use near '' 
 at line 1 (state=42000,code=1064)
 Closing: 0: jdbc:mysql://node-1.example.com/hive?createDatabaseIfNotExist=true
 org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
 state would be inconsistent !!
 org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
 state would be inconsistent !!
   at 
 org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:229)
   at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:468)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 Caused by: java.io.IOException: Schema script failed, errorcode 2
   at 
 org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:355)
   at 
 org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326)
   at 
 org.apache.hive.beeline.HiveSchemaTool.doUpgrade(HiveSchemaTool.java:224)
 {code}
 Looks like HIVE-7018 has introduced stored procedure as part of mysql upgrade 
 script and it is causing issues with schematool upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch

2015-05-05 Thread Peter Slawski (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529707#comment-14529707
 ] 

Peter Slawski commented on HIVE-10538:
--

Great, I've been working on just that. I'll be able to posted an updated patch 
tomorrow.

 Fix NPE in FileSinkOperator from hashcode mismatch
 --

 Key: HIVE-10538
 URL: https://issues.apache.org/jira/browse/HIVE-10538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.0.0, 1.2.0
Reporter: Peter Slawski
Assignee: Peter Slawski
Priority: Critical
 Fix For: 1.2.0, 1.3.0

 Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, 
 HIVE-10538.1.patch


 A Null Pointer Exception occurs when in FileSinkOperator when using bucketed 
 tables and distribute by with multiFileSpray enabled. The following snippet 
 query reproduces this issue:
 {code}
 set hive.enforce.bucketing = true;
 set hive.exec.reducers.max = 20;
 create table bucket_a(key int, value_a string) clustered by (key) into 256 
 buckets;
 create table bucket_b(key int, value_b string) clustered by (key) into 256 
 buckets;
 create table bucket_ab(key int, value_a string, value_b string) clustered by 
 (key) into 256 buckets;
 -- Insert data into bucket_a and bucket_b
 insert overwrite table bucket_ab
 select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key 
 = b.key) distribute by key;
 {code}
 The following stack trace is logged.
 {code}
 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer 
 (ExecReducer.java:reduce(255)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}}
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
   ... 8 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly


 [ 
https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-10565:

Attachment: HIVE-10565.07.patch

 LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT 
 OUTER JOIN repeated key correctly
 

 Key: HIVE-10565
 URL: https://issues.apache.org/jira/browse/HIVE-10565
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Affects Versions: 1.2.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 1.2.0, 1.3.0

 Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, 
 HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, 
 HIVE-10565.06.patch, HIVE-10565.07.patch


 Filtering can knock out some of the rows for a repeated key, but those 
 knocked out rows need to be included in the LEFT OUTER JOIN result and are 
 currently not when only some rows are filtered out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10539) set default value of hive.repl.task.factory


[ 
https://issues.apache.org/jira/browse/HIVE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529819#comment-14529819
 ] 

Hive QA commented on HIVE-10539:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12730349/HIVE-10539.3.patch

{color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 8900 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc
org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3745/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3745/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3745/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 24 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12730349 - PreCommit-HIVE-TRUNK-Build

 set default value of hive.repl.task.factory
 ---

 Key: HIVE-10539
 URL: https://issues.apache.org/jira/browse/HIVE-10539
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-10539.1.patch, HIVE-10539.2.patch, 
 HIVE-10539.3.patch


 hive.repl.task.factory does not have a default value set. It should be set to 
 org.apache.hive.hcatalog.api.repl.exim.EximReplicationTaskFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10618) Fix invocation of toString on byteArray in VerifyFast (250, 254)


 [ 
https://issues.apache.org/jira/browse/HIVE-10618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10618:
---
Attachment: rb33877.patch

patch #1

 Fix invocation of toString on byteArray in VerifyFast (250, 254)
 

 Key: HIVE-10618
 URL: https://issues.apache.org/jira/browse/HIVE-10618
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: rb33877.patch


 Arrays.toString(byteArray) can be used to convert byte[] to string



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9743) Incorrect result set for vectorized left outer join

2015-05-05 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529560#comment-14529560
 ] 

Vikram Dixit K commented on HIVE-9743:
--

That seems to be because with SMB there seems to be full delegation to the base 
class. I am not sure if we need the SMB changes at all.

 Incorrect result set for vectorized left outer join
 ---

 Key: HIVE-9743
 URL: https://issues.apache.org/jira/browse/HIVE-9743
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.14.0
Reporter: N Campbell
Assignee: Matt McCline
 Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, 
 HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, 
 HIVE-9743.06.patch, HIVE-9743.08.patch


 This query is supposed to return 3 rows and will when run without Tez but 
 returns 2 rows when run with Tez.
 select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left 
 outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2  15 )
 tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
 1 20  25  null
 2 null  50  null
 instead of
 tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
 0 10  15  null
 1 20  25  null
 2 null  50  null
 create table  if not exists TJOIN1 (RNUM int , C1 int, C2 int)
  STORED AS orc ;
 0|10|15
 1|20|25
 2|\N|50
 create table  if not exists TJOIN2 (RNUM int , C1 int, C2 char(2))
 ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
  STORED AS TEXTFILE ;
 0|10|BB
 1|15|DD
 2|\N|EE
 3|10|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9451) Add max size of column dictionaries to ORC metadata


[ 
https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529580#comment-14529580
 ] 

Sushanth Sowmyan commented on HIVE-9451:


Okay, thanks for the update. Will wait to hear more. :)

 Add max size of column dictionaries to ORC metadata
 ---

 Key: HIVE-9451
 URL: https://issues.apache.org/jira/browse/HIVE-9451
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley
  Labels: ORC
 Fix For: 1.2.0

 Attachments: HIVE-9451.patch, HIVE-9451.patch


 To predict the amount of memory required to read an ORC file we need to know 
 the size of the dictionaries for the columns that we are reading. I propose 
 adding the number of bytes for each column's dictionary to the stripe's 
 column statistics. The file's column statistics would have the maximum 
 dictionary size for each column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10547) CBO (Calcite Return Path) : genFileSinkPlan uses wrong partition col to create FS


 [ 
https://issues.apache.org/jira/browse/HIVE-10547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong resolved HIVE-10547.

Resolution: Fixed

 CBO (Calcite Return Path) : genFileSinkPlan uses wrong partition col to 
 create FS
 -

 Key: HIVE-10547
 URL: https://issues.apache.org/jira/browse/HIVE-10547
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Fix For: 1.2.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName


 [ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9392:
--
Attachment: HIVE-9392.01.patch

After discussing with [~jpullokkaran], we assume that this patch will solve the 
problem. And we already tried TPCDS 70,89 to confirm

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
Priority: Critical
 Attachments: HIVE-9392.01.patch, HIVE-9392.1.patch, HIVE-9392.2.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch

2015-05-05 Thread Peter Slawski (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529673#comment-14529673
 ] 

Peter Slawski commented on HIVE-10538:
--

The Spark driver failures are caused by this change. This would be expected if 
a row's hashcode affected its ordering in Spark. This patch makes it so that 
HiveKey's hashcode outputted from ReduceSinkOperator is no longer always 
multiplied by 31 (as explained previously).

Also, for at least those failed qtests, the row ordering/output in the expected 
output differs across MapRed, Tez, and Spark. So, execution engine affects 
ordering.

From 
[spark/groupby_complex_types_multi_single_reducer.q.out#L221|https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/spark/groupby_complex_types_multi_single_reducer.q.out#L221]
{code}
POSTHOOK: query: SELECT DEST2.* FROM DEST2
POSTHOOK: type: QUERY
POSTHOOK: Input: default@dest2
 A masked pattern was here 
{120:val_120}   2
{129:val_129}   2
{160:val_160}   1
{26:val_26} 2
{27:val_27} 1
{288:val_288}   2
{298:val_298}   3
{30:val_30} 1
{311:val_311}   3
{74:val_74} 1
{code}
From 
[groupby_complex_types_multi_single_reducer.q.out#L240|https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/groupby_complex_types_multi_single_reducer.q.out#L240]
{code}
POSTHOOK: query: SELECT DEST2.* FROM DEST2
POSTHOOK: type: QUERY
POSTHOOK: Input: default@dest2
 A masked pattern was here 
{0:val_0}   3
{10:val_10} 1
{100:val_100}   2
{103:val_103}   2
{104:val_104}   2
{105:val_105}   1
{11:val_11} 1
{111:val_111}   1
{113:val_113}   2
{114:val_114}   1
{code}

 Fix NPE in FileSinkOperator from hashcode mismatch
 --

 Key: HIVE-10538
 URL: https://issues.apache.org/jira/browse/HIVE-10538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.0.0, 1.2.0
Reporter: Peter Slawski
Assignee: Peter Slawski
Priority: Critical
 Fix For: 1.2.0, 1.3.0

 Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, 
 HIVE-10538.1.patch


 A Null Pointer Exception occurs when in FileSinkOperator when using bucketed 
 tables and distribute by with multiFileSpray enabled. The following snippet 
 query reproduces this issue:
 {code}
 set hive.enforce.bucketing = true;
 set hive.exec.reducers.max = 20;
 create table bucket_a(key int, value_a string) clustered by (key) into 256 
 buckets;
 create table bucket_b(key int, value_b string) clustered by (key) into 256 
 buckets;
 create table bucket_ab(key int, value_a string, value_b string) clustered by 
 (key) into 256 buckets;
 -- Insert data into bucket_a and bucket_b
 insert overwrite table bucket_ab
 select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key 
 = b.key) distribute by key;
 {code}
 The following stack trace is logged.
 {code}
 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer 
 (ExecReducer.java:reduce(255)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}}
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
   ... 8 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10619) Fix ConcurrentHashMap.get in MetadataListStructObjectInspector.getInstance (52)


 [ 
https://issues.apache.org/jira/browse/HIVE-10619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10619:
---
Attachment: rb33878.patch

patch #1

 Fix ConcurrentHashMap.get in MetadataListStructObjectInspector.getInstance 
 (52)
 ---

 Key: HIVE-10619
 URL: https://issues.apache.org/jira/browse/HIVE-10619
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: rb33878.patch


 cached.get(columnNames) should be replaced with cached.get(key) in the code 
 block below
 {code}
   cached = new ConcurrentHashMapListListString, 
 MetadataListStructObjectInspector();
   public static MetadataListStructObjectInspector getInstance(
   ListString columnNames) {
 ArrayListListString key = new ArrayListListString(1);
 key.add(columnNames);
 MetadataListStructObjectInspector result = cached.get(columnNames);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10526) CBO (Calcite Return Path): HiveCost epsilon comparison should take row count in to account


[ 
https://issues.apache.org/jira/browse/HIVE-10526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529584#comment-14529584
 ] 

Sushanth Sowmyan commented on HIVE-10526:
-

I don't see this picked up in the test commit queue, and it's possible it'll 
fail out saying it's already processed this file, so I'm going to re-upload 
.1.patch as .2.patch and manually submit this into the queue.

 CBO (Calcite Return Path): HiveCost epsilon comparison should take row count 
 in to account
 --

 Key: HIVE-10526
 URL: https://issues.apache.org/jira/browse/HIVE-10526
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 0.12.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 1.2.0

 Attachments: HIVE-10526.1.patch, HIVE-10526.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10617) LLAP: allocator occasionally has a spurious failure to allocate due to partitioned locking and has to retry


 [ 
https://issues.apache.org/jira/browse/HIVE-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10617:

Summary: LLAP: allocator occasionally has a spurious failure to allocate 
due to partitioned locking and has to retry  (was: LLAP: fix allocator 
concurrency rarely causing spurious failure to allocate due to partitioned 
locking)

 LLAP: allocator occasionally has a spurious failure to allocate due to 
 partitioned locking and has to retry
 -

 Key: HIVE-10617
 URL: https://issues.apache.org/jira/browse/HIVE-10617
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 See HIVE-10482 and the comment in code. Right now this is worked around by 
 retrying.
 Simple case - thread can reserve memory from manager and bounce between 
 checking arena 1 and arena 2 for memory as other threads allocate and 
 deallocate from respective arenas in reverse order, making it look like 
 there's no memory. More importantly this can happen when buddy blocks are 
 split when lots of stuff is allocated.
 This can be solved either with some form of helping (esp. for split case) or 
 by making allocator an actor (or set of actors, one per 1-N arenas that 
 they would own), to satisfy alloc requests more deterministically (and also 
 get rid of most sync).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10609) Vectorization : Q64 fails with ClassCastException


[ 
https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529972#comment-14529972
 ] 

Matt McCline commented on HIVE-10609:
-

This doesn't fail on my combined build of HIVE-9743 and HIVE-10565.  Will 
verify again when those JIRAs go in.

 Vectorization : Q64 fails with ClassCastException
 -

 Key: HIVE-10609
 URL: https://issues.apache.org/jira/browse/HIVE-10609
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline
 Fix For: 1.2.0


 TPC-DS Q64 fails with ClassCastException.
 Query
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
 hd1.hd_demo_sk
 JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
 hd2.hd_demo_sk
 JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
 ad1.ca_address_sk
 JOIN customer_address ad2 ON customer.c_current_addr_sk = 
 ad2.ca_address_sk
 JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
 JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
 JOIN item ON store_sales.ss_item_sk = item.i_item_sk
 JOIN
  (select cs_item_sk
 ,sum(cs_ext_list_price) as 
 sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
   from catalog_sales JOIN catalog_returns
   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
 and catalog_sales.cs_order_number = catalog_returns.cr_order_number
   group by cs_item_sk
   having 
 sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
  cs_ui
 ON store_sales.ss_item_sk = cs_ui.cs_item_sk
   WHERE  
  cd1.cd_marital_status  cd2.cd_marital_status and
  i_color in ('maroon','burnished','dim','steel','navajo','chocolate') 
 and
  i_current_price between 35 and 35 + 10 and
  i_current_price between 35 + 1 and 35 + 15
 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number
,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number
,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year 
 ,d3.d_year
 ) cs1
 JOIN
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON

[jira] [Commented] (HIVE-10482) LLAP: AsertionError cannot allocate when reading from orc


[ 
https://issues.apache.org/jira/browse/HIVE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529476#comment-14529476
 ] 

Sergey Shelukhin commented on HIVE-10482:
-

I found the issue, not clear how to fix this yet though

 LLAP: AsertionError cannot allocate when reading from orc
 -

 Key: HIVE-10482
 URL: https://issues.apache.org/jira/browse/HIVE-10482
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Sergey Shelukhin
 Fix For: llap


 This was from a run of tpch query 1. [~sershe] - not sure if you've already 
 seen this. Creating a jira so that it doesn't get lost.
 {code}
 2015-04-24 13:11:54,180 
 [TezTaskRunner_attempt_1429683757595_0326_4_00_000199_0(container_1_0326_01_003216_sseth_20150424131137_8ec6200c-77c8-43ea-a6a3-a0ab1da6e1ac:4_Map
  1_199_0)] ERROR org.apache.hadoop.hive.ql.exec.tez.TezProcessor: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
 java.io.IOException: java.lang.AssertionError: Cannot allocate
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:314)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:329)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
 at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: java.io.IOException: 
 java.lang.AssertionError: Cannot allocate
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
 at 
 org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137)
 at 
 org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62)
 ... 16 more
 Caused by: java.io.IOException: java.lang.AssertionError: Cannot allocate
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:257)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:209)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:147)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:97)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
 ... 22 more
 Caused by: java.lang.AssertionError: Cannot allocate
 at 
 org.apache.hadoop.hive.ql.io.orc.InStream.readEncodedStream(InStream.java:761)
 at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:441)
 at

[jira] [Commented] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly

2015-05-05 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529578#comment-14529578
 ] 

Vikram Dixit K commented on HIVE-10565:
---

I am reviewing this one.

 LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT 
 OUTER JOIN repeated key correctly
 

 Key: HIVE-10565
 URL: https://issues.apache.org/jira/browse/HIVE-10565
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Affects Versions: 1.2.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 1.2.0, 1.3.0

 Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, 
 HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, 
 HIVE-10565.06.patch


 Filtering can knock out some of the rows for a repeated key, but those 
 knocked out rows need to be included in the LEFT OUTER JOIN result and are 
 currently not when only some rows are filtered out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10604) update webhcat-default.xml with 1.2 version numbers


[ 
https://issues.apache.org/jira/browse/HIVE-10604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529633#comment-14529633
 ] 

Hive QA commented on HIVE-10604:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12730316/HIVE-10604.patch

{color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 8900 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc
org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3743/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3743/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3743/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 24 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12730316 - PreCommit-HIVE-TRUNK-Build

 update webhcat-default.xml with 1.2 version numbers
 ---

 Key: HIVE-10604
 URL: https://issues.apache.org/jira/browse/HIVE-10604
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Eugene Koifman
Assignee: Eugene Koifman
Priority: Minor
 Fix For: 1.2.0

 Attachments: HIVE-10604.patch


 no precommit tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10620) ZooKeeperHiveLock overrides equal() method but not hashcode()


[ 
https://issues.apache.org/jira/browse/HIVE-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529931#comment-14529931
 ] 

Ashutosh Chauhan commented on HIVE-10620:
-

+1

 ZooKeeperHiveLock overrides equal() method but not hashcode()
 -

 Key: HIVE-10620
 URL: https://issues.apache.org/jira/browse/HIVE-10620
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
 Attachments: HIVE-10620.patch


 ZooKeeperHiveLock overrides the public boolean equals(Object o) method but 
 does not for public int hashCode(). It violates the Java contract and may 
 cause unexpected results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive


[ 
https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529892#comment-14529892
 ] 

Eugene Koifman commented on HIVE-8065:
--

How come the move restriction is not an issue for something like  Insert 
Overwrite tableEZ1 select * from tableEZ2 inner join tableEZ3?

 Support HDFS encryption functionality on Hive
 -

 Key: HIVE-8065
 URL: https://issues.apache.org/jira/browse/HIVE-8065
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.1
Reporter: Sergio Peña
Assignee: Sergio Peña
  Labels: Hive-Scrum

 The new encryption support on HDFS makes Hive incompatible and unusable when 
 this feature is used.
 HDFS encryption is designed so that an user can configure different 
 encryption zones (or directories) for multi-tenant environments. An 
 encryption zone has an exclusive encryption key, such as AES-128 or AES-256. 
 Because of security compliance, the HDFS does not allow to move/rename files 
 between encryption zones. Renames are allowed only inside the same encryption 
 zone. A copy is allowed between encryption zones.
 See HDFS-6134 for more details about HDFS encryption design.
 Hive currently uses a scratch directory (like /tmp/$user/$random). This 
 scratch directory is used for the output of intermediate data (between MR 
 jobs) and for the final output of the hive query which is later moved to the 
 table directory location.
 If Hive tables are in different encryption zones than the scratch directory, 
 then Hive won't be able to renames those files/directories, and it will make 
 Hive unusable.
 To handle this problem, we can change the scratch directory of the 
 query/statement to be inside the same encryption zone of the table directory 
 location. This way, the renaming process will be successful. 
 Also, for statements that move files between encryption zones (i.e. LOAD 
 DATA), a copy may be executed instead of a rename. This will cause an 
 overhead when copying large data files, but it won't break the encryption on 
 Hive.
 Another security thing to consider is when using joins selects. If Hive joins 
 different tables with different encryption key strengths, then the results of 
 the select might break the security compliance of the tables. Let's say two 
 tables with 128 bits and 256 bits encryption are joined, then the temporary 
 results might be stored in the 128 bits encryption zone. This will conflict 
 with the table encrypted with 256 bits temporary.
 To fix this, Hive should be able to select the scratch directory that is more 
 secured/encrypted in order to save the intermediate data temporary with no 
 compliance issues.
 For instance:
 {noformat}
 SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id;
 {noformat}
 - This should use a scratch directory (or staging directory) inside the 
 table-aes256 table location.
 {noformat}
 INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1;
 {noformat}
 - This should use a scratch directory inside the table-aes1 location.
 {noformat}
 FROM table-unencrypted
 INSERT OVERWRITE TABLE table-aes128 SELECT id, name
 INSERT OVERWRITE TABLE table-aes256 SELECT id, name
 {noformat}
 - This should use a scratch directory on each of the tables locations.
 - The first SELECT will have its scratch directory on table-aes128 directory.
 - The second SELECT will have its scratch directory on table-aes256 directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10621) serde typeinfo equals methods are not symmetric


 [ 
https://issues.apache.org/jira/browse/HIVE-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10621:
---
Attachment: rb33880.patch

patch #1

 serde typeinfo equals methods are not symmetric
 ---

 Key: HIVE-10621
 URL: https://issues.apache.org/jira/browse/HIVE-10621
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: rb33880.patch


 correct equals method implementation should start with
 {code}
   if (this == other) {
 return true;
   }
   if (other == null || getClass() != other.getClass()) {
 return false;
   }
 {code}
 DecimalTypeInfo, PrimitiveTypeInfo, VarcharTypeInfo, CharTypeInfo, 
 HiveDecimalWritable equals method implementation starts with
 {code}
   if (other == null || !(other instanceof class_name)) {
 return false
   }
 {code}
 - first of all check for null is redundant
 - the second issue is that other instanceof class_name check is not 
 symmetric.
 contract of equals() implies that, a.equals(b) is true if and only if 
 b.equals(a) is true
 Current implementation violates this contract.
 e.g.
 DecimalTypeInfo instanceof PrimitiveTypeInfo is true
 but
 PrimitiveTypeInfo instanceof DecimalTypeInfo is false
 See more details here 
 http://stackoverflow.com/questions/6518534/equals-method-overrides-equals-in-superclass-and-may-not-be-symmetric



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10618) Fix invocation of toString on byteArray in VerifyFast (250, 254)


[ 
https://issues.apache.org/jira/browse/HIVE-10618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529958#comment-14529958
 ] 

Prasanth Jayachandran commented on HIVE-10618:
--

+1

 Fix invocation of toString on byteArray in VerifyFast (250, 254)
 

 Key: HIVE-10618
 URL: https://issues.apache.org/jira/browse/HIVE-10618
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: rb33877.patch


 Arrays.toString(byteArray) can be used to convert byte[] to string



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10592) ORC file dump in JSON format


[ 
https://issues.apache.org/jira/browse/HIVE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529907#comment-14529907
 ] 

Prasanth Jayachandran commented on HIVE-10592:
--

Added multifile support in the new patch. The output will now look like
{code}./bin/hive --orcfiledump --json --pretty 
file:///app/warehouse/alltypes_bloom/00_0 
file:///app/warehouse/alltypes_orc/00_0{code}
{code}
{orcFileDumps: [
  {
fileName: file:\/\/\/app\/warehouse\/alltypes_bloom\/00_0,
fileVersion: 0.12,
writerVersion: HIVE_8732,
numberOfRows: 3,
compression: ZLIB,
...
  },
 {
fileName: file:\/\/\/app\/warehouse\/alltypes_orc\/00_0,
fileVersion: 0.12,
writerVersion: HIVE_8732,
numberOfRows: 2,
compression: ZLIB,
...
  }
{code}

 ORC file dump in JSON format
 

 Key: HIVE-10592
 URL: https://issues.apache.org/jira/browse/HIVE-10592
 Project: Hive
  Issue Type: New Feature
Affects Versions: 1.3.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10592.1.patch, HIVE-10592.2.patch, 
 HIVE-10592.3.patch, HIVE-10592.4.patch


 ORC file dump uses custom format. Will be useful to dump ORC metadata in json 
 format so that other tools can be built on top it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10621) serde typeinfo equals methods are not symmetric

2015-05-05 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10621:
---
Description: 
correct equals method implementation should start with
{code}
  if (this == other) {
return true;
  }
  if (other == null || getClass() != other.getClass()) {
return false;
  }
{code}
DecimalTypeInfo, PrimitiveTypeInfo, VarcharTypeInfo, CharTypeInfo, 
HiveDecimalWritable equals method implementation starts with
{code}
  if (other == null || !(other instanceof class_name)) {
return false
  }
{code}
- first of all check for null is redundant
- the second issue is that other instanceof class_name check is not 
symmetric.

contract of equals() implies that, a.equals(b) is true if and only if 
b.equals(a) is true
Current implementation violates this contract.
e.g.
DecimalTypeInfo instanceof PrimitiveTypeInfo is true
but
PrimitiveTypeInfo instanceof DecimalTypeInfo is false

See more details here 
http://stackoverflow.com/questions/6518534/equals-method-overrides-equals-in-superclass-and-may-not-be-symmetric

 serde typeinfo equals methods are not symmetric
 ---

 Key: HIVE-10621
 URL: https://issues.apache.org/jira/browse/HIVE-10621
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor

 correct equals method implementation should start with
 {code}
   if (this == other) {
 return true;
   }
   if (other == null || getClass() != other.getClass()) {
 return false;
   }
 {code}
 DecimalTypeInfo, PrimitiveTypeInfo, VarcharTypeInfo, CharTypeInfo, 
 HiveDecimalWritable equals method implementation starts with
 {code}
   if (other == null || !(other instanceof class_name)) {
 return false
   }
 {code}
 - first of all check for null is redundant
 - the second issue is that other instanceof class_name check is not 
 symmetric.
 contract of equals() implies that, a.equals(b) is true if and only if 
 b.equals(a) is true
 Current implementation violates this contract.
 e.g.
 DecimalTypeInfo instanceof PrimitiveTypeInfo is true
 but
 PrimitiveTypeInfo instanceof DecimalTypeInfo is false
 See more details here 
 http://stackoverflow.com/questions/6518534/equals-method-overrides-equals-in-superclass-and-may-not-be-symmetric



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10563) MiniTezCliDriver tests ordering issues


 [ 
https://issues.apache.org/jira/browse/HIVE-10563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-10563:
-
Attachment: HIVE-10563.2.patch

 MiniTezCliDriver tests ordering issues
 --

 Key: HIVE-10563
 URL: https://issues.apache.org/jira/browse/HIVE-10563
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10563.1.patch, HIVE-10563.2.patch


 There are a bunch of tests related to TestMiniTezCliDriver which gives 
 ordering issues when run on Centos/Windows/OSX



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10607) Combination of ReducesinkDedup + TopN optimization yields incorrect result if there are multiple GBY in reducer


[ 
https://issues.apache.org/jira/browse/HIVE-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529903#comment-14529903
 ] 

Hive QA commented on HIVE-10607:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12730369/HIVE-10607.patch

{color:red}ERROR:{color} -1 due to 25 failed/errored test(s), 8900 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_load_data_to_encrypted_tables
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_select_read_only_encrypted_tbl
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_droppartition
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_sba_drop_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_alterpart_loc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3746/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3746/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3746/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 25 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12730369 - PreCommit-HIVE-TRUNK-Build

 Combination of ReducesinkDedup + TopN optimization yields incorrect result if 
 there are multiple GBY in reducer
 ---

 Key: HIVE-10607
 URL: https://issues.apache.org/jira/browse/HIVE-10607
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Tez
Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.1.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-10607.patch


 {code:sql}
 select ctinyint, count(cdouble) from (select ctinyint, cdouble from 
 alltypesorc group by ctinyint, cdouble) t1 group by ctinyint order by 
 ctinyint limit 20;
 {code}
 This gives different result set depending on which set of optimizations are 
 on. In particular in .q test environment following two invocations will give 
 you different result set:
 {code}
 *   mvn test -Phadoop-2 -Dtest.output.overwrite=true 
 -Dtest=TestMiniTezCliDriver -Dqfile=test.q 
 -Dhive.optimize.reducededuplication.min.reducer=1 
 -Dhive.limit.pushdown.memory.usage=0.3f
 *   mvn test -Phadoop-2

[jira] [Updated] (HIVE-10592) ORC file dump in JSON format


 [ 
https://issues.apache.org/jira/browse/HIVE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10592:
-
Attachment: HIVE-10592.4.patch

 ORC file dump in JSON format
 

 Key: HIVE-10592
 URL: https://issues.apache.org/jira/browse/HIVE-10592
 Project: Hive
  Issue Type: New Feature
Affects Versions: 1.3.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10592.1.patch, HIVE-10592.2.patch, 
 HIVE-10592.3.patch, HIVE-10592.4.patch


 ORC file dump uses custom format. Will be useful to dump ORC metadata in json 
 format so that other tools can be built on top it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8890) HiveServer2 dynamic service discovery: use persistent ephemeral nodes curator recipe


[ 
https://issues.apache.org/jira/browse/HIVE-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528575#comment-14528575
 ] 

Thejas M Nair commented on HIVE-8890:
-

+1
Sorry about the delay in reviewing updated patch!


 HiveServer2 dynamic service discovery: use persistent ephemeral nodes curator 
 recipe
 

 Key: HIVE-8890
 URL: https://issues.apache.org/jira/browse/HIVE-8890
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.14.0, 1.0.0, 1.1.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
Priority: Critical
 Fix For: 1.2.0

 Attachments: HIVE-8890.1.patch, HIVE-8890.2.patch, HIVE-8890.3.patch, 
 HIVE-8890.4.patch


 Using this recipe gives better reliability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)


 [ 
https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuben Kuhnert updated HIVE-10190:
--
Attachment: HIVE-10190.10.patch

 CBO: AST mode checks for TABLESAMPLE with 
 AST.toString().contains(TOK_TABLESPLITSAMPLE)
 -

 Key: HIVE-10190
 URL: https://issues.apache.org/jira/browse/HIVE-10190
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Reuben Kuhnert
Priority: Trivial
  Labels: perfomance
 Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, 
 HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, 
 HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, 
 HIVE-10190.07.patch, HIVE-10190.08.patch, HIVE-10190.09.patch, 
 HIVE-10190.10.patch


 {code}
 public static boolean validateASTForUnsupportedTokens(ASTNode ast) {
 String astTree = ast.toStringTree();
 // if any of following tokens are present in AST, bail out
 String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE };
 for (String token : tokens) {
   if (astTree.contains(token)) {
 return false;
   }
 }
 return true;
   }
 {code}
 This is an issue for a SQL query which is bigger in AST form than in text 
 (~700kb).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10597) Relative path doesn't work with CREATE TABLE LOCATION 'relative/path'


 [ 
https://issues.apache.org/jira/browse/HIVE-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuben Kuhnert updated HIVE-10597:
--
Attachment: HIVE-10597.02.patch

 Relative path doesn't work with CREATE TABLE LOCATION 'relative/path'
 -

 Key: HIVE-10597
 URL: https://issues.apache.org/jira/browse/HIVE-10597
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Reuben Kuhnert
Assignee: Reuben Kuhnert
Priority: Minor
 Attachments: HIVE-10597.01.patch, HIVE-10597.02.patch


 {code}
 0: jdbc:hive2://a2110.halxg.cloudera.com:1000 CREATE EXTERNAL TABLE IF NOT 
 EXISTS mydb.employees3 like mydb.employees LOCATION 'data/stock';
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
 MetaException(message:java.lang.NullPointerException) (state=08S01,code=1)
 0: jdbc:hive2://a2110.halxg.cloudera.com:1000 CREATE EXTERNAL TABLE IF NOT 
 EXISTS mydb.employees3 like mydb.employees LOCATION '/user/hive/data/stock';
 No rows affected (0.369 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)


 [ 
https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuben Kuhnert updated HIVE-10190:
--
Attachment: HIVE-10190.10.patch

 CBO: AST mode checks for TABLESAMPLE with 
 AST.toString().contains(TOK_TABLESPLITSAMPLE)
 -

 Key: HIVE-10190
 URL: https://issues.apache.org/jira/browse/HIVE-10190
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Reuben Kuhnert
Priority: Trivial
  Labels: perfomance
 Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, 
 HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, 
 HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, 
 HIVE-10190.07.patch, HIVE-10190.08.patch, HIVE-10190.09.patch, 
 HIVE-10190.10.patch


 {code}
 public static boolean validateASTForUnsupportedTokens(ASTNode ast) {
 String astTree = ast.toStringTree();
 // if any of following tokens are present in AST, bail out
 String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE };
 for (String token : tokens) {
   if (astTree.contains(token)) {
 return false;
   }
 }
 return true;
   }
 {code}
 This is an issue for a SQL query which is bigger in AST form than in text 
 (~700kb).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10594) Remote Spark client doesn't use Kerberos keytab to authenticate [Spark Branch]

2015-05-05 Thread Bruce Nelson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528640#comment-14528640
 ] 

Bruce Nelson commented on HIVE-10594:
-

I have confirmed the issue with a few more specifics : 
1. Confirmed using CDH 5.4.0 with Kerberos, OpenLDAP/SSSD and Sentry (no 
impersonation)
2. Problem is seem even if beeline is run on the HS2 server, 
3. Unless the hive/hs2 host princ@DOMAIN runs kinit, setting 
hive-execution.engine=spark will result in a failed SQL execution. Once the 
hive principal runs kinit, then the hive on spark query succeeds. 
4. The problem is specific to HS2 - it must be able to find the TGT cache for 
the hive principal in the default or KRB5CCNAME location or hive on spark will 
fail.

 Remote Spark client doesn't use Kerberos keytab to authenticate [Spark Branch]
 --

 Key: HIVE-10594
 URL: https://issues.apache.org/jira/browse/HIVE-10594
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 1.1.0
Reporter: Chao Sun

 Reporting problem found by one of the HoS users:
 Currently, if user is running Beeline on a different host than HS2, and 
 he/she didn't do kinit on the HS2 host, then he/she may get the following 
 error:
 {code}
 2015-04-29 15:49:34,614 INFO org.apache.hive.spark.client.SparkClientImpl: 
 15/04/29 15:49:34 WARN UserGroupInformation: PriviledgedActionException 
 as:hive (auth:KERBEROS) cause:java.io.IOException: 
 javax.security.sasl.SaslException: GSS initiate failed [Caused by 
 GSSException: No valid credentials provided (Mechanism level: Failed to find 
 any Kerberos tgt)]
 2015-04-29 15:49:34,652 INFO org.apache.hive.spark.client.SparkClientImpl: 
 Exception in thread main java.io.IOException: Failed on local exception: 
 java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed 
 [Caused by GSSException: No valid credentials provided (Mechanism level: 
 Failed to find any Kerberos tgt)]; Host Details : local host is: 
 secure-hos-1.ent.cloudera.com/10.20.77.79; destination host is: 
 secure-hos-1.ent.cloudera.com:8032;
 2015-04-29 15:49:34,653 INFO org.apache.hive.spark.client.SparkClientImpl:
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
 2015-04-29 15:49:34,653 INFO org.apache.hive.spark.client.SparkClientImpl:
   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
 2015-04-29 15:49:34,654 INFO org.apache.hive.spark.client.SparkClientImpl:
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
 2015-04-29 15:49:34,654 INFO org.apache.hive.spark.client.SparkClientImpl:
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
 2015-04-29 15:49:34,654 INFO org.apache.hive.spark.client.SparkClientImpl:
   at com.sun.proxy.$Proxy11.getClusterMetrics(Unknown Source)
 2015-04-29 15:49:34,655 INFO org.apache.hive.spark.client.SparkClientImpl:
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:202)
 2015-04-29 15:49:34,655 INFO org.apache.hive.spark.client.SparkClientImpl:
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 2015-04-29 15:49:34,655 INFO org.apache.hive.spark.client.SparkClientImpl:
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 2015-04-29 15:49:34,656 INFO org.apache.hive.spark.client.SparkClientImpl:
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 2015-04-29 15:49:34,656 INFO org.apache.hive.spark.client.SparkClientImpl:
   at java.lang.reflect.Method.invoke(Method.java:606)
 2015-04-29 15:49:34,656 INFO org.apache.hive.spark.client.SparkClientImpl:
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl:
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl:
   at com.sun.proxy.$Proxy12.getClusterMetrics(Unknown Source)
 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl:
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:461)
 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl:
   at 
 org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:91)
 2015-04-29 15:49:34,657 INFO org.apache.hive.spark.client.SparkClientImpl:
   at 
 org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:91)
 2015-04-29 15:49:34,657 INFO

[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)


 [ 
https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuben Kuhnert updated HIVE-10190:
--
Attachment: (was: HIVE-10190.10.patch)

 CBO: AST mode checks for TABLESAMPLE with 
 AST.toString().contains(TOK_TABLESPLITSAMPLE)
 -

 Key: HIVE-10190
 URL: https://issues.apache.org/jira/browse/HIVE-10190
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Reuben Kuhnert
Priority: Trivial
  Labels: perfomance
 Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, 
 HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, 
 HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, 
 HIVE-10190.07.patch, HIVE-10190.08.patch, HIVE-10190.09.patch, 
 HIVE-10190.10.patch


 {code}
 public static boolean validateASTForUnsupportedTokens(ASTNode ast) {
 String astTree = ast.toStringTree();
 // if any of following tokens are present in AST, bail out
 String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE };
 for (String token : tokens) {
   if (astTree.contains(token)) {
 return false;
   }
 }
 return true;
   }
 {code}
 This is an issue for a SQL query which is bigger in AST form than in text 
 (~700kb).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10576) add jar command does not work with Windows OS


[ 
https://issues.apache.org/jira/browse/HIVE-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528580#comment-14528580
 ] 

Thejas M Nair commented on HIVE-10576:
--

+1
Thanks Hari!

 add jar command does not work with Windows OS
 -

 Key: HIVE-10576
 URL: https://issues.apache.org/jira/browse/HIVE-10576
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10576.1.patch, HIVE-10576.2.patch, 
 HIVE-10576.3.patch


 Steps to reproduce this issue in Windows OS:
 hadoop.cmd fs -mkdir -p /tmp/testjars
 hadoop.cmd fs -copyFromLocal hive-hcatalog-core-*.jar  /tmp/testjars
 from hive cli:
 add jar hdfs:///tmp/testjars/hive-hcatalog-core-*.jar;
 add jar D:\hdp\hive-1.2.0.2.3.0.0-1737\hcatalog\share\hcatalog\hive-hcatal
 og-core-1.2.0.2.3.0.0-1737.jar;
 {code}
 hive add jar hdfs:///tmp/testjars/hive-hcatalog-core-1.2.0.2.3.0.0-1737.jar;
 converting to local 
 hdfs:///tmp/testjars/hive-hcatalog-core-1.2.0.2.3.0.0-1737.j
 ar
 Illegal character in opaque part at index 2: 
 C:\Users\hadoopqa\AppData\Local\Tem
 p\cf0c70a4-f8e5-43ae-8c94-aa528f90887d_resources\hive-hcatalog-core-1.2.0.2.3.0.
 0-1737.jar
 Query returned non-zero code: 1, cause: java.net.URISyntaxException: Illegal 
 cha
 racter in opaque part at index 2: 
 C:\Users\hadoopqa\AppData\Local\Temp\cf0c70a4-
 f8e5-43ae-8c94-aa528f90887d_resources\hive-hcatalog-core-1.2.0.2.3.0.0-1737.jar
 hive add jar 
 D:\hdp\hive-1.2.0.2.3.0.0-1737\hcatalog\share\hcatalog\hive-hcatal
 og-core-1.2.0.2.3.0.0-1737.jar;
 Illegal character in opaque part at index 2: 
 D:\hdp\hive-1.2.0.2.3.0.0-1737\hcat
 alog\share\hcatalog\hive-hcatalog-core-1.2.0.2.3.0.0-1737.jar
 Query returned non-zero code: 1, cause: java.net.URISyntaxException: Illegal 
 cha
 racter in opaque part at index 2: 
 D:\hdp\hive-1.2.0.2.3.0.0-1737\hcatalog\share\
 hcatalog\hive-hcatalog-core-1.2.0.2.3.0.0-1737.jar
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10576) add jar command does not work with Windows OS


[ 
https://issues.apache.org/jira/browse/HIVE-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528591#comment-14528591
 ] 

Hive QA commented on HIVE-10576:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12730319/HIVE-10576.3.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 8895 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend
org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testTempTable
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3734/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3734/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3734/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12730319 - PreCommit-HIVE-TRUNK-Build

 add jar command does not work with Windows OS
 -

 Key: HIVE-10576
 URL: https://issues.apache.org/jira/browse/HIVE-10576
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10576.1.patch, HIVE-10576.2.patch, 
 HIVE-10576.3.patch


 Steps to reproduce this issue in Windows OS:
 hadoop.cmd fs -mkdir -p /tmp/testjars
 hadoop.cmd fs -copyFromLocal hive-hcatalog-core-*.jar  /tmp/testjars
 from hive cli:
 add jar hdfs:///tmp/testjars/hive-hcatalog-core-*.jar;
 add jar D:\hdp\hive-1.2.0.2.3.0.0-1737\hcatalog\share\hcatalog\hive-hcatal
 og-core-1.2.0.2.3.0.0-1737.jar;
 {code}
 hive add jar hdfs:///tmp/testjars/hive-hcatalog-core-1.2.0.2.3.0.0-1737.jar;
 converting to local 
 hdfs:///tmp/testjars/hive-hcatalog-core-1.2.0.2.3.0.0-1737.j
 ar
 Illegal character in opaque part at index 2: 
 C:\Users\hadoopqa\AppData\Local\Tem
 p\cf0c70a4-f8e5-43ae-8c94-aa528f90887d_resources\hive-hcatalog-core-1.2.0.2.3.0.
 0-1737.jar
 Query returned non-zero code: 1, cause: java.net.URISyntaxException: Illegal 
 cha
 racter in opaque part at index 2: 
 C:\Users\hadoopqa\AppData\Local\Temp\cf0c70a4-
 f8e5-43ae-8c94-aa528f90887d_resources\hive-hcatalog-core-1.2.0.2.3.0.0-1737.jar
 hive add jar 
 D:\hdp\hive-1.2.0.2.3.0.0-1737\hcatalog\share\hcatalog\hive-hcatal
 og-core-1.2.0.2.3.0.0-1737.jar;
 Illegal character in opaque part at index 2: 
 D:\hdp\hive-1.2.0.2.3.0.0-1737\hcat
 alog\share\hcatalog\hive-hcatalog-core-1.2.0.2.3.0.0-1737.jar
 Query returned non-zero code: 1, cause: java.net.URISyntaxException: Illegal 
 cha
 racter in opaque part at index 2: 
 D:\hdp\hive-1.2.0.2.3.0.0-1737\hcatalog\share\
 hcatalog\hive-hcatalog-core-1.2.0.2.3.0.0-1737.jar
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10608) Fix useless 'if' stamement in RetryingMetaStoreClient (135)

2015-05-05 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529430#comment-14529430
 ] 

Szehon Ho commented on HIVE-10608:
--

+1

 Fix useless 'if' stamement in RetryingMetaStoreClient (135)
 ---

 Key: HIVE-10608
 URL: https://issues.apache.org/jira/browse/HIVE-10608
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: rb33861.patch


 if statement below is useless because it ends with ;
 {code}
   } catch (MetaException e) {
 if (e.getMessage().matches((?s).*(IO|TTransport)Exception.*));
 caughtException = e;
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10213) MapReduce jobs using dynamic-partitioning fail on commit.


[ 
https://issues.apache.org/jira/browse/HIVE-10213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529452#comment-14529452
 ] 

Sushanth Sowmyan commented on HIVE-10213:
-

This patch set off some warning flags for me with regards to the traditional 
M-R usecase, but it's because it's been a while since I looked at this piece of 
code. The traditional M-R usecase is still fine, because the 
DynamicPartitionFileRecordWriterContainer.close() will register an appropriate 
TaskCommitterProxy, and a commit on the OutputCommitter will be called in the 
same process scope, thus making it okay. For pig-based optimizations also, it'd 
continue to be okay as the singleton retains it in memory.

+1, and I'm okay with committing this patch as-is, tests have already run on 
this, and this section of code has not changed since then.

 MapReduce jobs using dynamic-partitioning fail on commit.
 -

 Key: HIVE-10213
 URL: https://issues.apache.org/jira/browse/HIVE-10213
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-10213.1.patch


 I recently ran into a problem in {{TaskCommitContextRegistry}}, when using 
 dynamic-partitions.
 Consider a MapReduce program that reads HCatRecords from a table (using 
 HCatInputFormat), and then writes to another table (with identical schema), 
 using HCatOutputFormat. The Map-task fails with the following exception:
 {code}
 Error: java.io.IOException: No callback registered for 
 TaskAttemptID:attempt_1426589008676_509707_m_00_0@hdfs://crystalmyth.myth.net:8020/user/mithunr/mythdb/target/_DYN0.6784154320609959/grid=__HIVE_DEFAULT_PARTITION__/dt=__HIVE_DEFAULT_PARTITION__
 at 
 org.apache.hive.hcatalog.mapreduce.TaskCommitContextRegistry.commitTask(TaskCommitContextRegistry.java:56)
 at 
 org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitTask(FileOutputCommitterContainer.java:139)
 at org.apache.hadoop.mapred.Task.commit(Task.java:1163)
 at org.apache.hadoop.mapred.Task.done(Task.java:1025)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 {code}
 {{TaskCommitContextRegistry::commitTask()}} uses call-backs registered from 
 {{DynamicPartitionFileRecordWriter}}. But in case {{HCatInputFormat}} and 
 {{HCatOutputFormat}} are both used in the same job, the 
 {{DynamicPartitionFileRecordWriter}} might only be exercised in the Reducer.
 I'm relaxing the IOException, and log a warning message instead of just 
 failing.
 (I'll post the fix shortly.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9743) Incorrect result set for vectorized left outer join


 [ 
https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-9743:
---
Attachment: (was: HIVE-9743.07.patch)

 Incorrect result set for vectorized left outer join
 ---

 Key: HIVE-9743
 URL: https://issues.apache.org/jira/browse/HIVE-9743
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.14.0
Reporter: N Campbell
Assignee: Matt McCline
 Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, 
 HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, HIVE-9743.06.patch


 This query is supposed to return 3 rows and will when run without Tez but 
 returns 2 rows when run with Tez.
 select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left 
 outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2  15 )
 tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
 1 20  25  null
 2 null  50  null
 instead of
 tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
 0 10  15  null
 1 20  25  null
 2 null  50  null
 create table  if not exists TJOIN1 (RNUM int , C1 int, C2 int)
  STORED AS orc ;
 0|10|15
 1|20|25
 2|\N|50
 create table  if not exists TJOIN2 (RNUM int , C1 int, C2 char(2))
 ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
  STORED AS TEXTFILE ;
 0|10|BB
 1|15|DD
 2|\N|EE
 3|10|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive

2015-05-05 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529461#comment-14529461
 ] 

Brock Noland commented on HIVE-8065:


bq. have you considered creating a single encrypted staging dir for all queries 
to use instead of creating new ones under the table namespace? (this could be 
owned by Hive and encrypted with Hive's key). If so, why did you choose the 
current design?

This approach does not work since you cannot move files across encryption zones.

 Support HDFS encryption functionality on Hive
 -

 Key: HIVE-8065
 URL: https://issues.apache.org/jira/browse/HIVE-8065
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.1
Reporter: Sergio Peña
Assignee: Sergio Peña
  Labels: Hive-Scrum

 The new encryption support on HDFS makes Hive incompatible and unusable when 
 this feature is used.
 HDFS encryption is designed so that an user can configure different 
 encryption zones (or directories) for multi-tenant environments. An 
 encryption zone has an exclusive encryption key, such as AES-128 or AES-256. 
 Because of security compliance, the HDFS does not allow to move/rename files 
 between encryption zones. Renames are allowed only inside the same encryption 
 zone. A copy is allowed between encryption zones.
 See HDFS-6134 for more details about HDFS encryption design.
 Hive currently uses a scratch directory (like /tmp/$user/$random). This 
 scratch directory is used for the output of intermediate data (between MR 
 jobs) and for the final output of the hive query which is later moved to the 
 table directory location.
 If Hive tables are in different encryption zones than the scratch directory, 
 then Hive won't be able to renames those files/directories, and it will make 
 Hive unusable.
 To handle this problem, we can change the scratch directory of the 
 query/statement to be inside the same encryption zone of the table directory 
 location. This way, the renaming process will be successful. 
 Also, for statements that move files between encryption zones (i.e. LOAD 
 DATA), a copy may be executed instead of a rename. This will cause an 
 overhead when copying large data files, but it won't break the encryption on 
 Hive.
 Another security thing to consider is when using joins selects. If Hive joins 
 different tables with different encryption key strengths, then the results of 
 the select might break the security compliance of the tables. Let's say two 
 tables with 128 bits and 256 bits encryption are joined, then the temporary 
 results might be stored in the 128 bits encryption zone. This will conflict 
 with the table encrypted with 256 bits temporary.
 To fix this, Hive should be able to select the scratch directory that is more 
 secured/encrypted in order to save the intermediate data temporary with no 
 compliance issues.
 For instance:
 {noformat}
 SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id;
 {noformat}
 - This should use a scratch directory (or staging directory) inside the 
 table-aes256 table location.
 {noformat}
 INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1;
 {noformat}
 - This should use a scratch directory inside the table-aes1 location.
 {noformat}
 FROM table-unencrypted
 INSERT OVERWRITE TABLE table-aes128 SELECT id, name
 INSERT OVERWRITE TABLE table-aes256 SELECT id, name
 {noformat}
 - This should use a scratch directory on each of the tables locations.
 - The first SELECT will have its scratch directory on table-aes128 directory.
 - The second SELECT will have its scratch directory on table-aes256 directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9743) Incorrect result set for vectorized left outer join


 [ 
https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-9743:
---
Attachment: HIVE-9743.08.patch

 Incorrect result set for vectorized left outer join
 ---

 Key: HIVE-9743
 URL: https://issues.apache.org/jira/browse/HIVE-9743
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.14.0
Reporter: N Campbell
Assignee: Matt McCline
 Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, 
 HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, 
 HIVE-9743.06.patch, HIVE-9743.08.patch


 This query is supposed to return 3 rows and will when run without Tez but 
 returns 2 rows when run with Tez.
 select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left 
 outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2  15 )
 tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
 1 20  25  null
 2 null  50  null
 instead of
 tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
 0 10  15  null
 1 20  25  null
 2 null  50  null
 create table  if not exists TJOIN1 (RNUM int , C1 int, C2 int)
  STORED AS orc ;
 0|10|15
 1|20|25
 2|\N|50
 create table  if not exists TJOIN2 (RNUM int , C1 int, C2 char(2))
 ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
  STORED AS TEXTFILE ;
 0|10|BB
 1|15|DD
 2|\N|EE
 3|10|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10615) LLAP: Invalid containerId prefix


[ 
https://issues.apache.org/jira/browse/HIVE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529473#comment-14529473
 ] 

Prasanth Jayachandran commented on HIVE-10615:
--

[~sseth] fyi..

 LLAP: Invalid containerId prefix
 

 Key: HIVE-10615
 URL: https://issues.apache.org/jira/browse/HIVE-10615
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran

 I encountered this error when I ran a simple query in llap mode today. 
 {code}org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
 java.lang.IllegalArgumentException: Invalid ContainerId prefix: 
   at 
 org.apache.hadoop.yarn.api.records.ContainerId.fromString(ContainerId.java:211)
   at 
 org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:178)
   at 
 org.apache.tez.dag.app.TezTaskCommunicatorImpl$TezTaskUmbilicalProtocolImpl.heartbeat(TezTaskCommunicatorImpl.java:311)
   at 
 org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator$LlapTaskUmbilicalProtocolImpl.heartbeat(LlapTaskCommunicator.java:398)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:514)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
   at org.apache.hadoop.ipc.Client.call(Client.java:1468)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244)
   at com.sun.proxy.$Proxy14.heartbeat(Unknown Source)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.heartbeat(LlapTaskReporter.java:256)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:184)
   at 
 org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:126)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 15/05/05 15:24:22 [Task-Executor-0] INFO task.TezTaskRunner : Interrupted 
 while waiting for task to complete. Interrupting task
 15/05/05 15:24:22 [TezTaskRunner_attempt_1430816501738_0034_1_00_00_0] 
 INFO task.TezTaskRunner : Encounted an error while executing task: 
 attempt_1430816501738_0034_1_00_00_0
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
   at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
   at 
 java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.initialize(LogicalIOProcessorRuntimeTask.java:218)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:177)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at

[jira] [Updated] (HIVE-10610) hive command fails to get hadoop version


 [ 
https://issues.apache.org/jira/browse/HIVE-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10610:

Assignee: Shwetha G S

 hive command fails to get hadoop version
 

 Key: HIVE-10610
 URL: https://issues.apache.org/jira/browse/HIVE-10610
 Project: Hive
  Issue Type: Bug
Reporter: Shwetha G S
Assignee: Shwetha G S
 Attachments: HIVE-10610.patch


 NO PRECOMMIT TESTS
 If debug level logging is enabled, hive command fails with the following 
 exception:
 {noformat}
 apache-hive-1.2.0-SNAPSHOT-bin$ ./bin/hive
 Unable to determine Hadoop version information from 13:54:07,683
 'hadoop version' returned:
 2015-05-05 13:54:08,014 DEBUG - [main:] ~ version: 2.5.0-cdh5.3.3 
 (VersionInfo:171) Hadoop 2.5.0-cdh5.3.3 Subversion 
 http://github.com/cloudera/hadoop -r 82a65209d6e9e4a2b41fdbcd8190c7ea38730627 
 Compiled by jenkins on 2015-04-08T22:00Z Compiled with protoc 2.5.0 From 
 source with checksum 1531e104cdad7489656f44875f3334b This command was run 
 using 
 /Users/sshivalingamurthy/installs/hadoop-2.5.0-cdh5.3.3/share/hadoop/common/hadoop-common-2.5.0-cdh5.3.3.jar
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9743) Incorrect result set for vectorized left outer join


[ 
https://issues.apache.org/jira/browse/HIVE-9743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529477#comment-14529477
 ] 

Matt McCline commented on HIVE-9743:


[~vikram.dixit] I removed the annotations and the MR 
vector_left_outer_join3.q.out and fiddled with environment variables so that it 
now has Sorted Merge Bucket Map Join Operator operators; Tez has Merge Join 
Operator as you said.

The original LEFT OUTER JOIN problem does not repro with 
vector_left_outer_join3.q though.

 Incorrect result set for vectorized left outer join
 ---

 Key: HIVE-9743
 URL: https://issues.apache.org/jira/browse/HIVE-9743
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.14.0
Reporter: N Campbell
Assignee: Matt McCline
 Attachments: HIVE-9743.01.patch, HIVE-9743.02.patch, 
 HIVE-9743.03.patch, HIVE-9743.04.patch, HIVE-9743.05.patch, 
 HIVE-9743.06.patch, HIVE-9743.08.patch


 This query is supposed to return 3 rows and will when run without Tez but 
 returns 2 rows when run with Tez.
 select tjoin1.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 left 
 outer join tjoin2 on ( tjoin1.c1 = tjoin2.c1 and tjoin1.c2  15 )
 tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
 1 20  25  null
 2 null  50  null
 instead of
 tjoin1.rnum   tjoin1.c1   tjoin1.c2   c2j2
 0 10  15  null
 1 20  25  null
 2 null  50  null
 create table  if not exists TJOIN1 (RNUM int , C1 int, C2 int)
  STORED AS orc ;
 0|10|15
 1|20|25
 2|\N|50
 create table  if not exists TJOIN2 (RNUM int , C1 int, C2 char(2))
 ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
  STORED AS TEXTFILE ;
 0|10|BB
 1|15|DD
 2|\N|EE
 3|10|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9508) MetaStore client socket connection should have a lifetime


[ 
https://issues.apache.org/jira/browse/HIVE-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528854#comment-14528854
 ] 

Vaibhav Gumashta commented on HIVE-9508:


Failures are unrelated. Will commit shortly.

 MetaStore client socket connection should have a lifetime
 -

 Key: HIVE-9508
 URL: https://issues.apache.org/jira/browse/HIVE-9508
 Project: Hive
  Issue Type: Sub-task
  Components: CLI, Metastore
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
  Labels: metastore, rolling_upgrade
 Fix For: 1.2.0

 Attachments: HIVE-9508.1.patch, HIVE-9508.2.patch, HIVE-9508.3.patch, 
 HIVE-9508.4.patch, HIVE-9508.5.patch, HIVE-9508.6.patch


 Currently HiveMetaStoreClient (or SessionHMSC) is connected to one Metastore 
 server until the connection is closed or there is a problem. I would like to 
 introduce the concept of a MetaStore client socket life time. The MS client 
 will reconnect if the socket lifetime is reached. This will help during 
 rolling upgrade of Metastore.
 When there are multiple Metastore servers behind a VIP (load balancer), it is 
 easy to take one server out of rotation and wait for 10+ mins for all 
 existing connections will die down (if the lifetime is 5mins say) and the 
 server can be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10607) Combination of ReducesinkDedup + TopN optimization yields incorrect result if there are multiple GBY in reducer


[ 
https://issues.apache.org/jira/browse/HIVE-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528939#comment-14528939
 ] 

Ashutosh Chauhan commented on HIVE-10607:
-

yeah.. it will be good to have this in 1.2 too. [~sushanth] is that OK ?

 Combination of ReducesinkDedup + TopN optimization yields incorrect result if 
 there are multiple GBY in reducer
 ---

 Key: HIVE-10607
 URL: https://issues.apache.org/jira/browse/HIVE-10607
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Tez
Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.1.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-10607.patch


 {code:sql}
 select ctinyint, count(cdouble) from (select ctinyint, cdouble from 
 alltypesorc group by ctinyint, cdouble) t1 group by ctinyint order by 
 ctinyint limit 20;
 {code}
 This gives different result set depending on which set of optimizations are 
 on. In particular in .q test environment following two invocations will give 
 you different result set:
 {code}
 *   mvn test -Phadoop-2 -Dtest.output.overwrite=true 
 -Dtest=TestMiniTezCliDriver -Dqfile=test.q 
 -Dhive.optimize.reducededuplication.min.reducer=1 
 -Dhive.limit.pushdown.memory.usage=0.3f
 *   mvn test -Phadoop-2 -Dtest.output.overwrite=true 
 -Dtest=TestMiniTezCliDriver -Dqfile=test.q 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10542) Full outer joins in tez produce incorrect results in certain cases

2015-05-05 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-10542:
--
Attachment: HIVE-10542.6.patch

 Full outer joins in tez produce incorrect results in certain cases
 --

 Key: HIVE-10542
 URL: https://issues.apache.org/jira/browse/HIVE-10542
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
Priority: Blocker
 Attachments: HIVE-10542.1.patch, HIVE-10542.2.patch, 
 HIVE-10542.3.patch, HIVE-10542.4.patch, HIVE-10542.5.patch, HIVE-10542.6.patch


 If there is no records for one of the tables in the full outer join, we do 
 not read the other input and end up not producing rows which we should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9845) HCatSplit repeats information making input split data size huge

2015-05-05 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529007#comment-14529007
 ] 

Mithun Radhakrishnan commented on HIVE-9845:


Here's the updated patch. Sorry for the delay.

 HCatSplit repeats information making input split data size huge
 ---

 Key: HIVE-9845
 URL: https://issues.apache.org/jira/browse/HIVE-9845
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Rohini Palaniswamy
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9845.1.patch, HIVE-9845.3.patch, HIVE-9845.4.patch, 
 HIVE-9845.5.patch


 Pig on Tez jobs with larger tables hit PIG-4443. Running on HDFS data which 
 has even triple the number of splits(100K+ splits and tasks) does not hit 
 that issue.
 {code}
 HCatBaseInputFormat.java:
  //Call getSplit on the InputFormat, create an
   //HCatSplit for each underlying split
   //NumSplits is 0 for our purposes
   org.apache.hadoop.mapred.InputSplit[] baseSplits = 
 inputFormat.getSplits(jobConf, 0);
   for(org.apache.hadoop.mapred.InputSplit split : baseSplits) {
 splits.add(new HCatSplit(
 partitionInfo,
 split,allCols));
   }
 {code}
 Each hcatSplit duplicates partition schema and table schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10611) Mini tez tests wait for 5 minutes before shutting down


[ 
https://issues.apache.org/jira/browse/HIVE-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529024#comment-14529024
 ] 

Ashutosh Chauhan commented on HIVE-10611:
-

+1 Thanks, [~vikram.dixit] for doing this.

 Mini tez tests wait for 5 minutes before shutting down
 --

 Key: HIVE-10611
 URL: https://issues.apache.org/jira/browse/HIVE-10611
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.3.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-10611.1.patch


 Currently, at shutdown, the tez mini cluster waits for the session to close 
 before shutting down the cluster. This ends up being 5 minutes - the default 
 value. We can shut down the session to alleviate this situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others


[ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529366#comment-14529366
 ] 

Thejas M Nair commented on HIVE-7018:
-

This change breaks schematool upgrade - See HIVE-10614

 Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
 not others
 -

 Key: HIVE-7018
 URL: https://issues.apache.org/jira/browse/HIVE-7018
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Yongzhi Chen
 Fix For: 1.2.0

 Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch


 It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
 column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive


[ 
https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529378#comment-14529378
 ] 

Eugene Koifman commented on HIVE-8065:
--

[~spena], when implementing this, have you considered creating a single 
encrypted staging dir for all queries to use instead of creating new ones under 
the table namespace?  (this could be owned by Hive and encrypted with Hive's 
key).  If so, why did you choose the current design?

Some possible issues with current design:
Requires write permission on the table dir
delete-on-exit (on stagingdir) is not completely reliable as far as I know.  
This may leave files around
in a query like SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id 
== t2.id; when staging dir is created under table-aes256, someone how has a 
key for this EZ may read data (in theory at least) that came from table-aes128 
even if they don't have a key for EZ which contains table-aes128.

thanks

 Support HDFS encryption functionality on Hive
 -

 Key: HIVE-8065
 URL: https://issues.apache.org/jira/browse/HIVE-8065
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.1
Reporter: Sergio Peña
Assignee: Sergio Peña
  Labels: Hive-Scrum

 The new encryption support on HDFS makes Hive incompatible and unusable when 
 this feature is used.
 HDFS encryption is designed so that an user can configure different 
 encryption zones (or directories) for multi-tenant environments. An 
 encryption zone has an exclusive encryption key, such as AES-128 or AES-256. 
 Because of security compliance, the HDFS does not allow to move/rename files 
 between encryption zones. Renames are allowed only inside the same encryption 
 zone. A copy is allowed between encryption zones.
 See HDFS-6134 for more details about HDFS encryption design.
 Hive currently uses a scratch directory (like /tmp/$user/$random). This 
 scratch directory is used for the output of intermediate data (between MR 
 jobs) and for the final output of the hive query which is later moved to the 
 table directory location.
 If Hive tables are in different encryption zones than the scratch directory, 
 then Hive won't be able to renames those files/directories, and it will make 
 Hive unusable.
 To handle this problem, we can change the scratch directory of the 
 query/statement to be inside the same encryption zone of the table directory 
 location. This way, the renaming process will be successful. 
 Also, for statements that move files between encryption zones (i.e. LOAD 
 DATA), a copy may be executed instead of a rename. This will cause an 
 overhead when copying large data files, but it won't break the encryption on 
 Hive.
 Another security thing to consider is when using joins selects. If Hive joins 
 different tables with different encryption key strengths, then the results of 
 the select might break the security compliance of the tables. Let's say two 
 tables with 128 bits and 256 bits encryption are joined, then the temporary 
 results might be stored in the 128 bits encryption zone. This will conflict 
 with the table encrypted with 256 bits temporary.
 To fix this, Hive should be able to select the scratch directory that is more 
 secured/encrypted in order to save the intermediate data temporary with no 
 compliance issues.
 For instance:
 {noformat}
 SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id;
 {noformat}
 - This should use a scratch directory (or staging directory) inside the 
 table-aes256 table location.
 {noformat}
 INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1;
 {noformat}
 - This should use a scratch directory inside the table-aes1 location.
 {noformat}
 FROM table-unencrypted
 INSERT OVERWRITE TABLE table-aes128 SELECT id, name
 INSERT OVERWRITE TABLE table-aes256 SELECT id, name
 {noformat}
 - This should use a scratch directory on each of the tables locations.
 - The first SELECT will have its scratch directory on table-aes128 directory.
 - The second SELECT will have its scratch directory on table-aes256 directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)


 [ 
https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuben Kuhnert updated HIVE-10190:
--
Attachment: HIVE-10190.09.patch

 CBO: AST mode checks for TABLESAMPLE with 
 AST.toString().contains(TOK_TABLESPLITSAMPLE)
 -

 Key: HIVE-10190
 URL: https://issues.apache.org/jira/browse/HIVE-10190
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Reuben Kuhnert
Priority: Trivial
  Labels: perfomance
 Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, 
 HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, 
 HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, 
 HIVE-10190.07.patch, HIVE-10190.08.patch, HIVE-10190.09.patch


 {code}
 public static boolean validateASTForUnsupportedTokens(ASTNode ast) {
 String astTree = ast.toStringTree();
 // if any of following tokens are present in AST, bail out
 String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE };
 for (String token : tokens) {
   if (astTree.contains(token)) {
 return false;
   }
 }
 return true;
   }
 {code}
 This is an issue for a SQL query which is bigger in AST form than in text 
 (~700kb).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.


[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528696#comment-14528696
 ] 

Sushanth Sowmyan commented on HIVE-9736:


+1 : Have looked through patch and it makes sense. Tests pass, and I trust 
Chris' judgement on this for a more detailed verification. :) 

Will commit to master and branch-1.2

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)


 [ 
https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuben Kuhnert updated HIVE-10190:
--
Attachment: (was: HIVE-10190.10.patch)

 CBO: AST mode checks for TABLESAMPLE with 
 AST.toString().contains(TOK_TABLESPLITSAMPLE)
 -

 Key: HIVE-10190
 URL: https://issues.apache.org/jira/browse/HIVE-10190
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Reuben Kuhnert
Priority: Trivial
  Labels: perfomance
 Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, 
 HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, 
 HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, 
 HIVE-10190.07.patch, HIVE-10190.08.patch


 {code}
 public static boolean validateASTForUnsupportedTokens(ASTNode ast) {
 String astTree = ast.toStringTree();
 // if any of following tokens are present in AST, bail out
 String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE };
 for (String token : tokens) {
   if (astTree.contains(token)) {
 return false;
   }
 }
 return true;
   }
 {code}
 This is an issue for a SQL query which is bigger in AST form than in text 
 (~700kb).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)


 [ 
https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuben Kuhnert updated HIVE-10190:
--
Attachment: (was: HIVE-10190.09.patch)

 CBO: AST mode checks for TABLESAMPLE with 
 AST.toString().contains(TOK_TABLESPLITSAMPLE)
 -

 Key: HIVE-10190
 URL: https://issues.apache.org/jira/browse/HIVE-10190
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Reuben Kuhnert
Priority: Trivial
  Labels: perfomance
 Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, 
 HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, 
 HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, 
 HIVE-10190.07.patch, HIVE-10190.08.patch


 {code}
 public static boolean validateASTForUnsupportedTokens(ASTNode ast) {
 String astTree = ast.toStringTree();
 // if any of following tokens are present in AST, bail out
 String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE };
 for (String token : tokens) {
   if (astTree.contains(token)) {
 return false;
   }
 }
 return true;
   }
 {code}
 This is an issue for a SQL query which is bigger in AST form than in text 
 (~700kb).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7375) Add option in test infra to compile in other profiles (like hadoop-1)

2015-05-05 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528733#comment-14528733
 ] 

Xuefu Zhang commented on HIVE-7375:
---

+1

 Add option in test infra to compile in other profiles (like hadoop-1)
 -

 Key: HIVE-7375
 URL: https://issues.apache.org/jira/browse/HIVE-7375
 Project: Hive
  Issue Type: Test
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-7375.2.patch, HIVE-7375.patch


 As we are seeing some commits breaking hadoop-1 compilation due to lack of 
 pre-commit converage, it might be nice to add an option in the test infra to 
 compile on optional profiles as a pre-step before testing on the main profile.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10563) MiniTezCliDriver tests ordering issues