[jira] [Commented] (HIVE-13563) Hive Streaming does not honor orc.compress.size and orc.stripe.size table properties
[ https://issues.apache.org/jira/browse/HIVE-13563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259451#comment-15259451 ] Hive QA commented on HIVE-13563: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12800332/HIVE-13563.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 46 failed/errored test(s), 9935 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestMiniTezCliDriver-auto_join1.q-vector_complex_join.q-vectorization_limit.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vector_decimal_2.q-explainuser_1.q-explainuser_3.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vector_varchar_4.q-smb_cache.q-tez_join_hash.q-and-8-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nomore_ambiguous_table_col org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regexp_extract org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern3 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern4 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_nonkey_groupby org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_selectDistinctStarNeg_2 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_subquery_shared_alias org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udtf_not_supported1 org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote.org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote org.apache.hadoop.hive.metastore.TestFilterHooks.org.apache.hadoop.hive.metastore.TestFilterHooks org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testAddPartitions org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener org.apache.hadoop.hive.metastore.TestMetaStoreEventListenerOnlyOnCommit.testEventStatus org.apache.hadoop.hive.metastore.TestMetaStoreInitListener.testMetaStoreInitListener org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.org.apache.hadoop.hive.metastore.TestMetaStoreMetrics org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithUnicode org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithValidPartVal org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithCommas org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithUnicode org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithValidCharacters org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler org.apache.hadoop.hive.ql.security.TestClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestFolderPermissions.org.apache.hadoop.hive.ql.security.TestFolderPermissions org.apache.hadoop.hive.ql.security.TestMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener.org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testSaslWithHiveMetaStore org.apache.hive.hcatalog.listener.TestDbNotificationListener.sqlInsertPartition org.apache.hive.minikdc.TestJdbcWithDBTokenStore.org.apache.hive.minikdc.TestJdbcWithDBTokenStore
[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles
[ https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259412#comment-15259412 ] Rui Li commented on HIVE-13572: --- Here's the time (in seconds) spent of copying 183 files in my local test: ||W/O patch||Patch v1||Patch v2|| |16.36|3.6|0.72| > Redundant setting full file status in Hive::copyFiles > - > > Key: HIVE-13572 > URL: https://issues.apache.org/jira/browse/HIVE-13572 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch > > > We set full file status in each copy-file thread. I think it's redundant and > hurts performance when we have multiple files to copy. > {code} > if (inheritPerms) { > ShimLoader.getHadoopShims().setFullFileStatus(conf, > fullDestStatus, destFs, destf); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13343) Need to disable hybrid grace hash join in llap mode except for dynamically partitioned hash join
[ https://issues.apache.org/jira/browse/HIVE-13343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259356#comment-15259356 ] Vikram Dixit K commented on HIVE-13343: --- Addressed comments. Created RB. > Need to disable hybrid grace hash join in llap mode except for dynamically > partitioned hash join > > > Key: HIVE-13343 > URL: https://issues.apache.org/jira/browse/HIVE-13343 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0 >Reporter: Vikram Dixit K >Assignee: Vikram Dixit K > Attachments: HIVE-13343.1.patch, HIVE-13343.2.patch, > HIVE-13343.3.patch > > > Due to performance reasons, we should disable use of hybrid grace hash join > in llap when dynamic partition hash join is not used. With dynamic partition > hash join, we need hybrid grace hash join due to the possibility of skews. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13610) Hive exec module won't compile with IBM JDK
[ https://issues.apache.org/jira/browse/HIVE-13610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259351#comment-15259351 ] Pan Yuxuan commented on HIVE-13610: --- [~sershe] Updated the patch, please take a look, thanks for your kindly review. > Hive exec module won't compile with IBM JDK > --- > > Key: HIVE-13610 > URL: https://issues.apache.org/jira/browse/HIVE-13610 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0 > Environment: Hive 1.1.0 + IBM JDK 1.7 +ppc64 architecture >Reporter: Pan Yuxuan >Assignee: Pan Yuxuan > Attachments: HIVE-13610.1.patch, HIVE-13610.patch > > > org.apache.hadoop.hive.ql.debug.Utils explicitly import > com.sun.management.HotSpotDiagnosticMXBean which is not supported by IBM JDK. > So we can make HotSpotDiagnosticMXBean as runtime but not compile. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13610) Hive exec module won't compile with IBM JDK
[ https://issues.apache.org/jira/browse/HIVE-13610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pan Yuxuan updated HIVE-13610: -- Attachment: HIVE-13610.1.patch > Hive exec module won't compile with IBM JDK > --- > > Key: HIVE-13610 > URL: https://issues.apache.org/jira/browse/HIVE-13610 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0 > Environment: Hive 1.1.0 + IBM JDK 1.7 +ppc64 architecture >Reporter: Pan Yuxuan >Assignee: Pan Yuxuan > Attachments: HIVE-13610.1.patch, HIVE-13610.patch > > > org.apache.hadoop.hive.ql.debug.Utils explicitly import > com.sun.management.HotSpotDiagnosticMXBean which is not supported by IBM JDK. > So we can make HotSpotDiagnosticMXBean as runtime but not compile. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13343) Need to disable hybrid grace hash join in llap mode except for dynamically partitioned hash join
[ https://issues.apache.org/jira/browse/HIVE-13343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-13343: -- Attachment: HIVE-13343.3.patch > Need to disable hybrid grace hash join in llap mode except for dynamically > partitioned hash join > > > Key: HIVE-13343 > URL: https://issues.apache.org/jira/browse/HIVE-13343 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0 >Reporter: Vikram Dixit K >Assignee: Vikram Dixit K > Attachments: HIVE-13343.1.patch, HIVE-13343.2.patch, > HIVE-13343.3.patch > > > Due to performance reasons, we should disable use of hybrid grace hash join > in llap when dynamic partition hash join is not used. With dynamic partition > hash join, we need hybrid grace hash join due to the possibility of skews. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles
[ https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259315#comment-15259315 ] Ashutosh Chauhan commented on HIVE-13572: - I see. Do you have numbers on how do these two approaches compare? Just trying to figure out how much improvement we are getting? > Redundant setting full file status in Hive::copyFiles > - > > Key: HIVE-13572 > URL: https://issues.apache.org/jira/browse/HIVE-13572 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch > > > We set full file status in each copy-file thread. I think it's redundant and > hurts performance when we have multiple files to copy. > {code} > if (inheritPerms) { > ShimLoader.getHadoopShims().setFullFileStatus(conf, > fullDestStatus, destFs, destf); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles
[ https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259306#comment-15259306 ] Rui Li commented on HIVE-13572: --- Just created RS for v2 patch. Changes to sessionstate and shim are trying to add more work to threadpool and make the synchronization more efficient. It's not a bug. > Redundant setting full file status in Hive::copyFiles > - > > Key: HIVE-13572 > URL: https://issues.apache.org/jira/browse/HIVE-13572 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch > > > We set full file status in each copy-file thread. I think it's redundant and > hurts performance when we have multiple files to copy. > {code} > if (inheritPerms) { > ShimLoader.getHadoopShims().setFullFileStatus(conf, > fullDestStatus, destFs, destf); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13395) Lost Update problem in ACID
[ https://issues.apache.org/jira/browse/HIVE-13395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-13395: -- Attachment: HIVE-13395.11.patch patch 11 (not final) reworks the implementation to get WriteSet information from TxnHandler.addDynamicPartitions() call rather than lock information (where applicable). The later knows exactly which partitions have been written to. This increases concurrency dramatically. For example "update T set a = 7 where b = 17". Suppose b is not a partition column and the table has 10K partitions. Suppose further that only 5 partitions match "b=17". We'll currently lock all the existing partitions but a true WriteSet is the 5 partitions actually modified. Further optimizations in HIVE-13622 are useful but not absolutely required. > Lost Update problem in ACID > --- > > Key: HIVE-13395 > URL: https://issues.apache.org/jira/browse/HIVE-13395 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.2.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-13395.11.patch, HIVE-13395.6.patch, > HIVE-13395.7.patch, HIVE-13395.8.patch > > > ACID users can run into Lost Update problem. > In Hive 1.2, Driver.recordValidTxns() (which records the snapshot to use for > the query) is called in Driver.compile(). > Now suppose to concurrent "update T set x = x + 1" are executed. (for > simplicity assume there is exactly 1 row in T) > What can happen is that both compile at the same time (more precisely before > acquireLocksAndOpenTxn() in runInternal() is called) and thus will lock in > the same snapshot, say the value of x = 7 in this snapshot. > Now 1 will get the lock on the row, the second will block. > Now 1, makes x = 8 and commits. > Now 2 proceeds and makes x = 8 again since in it's snapshot x is still 7. > This specific issue is solved in Hive 1.3/2.0 (HIVE-11077 which is a large > patch that deals with multi-statement txns) by moving recordValidTxns() after > locks are acquired which reduces the likelihood of this but doesn't eliminate > the problem. > > Even in 1.3 version of the code, you could have the same issue. Assume the > same 2 queries: > Both start a txn, say txnid 9 and 10. Say 10 gets the lock first, 9 blocks. > 10 updates the row (so x = 8) and thus ReaderKey.currentTransactionId=10. > 10 commits. > Now 9 can proceed and it will get a snapshot that includes 10, i.e. it will > see x = 8 and it will write x = 9, but it will set > ReaderKey.currentTransactionId = 9. Thus when merge logic runs, it will see > x = 8 is the later version of this row, i.e. lost update. > The problem is that locks alone are insufficient for MVCC architecture. > > At lower level Row ID has (originalTransactionId, rowid, bucket id, > currentTransactionId) and since on update/delete we do a table scan, we could > check that we are about to write a row with currentTransactionId < > (currentTransactionId of row we've read) and fail the query. Currently, > currentTransactionId is not surfaced at higher level where this check can be > made. > This would not work (efficiently) longer term where we want to support fast > update on user defined PK vis streaming ingest. > Also, this would not work with multi statement txns since in that case we'd > lock in the snapshot at the start of the txn, but then 2nd, 3rd etc queries > would use the same snapshot and the locks for these queries would be acquired > after the snapshot is locked in so this would be the same situation as pre > HIVE-11077. > > > A more robust solution (commonly used with MVCC) is to keep track of start > and commit time (logical counter) or each transaction to detect if two txns > overlap. The 2nd part is to keep track of write-set, i.e. which data (rows, > partitions, whatever appropriate level of granularity is) were modified by > any txn and if 2 txns overlap in time and wrote the same element, abort later > one. This is called first-committer-wins rule. This requires a MS DB schema > change > It would be most convenient to use the same sequence for txnId, start and > commit time (in which case txnid=start time). In this case we'd need to add > 1 filed to TXNS table. The complication here is that we'll be using elements > of the sequence faster and they are used as part of file name of delta and > base dir and currently limited to 7 digits which can be exceeded. So this > would require some thought to handling upgrade/migration. > Also, write-set tracking requires either additional metastore table or > keeping info in HIVE_LOCKS around longer with new state. > > In the short term, on SQL side of things we could (in auto commit mode only) > acquire the locks first
[jira] [Updated] (HIVE-13622) WriteSet tracking optimizations
[ https://issues.apache.org/jira/browse/HIVE-13622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-13622: -- Description: HIVE-13395 solves the the lost update problem with some inefficiencies. 1. TxhHandler.OperationType is currently derived from LockType. This doesn't distinguish between Update and Delete but would be useful. See comments in TxnHandler. Should be able to pass in Insert/Update/Delete info from client into TxnHandler. 2. TxnHandler.addDynamicPartitions() should know the OperationType as well from the client. It currently extrapolates it from TXN_COMPONENTS. This works but requires extra SQL statements and is thus less performant. It will not work multi-stmt txns. See comments in the code. 3. TxnHandler.checkLock() see more comments around "isPartOfDynamicPartitionInsert". If TxnHandler knew whether it is being called as part of an op running with dynamic partitions, it could be more efficient. In that case we don't have to write to TXN_COMPONENTS at all during lock acquisition. Conversely, if not running with DynPart then, we can kill current txn on lock grant rather than wait until commit time. All of these require some Thrift changes Once done, re-enable TestDbTxnHandler2.testWriteSetTracking11() was: HIVE-13395 solves the the lost update problem with some inefficiencies. 1. TxhHandler.OperationType is currently derived from LockType. This doesn't distinguish between Update and Delete but would be useful. See comments in TxnHandler. Should be able to pass in Insert/Update/Delete info from client into TxnHandler. 2. TxnHandler.addDynamicPartitions() should know the OperationType as well from the client. It currently extrapolates it from TXN_COMPONENTS. This works but requires extra SQL statements and is thus less performant. It will not work multi-stmt txns. See comments in the code. 3. TxnHandler.checkLock() see more comments around "isPartOfDynamicPartitionInsert". If TxnHandler knew whether it is being called as part of an op running with dynamic partitions, it could be more efficient. In that case we don't have to write to TXN_COMPONENTS at all during lock acquisition. Conversely, if not running with DynPart then, we can kill current txn on lock grant rather than wait until commit time. All of these require some Thrift changes > WriteSet tracking optimizations > --- > > Key: HIVE-13622 > URL: https://issues.apache.org/jira/browse/HIVE-13622 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > HIVE-13395 solves the the lost update problem with some inefficiencies. > 1. TxhHandler.OperationType is currently derived from LockType. This doesn't > distinguish between Update and Delete but would be useful. See comments in > TxnHandler. Should be able to pass in Insert/Update/Delete info from client > into TxnHandler. > 2. TxnHandler.addDynamicPartitions() should know the OperationType as well > from the client. It currently extrapolates it from TXN_COMPONENTS. This > works but requires extra SQL statements and is thus less performant. It will > not work multi-stmt txns. See comments in the code. > 3. TxnHandler.checkLock() see more comments around > "isPartOfDynamicPartitionInsert". If TxnHandler knew whether it is being > called as part of an op running with dynamic partitions, it could be more > efficient. In that case we don't have to write to TXN_COMPONENTS at all > during lock acquisition. Conversely, if not running with DynPart then, we > can kill current txn on lock grant rather than wait until commit time. > All of these require some Thrift changes > Once done, re-enable TestDbTxnHandler2.testWriteSetTracking11() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9675) Support START TRANSACTION/COMMIT/ROLLBACK commands
[ https://issues.apache.org/jira/browse/HIVE-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259292#comment-15259292 ] Eugene Koifman commented on HIVE-9675: -- make sure to fix HIVE-13622 first (the part about knowing in TxnHandler if the query is using Dynamic Partitions) > Support START TRANSACTION/COMMIT/ROLLBACK commands > -- > > Key: HIVE-9675 > URL: https://issues.apache.org/jira/browse/HIVE-9675 > Project: Hive > Issue Type: New Feature > Components: SQL, Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > Hive 0.14 added support for insert/update/delete statements with ACID > semantics. Hive 0.14 only supports auto-commit mode. We need to add support > for START TRANSACTION/COMMIT/ROLLBACK commands so that the user can demarcate > transaction boundaries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13395) Lost Update problem in ACID
[ https://issues.apache.org/jira/browse/HIVE-13395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259289#comment-15259289 ] Eugene Koifman commented on HIVE-13395: --- HIVE-13622 covers some optimizations for this > Lost Update problem in ACID > --- > > Key: HIVE-13395 > URL: https://issues.apache.org/jira/browse/HIVE-13395 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.2.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-13395.6.patch, HIVE-13395.7.patch, > HIVE-13395.8.patch > > > ACID users can run into Lost Update problem. > In Hive 1.2, Driver.recordValidTxns() (which records the snapshot to use for > the query) is called in Driver.compile(). > Now suppose to concurrent "update T set x = x + 1" are executed. (for > simplicity assume there is exactly 1 row in T) > What can happen is that both compile at the same time (more precisely before > acquireLocksAndOpenTxn() in runInternal() is called) and thus will lock in > the same snapshot, say the value of x = 7 in this snapshot. > Now 1 will get the lock on the row, the second will block. > Now 1, makes x = 8 and commits. > Now 2 proceeds and makes x = 8 again since in it's snapshot x is still 7. > This specific issue is solved in Hive 1.3/2.0 (HIVE-11077 which is a large > patch that deals with multi-statement txns) by moving recordValidTxns() after > locks are acquired which reduces the likelihood of this but doesn't eliminate > the problem. > > Even in 1.3 version of the code, you could have the same issue. Assume the > same 2 queries: > Both start a txn, say txnid 9 and 10. Say 10 gets the lock first, 9 blocks. > 10 updates the row (so x = 8) and thus ReaderKey.currentTransactionId=10. > 10 commits. > Now 9 can proceed and it will get a snapshot that includes 10, i.e. it will > see x = 8 and it will write x = 9, but it will set > ReaderKey.currentTransactionId = 9. Thus when merge logic runs, it will see > x = 8 is the later version of this row, i.e. lost update. > The problem is that locks alone are insufficient for MVCC architecture. > > At lower level Row ID has (originalTransactionId, rowid, bucket id, > currentTransactionId) and since on update/delete we do a table scan, we could > check that we are about to write a row with currentTransactionId < > (currentTransactionId of row we've read) and fail the query. Currently, > currentTransactionId is not surfaced at higher level where this check can be > made. > This would not work (efficiently) longer term where we want to support fast > update on user defined PK vis streaming ingest. > Also, this would not work with multi statement txns since in that case we'd > lock in the snapshot at the start of the txn, but then 2nd, 3rd etc queries > would use the same snapshot and the locks for these queries would be acquired > after the snapshot is locked in so this would be the same situation as pre > HIVE-11077. > > > A more robust solution (commonly used with MVCC) is to keep track of start > and commit time (logical counter) or each transaction to detect if two txns > overlap. The 2nd part is to keep track of write-set, i.e. which data (rows, > partitions, whatever appropriate level of granularity is) were modified by > any txn and if 2 txns overlap in time and wrote the same element, abort later > one. This is called first-committer-wins rule. This requires a MS DB schema > change > It would be most convenient to use the same sequence for txnId, start and > commit time (in which case txnid=start time). In this case we'd need to add > 1 filed to TXNS table. The complication here is that we'll be using elements > of the sequence faster and they are used as part of file name of delta and > base dir and currently limited to 7 digits which can be exceeded. So this > would require some thought to handling upgrade/migration. > Also, write-set tracking requires either additional metastore table or > keeping info in HIVE_LOCKS around longer with new state. > > In the short term, on SQL side of things we could (in auto commit mode only) > acquire the locks first and then open the txn AND update these locks with txn > id. > This implies another Thrift change to pass in lockId to openTxn. > The same would not work for Streaming API since it opens several txns at once > and then acquires locks for each. > (Not sure if that's is an issue or not since Streaming only does Insert). > Either way this feels hacky. > > Here is one simple example why we need Write-Set tracking for multi-statement > txns > Consider transactions T ~1~ and T ~2~: > T ~1~: r ~1~\[x] -> w ~1~\[y] -> c ~1~ > T ~2~: w ~2~\[x] -> w ~2~\[y] -> c ~2~ > Suppose the order of operations
[jira] [Updated] (HIVE-13622) WriteSet tracking optimizations
[ https://issues.apache.org/jira/browse/HIVE-13622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-13622: -- Description: HIVE-13395 solves the the lost update problem with some inefficiencies. 1. TxhHandler.OperationType is currently derived from LockType. This doesn't distinguish between Update and Delete but would be useful. See comments in TxnHandler. Should be able to pass in Insert/Update/Delete info from client into TxnHandler. 2. TxnHandler.addDynamicPartitions() should know the OperationType as well from the client. It currently extrapolates it from TXN_COMPONENTS. This works but requires extra SQL statements and is thus less performant. It will not work multi-stmt txns. See comments in the code. 3. TxnHandler.checkLock() see more comments around "isPartOfDynamicPartitionInsert". If TxnHandler knew whether it is being called as part of an op running with dynamic partitions, it could be more efficient. In that case we don't have to write to TXN_COMPONENTS at all during lock acquisition. Conversely, if not running with DynPart then, we can kill current txn on lock grant rather than wait until commit time. All of these require some Thrift changes was: HIVE-13395 solves the the lost update problem with some inefficiencies. 1. TxhHandler.OperationType is currently derived from LockType. This doesn't distinguish between Update and Delete but would be useful. See comments in TxnHandler. Should be able to pass in Insert/Update/Delete info from client into TxnHandler. 2. TxnHandler.addDynamicPartitions() should know the OperationType as well from the client. It currently extrapolates it from TXN_COMPONENTS. This works but requires extra SQL statements and is thus less performant. It will not work multi-stmt txns. See comments in the code. 3. TxnHandler.checkLock() see more comments around "isPartOfDynamicPartitionInsert". If TxnHandler knew whether it is being called as part of an op running with dynamic partitions, it could be more efficient. In that case we don't have to write to TXN_COMPONENTS at all during lock acquisition. Conversely, if not running with DynPart then, we can kill current txn on lock grant rather than wait until commit time. > WriteSet tracking optimizations > --- > > Key: HIVE-13622 > URL: https://issues.apache.org/jira/browse/HIVE-13622 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > HIVE-13395 solves the the lost update problem with some inefficiencies. > 1. TxhHandler.OperationType is currently derived from LockType. This doesn't > distinguish between Update and Delete but would be useful. See comments in > TxnHandler. Should be able to pass in Insert/Update/Delete info from client > into TxnHandler. > 2. TxnHandler.addDynamicPartitions() should know the OperationType as well > from the client. It currently extrapolates it from TXN_COMPONENTS. This > works but requires extra SQL statements and is thus less performant. It will > not work multi-stmt txns. See comments in the code. > 3. TxnHandler.checkLock() see more comments around > "isPartOfDynamicPartitionInsert". If TxnHandler knew whether it is being > called as part of an op running with dynamic partitions, it could be more > efficient. In that case we don't have to write to TXN_COMPONENTS at all > during lock acquisition. Conversely, if not running with DynPart then, we > can kill current txn on lock grant rather than wait until commit time. > All of these require some Thrift changes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13621) compute stats in certain cases fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-13621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-13621: -- Attachment: HIVE-13621.1.patch > compute stats in certain cases fails with NPE > - > > Key: HIVE-13621 > URL: https://issues.apache.org/jira/browse/HIVE-13621 > Project: Hive > Issue Type: Bug > Components: HBase Metastore, Metastore >Affects Versions: 2.1.0, 2.0.1 >Reporter: Vikram Dixit K >Assignee: Vikram Dixit K > Attachments: HIVE-13621.1.patch > > > {code} > FAILED: NullPointerException null > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatistics(StatsUtils.java:693) > at > org.apache.hadoop.hive.ql.stats.StatsUtils.convertColStats(StatsUtils.java:739) > at > org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:728) > at > org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:183) > at > org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:136) > at > org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:124){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13621) compute stats in certain cases fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-13621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-13621: -- Status: Patch Available (was: Open) > compute stats in certain cases fails with NPE > - > > Key: HIVE-13621 > URL: https://issues.apache.org/jira/browse/HIVE-13621 > Project: Hive > Issue Type: Bug > Components: HBase Metastore, Metastore >Affects Versions: 2.1.0, 2.0.1 >Reporter: Vikram Dixit K >Assignee: Vikram Dixit K > Attachments: HIVE-13621.1.patch > > > {code} > FAILED: NullPointerException null > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatistics(StatsUtils.java:693) > at > org.apache.hadoop.hive.ql.stats.StatsUtils.convertColStats(StatsUtils.java:739) > at > org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:728) > at > org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:183) > at > org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:136) > at > org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:124){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13620) Merge llap branch work to master
[ https://issues.apache.org/jira/browse/HIVE-13620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-13620: -- Attachment: llap_master_diff.txt Attaching diff of llap branch with master. > Merge llap branch work to master > > > Key: HIVE-13620 > URL: https://issues.apache.org/jira/browse/HIVE-13620 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: llap_master_diff.txt > > > Would like to try to merge the llap branch work for HIVE-12991 into the > master branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13563) Hive Streaming does not honor orc.compress.size and orc.stripe.size table properties
[ https://issues.apache.org/jira/browse/HIVE-13563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259235#comment-15259235 ] Wei Zheng commented on HIVE-13563: -- ping [~owen.omalley].. > Hive Streaming does not honor orc.compress.size and orc.stripe.size table > properties > > > Key: HIVE-13563 > URL: https://issues.apache.org/jira/browse/HIVE-13563 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Labels: TODOC2.1 > Attachments: HIVE-13563.1.patch > > > According to the doc: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-HiveQLSyntax > One should be able to specify tblproperties for many ORC options. > But the settings for orc.compress.size and orc.stripe.size don't take effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13568) Add UDFs to support column-masking
[ https://issues.apache.org/jira/browse/HIVE-13568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259229#comment-15259229 ] Gunther Hagleitner commented on HIVE-13568: --- Some more comments on RB. Getting closer. I've looked through the test failures above. They look unrelated. > Add UDFs to support column-masking > -- > > Key: HIVE-13568 > URL: https://issues.apache.org/jira/browse/HIVE-13568 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Madhan Neethiraj >Assignee: Madhan Neethiraj > Attachments: HIVE-13568.1.patch, HIVE-13568.1.patch, > HIVE-13568.2.patch > > > HIVE-13125 added support to provide column-masking and row-filtering during > select via HiveAuthorizer interface. This JIRA is track addition of UDFs that > can be used by HiveAuthorizer implementations to mask column values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC
[ https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-9660: --- Attachment: HIVE-9660.11.patch Rebased again, removed the writer version (which was also the cause of some test failures) > store end offset of compressed data for RG in RowIndex in ORC > - > > Key: HIVE-9660 > URL: https://issues.apache.org/jira/browse/HIVE-9660 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, > HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, > HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, > HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch, > HIVE-9660.10.patch, HIVE-9660.11.patch, HIVE-9660.patch, HIVE-9660.patch > > > Right now the end offset is estimated, which in some cases results in tons of > extra data being read. > We can add a separate array to RowIndex (positions_v2?) that stores number of > compressed buffers for each RG, or end offset, or something, to remove this > estimation magic -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12878) Support Vectorization for TEXTFILE and other formats
[ https://issues.apache.org/jira/browse/HIVE-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259218#comment-15259218 ] Sergey Shelukhin commented on HIVE-12878: - Left some more comments on RB > Support Vectorization for TEXTFILE and other formats > > > Key: HIVE-12878 > URL: https://issues.apache.org/jira/browse/HIVE-12878 > Project: Hive > Issue Type: New Feature > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12878.01.patch, HIVE-12878.02.patch, > HIVE-12878.03.patch, HIVE-12878.04.patch, HIVE-12878.05.patch, > HIVE-12878.06.patch, HIVE-12878.07.patch, HIVE-12878.08.patch, > HIVE-12878.09.patch > > > Support vectorizing when the input format is TEXTFILE and other formats for > better Map Vertex performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13560) Adding Omid as connection manager for HBase Metastore
[ https://issues.apache.org/jira/browse/HIVE-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259212#comment-15259212 ] Alan Gates commented on HIVE-13560: --- In conf/tez/hive-site.xml did you mean to stomp the hive.orc.splits.ms.footer.cache.enabled value? Why take out the option to test Tephra, since we haven't taken out the Tephra connector? HBaseStore.java, line 451, why did you change the catch from IOException to Exception? I can't see any other changes in the code that should require this. I don't understand why you removed the transactions from getPartitionsByExpr. I think removing the transactions around the get/putFileMetadata is fine, but we should explicitly comment that these operations are outside of the transactions and why. It's not clear to me that you need OmidHBaseConnection.transaction to be a thread local variable. HBaseReadWrite is already a thread local in HBaseStore, so you should be guaranteed that there's an OmidHBaseConnection per thread. > Adding Omid as connection manager for HBase Metastore > - > > Key: HIVE-13560 > URL: https://issues.apache.org/jira/browse/HIVE-13560 > Project: Hive > Issue Type: Improvement > Components: HBase Metastore >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-13560.1.patch, HIVE-13560.2.patch, > HIVE-13560.3.patch > > > Adding Omid as a transaction manager to HBase Metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10176) skip.header.line.count causes values to be skipped when performing insert values
[ https://issues.apache.org/jira/browse/HIVE-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259209#comment-15259209 ] Hive QA commented on HIVE-10176: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12800836/HIVE-10176.14.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/91/testReport Console output: http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/91/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-91/ Messages: {noformat} This message was trimmed, see log for full details [INFO] [INFO] [INFO] Building Spark Remote Client 2.1.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ spark-client --- [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/spark-client/target [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/spark-client (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ spark-client --- [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ spark-client --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ spark-client --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-github-source-source/spark-client/src/main/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ spark-client --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ spark-client --- [INFO] Compiling 28 source files to /data/hive-ptest/working/apache-github-source-source/spark-client/target/classes [WARNING] /data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java: /data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java uses or overrides a deprecated API. [WARNING] /data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java: Recompile with -Xlint:deprecation for details. [WARNING] /data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java: Some input files use unchecked or unsafe operations. [WARNING] /data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java: Recompile with -Xlint:unchecked for details. [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ spark-client --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Copying 1 resource [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ spark-client --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/spark-client/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp/conf [copy] Copying 15 files to /data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ spark-client --- [INFO] Compiling 5 source files to /data/hive-ptest/working/apache-github-source-source/spark-client/target/test-classes [INFO] [INFO] --- maven-dependency-plugin:2.8:copy (copy-guava-14) @ spark-client --- [INFO] Configured Artifact: com.google.guava:guava:14.0.1:jar [INFO] Copying guava-14.0.1.jar to /data/hive-ptest/working/apache-github-source-source/spark-client/target/dependency/guava-14.0.1.jar [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ spark-client --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ spark-client --- [INFO] Building jar: /data/hive-ptest/working/apache-github-source-source/spark-client/target/spark-client-2.1.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ spark-client --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ spark-client ---
[jira] [Commented] (HIVE-13421) Propagate job progress in operation status
[ https://issues.apache.org/jira/browse/HIVE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259206#comment-15259206 ] Hive QA commented on HIVE-13421: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12800477/HIVE-13421.05.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/89/testReport Console output: http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/89/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-89/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-MASTER-Build-89/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin >From https://github.com/apache/hive af4766d..a3502d0 branch-2.0 -> origin/branch-2.0 1548501..815499a master -> origin/master + git reset --hard HEAD HEAD is now at 1548501 HIVE-13541: Pass view's ColumnAccessInfo to HiveAuthorizer (Pengcheng Xiong, reviewed by Ashutosh Chauhan) + git clean -f -d + git checkout master Already on 'master' Your branch is behind 'origin/master' by 3 commits, and can be fast-forwarded. + git reset --hard origin/master HEAD is now at 815499a HIVE-13585: Add counter metric for direct sql failures (Mohit Sabharwal, reviewed by Aihua Xu, Sergey Shelukhin) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12800477 - PreCommit-HIVE-MASTER-Build > Propagate job progress in operation status > -- > > Key: HIVE-13421 > URL: https://issues.apache.org/jira/browse/HIVE-13421 > Project: Hive > Issue Type: Improvement >Reporter: Rajat Khandelwal >Assignee: Rajat Khandelwal > Fix For: 2.1.0 > > Attachments: HIVE-13421.01.patch, HIVE-13421.02.patch, > HIVE-13421.03.patch, HIVE-13421.04.patch, HIVE-13421.05.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13603) Fix ptest unit tests broken by HIVE13505
[ https://issues.apache.org/jira/browse/HIVE-13603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259204#comment-15259204 ] Hive QA commented on HIVE-13603: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12800459/HIVE-13603.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 81 failed/errored test(s), 9953 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nomore_ambiguous_table_col org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regexp_extract org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge2 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge9 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge_diff_fs org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join2 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join3 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join4 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join5 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern3 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern4 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_nonkey_groupby org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_selectDistinctStarNeg_2 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_subquery_shared_alias org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udtf_not_supported1 org.apache.hadoop.hive.llap.daemon.impl.comparator.TestShortestJobFirstComparator.testWaitQueueComparatorWithinDagPriority org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testForcedLocalityPreemption org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote.org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote org.apache.hadoop.hive.metastore.TestFilterHooks.org.apache.hadoop.hive.metastore.TestFilterHooks org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener org.apache.hadoop.hive.metastore.TestMetaStoreEventListenerOnlyOnCommit.testEventStatus org.apache.hadoop.hive.metastore.TestMetaStoreInitListener.testMetaStoreInitListener org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.org.apache.hadoop.hive.metastore.TestMetaStoreMetrics org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithUnicode org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithValidPartVal org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithCommas org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithUnicode org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithValidCharacters org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler org.apache.hadoop.hive.ql.security.TestClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestExtendedAcls.org.apache.hadoop.hive.ql.security.TestExtendedAcls org.apache.hadoop.hive.ql.security.TestFolderPermissions.org.apache.hadoop.hive.ql.security.TestFolderPermissions org.apache.hadoop.hive.ql.security.TestMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener.org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener
[jira] [Commented] (HIVE-13615) nomore_ambiguous_table_col.q is failing on master
[ https://issues.apache.org/jira/browse/HIVE-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259199#comment-15259199 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-13615: -- I think this is a much generic issue which got exposed by the recent parser changes in HIVE-13290. In general, if we use any of the nonreserved key words in IdentifiersParser.g as an alias in the statements like below, we will run into an error: {code} FROM src rely INSERT OVERWRITE TABLE ambiguous SELECT rely.key, rely.value WHERE rely.value < 'val_100'; FROM src key INSERT OVERWRITE TABLE ambiguous SELECT key.key, key.value WHERE key.value < 'val_100'; FROM src uri INSERT OVERWRITE TABLE ambiguous SELECT uri.key, uri.value WHERE uri.value < 'val_100'; {code} > nomore_ambiguous_table_col.q is failing on master > - > > Key: HIVE-13615 > URL: https://issues.apache.org/jira/browse/HIVE-13615 > Project: Hive > Issue Type: Test > Components: Parser >Affects Versions: 2.1.0 >Reporter: Ashutosh Chauhan > > Fails with: > FAILED: ParseException line 3:9 cannot recognize input near 'src' 'key' > 'INSERT' in from source 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13585) Add counter metric for direct sql failures
[ https://issues.apache.org/jira/browse/HIVE-13585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-13585: Resolution: Fixed Fix Version/s: 2.1.0 Status: Resolved (was: Patch Available) Committed to master. Thanks Mohit for the work. > Add counter metric for direct sql failures > -- > > Key: HIVE-13585 > URL: https://issues.apache.org/jira/browse/HIVE-13585 > Project: Hive > Issue Type: Bug >Reporter: Mohit Sabharwal >Assignee: Mohit Sabharwal > Fix For: 2.1.0 > > Attachments: HIVE-13585.patch > > > In case of direct sql failure, metastore query falls back to DataNucleus. > It'd be good to record how often this happens as a metrics counter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12887) Handle ORC schema on read with fewer columns than file schema (after Schema Evolution changes)
[ https://issues.apache.org/jira/browse/HIVE-12887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-12887: Fix Version/s: 2.0.1 > Handle ORC schema on read with fewer columns than file schema (after Schema > Evolution changes) > -- > > Key: HIVE-12887 > URL: https://issues.apache.org/jira/browse/HIVE-12887 > Project: Hive > Issue Type: Bug > Components: ORC >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 1.3.0, 2.1.0, 2.0.1 > > Attachments: HIVE-12887.01.patch, HIVE-12887.02.patch > > > Exception caused by reading after column removal. > {code} > Caused by: java.lang.IndexOutOfBoundsException: Index: 10, Size: 10 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) > at java.util.ArrayList.get(ArrayList.java:429) > at java.util.Collections$UnmodifiableList.get(Collections.java:1309) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240) > at > org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:2053) > at > org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2481) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:216) > at > org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:598) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:179) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.(OrcRawRecordMerger.java:222) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:442) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1285) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1165) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13619) Bucket map join plan is incorrect
[ https://issues.apache.org/jira/browse/HIVE-13619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259174#comment-15259174 ] Vikram Dixit K commented on HIVE-13619: --- Yes. The method is named findSingleUpstreamOperatorJoinAccounted. We are expecting only one instance of the operator type to be returned. It is not trying to find one specific operator in the list. > Bucket map join plan is incorrect > - > > Key: HIVE-13619 > URL: https://issues.apache.org/jira/browse/HIVE-13619 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.0.0, 2.1.0 >Reporter: Vikram Dixit K >Assignee: Vikram Dixit K > Attachments: HIVE-13619.1.patch > > > Same as HIVE-12992. Missed a single line check. TPCDS query 4 with bucketing > can produce this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12887) Handle ORC schema on read with fewer columns than file schema (after Schema Evolution changes)
[ https://issues.apache.org/jira/browse/HIVE-12887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259168#comment-15259168 ] Matt McCline commented on HIVE-12887: - Committed to branch-2.0 > Handle ORC schema on read with fewer columns than file schema (after Schema > Evolution changes) > -- > > Key: HIVE-12887 > URL: https://issues.apache.org/jira/browse/HIVE-12887 > Project: Hive > Issue Type: Bug > Components: ORC >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 1.3.0, 2.1.0, 2.0.1 > > Attachments: HIVE-12887.01.patch, HIVE-12887.02.patch > > > Exception caused by reading after column removal. > {code} > Caused by: java.lang.IndexOutOfBoundsException: Index: 10, Size: 10 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) > at java.util.ArrayList.get(ArrayList.java:429) > at java.util.Collections$UnmodifiableList.get(Collections.java:1309) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240) > at > org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:2053) > at > org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2481) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:216) > at > org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:598) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:179) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.(OrcRawRecordMerger.java:222) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:442) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1285) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1165) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13619) Bucket map join plan is incorrect
[ https://issues.apache.org/jira/browse/HIVE-13619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259158#comment-15259158 ] Sergey Shelukhin commented on HIVE-13619: - The method expecting a single thing is ok with just taking the first of many things? > Bucket map join plan is incorrect > - > > Key: HIVE-13619 > URL: https://issues.apache.org/jira/browse/HIVE-13619 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.0.0, 2.1.0 >Reporter: Vikram Dixit K >Assignee: Vikram Dixit K > Attachments: HIVE-13619.1.patch > > > Same as HIVE-12992. Missed a single line check. TPCDS query 4 with bucketing > can produce this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12799) Always use Schema Evolution for ACID
[ https://issues.apache.org/jira/browse/HIVE-12799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259152#comment-15259152 ] Matt McCline commented on HIVE-12799: - Also committed to branch-2.0 > Always use Schema Evolution for ACID > > > Key: HIVE-12799 > URL: https://issues.apache.org/jira/browse/HIVE-12799 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Labels: TODOC1.3, TODOC2.1 > Fix For: 2.1.0, 2.0.1 > > Attachments: HIVE-12799.01.patch, HIVE-12799.02.patch, > HIVE-12799.03.patch, HIVE-12799.04.patch > > > Always use Schema Evolution for ACID -- ignore hive.exec.schema.evolution > setting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12799) Always use Schema Evolution for ACID
[ https://issues.apache.org/jira/browse/HIVE-12799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-12799: Fix Version/s: 2.0.1 > Always use Schema Evolution for ACID > > > Key: HIVE-12799 > URL: https://issues.apache.org/jira/browse/HIVE-12799 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Labels: TODOC1.3, TODOC2.1 > Fix For: 2.1.0, 2.0.1 > > Attachments: HIVE-12799.01.patch, HIVE-12799.02.patch, > HIVE-12799.03.patch, HIVE-12799.04.patch > > > Always use Schema Evolution for ACID -- ignore hive.exec.schema.evolution > setting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13241) LLAP: Incremental Caching marks some small chunks as "incomplete CB"
[ https://issues.apache.org/jira/browse/HIVE-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259145#comment-15259145 ] Sergey Shelukhin commented on HIVE-13241: - Committed the description change in the addendum patch > LLAP: Incremental Caching marks some small chunks as "incomplete CB" > > > Key: HIVE-13241 > URL: https://issues.apache.org/jira/browse/HIVE-13241 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Sergey Shelukhin > Labels: TODOC2.1 > Fix For: 2.1.0 > > Attachments: HIVE-13241.01.patch, HIVE-13241.patch > > > Run #3 of a query with 1 node still has cache misses. > {code} > LLAP IO Summary > -- > VERTICES ROWGROUPS META_HIT META_MISS DATA_HIT DATA_MISS ALLOCATION > USED TOTAL_IO > -- > Map 111 1116 01.65GB93.61MB 0B >0B32.72s > -- > {code} > {code} > 2016-03-08T21:05:39,417 INFO > [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: > encoded.EncodedReaderImpl > (EncodedReaderImpl.java:prepareRangesForCompressedRead(695)) - Locking > 0x1c44401d(1) due to reuse > 2016-03-08T21:05:39,417 INFO > [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: > encoded.EncodedReaderImpl > (EncodedReaderImpl.java:prepareRangesForCompressedRead(701)) - Adding an > already-uncompressed buffer 0x1c44401d(2) > 2016-03-08T21:05:39,417 INFO > [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: > encoded.EncodedReaderImpl > (EncodedReaderImpl.java:prepareRangesForCompressedRead(695)) - Locking > 0x4e51b032(1) due to reuse > 2016-03-08T21:05:39,417 INFO > [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: > encoded.EncodedReaderImpl > (EncodedReaderImpl.java:prepareRangesForCompressedRead(701)) - Adding an > already-uncompressed buffer 0x4e51b032(2) > 2016-03-08T21:05:39,418 INFO > [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: > encoded.EncodedReaderImpl > (EncodedReaderImpl.java:addOneCompressionBuffer(1161)) - Found CB at 1373931, > chunk length 86587, total 86590, compressed > 2016-03-08T21:05:39,418 INFO > [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: > encoded.EncodedReaderImpl > (EncodedReaderImpl.java:addIncompleteCompressionBuffer(1241)) - Replacing > data range [1373931, 1408408), size: 34474(!) type: direct (and 0 previous > chunks) with incomplete CB start: 1373931 end: 1408408 in the buffers > 2016-03-08T21:05:39,418 INFO > [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: > encoded.EncodedReaderImpl > (EncodedReaderImpl.java:createRgColumnStreamData(441)) - Getting data for > column 7 RG 14 stream DATA at 1460521, 319811 index position 0: compressed > [1626961, 1780332) > {code} > {code} > 2016-03-08T21:05:38,925 INFO > [IO-Elevator-Thread-7[attempt_1455662455106_2688_3_00_01_0]]: > encoded.OrcEncodedDataReader (OrcEncodedDataReader.java:readFileData(878)) - > Disk ranges after disk read (file 5372745, base offset 3): [{start: 18986 > end: 20660 cache buffer: 0x660faf7c(1)}, {start: 20660 end: 35775 cache > buffer: 0x1dcb1d97(1)}, {start: 318852 end: 422353 cache buffer: > 0x6c7f9a05(1)}, {start: 1148616 end: 1262468 cache buffer: 0x196e1d41(1)}, > {start: 1262468 end: 1376342 cache buffer: 0x201255f(1)}, {data range > [1376342, 1410766), size: 34424 type: direct}, {start: 1631359 end: 1714694 > cache buffer: 0x47e3a72d(1)}, {start: 1714694 end: 1785770 cache buffer: > 0x57dca266(1)}, {start: 4975035 end: 5095215 cache buffer: 0x3e3139c9(1)}, > {start: 5095215 end: 5197863 cache buffer: 0x3511c88d(1)}, {start: 7448387 > end: 7572268 cache buffer: 0x6f11dbcd(1)}, {start: 7572268 end: 7696182 cache > buffer: 0x5d6c9bdb(1)}, {data range [7696182, 7710537), size: 14355 type: > direct}, {start: 8235756 end: 8345367 cache buffer: 0x6a241ece(1)}, {start: > 8345367 end: 8455009 cache buffer: 0x51caf6a7(1)}, {data range [8455009, > 8497906), size: 42897 type: direct}, {start: 9035815 end: 9159708 cache > buffer: 0x306480e0(1)}, {start: 9159708 end: 9283629 cache buffer: > 0x9ef7774(1)}, {data range [9283629, 9297965), size: 14336 type: direct}, > {start: 9989884 end: 10113731 cache buffer: 0x43f7cae9(1)}, {start: 10113731 > end: 10237589 cache buffer: 0x458e63fe(1)}, {data range [10237589, 10252034), > size: 14445 type: direct}, {start: 11897896 end: 12021787 cache buffer: > 0x51f9982f(1)}, {start: 12021787 end: 12145656 cache
[jira] [Commented] (HIVE-13142) Make HiveSplitGenerator usable independent of Tez
[ https://issues.apache.org/jira/browse/HIVE-13142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259137#comment-15259137 ] Jason Dere commented on HIVE-13142: --- At least some of this may have been done as part of the cleanup in HIVE-13594. > Make HiveSplitGenerator usable independent of Tez > - > > Key: HIVE-13142 > URL: https://issues.apache.org/jira/browse/HIVE-13142 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth > > Already exists in the branch, but a bunch of cleanup is required. The branch > contains code which makes some fields non-final, and a separate set method > instead of the constructor. Should be simplified so that it can be > constructed independently of Tez -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore
[ https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259139#comment-15259139 ] Sergey Shelukhin commented on HIVE-13596: - RB at https://reviews.apache.org/r/46715/ > HS2 should be able to get UDFs on demand from metastore > --- > > Key: HIVE-13596 > URL: https://issues.apache.org/jira/browse/HIVE-13596 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13596.01.patch, HIVE-13596.02.patch, > HIVE-13596.patch > > > When multiple HS2s are run, creating a permanent fn is only executed on one > of them, and the other HS2s don't get the new function. Unlike say with > tables, where we always get stuff from db on demand, fns are registered at > certain points in the code and if the new one is not registered, it will not > be available. > We should restore the pre-HIVE-2573 behavior of being able to refresh the > UDFs on demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore
[ https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13596: Attachment: HIVE-13596.02.patch Updated the patch to add it to the system registry instead, after some discussion. I changed the synchronized methods to use a lock object without any logic changes wrt locking, and then took the MS call (and resource acquisition) out of the lock to avoid bottlenecking on the system registry. > HS2 should be able to get UDFs on demand from metastore > --- > > Key: HIVE-13596 > URL: https://issues.apache.org/jira/browse/HIVE-13596 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13596.01.patch, HIVE-13596.02.patch, > HIVE-13596.patch > > > When multiple HS2s are run, creating a permanent fn is only executed on one > of them, and the other HS2s don't get the new function. Unlike say with > tables, where we always get stuff from db on demand, fns are registered at > certain points in the code and if the new one is not registered, it will not > be available. > We should restore the pre-HIVE-2573 behavior of being able to refresh the > UDFs on demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-13522) regexp_extract.q hangs on master
[ https://issues.apache.org/jira/browse/HIVE-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HIVE-13522. -- Resolution: Fixed Assignee: Owen O'Malley (was: Ashutosh Chauhan) Fix Version/s: 2.1.0 This got closed as part of HIVE-12159. > regexp_extract.q hangs on master > > > Key: HIVE-13522 > URL: https://issues.apache.org/jira/browse/HIVE-13522 > Project: Hive > Issue Type: Bug > Components: Tests >Reporter: Ashutosh Chauhan >Assignee: Owen O'Malley >Priority: Blocker > Fix For: 2.1.0 > > Attachments: HIVE-13522.patch, jstack_regexp_extract.txt > > > Disable to unblock Hive QA runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13611) add jar causes beeline not to output log messages
[ https://issues.apache.org/jira/browse/HIVE-13611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-13611: --- Description: After adding a jar in beeline warning messages and job log ouptut are no longer shown. This only occurs if you use short connection strings (e.g. jdbc:hive2://). Example below: {code} 0: jdbc:hive2://nightly55-1.gce.cloudera.com:> !connect jdbc:hive2:// Connecting to jdbc:hive2:// Enter username for jdbc:hive2://: hive Enter password for jdbc:hive2://: Connected to: Apache Hive (version 1.1.0-cdh5.5.4) Driver: Hive JDBC (version 1.1.0-cdh5.5.4) Transaction isolation: TRANSACTION_REPEATABLE_READ 1: jdbc:hive2://> select count(*) from sample_07 limit 1; INFO : Number of reduce tasks determined at compile time: 1 INFO : In order to change the average load for a reducer (in bytes): INFO : set hive.exec.reducers.bytes.per.reducer= INFO : In order to limit the maximum number of reducers: INFO : set hive.exec.reducers.max= INFO : In order to set a constant number of reducers: INFO : set mapreduce.job.reduces= INFO : number of splits:1 INFO : Submitting tokens for job: job_1461621650734_0020 INFO : The url to track the job: http://nightly55-1.gce.cloudera.com:8088/proxy/application_1461621650734_0020/ INFO : Starting Job = job_1461621650734_0020, Tracking URL = http://nightly55-1.gce.cloudera.com:8088/proxy/application_1461621650734_0020/ INFO : Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1461621650734_0020 INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 INFO : 2016-04-26 01:36:04,297 Stage-1 map = 0%, reduce = 0% INFO : 2016-04-26 01:36:11,802 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.52 sec INFO : 2016-04-26 01:36:19,419 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.25 sec INFO : MapReduce Total cumulative CPU time: 3 seconds 250 msec INFO : Ended Job = job_1461621650734_0020 +--+--+ | _c0 | +--+--+ | 823 | +--+--+ 1 row selected (25.908 seconds) 1: jdbc:hive2://> add jar hdfs://some_nn.com/tmp/somedir/some_jar.jar 1: jdbc:hive2://> ; converting to local hdfs://some_nn.com/tmp/somedir/some_jar.jar Added [/tmp/93ca63a2-5019-4f37-b9b4-75f1740b53c8_resources/some_jar.jar] to class path Added resources: [hdfs://some_nn.com/tmp/somedir/some_jar.jar] No rows affected (0.179 seconds) 1: jdbc:hive2://> select count(*) from sample_07 limit 1; +--+--+ | _c0 | +--+--+ | 823 | +--+--+ 1: jdbc:hive2://> {code} was: After adding a jar in beeline warning messages and job log ouptut are no longer shown. This only occurs if you use short connection strings (e.g. jdbc:hive2://). Example below: 0: jdbc:hive2://nightly55-1.gce.cloudera.com:> !connect jdbc:hive2:// Connecting to jdbc:hive2:// Enter username for jdbc:hive2://: hive Enter password for jdbc:hive2://: Connected to: Apache Hive (version 1.1.0-cdh5.5.4) Driver: Hive JDBC (version 1.1.0-cdh5.5.4) Transaction isolation: TRANSACTION_REPEATABLE_READ 1: jdbc:hive2://> select count(*) from sample_07 limit 1; INFO : Number of reduce tasks determined at compile time: 1 INFO : In order to change the average load for a reducer (in bytes): INFO : set hive.exec.reducers.bytes.per.reducer= INFO : In order to limit the maximum number of reducers: INFO : set hive.exec.reducers.max= INFO : In order to set a constant number of reducers: INFO : set mapreduce.job.reduces= INFO : number of splits:1 INFO : Submitting tokens for job: job_1461621650734_0020 INFO : The url to track the job: http://nightly55-1.gce.cloudera.com:8088/proxy/application_1461621650734_0020/ INFO : Starting Job = job_1461621650734_0020, Tracking URL = http://nightly55-1.gce.cloudera.com:8088/proxy/application_1461621650734_0020/ INFO : Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1461621650734_0020 INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 INFO : 2016-04-26 01:36:04,297 Stage-1 map = 0%, reduce = 0% INFO : 2016-04-26 01:36:11,802 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.52 sec INFO : 2016-04-26 01:36:19,419 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.25 sec INFO : MapReduce Total cumulative CPU time: 3 seconds 250 msec INFO : Ended Job = job_1461621650734_0020 +--+--+ | _c0 | +--+--+ | 823 | +--+--+ 1 row selected (25.908 seconds) 1: jdbc:hive2://> add jar hdfs://some_nn.com/tmp/somedir/some_jar.jar 1: jdbc:hive2://> ; converting to local hdfs://some_nn.com/tmp/somedir/some_jar.jar Added [/tmp/93ca63a2-5019-4f37-b9b4-75f1740b53c8_resources/some_jar.jar] to class path Added resources: [hdfs://some_nn.com/tmp/somedir/some_jar.jar] No rows affected (0.179 seconds) 1: jdbc:hive2://> select count(*) from sample_07 limit 1; +--+--+ | _c0 | +--+--+ | 823 | +--+--+ 1: jdbc:hive2://> > add jar causes beeline not to output log messages >
[jira] [Updated] (HIVE-13619) Bucket map join plan is incorrect
[ https://issues.apache.org/jira/browse/HIVE-13619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-13619: -- Status: Patch Available (was: Open) > Bucket map join plan is incorrect > - > > Key: HIVE-13619 > URL: https://issues.apache.org/jira/browse/HIVE-13619 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.0.0, 2.1.0 >Reporter: Vikram Dixit K >Assignee: Vikram Dixit K > Attachments: HIVE-13619.1.patch > > > Same as HIVE-12992. Missed a single line check. TPCDS query 4 with bucketing > can produce this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13619) Bucket map join plan is incorrect
[ https://issues.apache.org/jira/browse/HIVE-13619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-13619: -- Attachment: HIVE-13619.1.patch > Bucket map join plan is incorrect > - > > Key: HIVE-13619 > URL: https://issues.apache.org/jira/browse/HIVE-13619 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.0.0, 2.1.0 >Reporter: Vikram Dixit K >Assignee: Vikram Dixit K > Attachments: HIVE-13619.1.patch > > > Same as HIVE-12992. Missed a single line check. TPCDS query 4 with bucketing > can produce this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13619) Bucket map join plan is incorrect
[ https://issues.apache.org/jira/browse/HIVE-13619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-13619: -- Description: Same as HIVE-12992. Missed a single line check. TPCDS query 4 with bucketing can produce this issue. (was: Same as HIVE-12992. Missed a single line check.) > Bucket map join plan is incorrect > - > > Key: HIVE-13619 > URL: https://issues.apache.org/jira/browse/HIVE-13619 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.0.0, 2.1.0 >Reporter: Vikram Dixit K >Assignee: Vikram Dixit K > > Same as HIVE-12992. Missed a single line check. TPCDS query 4 with bucketing > can produce this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13408) Issue appending HIVE_QUERY_ID without checking if the prefix already exists
[ https://issues.apache.org/jira/browse/HIVE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13408: Target Version/s: (was: 2.1.0, 2.0.1) > Issue appending HIVE_QUERY_ID without checking if the prefix already exists > --- > > Key: HIVE-13408 > URL: https://issues.apache.org/jira/browse/HIVE-13408 > Project: Hive > Issue Type: Bug > Components: Shims >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Vikram Dixit K > Attachments: HIVE-13408.1.patch, HIVE-13408.2.patch > > > {code} > We are resetting the hadoop caller context to HIVE_QUERY_ID:HIVE_QUERY_ID: > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13618) Trailing spaces in partition column will be treated differently
[ https://issues.apache.org/jira/browse/HIVE-13618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13618: --- Summary: Trailing spaces in partition column will be treated differently (was: trailing spaces in partition column will be treated differently) > Trailing spaces in partition column will be treated differently > --- > > Key: HIVE-13618 > URL: https://issues.apache.org/jira/browse/HIVE-13618 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > > We store the partition spec value in the metastore. In mysql (and derby i > think), the trailing space is ignored. That is, if you have a partition > column "col" (type varchar or string) with value "a " and then select from > the table where col = "a", it will return. However, in postgres and Oracle, > the trailing space is not ignored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13408) Issue appending HIVE_QUERY_ID without checking if the prefix already exists
[ https://issues.apache.org/jira/browse/HIVE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13408: Resolution: Invalid Status: Resolved (was: Patch Available) This is actually broken by HIVE-12254 (that is not committed yet), according to [~vikram.dixit]; should be included there. > Issue appending HIVE_QUERY_ID without checking if the prefix already exists > --- > > Key: HIVE-13408 > URL: https://issues.apache.org/jira/browse/HIVE-13408 > Project: Hive > Issue Type: Bug > Components: Shims >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Vikram Dixit K > Attachments: HIVE-13408.1.patch, HIVE-13408.2.patch > > > {code} > We are resetting the hadoop caller context to HIVE_QUERY_ID:HIVE_QUERY_ID: > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12254) Improve logging with yarn/hdfs
[ https://issues.apache.org/jira/browse/HIVE-12254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259108#comment-15259108 ] Sergey Shelukhin commented on HIVE-12254: - +1 conditional on also including the fix for HIVE-13408 :) > Improve logging with yarn/hdfs > -- > > Key: HIVE-12254 > URL: https://issues.apache.org/jira/browse/HIVE-12254 > Project: Hive > Issue Type: Bug > Components: Shims >Affects Versions: 1.2.1 >Reporter: Vikram Dixit K >Assignee: Vikram Dixit K > Attachments: HIVE-12254.1.patch, HIVE-12254.2.patch > > > In extension to HIVE-12249, adding info for Yarn/HDFS as well. Both > HIVE-12249 and HDFS-9184 are required (and upgraded in hive for the HDFS > issue) before this can be resolved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13463) Fix ImportSemanticAnalyzer to allow for different src/dst filesystems
[ https://issues.apache.org/jira/browse/HIVE-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13463: Resolution: Fixed Fix Version/s: 2.0.1 2.1.0 Status: Resolved (was: Patch Available) Committed to master and branch-2.0. Thanks for the patch > Fix ImportSemanticAnalyzer to allow for different src/dst filesystems > - > > Key: HIVE-13463 > URL: https://issues.apache.org/jira/browse/HIVE-13463 > Project: Hive > Issue Type: Bug > Components: Import/Export >Affects Versions: 2.0.0 >Reporter: Zach York >Assignee: Zach York > Fix For: 2.1.0, 2.0.1 > > Attachments: HIVE-13463-1.patch, HIVE-13463-2.patch, > HIVE-13463-3.patch, HIVE-13463-4.patch, HIVE-13463.4.patch, HIVE-13463.patch > > > In ImportSemanticAnalyzer, there is an assumption that the src filesystem for > import and the final location are on the same filesystem. Therefore the check > for emptiness and getExternalTmpLocation will be looking on the wrong > filesystem and will cause an error. The output path should be fed into > getExternalTmpLocation to get a temporary file on the correct filesystem. The > check for emptiness should use the output filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13617) LLAP: support non-vectorized execution in IO
[ https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259047#comment-15259047 ] Sergey Shelukhin commented on HIVE-13617: - Yes :) > LLAP: support non-vectorized execution in IO > > > Key: HIVE-13617 > URL: https://issues.apache.org/jira/browse/HIVE-13617 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > Two approaches - a separate decoding path, into rows instead of VRBs; or > decoding VRBs into rows on a higher level (the original LlapInputFormat). I > think the latter might be better - it's not a hugely important path, and perf > in non-vectorized case is not the best anyway, so it's better to make do with > much less new code and architectural disruption. > Some ORC patches in progress introduce an easy to reuse (or so I hope, > anyway) VRB-to-row conversion, so we should just use that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13603) Fix ptest unit tests broken by HIVE13505
[ https://issues.apache.org/jira/browse/HIVE-13603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259030#comment-15259030 ] Siddharth Seth commented on HIVE-13603: --- Oh well. Just so happens that the current test run is this patch. Will wait for it. > Fix ptest unit tests broken by HIVE13505 > > > Key: HIVE-13603 > URL: https://issues.apache.org/jira/browse/HIVE-13603 > Project: Hive > Issue Type: Task >Affects Versions: 2.1.0 >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-13603.01.patch > > > HIVE-13505 broke some unit tests in the ptest2 framework, which need to be > fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13603) Fix ptest unit tests broken by HIVE13505
[ https://issues.apache.org/jira/browse/HIVE-13603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259023#comment-15259023 ] Siddharth Seth commented on HIVE-13603: --- The patch on HIVE-13505 itself works. It's been running on the precommit boxes for a while. Unfortunately I did not run the ptest2 tests along with that patch, and noticed ptest test failures while making other changes. This just updates the template files to remove the verfication of TestDummy. I'll commit this shortly, and remove it from the precommit queue. Thanks for the review. > Fix ptest unit tests broken by HIVE13505 > > > Key: HIVE-13603 > URL: https://issues.apache.org/jira/browse/HIVE-13603 > Project: Hive > Issue Type: Task >Affects Versions: 2.1.0 >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-13603.01.patch > > > HIVE-13505 broke some unit tests in the ptest2 framework, which need to be > fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13617) LLAP: support non-vectorized execution in IO
[ https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258969#comment-15258969 ] Lefty Leverenz commented on HIVE-13617: --- Aha, HIVE-11417 discusses VectorizedRowBatch. Is that it? > LLAP: support non-vectorized execution in IO > > > Key: HIVE-13617 > URL: https://issues.apache.org/jira/browse/HIVE-13617 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > Two approaches - a separate decoding path, into rows instead of VRBs; or > decoding VRBs into rows on a higher level (the original LlapInputFormat). I > think the latter might be better - it's not a hugely important path, and perf > in non-vectorized case is not the best anyway, so it's better to make do with > much less new code and architectural disruption. > Some ORC patches in progress introduce an easy to reuse (or so I hope, > anyway) VRB-to-row conversion, so we should just use that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13354) Add ability to specify Compaction options per table and per request
[ https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-13354: - Summary: Add ability to specify Compaction options per table and per request (was: Add ability to specify Compaction options per table) > Add ability to specify Compaction options per table and per request > --- > > Key: HIVE-13354 > URL: https://issues.apache.org/jira/browse/HIVE-13354 > Project: Hive > Issue Type: Improvement >Affects Versions: 1.3.0, 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Labels: TODOC2.1 > > Currently the are a few options that determine when automatic compaction is > triggered. They are specified once for the warehouse. > This doesn't make sense - some table may be more important and need to be > compacted more often. > We should allow specifying these on per table basis. > Also, compaction is an MR job launched from within the metastore. There is > currently no way to control job parameters (like memory, for example) except > to specify it in hive-site.xml for metastore which means they are site wide. > Should add a way to specify these per table (perhaps even per compaction if > launched via ALTER TABLE) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13617) LLAP: support non-vectorized execution in IO
[ https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258959#comment-15258959 ] Lefty Leverenz commented on HIVE-13617: --- Acronym clarification: What's a VRB when it's at home? Google fun: vanadium redox battery, variable reenlistment bonus, victim row-buffer, vodka red bull, Virginia regional ballet, etc. (VRB-to-ROW is a flight from Vero Beach to Roswell.) > LLAP: support non-vectorized execution in IO > > > Key: HIVE-13617 > URL: https://issues.apache.org/jira/browse/HIVE-13617 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > Two approaches - a separate decoding path, into rows instead of VRBs; or > decoding VRBs into rows on a higher level (the original LlapInputFormat). I > think the latter might be better - it's not a hugely important path, and perf > in non-vectorized case is not the best anyway, so it's better to make do with > much less new code and architectural disruption. > Some ORC patches in progress introduce an easy to reuse (or so I hope, > anyway) VRB-to-row conversion, so we should just use that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11417) Create shims for the row by row read path that is backed by VectorizedRowBatch
[ https://issues.apache.org/jira/browse/HIVE-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-11417: - Attachment: HIVE-11417.patch Fixed the spurious changes that Matt found in the patch. > Create shims for the row by row read path that is backed by VectorizedRowBatch > -- > > Key: HIVE-11417 > URL: https://issues.apache.org/jira/browse/HIVE-11417 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.1.0 > > Attachments: HIVE-11417.patch, HIVE-11417.patch, HIVE-11417.patch, > HIVE-11417.patch > > > I'd like to make the default path for reading and writing ORC files to be > vectorized. To ensure that Hive can still read row by row, we'll need shims > to support the old API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13607) Change website references to HQL/HiveQL to SQL
[ https://issues.apache.org/jira/browse/HIVE-13607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-13607: -- Attachment: HIVE-13607.2.patch Note to self: turn on spell checking in vim Thanks [~leftylev] for catching that. Posting a new patch with driver spelled correctly. > Change website references to HQL/HiveQL to SQL > -- > > Key: HIVE-13607 > URL: https://issues.apache.org/jira/browse/HIVE-13607 > Project: Hive > Issue Type: Improvement > Components: Website >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-13607.2.patch, HIVE-13607.patch > > > When it started Hive's SQL dialect was far enough from standard SQL that the > developers called it HQL or HiveQL. > Over the years Hive's SQL dialect has matured. It still has some oddities > but it is explicitly pushing towards SQL 2011 conformance. Calling the > language anything but SQL now is confusing for users. > In addition to changing the website I propose to make changes in the wiki. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore
[ https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1525#comment-1525 ] Jason Dere commented on HIVE-13596: --- Ok, looking at the code again, I see why checkFunctionClass() is no longer called - there is a separate registerToSessionRegistry() method to add a UDF from the system registry to the session registry. It's not bad to have it in the session registry, since the UDF does eventually need to get added there, though it is a bit different from the UDFs added via reloadFunctions() since those are added to the system registry. > HS2 should be able to get UDFs on demand from metastore > --- > > Key: HIVE-13596 > URL: https://issues.apache.org/jira/browse/HIVE-13596 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13596.01.patch, HIVE-13596.patch > > > When multiple HS2s are run, creating a permanent fn is only executed on one > of them, and the other HS2s don't get the new function. Unlike say with > tables, where we always get stuff from db on demand, fns are registered at > certain points in the code and if the new one is not registered, it will not > be available. > We should restore the pre-HIVE-2573 behavior of being able to refresh the > UDFs on demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12887) Handle ORC schema on read with fewer columns than file schema (after Schema Evolution changes)
[ https://issues.apache.org/jira/browse/HIVE-12887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258887#comment-15258887 ] Sergey Shelukhin commented on HIVE-12887: - [~mmccline] ping?? > Handle ORC schema on read with fewer columns than file schema (after Schema > Evolution changes) > -- > > Key: HIVE-12887 > URL: https://issues.apache.org/jira/browse/HIVE-12887 > Project: Hive > Issue Type: Bug > Components: ORC >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 1.3.0, 2.1.0 > > Attachments: HIVE-12887.01.patch, HIVE-12887.02.patch > > > Exception caused by reading after column removal. > {code} > Caused by: java.lang.IndexOutOfBoundsException: Index: 10, Size: 10 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) > at java.util.ArrayList.get(ArrayList.java:429) > at java.util.Collections$UnmodifiableList.get(Collections.java:1309) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240) > at > org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:2053) > at > org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2481) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:216) > at > org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:598) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:179) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.(OrcRawRecordMerger.java:222) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:442) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1285) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1165) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13617) LLAP: support non-vectorized execution in IO
[ https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13617: Description: Two approaches - a separate decoding path, into rows instead of VRBs; or decoding VRBs into rows on a higher level (the original LlapInputFormat). I think the latter might be better - it's not a hugely important path, and perf in non-vectorized case is not the best anyway, so it's better to make do with much less new code and architectural disruption. Some ORC patches in progress introduce an easy to reuse (or so I hope, anyway) VRB-to-row conversion, so we should just use that. was: Two approaches - a separate decoding path, into rows instead of VRBs; or decoding VRBs into rows. I think the latter might be better - it's not a hugely important path, and perf in non-vectorized case is not the best anyway, so it's better to make do with much less new code and architectural disruption. Some ORC patches in progress introduce an easy to reuse (or so I hope, anyway) VRB-to-row conversion, so we should just use that. > LLAP: support non-vectorized execution in IO > > > Key: HIVE-13617 > URL: https://issues.apache.org/jira/browse/HIVE-13617 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > Two approaches - a separate decoding path, into rows instead of VRBs; or > decoding VRBs into rows on a higher level (the original LlapInputFormat). I > think the latter might be better - it's not a hugely important path, and perf > in non-vectorized case is not the best anyway, so it's better to make do with > much less new code and architectural disruption. > Some ORC patches in progress introduce an easy to reuse (or so I hope, > anyway) VRB-to-row conversion, so we should just use that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13617) LLAP: support non-vectorized execution in IO
[ https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258879#comment-15258879 ] Sergey Shelukhin edited comment on HIVE-13617 at 4/26/16 8:42 PM: -- [~prasanth_j] [~hagleitn] fyi [~owen.omalley] what is the final JIRA that adds the vrb-to-row conversion? We would like to reuse that code after it goes in. Is it HIVE-11417? was (Author: sershe): [~prasanth_j] [~hagleitn] fyi [~owen.omalley] what is the final JIRA that adds the vrb-to-row conversion? We would like to reuse that code after it goes in. Is it hive-11417? > LLAP: support non-vectorized execution in IO > > > Key: HIVE-13617 > URL: https://issues.apache.org/jira/browse/HIVE-13617 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > Two approaches - a separate decoding path, into rows instead of VRBs; or > decoding VRBs into rows. I think the latter might be better - it's not a > hugely important path, and perf in non-vectorized case is not the best > anyway, so it's better to make do with much less new code and architectural > disruption. > Some ORC patches in progress introduce an easy to reuse (or so I hope, > anyway) VRB-to-row conversion, so we should just use that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13617) LLAP: support non-vectorized execution in IO
[ https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258879#comment-15258879 ] Sergey Shelukhin edited comment on HIVE-13617 at 4/26/16 8:42 PM: -- [~prasanth_j] [~hagleitn] fyi [~owen.omalley] what is the final JIRA that adds the vrb-to-row conversion? We would like to reuse that code after it goes in. Is it hive-11417? was (Author: sershe): [~prasanth_j] [~hagleitn] fyi [~owen.omalley] what is the final JIRA that adds the vrb-to-row conversion? We would like to reuse that code after it goes in > LLAP: support non-vectorized execution in IO > > > Key: HIVE-13617 > URL: https://issues.apache.org/jira/browse/HIVE-13617 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > Two approaches - a separate decoding path, into rows instead of VRBs; or > decoding VRBs into rows. I think the latter might be better - it's not a > hugely important path, and perf in non-vectorized case is not the best > anyway, so it's better to make do with much less new code and architectural > disruption. > Some ORC patches in progress introduce an easy to reuse (or so I hope, > anyway) VRB-to-row conversion, so we should just use that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13617) LLAP: support non-vectorized execution in IO
[ https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258879#comment-15258879 ] Sergey Shelukhin commented on HIVE-13617: - [~prasanth_j] [~hagleitn] fyi [~owen.omalley] what is the final JIRA that adds the vrb-to-row conversion? We would like to reuse that code after it goes in > LLAP: support non-vectorized execution in IO > > > Key: HIVE-13617 > URL: https://issues.apache.org/jira/browse/HIVE-13617 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > Two approaches - a separate decoding path, into rows instead of VRBs; or > decoding VRBs into rows. I think the latter might be better - it's not a > hugely important path, and perf in non-vectorized case is not the best > anyway, so it's better to make do with much less new code and architectural > disruption. > Some ORC patches in progress introduce an easy to reuse (or so I hope, > anyway) VRB-to-row conversion, so we should just use that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11417) Create shims for the row by row read path that is backed by VectorizedRowBatch
[ https://issues.apache.org/jira/browse/HIVE-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-11417: - Attachment: HIVE-11417.patch This is the same patch, generated without the -C parameter that causes git to find file moves. > Create shims for the row by row read path that is backed by VectorizedRowBatch > -- > > Key: HIVE-11417 > URL: https://issues.apache.org/jira/browse/HIVE-11417 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.1.0 > > Attachments: HIVE-11417.patch, HIVE-11417.patch, HIVE-11417.patch > > > I'd like to make the default path for reading and writing ORC files to be > vectorized. To ensure that Hive can still read row by row, we'll need shims > to support the old API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13609) Fix UDTFs to allow local fetch task to fetch rows forwarded by GenericUDTF.close()
[ https://issues.apache.org/jira/browse/HIVE-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258842#comment-15258842 ] Ashutosh Chauhan commented on HIVE-13609: - +1 LGTM pending tests > Fix UDTFs to allow local fetch task to fetch rows forwarded by > GenericUDTF.close() > -- > > Key: HIVE-13609 > URL: https://issues.apache.org/jira/browse/HIVE-13609 > Project: Hive > Issue Type: Bug >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-13609.1.patch > > > From [~ashutoshc]'s comments in HIVE-13586, attempt to fix whatever is > causing the local fetch task to not get the rows forwarded by UDTF close(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13509) HCatalog getSplits should ignore the partition with invalid path
[ https://issues.apache.org/jira/browse/HIVE-13509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258780#comment-15258780 ] Mithun Radhakrishnan commented on HIVE-13509: - Yes, sir. +1. > HCatalog getSplits should ignore the partition with invalid path > > > Key: HIVE-13509 > URL: https://issues.apache.org/jira/browse/HIVE-13509 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-13509.1.patch, HIVE-13509.2.patch, HIVE-13509.patch > > > It is quite common that there is the discrepancy between partition directory > and its HMS metadata, simply because the directory could be added/deleted > externally using hdfs shell command. Technically it should be fixed by MSCK > and alter table .. add/drop command etc, but sometimes it might not be > practical especially in a multi-tenant env. This discrepancy does not cause > any problem to Hive, Hive returns no rows for a partition with an invalid > (e.g. non-existing) path, but it fails the Pig load with HCatLoader, because > the HCatBaseInputFormat getSplits throws an error when getting a split for a > non-existing path. The error message might looks like: > {code} > Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does > not exist: > hdfs://xyz.com:8020/user/hive/warehouse/xyz/date=2016-01-01/country=BR > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) > at > org.apache.hive.hcatalog.mapreduce.HCatBaseInputFormat.getSplits(HCatBaseInputFormat.java:162) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13609) Fix UDTFs to allow local fetch task to fetch rows forwarded by GenericUDTF.close()
[ https://issues.apache.org/jira/browse/HIVE-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258755#comment-15258755 ] Ashutosh Chauhan commented on HIVE-13609: - [~jdere] Can you create a RB for this? > Fix UDTFs to allow local fetch task to fetch rows forwarded by > GenericUDTF.close() > -- > > Key: HIVE-13609 > URL: https://issues.apache.org/jira/browse/HIVE-13609 > Project: Hive > Issue Type: Bug >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-13609.1.patch > > > From [~ashutoshc]'s comments in HIVE-13586, attempt to fix whatever is > causing the local fetch task to not get the rows forwarded by UDTF close(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles
[ https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258743#comment-15258743 ] Ashutosh Chauhan commented on HIVE-13572: - Can you create a RB for this? Also I see changes related to sessionstate and in shims. Was there any bug you encountered there? > Redundant setting full file status in Hive::copyFiles > - > > Key: HIVE-13572 > URL: https://issues.apache.org/jira/browse/HIVE-13572 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch > > > We set full file status in each copy-file thread. I think it's redundant and > hurts performance when we have multiple files to copy. > {code} > if (inheritPerms) { > ShimLoader.getHadoopShims().setFullFileStatus(conf, > fullDestStatus, destFs, destf); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13510) Dynamic partitioning doesn’t work when remote metastore is used
[ https://issues.apache.org/jira/browse/HIVE-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Illya Yalovyy updated HIVE-13510: - Attachment: HIVE-13510.2.patch It seems like my patch was never picked up by jenkins, so I'm uploading it again. > Dynamic partitioning doesn’t work when remote metastore is used > --- > > Key: HIVE-13510 > URL: https://issues.apache.org/jira/browse/HIVE-13510 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.1.0 > Environment: Hadoop 2.7.1 >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy >Priority: Critical > Attachments: HIVE-13510.1.patch, HIVE-13510.2.patch > > > *Steps to reproduce:* > # Configure remote metastore (hive.metastore.uris) > # Create table t1 (a string); > # Create table t2 (a string) partitioned by (b string); > # set hive.exec.dynamic.partition.mode=nonstrict; > # Insert overwrite table t2 partition (b) select a,a from t1; > *Result:* > {noformat} > FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.thrift.TApplicationException: getMetaConf failed: unknown result > 16/04/13 15:04:51 [c679e424-2501-4347-8146-cf1b1cae217c main]: ERROR > ql.Driver: FAILED: SemanticException > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.thrift.TApplicationException: getMetaConf failed: unknown result > org.apache.hadoop.hive.ql.parse.SemanticException: > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.thrift.TApplicationException: getMetaConf failed: unknown result > at > org.apache.hadoop.hive.ql.plan.DynamicPartitionCtx.(DynamicPartitionCtx.java:84) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6550) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:9315) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9204) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10071) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9949) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10607) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:358) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10618) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:233) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:245) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:476) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:318) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1192) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1287) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1118) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1106) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:236) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:339) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:748) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:721) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:648) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.thrift.TApplicationException: getMetaConf failed: unknown result > at org.apache.hadoop.hive.ql.metadata.Hive.getMetaConf(Hive.java:3493) > at > org.apache.hadoop.hive.ql.plan.DynamicPartitionCtx.(DynamicPartitionCtx.java:82) > ... 29 more > Caused by: org.apache.thrift.TApplicationException: getMetaConf failed: > unknown result > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_getMetaConf(ThriftHiveMetastore.java:666) > at >
[jira] [Updated] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore
[ https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13596: Attachment: HIVE-13596.01.patch > HS2 should be able to get UDFs on demand from metastore > --- > > Key: HIVE-13596 > URL: https://issues.apache.org/jira/browse/HIVE-13596 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13596.01.patch, HIVE-13596.patch > > > When multiple HS2s are run, creating a permanent fn is only executed on one > of them, and the other HS2s don't get the new function. Unlike say with > tables, where we always get stuff from db on demand, fns are registered at > certain points in the code and if the new one is not registered, it will not > be available. > We should restore the pre-HIVE-2573 behavior of being able to refresh the > UDFs on demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore
[ https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258724#comment-15258724 ] Sergey Shelukhin commented on HIVE-13596: - {noformat} Should this be a settable option (as opposed to always on)? And why default to false? {noformat} It's settable per session. Defaulting to false because that was the current behavior for a while. {noformat} Which Registry is performing the UDF lookup, the system registry or the session registry? If it is the system registry, then we may run into HIVE-6672 again. checkFunctionClass() (removed in your patch) was added for this purpose. {noformat} Session registry. I removed the method because it was unused... {noformat} If the functions are being looked up/added to the session registry, then this may not be an issue because every session would need to lookup UDF/load JARs. Actually I see that the permanent UDFs by Hive.reloadFunctions() (at initialize time) are added to the system registry .. I suspect Hive probably has class loading issues if we ever use "RELOAD FUNCTIONS" to pick up new UDFs, since Hive no longer seems to be calling checkFunctionClass(). {noformat} Hmm... should this be done in the system registry then? Does it hurt to have them in the session registry? Also, does checkFunctionClass need to be reinstated here or in a separate JIRA. {noformat} public Registry(boolean isNative, HiveConf conf): conf needs a null check before it's used {noformat} Implied that it's not used for a native registry; I will add an explicit check. > HS2 should be able to get UDFs on demand from metastore > --- > > Key: HIVE-13596 > URL: https://issues.apache.org/jira/browse/HIVE-13596 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13596.patch > > > When multiple HS2s are run, creating a permanent fn is only executed on one > of them, and the other HS2s don't get the new function. Unlike say with > tables, where we always get stuff from db on demand, fns are registered at > certain points in the code and if the new one is not registered, it will not > be available. > We should restore the pre-HIVE-2573 behavior of being able to refresh the > UDFs on demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13568) Add UDFs to support column-masking
[ https://issues.apache.org/jira/browse/HIVE-13568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258723#comment-15258723 ] Hive QA commented on HIVE-13568: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12800706/HIVE-13568.2.patch {color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 48 failed/errored test(s), 9959 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestMiniTezCliDriver-cte_4.q-schema_evol_text_nonvec_mapwork_table.q-vector_groupby_reduce.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nomore_ambiguous_table_col org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regexp_extract org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern3 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern4 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_nonkey_groupby org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_selectDistinctStarNeg_2 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_subquery_shared_alias org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udtf_not_supported1 org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote.org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote org.apache.hadoop.hive.metastore.TestFilterHooks.org.apache.hadoop.hive.metastore.TestFilterHooks org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener org.apache.hadoop.hive.metastore.TestMetaStoreEventListenerOnlyOnCommit.testEventStatus org.apache.hadoop.hive.metastore.TestMetaStoreInitListener.testMetaStoreInitListener org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.org.apache.hadoop.hive.metastore.TestMetaStoreMetrics org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithValidPartVal org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithCommas org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithUnicode org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithValidCharacters org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler org.apache.hadoop.hive.metastore.hbase.TestHBaseImport.org.apache.hadoop.hive.metastore.hbase.TestHBaseImport org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.concurrencyFalse org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testLockTimeout org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testUpdate org.apache.hadoop.hive.ql.security.TestClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestExtendedAcls.org.apache.hadoop.hive.ql.security.TestExtendedAcls org.apache.hadoop.hive.ql.security.TestFolderPermissions.org.apache.hadoop.hive.ql.security.TestFolderPermissions org.apache.hadoop.hive.ql.security.TestMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener.org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccessWithReadOnly org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testSaslWithHiveMetaStore org.apache.hive.hcatalog.listener.TestDbNotificationListener.cleanupNotifs org.apache.hive.hcatalog.listener.TestDbNotificationListener.dropDatabase
[jira] [Commented] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore
[ https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258709#comment-15258709 ] Jason Dere commented on HIVE-13596: --- - Should this be a settable option (as opposed to always on)? And why default to false? - Which Registry is performing the UDF lookup, the system registry or the session registry? If it is the system registry, then we may run into HIVE-6672 again. checkFunctionClass() (removed in your patch) was added for this purpose. If the functions are being looked up/added to the session registry, then this may not be an issue because every session would need to lookup UDF/load JARs. Actually I see that the permanent UDFs by Hive.reloadFunctions() (at initialize time) are added to the system registry .. I suspect Hive probably has class loading issues if we ever use "RELOAD FUNCTIONS" to pick up new UDFs, since Hive no longer seems to be calling checkFunctionClass(). - public Registry(boolean isNative, HiveConf conf): - conf needs a null check before it's used - private FunctionInfo getQualifiedFunctionInfo(): - Wrap if/then with braces - private boolean refreshFunctionInfoFromMetastore(String functionName) - line 629: wrap if/then with braces > HS2 should be able to get UDFs on demand from metastore > --- > > Key: HIVE-13596 > URL: https://issues.apache.org/jira/browse/HIVE-13596 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13596.patch > > > When multiple HS2s are run, creating a permanent fn is only executed on one > of them, and the other HS2s don't get the new function. Unlike say with > tables, where we always get stuff from db on demand, fns are registered at > certain points in the code and if the new one is not registered, it will not > be available. > We should restore the pre-HIVE-2573 behavior of being able to refresh the > UDFs on demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13509) HCatalog getSplits should ignore the partition with invalid path
[ https://issues.apache.org/jira/browse/HIVE-13509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-13509: --- Attachment: HIVE-13509.2.patch Thanks [~mithun] for reviewing the patch. I fixed the #2. Please take a look. > HCatalog getSplits should ignore the partition with invalid path > > > Key: HIVE-13509 > URL: https://issues.apache.org/jira/browse/HIVE-13509 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-13509.1.patch, HIVE-13509.2.patch, HIVE-13509.patch > > > It is quite common that there is the discrepancy between partition directory > and its HMS metadata, simply because the directory could be added/deleted > externally using hdfs shell command. Technically it should be fixed by MSCK > and alter table .. add/drop command etc, but sometimes it might not be > practical especially in a multi-tenant env. This discrepancy does not cause > any problem to Hive, Hive returns no rows for a partition with an invalid > (e.g. non-existing) path, but it fails the Pig load with HCatLoader, because > the HCatBaseInputFormat getSplits throws an error when getting a split for a > non-existing path. The error message might looks like: > {code} > Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does > not exist: > hdfs://xyz.com:8020/user/hive/warehouse/xyz/date=2016-01-01/country=BR > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) > at > org.apache.hive.hcatalog.mapreduce.HCatBaseInputFormat.getSplits(HCatBaseInputFormat.java:162) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13610) Hive exec module won't compile with IBM JDK
[ https://issues.apache.org/jira/browse/HIVE-13610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258667#comment-15258667 ] Sergey Shelukhin edited comment on HIVE-13610 at 4/26/16 6:40 PM: -- DUMP_HEAP_METHOD.invoke needs a null check. Also the exception could be logged in case of error (or at least the message from it). Otherwise, looks good. was (Author: sershe): DUMP_HEAP_METHOD.invoke needs a null check. Otherwise, looks good. > Hive exec module won't compile with IBM JDK > --- > > Key: HIVE-13610 > URL: https://issues.apache.org/jira/browse/HIVE-13610 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0 > Environment: Hive 1.1.0 + IBM JDK 1.7 +ppc64 architecture >Reporter: Pan Yuxuan >Assignee: Pan Yuxuan > Attachments: HIVE-13610.patch > > > org.apache.hadoop.hive.ql.debug.Utils explicitly import > com.sun.management.HotSpotDiagnosticMXBean which is not supported by IBM JDK. > So we can make HotSpotDiagnosticMXBean as runtime but not compile. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13610) Hive exec module won't compile with IBM JDK
[ https://issues.apache.org/jira/browse/HIVE-13610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258667#comment-15258667 ] Sergey Shelukhin commented on HIVE-13610: - DUMP_HEAP_METHOD.invoke needs a null check. Otherwise, looks good. > Hive exec module won't compile with IBM JDK > --- > > Key: HIVE-13610 > URL: https://issues.apache.org/jira/browse/HIVE-13610 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0 > Environment: Hive 1.1.0 + IBM JDK 1.7 +ppc64 architecture >Reporter: Pan Yuxuan >Assignee: Pan Yuxuan > Attachments: HIVE-13610.patch > > > org.apache.hadoop.hive.ql.debug.Utils explicitly import > com.sun.management.HotSpotDiagnosticMXBean which is not supported by IBM JDK. > So we can make HotSpotDiagnosticMXBean as runtime but not compile. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13615) nomore_ambiguous_table_col.q is failing on master
[ https://issues.apache.org/jira/browse/HIVE-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258658#comment-15258658 ] Ashutosh Chauhan commented on HIVE-13615: - In addition to that error message has changed for -ve test cases of =nonkey_groupby.q,subquery_shared_alias.q,clustern3.q,clustern4.q,udtf_not_supported1.q,selectDistinctStarNeg_2.q We are losing line number and char position in error message. We should try to restore previous behavior. > nomore_ambiguous_table_col.q is failing on master > - > > Key: HIVE-13615 > URL: https://issues.apache.org/jira/browse/HIVE-13615 > Project: Hive > Issue Type: Test > Components: Parser >Affects Versions: 2.1.0 >Reporter: Ashutosh Chauhan > > Fails with: > FAILED: ParseException line 3:9 cannot recognize input near 'src' 'key' > 'INSERT' in from source 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore
[ https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13596: Attachment: HIVE-13596.patch This restores some of the functionality, with a config flag, and accounting for other registry changes. [~jdere] can you take a look? > HS2 should be able to get UDFs on demand from metastore > --- > > Key: HIVE-13596 > URL: https://issues.apache.org/jira/browse/HIVE-13596 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > Attachments: HIVE-13596.patch > > > When multiple HS2s are run, creating a permanent fn is only executed on one > of them, and the other HS2s don't get the new function. Unlike say with > tables, where we always get stuff from db on demand, fns are registered at > certain points in the code and if the new one is not registered, it will not > be available. > We should restore the pre-HIVE-2573 behavior of being able to refresh the > UDFs on demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore
[ https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13596: Assignee: Sergey Shelukhin Status: Patch Available (was: Open) > HS2 should be able to get UDFs on demand from metastore > --- > > Key: HIVE-13596 > URL: https://issues.apache.org/jira/browse/HIVE-13596 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13596.patch > > > When multiple HS2s are run, creating a permanent fn is only executed on one > of them, and the other HS2s don't get the new function. Unlike say with > tables, where we always get stuff from db on demand, fns are registered at > certain points in the code and if the new one is not registered, it will not > be available. > We should restore the pre-HIVE-2573 behavior of being able to refresh the > UDFs on demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11417) Create shims for the row by row read path that is backed by VectorizedRowBatch
[ https://issues.apache.org/jira/browse/HIVE-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-11417: - Attachment: HIVE-11417.patch This patch moves the ReaderImpl and RecordReaderImpl to the orc module after removing the row by row API. The row by row API is emulated in ql with child classes. > Create shims for the row by row read path that is backed by VectorizedRowBatch > -- > > Key: HIVE-11417 > URL: https://issues.apache.org/jira/browse/HIVE-11417 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.1.0 > > Attachments: HIVE-11417.patch, HIVE-11417.patch > > > I'd like to make the default path for reading and writing ORC files to be > vectorized. To ensure that Hive can still read row by row, we'll need shims > to support the old API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore
[ https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13596: Description: When multiple HS2s are run, creating a permanent fn is only executed on one of them, and the other HS2s don't get the new function. Unlike say with tables, where we always get stuff from db on demand, fns are registered at certain points in the code and if the new one is not registered, it will not be available. We should restore the pre-HIVE-2573 behavior of being able to refresh the UDFs on demand. was: When multiple HS2s are run, creating a permanent fn is only executed on one of them, and the other HS2s don't get the new function. Unlike say with tables, where we always get stuff from db on demand, fns are registered at certain points in the code and if the new one is not registered, it will not be available. We could change the code to refresh the udf by name if it's missing, similar to getting a table or whatever; or we could refresh UDFs when a session is started in multi-HS2 case, or at some other convenient point. > HS2 should be able to get UDFs on demand from metastore > --- > > Key: HIVE-13596 > URL: https://issues.apache.org/jira/browse/HIVE-13596 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > When multiple HS2s are run, creating a permanent fn is only executed on one > of them, and the other HS2s don't get the new function. Unlike say with > tables, where we always get stuff from db on demand, fns are registered at > certain points in the code and if the new one is not registered, it will not > be available. > We should restore the pre-HIVE-2573 behavior of being able to refresh the > UDFs on demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore
[ https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13596: Summary: HS2 should be able to get UDFs on demand from metastore (was: HS2 should refresh UDFs more frequently(?), at least in multi-HS2 case) > HS2 should be able to get UDFs on demand from metastore > --- > > Key: HIVE-13596 > URL: https://issues.apache.org/jira/browse/HIVE-13596 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > When multiple HS2s are run, creating a permanent fn is only executed on one > of them, and the other HS2s don't get the new function. Unlike say with > tables, where we always get stuff from db on demand, fns are registered at > certain points in the code and if the new one is not registered, it will not > be available. > We could change the code to refresh the udf by name if it's missing, similar > to getting a table or whatever; or we could refresh UDFs when a session is > started in multi-HS2 case, or at some other convenient point. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10176) skip.header.line.count causes values to be skipped when performing insert values
[ https://issues.apache.org/jira/browse/HIVE-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladyslav Pavlenko updated HIVE-10176: -- Attachment: HIVE-10176.14.patch Ok, I do what you ask. About question from RB, I found related ticked: https://issues.apache.org/jira/browse/HIVE-11996. Table doesn't have properties "line.delimiter" or equals. And I don't found workaround for it problem. > skip.header.line.count causes values to be skipped when performing insert > values > > > Key: HIVE-10176 > URL: https://issues.apache.org/jira/browse/HIVE-10176 > Project: Hive > Issue Type: Bug >Affects Versions: 1.0.0, 1.2.1 >Reporter: Wenbo Wang >Assignee: Vladyslav Pavlenko > Fix For: 2.0.0 > > Attachments: HIVE-10176.1.patch, HIVE-10176.10.patch, > HIVE-10176.11.patch, HIVE-10176.12.patch, HIVE-10176.13.patch, > HIVE-10176.14.patch, HIVE-10176.2.patch, HIVE-10176.3.patch, > HIVE-10176.4.patch, HIVE-10176.5.patch, HIVE-10176.6.patch, > HIVE-10176.7.patch, HIVE-10176.8.patch, HIVE-10176.9.patch, data > > > When inserting values in to tables with TBLPROPERTIES > ("skip.header.line.count"="1") the first value listed is also skipped. > create table test (row int, name string) TBLPROPERTIES > ("skip.header.line.count"="1"); > load data local inpath '/root/data' into table test; > insert into table test values (1, 'a'), (2, 'b'), (3, 'c'); > (1, 'a') isn't inserted into the table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13178) Enhance ORC Schema Evolution to handle more standard data type conversions
[ https://issues.apache.org/jira/browse/HIVE-13178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-13178: Attachment: HIVE-13178.092.patch HIVE-12159 went in -- rebase. > Enhance ORC Schema Evolution to handle more standard data type conversions > -- > > Key: HIVE-13178 > URL: https://issues.apache.org/jira/browse/HIVE-13178 > Project: Hive > Issue Type: Bug > Components: Hive, ORC >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-13178.01.patch, HIVE-13178.02.patch, > HIVE-13178.03.patch, HIVE-13178.04.patch, HIVE-13178.05.patch, > HIVE-13178.06.patch, HIVE-13178.07.patch, HIVE-13178.08.patch, > HIVE-13178.09.patch, HIVE-13178.091.patch, HIVE-13178.092.patch > > > Currently, SHORT -> INT -> BIGINT is supported. > Handle ORC data type conversions permitted by Implicit conversion allowed by > TypeIntoUtils.implicitConvertible method. >* STRING_GROUP -> DOUBLE >* STRING_GROUP -> DECIMAL >* DATE_GROUP -> STRING >* NUMERIC_GROUP -> STRING >* STRING_GROUP -> STRING_GROUP >* >* // Upward from "lower" type to "higher" numeric type: >* BYTE -> SHORT -> INT -> BIGINT -> FLOAT -> DOUBLE -> DECIMAL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13541) Pass view's ColumnAccessInfo to HiveAuthorizer
[ https://issues.apache.org/jira/browse/HIVE-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13541: --- Resolution: Fixed Status: Resolved (was: Patch Available) double checked the test case failures. The MiniTez ones are not reproducible. The MR ones are not reproducible. The Negative ones are existing failures. pushed to master. Thanks [~ashutoshc] for the review. > Pass view's ColumnAccessInfo to HiveAuthorizer > -- > > Key: HIVE-13541 > URL: https://issues.apache.org/jira/browse/HIVE-13541 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.1.0 > > Attachments: HIVE-13541.01.patch, HIVE-13541.02.patch > > > RIght now, only table's ColumnAccessInfo is passed to HiveAuthorizer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13541) Pass view's ColumnAccessInfo to HiveAuthorizer
[ https://issues.apache.org/jira/browse/HIVE-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13541: --- Affects Version/s: 2.0.0 > Pass view's ColumnAccessInfo to HiveAuthorizer > -- > > Key: HIVE-13541 > URL: https://issues.apache.org/jira/browse/HIVE-13541 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.1.0 > > Attachments: HIVE-13541.01.patch, HIVE-13541.02.patch > > > RIght now, only table's ColumnAccessInfo is passed to HiveAuthorizer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13509) HCatalog getSplits should ignore the partition with invalid path
[ https://issues.apache.org/jira/browse/HIVE-13509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258527#comment-15258527 ] Mithun Radhakrishnan commented on HIVE-13509: - bq. ... with Google Guava's {{Iterators.filter()}}. Actually, please ignore comment#3, above. I was trying to avoid checking {{ignoreInvalidPath}} multiple times. I tried writing it out myself (to illustrate), and saw that the call to {{fs.makeQualified()}} implies that we'll need to use both {{Iterators.filter()}} and {{Iterators.transform}}, at which point, it's no longer short and sweet. Please fix #2 above, and I will +1. Also, thanks for adding tests. > HCatalog getSplits should ignore the partition with invalid path > > > Key: HIVE-13509 > URL: https://issues.apache.org/jira/browse/HIVE-13509 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-13509.1.patch, HIVE-13509.patch > > > It is quite common that there is the discrepancy between partition directory > and its HMS metadata, simply because the directory could be added/deleted > externally using hdfs shell command. Technically it should be fixed by MSCK > and alter table .. add/drop command etc, but sometimes it might not be > practical especially in a multi-tenant env. This discrepancy does not cause > any problem to Hive, Hive returns no rows for a partition with an invalid > (e.g. non-existing) path, but it fails the Pig load with HCatLoader, because > the HCatBaseInputFormat getSplits throws an error when getting a split for a > non-existing path. The error message might looks like: > {code} > Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does > not exist: > hdfs://xyz.com:8020/user/hive/warehouse/xyz/date=2016-01-01/country=BR > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) > at > org.apache.hive.hcatalog.mapreduce.HCatBaseInputFormat.getSplits(HCatBaseInputFormat.java:162) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13560) Adding Omid as connection manager for HBase Metastore
[ https://issues.apache.org/jira/browse/HIVE-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-13560: -- Attachment: HIVE-13560.3.patch > Adding Omid as connection manager for HBase Metastore > - > > Key: HIVE-13560 > URL: https://issues.apache.org/jira/browse/HIVE-13560 > Project: Hive > Issue Type: Improvement > Components: HBase Metastore >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-13560.1.patch, HIVE-13560.2.patch, > HIVE-13560.3.patch > > > Adding Omid as a transaction manager to HBase Metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13588) NPE is thrown from MapredLocalTask.executeInChildVM
[ https://issues.apache.org/jira/browse/HIVE-13588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258483#comment-15258483 ] Yongzhi Chen commented on HIVE-13588: - The new PATCH LGTM +1 > NPE is thrown from MapredLocalTask.executeInChildVM > --- > > Key: HIVE-13588 > URL: https://issues.apache.org/jira/browse/HIVE-13588 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-13588.1.patch, HIVE-13588.patch, HIVE-13588.patch > > > NPE was thrown out from MapredLocalTask.executeInChildVM in running some > queries with CLI, see error below: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInChildVM(MapredLocalTask.java:321) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.execute(MapredLocalTask.java:148) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:172) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1868) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1595) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1346) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1117) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1105) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:236) > [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187) > [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) > [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:782) > [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:721) > [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:648) > [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.7.0_45] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > ~[?:1.7.0_45] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.7.0_45] > {code} > It is because the operationLog is only applicable to HS2 but CLI, therefore > it might not be set (null) > It is related to HIVE-13183 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13615) nomore_ambiguous_table_col.q is failing on master
[ https://issues.apache.org/jira/browse/HIVE-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258453#comment-15258453 ] Ashutosh Chauhan commented on HIVE-13615: - [~hsubramaniyan] May be related to recent parser changes. Can you take a look? > nomore_ambiguous_table_col.q is failing on master > - > > Key: HIVE-13615 > URL: https://issues.apache.org/jira/browse/HIVE-13615 > Project: Hive > Issue Type: Test > Components: Parser >Affects Versions: 2.1.0 >Reporter: Ashutosh Chauhan > > Fails with: > FAILED: ParseException line 3:9 cannot recognize input near 'src' 'key' > 'INSERT' in from source 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13509) HCatalog getSplits should ignore the partition with invalid path
[ https://issues.apache.org/jira/browse/HIVE-13509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258454#comment-15258454 ] Mithun Radhakrishnan commented on HIVE-13509: - Reviewing your patch now. On the face of it, it looks good. Looking at it a little more closely... A couple of observations: # {{hcat.input.ignore.invalid.path}} is well-named, and would make sense to anyone who'd want to override the default. (I thought we'd go with {{hcat.input.allow.invalid.path=true}}, but your version is better. # Consider replacing {{(pathString == null || pathString.trim().isEmpty())}} with {{StringUtils.isBlank(pathString)}}. # Nitpick: Consider replacing the loop at {{HCatBaseInputFormat.java:Line#335}} with Google Guava's {{Iterators.filter()}}. Then, depending on whether {{ignoreInvalidPath}} is set, the erstwhile loop at Line#329 will either loop on {{paths}} or on {{filteredPaths}}. This will be more readable. > HCatalog getSplits should ignore the partition with invalid path > > > Key: HIVE-13509 > URL: https://issues.apache.org/jira/browse/HIVE-13509 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-13509.1.patch, HIVE-13509.patch > > > It is quite common that there is the discrepancy between partition directory > and its HMS metadata, simply because the directory could be added/deleted > externally using hdfs shell command. Technically it should be fixed by MSCK > and alter table .. add/drop command etc, but sometimes it might not be > practical especially in a multi-tenant env. This discrepancy does not cause > any problem to Hive, Hive returns no rows for a partition with an invalid > (e.g. non-existing) path, but it fails the Pig load with HCatLoader, because > the HCatBaseInputFormat getSplits throws an error when getting a split for a > non-existing path. The error message might looks like: > {code} > Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does > not exist: > hdfs://xyz.com:8020/user/hive/warehouse/xyz/date=2016-01-01/country=BR > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) > at > org.apache.hive.hcatalog.mapreduce.HCatBaseInputFormat.getSplits(HCatBaseInputFormat.java:162) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13241) LLAP: Incremental Caching marks some small chunks as "incomplete CB"
[ https://issues.apache.org/jira/browse/HIVE-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258434#comment-15258434 ] Sergey Shelukhin commented on HIVE-13241: - Sorry, I will do a follow-up patch > LLAP: Incremental Caching marks some small chunks as "incomplete CB" > > > Key: HIVE-13241 > URL: https://issues.apache.org/jira/browse/HIVE-13241 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Sergey Shelukhin > Labels: TODOC2.1 > Fix For: 2.1.0 > > Attachments: HIVE-13241.01.patch, HIVE-13241.patch > > > Run #3 of a query with 1 node still has cache misses. > {code} > LLAP IO Summary > -- > VERTICES ROWGROUPS META_HIT META_MISS DATA_HIT DATA_MISS ALLOCATION > USED TOTAL_IO > -- > Map 111 1116 01.65GB93.61MB 0B >0B32.72s > -- > {code} > {code} > 2016-03-08T21:05:39,417 INFO > [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: > encoded.EncodedReaderImpl > (EncodedReaderImpl.java:prepareRangesForCompressedRead(695)) - Locking > 0x1c44401d(1) due to reuse > 2016-03-08T21:05:39,417 INFO > [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: > encoded.EncodedReaderImpl > (EncodedReaderImpl.java:prepareRangesForCompressedRead(701)) - Adding an > already-uncompressed buffer 0x1c44401d(2) > 2016-03-08T21:05:39,417 INFO > [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: > encoded.EncodedReaderImpl > (EncodedReaderImpl.java:prepareRangesForCompressedRead(695)) - Locking > 0x4e51b032(1) due to reuse > 2016-03-08T21:05:39,417 INFO > [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: > encoded.EncodedReaderImpl > (EncodedReaderImpl.java:prepareRangesForCompressedRead(701)) - Adding an > already-uncompressed buffer 0x4e51b032(2) > 2016-03-08T21:05:39,418 INFO > [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: > encoded.EncodedReaderImpl > (EncodedReaderImpl.java:addOneCompressionBuffer(1161)) - Found CB at 1373931, > chunk length 86587, total 86590, compressed > 2016-03-08T21:05:39,418 INFO > [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: > encoded.EncodedReaderImpl > (EncodedReaderImpl.java:addIncompleteCompressionBuffer(1241)) - Replacing > data range [1373931, 1408408), size: 34474(!) type: direct (and 0 previous > chunks) with incomplete CB start: 1373931 end: 1408408 in the buffers > 2016-03-08T21:05:39,418 INFO > [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: > encoded.EncodedReaderImpl > (EncodedReaderImpl.java:createRgColumnStreamData(441)) - Getting data for > column 7 RG 14 stream DATA at 1460521, 319811 index position 0: compressed > [1626961, 1780332) > {code} > {code} > 2016-03-08T21:05:38,925 INFO > [IO-Elevator-Thread-7[attempt_1455662455106_2688_3_00_01_0]]: > encoded.OrcEncodedDataReader (OrcEncodedDataReader.java:readFileData(878)) - > Disk ranges after disk read (file 5372745, base offset 3): [{start: 18986 > end: 20660 cache buffer: 0x660faf7c(1)}, {start: 20660 end: 35775 cache > buffer: 0x1dcb1d97(1)}, {start: 318852 end: 422353 cache buffer: > 0x6c7f9a05(1)}, {start: 1148616 end: 1262468 cache buffer: 0x196e1d41(1)}, > {start: 1262468 end: 1376342 cache buffer: 0x201255f(1)}, {data range > [1376342, 1410766), size: 34424 type: direct}, {start: 1631359 end: 1714694 > cache buffer: 0x47e3a72d(1)}, {start: 1714694 end: 1785770 cache buffer: > 0x57dca266(1)}, {start: 4975035 end: 5095215 cache buffer: 0x3e3139c9(1)}, > {start: 5095215 end: 5197863 cache buffer: 0x3511c88d(1)}, {start: 7448387 > end: 7572268 cache buffer: 0x6f11dbcd(1)}, {start: 7572268 end: 7696182 cache > buffer: 0x5d6c9bdb(1)}, {data range [7696182, 7710537), size: 14355 type: > direct}, {start: 8235756 end: 8345367 cache buffer: 0x6a241ece(1)}, {start: > 8345367 end: 8455009 cache buffer: 0x51caf6a7(1)}, {data range [8455009, > 8497906), size: 42897 type: direct}, {start: 9035815 end: 9159708 cache > buffer: 0x306480e0(1)}, {start: 9159708 end: 9283629 cache buffer: > 0x9ef7774(1)}, {data range [9283629, 9297965), size: 14336 type: direct}, > {start: 9989884 end: 10113731 cache buffer: 0x43f7cae9(1)}, {start: 10113731 > end: 10237589 cache buffer: 0x458e63fe(1)}, {data range [10237589, 10252034), > size: 14445 type: direct}, {start: 11897896 end: 12021787 cache buffer: > 0x51f9982f(1)}, {start: 12021787 end: 12145656 cache buffer: 0x23df01b3(1)},
[jira] [Updated] (HIVE-13588) NPE is thrown from MapredLocalTask.executeInChildVM
[ https://issues.apache.org/jira/browse/HIVE-13588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-13588: --- Attachment: HIVE-13588.1.patch Yeah, it is not necessary to reset the operationLog when tasks are not run in parallel. I revised the patch. Thanks [~ychena] > NPE is thrown from MapredLocalTask.executeInChildVM > --- > > Key: HIVE-13588 > URL: https://issues.apache.org/jira/browse/HIVE-13588 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-13588.1.patch, HIVE-13588.patch, HIVE-13588.patch > > > NPE was thrown out from MapredLocalTask.executeInChildVM in running some > queries with CLI, see error below: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInChildVM(MapredLocalTask.java:321) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.execute(MapredLocalTask.java:148) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:172) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1868) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1595) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1346) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1117) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1105) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:236) > [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187) > [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) > [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:782) > [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:721) > [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:648) > [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.7.0_45] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > ~[?:1.7.0_45] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.7.0_45] > {code} > It is because the operationLog is only applicable to HS2 but CLI, therefore > it might not be set (null) > It is related to HIVE-13183 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13537) Update slf4j version to 1.7.10
[ https://issues.apache.org/jira/browse/HIVE-13537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258383#comment-15258383 ] Hive QA commented on HIVE-13537: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12800455/HIVE-13537.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 62 failed/errored test(s), 9940 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestMiniTezCliDriver-vector_distinct_2.q-tez_joins_explain.q-cte_mat_1.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vectorized_parquet.q-vector_decimal_aggregate.q-tez_self_join.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nomore_ambiguous_table_col org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge2 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge9 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge_diff_fs org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join2 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join3 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join4 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join5 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern3 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern4 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_nonkey_groupby org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_selectDistinctStarNeg_2 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_subquery_shared_alias org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udtf_not_supported1 org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote.org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote org.apache.hadoop.hive.metastore.TestFilterHooks.org.apache.hadoop.hive.metastore.TestFilterHooks org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener org.apache.hadoop.hive.metastore.TestMetaStoreEventListenerOnlyOnCommit.testEventStatus org.apache.hadoop.hive.metastore.TestMetaStoreInitListener.testMetaStoreInitListener org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.org.apache.hadoop.hive.metastore.TestMetaStoreMetrics org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithUnicode org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithValidPartVal org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithCommas org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithUnicode org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithValidCharacters org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler org.apache.hadoop.hive.metastore.hbase.TestHBaseImport.org.apache.hadoop.hive.metastore.hbase.TestHBaseImport org.apache.hadoop.hive.ql.security.TestClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestExtendedAcls.org.apache.hadoop.hive.ql.security.TestExtendedAcls org.apache.hadoop.hive.ql.security.TestFolderPermissions.org.apache.hadoop.hive.ql.security.TestFolderPermissions org.apache.hadoop.hive.ql.security.TestMetastoreAuthorizationProvider.testSimplePrivileges
[jira] [Commented] (HIVE-13588) NPE is thrown from MapredLocalTask.executeInChildVM
[ https://issues.apache.org/jira/browse/HIVE-13588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258373#comment-15258373 ] Yongzhi Chen commented on HIVE-13588: - [~ctang.ma], Does your fix need changing location of tskRun.setOperationLog(OperationLog.getCurrentOperationLog()); ? >From HIVE-9120 , it looks like the setOperationLog is not needed when >hive.exec.parallel is false. > NPE is thrown from MapredLocalTask.executeInChildVM > --- > > Key: HIVE-13588 > URL: https://issues.apache.org/jira/browse/HIVE-13588 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-13588.patch, HIVE-13588.patch > > > NPE was thrown out from MapredLocalTask.executeInChildVM in running some > queries with CLI, see error below: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInChildVM(MapredLocalTask.java:321) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.execute(MapredLocalTask.java:148) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:172) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1868) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1595) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1346) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1117) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1105) > [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:236) > [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187) > [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) > [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:782) > [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:721) > [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:648) > [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.7.0_45] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > ~[?:1.7.0_45] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.7.0_45] > {code} > It is because the operationLog is only applicable to HS2 but CLI, therefore > it might not be set (null) > It is related to HIVE-13183 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12159) Create vectorized readers for the complex types
[ https://issues.apache.org/jira/browse/HIVE-12159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258293#comment-15258293 ] ASF GitHub Bot commented on HIVE-12159: --- Github user omalley closed the pull request at: https://github.com/apache/hive/pull/68 > Create vectorized readers for the complex types > --- > > Key: HIVE-12159 > URL: https://issues.apache.org/jira/browse/HIVE-12159 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.1.0 > > Attachments: HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, > HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, > HIVE-12159.patch, HIVE-12159.patch > > > We need vectorized readers for the complex types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12159) Create vectorized readers for the complex types
[ https://issues.apache.org/jira/browse/HIVE-12159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258289#comment-15258289 ] Owen O'Malley commented on HIVE-12159: -- The failures from Jenkins are not related. I just committed this. Thanks for the reviews, Matt & Prasanth. > Create vectorized readers for the complex types > --- > > Key: HIVE-12159 > URL: https://issues.apache.org/jira/browse/HIVE-12159 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.1.0 > > Attachments: HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, > HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, > HIVE-12159.patch, HIVE-12159.patch > > > We need vectorized readers for the complex types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12159) Create vectorized readers for the complex types
[ https://issues.apache.org/jira/browse/HIVE-12159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-12159: - Resolution: Fixed Fix Version/s: 2.1.0 Status: Resolved (was: Patch Available) > Create vectorized readers for the complex types > --- > > Key: HIVE-12159 > URL: https://issues.apache.org/jira/browse/HIVE-12159 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.1.0 > > Attachments: HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, > HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, > HIVE-12159.patch, HIVE-12159.patch > > > We need vectorized readers for the complex types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13525) HoS hangs when job is empty
[ https://issues.apache.org/jira/browse/HIVE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258256#comment-15258256 ] Rui Li commented on HIVE-13525: --- {{TestSparkClient.testMetricsCollection}} failure is related. The problem is, if {{RemoteDriver}} considers a job is done after future#get returns, we may send JobResult before the listener can handle TaskEnd event and send the metrics. At the client side, the job handle is removed after JobResult is received, which means the later metrics will be simply discarded. I'll think about how to solve this. And any idea is welcome. > HoS hangs when job is empty > --- > > Key: HIVE-13525 > URL: https://issues.apache.org/jira/browse/HIVE-13525 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-13525.1.patch, HIVE-13525.2.patch > > > Observed in local tests. This should be the cause of HIVE-13402. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10867) ArrayIndexOutOfBoundsException LazyBinaryUtils.byteArrayToLong with Hive on Tez
[ https://issues.apache.org/jira/browse/HIVE-10867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alina Abramova updated HIVE-10867: -- Attachment: HIVE-10867.patch I created this patch based on https://issues.apache.org/jira/browse/HIVE-9517 I see that that fix works for Tez too > ArrayIndexOutOfBoundsException LazyBinaryUtils.byteArrayToLong with Hive on > Tez > --- > > Key: HIVE-10867 > URL: https://issues.apache.org/jira/browse/HIVE-10867 > Project: Hive > Issue Type: Bug > Components: Hive, Tez >Affects Versions: 0.14.0, 1.0.0 > Environment: Hortwonworks distribution 2.2.4-2 > Hive 0.14.0 > Tez 0.5.2.2.2.4.2-2 on cluster > Tez 0.7.0 in local setup >Reporter: Per Ullberg >Assignee: Alina Abramova > Attachments: HIVE-10867.patch > > > Hi, > The following query runs fine on map reduce engine but when setting the > hive.exection.engine to tez it produces an ArrayIndexOutOfBoundsException. > Query > {code} > create external table table_1 (id string, date string, amount bigint); > insert into table table_1 values (305,'2013-03-02',3790); > create external table table_2 (id string); > insert into table table_2 VALUES (305); > create external table table_3 (id string, date_3 string, amount_3 bigint); > insert into table table_3 values (305,'2013-03-01',-1600); > create external table table_4 (id bigint, str_4 string, amount_4 bigint); > create table table_5 > as > SELECT > c.diff > FROM ( > SELECT > id AS id, > date AS create_date, > -amount AS diff > FROM table_1 > UNION ALL > SELECT > p.id AS id, > p.str_4 AS create_date, > -p.amount_4 AS diff > FROM table_4 p > UNION ALL > SELECT > id, > create_date, > diff > FROM ( > SELECT > i.id AS id, > tp.date_3 AS create_date, > cast(amount_3 as double) AS diff > FROM table_3 tp > INNER JOIN table_2 i ON cast(tp.id as string) = cast(i.id as string) > ) fees > ) c > INNER JOIN table_2 i ON cast(c.id as string) = cast(i.id as string); > {code} > Results with map reduce engine: > {code} > hive> select * from table_5; > OK > -1600.0 > -3790.0 > Time taken: 0.061 seconds, Fetched: 2 row(s) > {code} > Exception with tez engine: > {code} > Status: Failed > Vertex failed, vertexName=Reducer 4, vertexId=vertex_1432809678493_0891_4_06, > diagnostics=[Task failed, taskId=task_1432809678493_0891_4_06_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) > {"key":{"reducesinkkey0":"305"},"value":{"_col1":-1600.0}} > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row (tag=0) > {"key":{"reducesinkkey0":"305"},"value":{"_col1":-1600.0}} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:337) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:218) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:168) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163) > ... 13 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 6 > at >
[jira] [Updated] (HIVE-10867) ArrayIndexOutOfBoundsException LazyBinaryUtils.byteArrayToLong with Hive on Tez
[ https://issues.apache.org/jira/browse/HIVE-10867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alina Abramova updated HIVE-10867: -- Affects Version/s: 1.0.0 Status: Patch Available (was: In Progress) > ArrayIndexOutOfBoundsException LazyBinaryUtils.byteArrayToLong with Hive on > Tez > --- > > Key: HIVE-10867 > URL: https://issues.apache.org/jira/browse/HIVE-10867 > Project: Hive > Issue Type: Bug > Components: Hive, Tez >Affects Versions: 1.0.0, 0.14.0 > Environment: Hortwonworks distribution 2.2.4-2 > Hive 0.14.0 > Tez 0.5.2.2.2.4.2-2 on cluster > Tez 0.7.0 in local setup >Reporter: Per Ullberg >Assignee: Alina Abramova > Attachments: HIVE-10867.patch > > > Hi, > The following query runs fine on map reduce engine but when setting the > hive.exection.engine to tez it produces an ArrayIndexOutOfBoundsException. > Query > {code} > create external table table_1 (id string, date string, amount bigint); > insert into table table_1 values (305,'2013-03-02',3790); > create external table table_2 (id string); > insert into table table_2 VALUES (305); > create external table table_3 (id string, date_3 string, amount_3 bigint); > insert into table table_3 values (305,'2013-03-01',-1600); > create external table table_4 (id bigint, str_4 string, amount_4 bigint); > create table table_5 > as > SELECT > c.diff > FROM ( > SELECT > id AS id, > date AS create_date, > -amount AS diff > FROM table_1 > UNION ALL > SELECT > p.id AS id, > p.str_4 AS create_date, > -p.amount_4 AS diff > FROM table_4 p > UNION ALL > SELECT > id, > create_date, > diff > FROM ( > SELECT > i.id AS id, > tp.date_3 AS create_date, > cast(amount_3 as double) AS diff > FROM table_3 tp > INNER JOIN table_2 i ON cast(tp.id as string) = cast(i.id as string) > ) fees > ) c > INNER JOIN table_2 i ON cast(c.id as string) = cast(i.id as string); > {code} > Results with map reduce engine: > {code} > hive> select * from table_5; > OK > -1600.0 > -3790.0 > Time taken: 0.061 seconds, Fetched: 2 row(s) > {code} > Exception with tez engine: > {code} > Status: Failed > Vertex failed, vertexName=Reducer 4, vertexId=vertex_1432809678493_0891_4_06, > diagnostics=[Task failed, taskId=task_1432809678493_0891_4_06_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) > {"key":{"reducesinkkey0":"305"},"value":{"_col1":-1600.0}} > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row (tag=0) > {"key":{"reducesinkkey0":"305"},"value":{"_col1":-1600.0}} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:337) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:218) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:168) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163) > ... 13 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:84) > at >
[jira] [Assigned] (HIVE-10867) ArrayIndexOutOfBoundsException LazyBinaryUtils.byteArrayToLong with Hive on Tez
[ https://issues.apache.org/jira/browse/HIVE-10867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alina Abramova reassigned HIVE-10867: - Assignee: Alina Abramova > ArrayIndexOutOfBoundsException LazyBinaryUtils.byteArrayToLong with Hive on > Tez > --- > > Key: HIVE-10867 > URL: https://issues.apache.org/jira/browse/HIVE-10867 > Project: Hive > Issue Type: Bug > Components: Hive, Tez >Affects Versions: 0.14.0 > Environment: Hortwonworks distribution 2.2.4-2 > Hive 0.14.0 > Tez 0.5.2.2.2.4.2-2 on cluster > Tez 0.7.0 in local setup >Reporter: Per Ullberg >Assignee: Alina Abramova > > Hi, > The following query runs fine on map reduce engine but when setting the > hive.exection.engine to tez it produces an ArrayIndexOutOfBoundsException. > Query > {code} > create external table table_1 (id string, date string, amount bigint); > insert into table table_1 values (305,'2013-03-02',3790); > create external table table_2 (id string); > insert into table table_2 VALUES (305); > create external table table_3 (id string, date_3 string, amount_3 bigint); > insert into table table_3 values (305,'2013-03-01',-1600); > create external table table_4 (id bigint, str_4 string, amount_4 bigint); > create table table_5 > as > SELECT > c.diff > FROM ( > SELECT > id AS id, > date AS create_date, > -amount AS diff > FROM table_1 > UNION ALL > SELECT > p.id AS id, > p.str_4 AS create_date, > -p.amount_4 AS diff > FROM table_4 p > UNION ALL > SELECT > id, > create_date, > diff > FROM ( > SELECT > i.id AS id, > tp.date_3 AS create_date, > cast(amount_3 as double) AS diff > FROM table_3 tp > INNER JOIN table_2 i ON cast(tp.id as string) = cast(i.id as string) > ) fees > ) c > INNER JOIN table_2 i ON cast(c.id as string) = cast(i.id as string); > {code} > Results with map reduce engine: > {code} > hive> select * from table_5; > OK > -1600.0 > -3790.0 > Time taken: 0.061 seconds, Fetched: 2 row(s) > {code} > Exception with tez engine: > {code} > Status: Failed > Vertex failed, vertexName=Reducer 4, vertexId=vertex_1432809678493_0891_4_06, > diagnostics=[Task failed, taskId=task_1432809678493_0891_4_06_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) > {"key":{"reducesinkkey0":"305"},"value":{"_col1":-1600.0}} > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row (tag=0) > {"key":{"reducesinkkey0":"305"},"value":{"_col1":-1600.0}} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:337) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:218) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:168) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163) > ... 13 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:84) > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43) > at >
[jira] [Work started] (HIVE-10867) ArrayIndexOutOfBoundsException LazyBinaryUtils.byteArrayToLong with Hive on Tez
[ https://issues.apache.org/jira/browse/HIVE-10867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-10867 started by Alina Abramova. - > ArrayIndexOutOfBoundsException LazyBinaryUtils.byteArrayToLong with Hive on > Tez > --- > > Key: HIVE-10867 > URL: https://issues.apache.org/jira/browse/HIVE-10867 > Project: Hive > Issue Type: Bug > Components: Hive, Tez >Affects Versions: 0.14.0 > Environment: Hortwonworks distribution 2.2.4-2 > Hive 0.14.0 > Tez 0.5.2.2.2.4.2-2 on cluster > Tez 0.7.0 in local setup >Reporter: Per Ullberg >Assignee: Alina Abramova > > Hi, > The following query runs fine on map reduce engine but when setting the > hive.exection.engine to tez it produces an ArrayIndexOutOfBoundsException. > Query > {code} > create external table table_1 (id string, date string, amount bigint); > insert into table table_1 values (305,'2013-03-02',3790); > create external table table_2 (id string); > insert into table table_2 VALUES (305); > create external table table_3 (id string, date_3 string, amount_3 bigint); > insert into table table_3 values (305,'2013-03-01',-1600); > create external table table_4 (id bigint, str_4 string, amount_4 bigint); > create table table_5 > as > SELECT > c.diff > FROM ( > SELECT > id AS id, > date AS create_date, > -amount AS diff > FROM table_1 > UNION ALL > SELECT > p.id AS id, > p.str_4 AS create_date, > -p.amount_4 AS diff > FROM table_4 p > UNION ALL > SELECT > id, > create_date, > diff > FROM ( > SELECT > i.id AS id, > tp.date_3 AS create_date, > cast(amount_3 as double) AS diff > FROM table_3 tp > INNER JOIN table_2 i ON cast(tp.id as string) = cast(i.id as string) > ) fees > ) c > INNER JOIN table_2 i ON cast(c.id as string) = cast(i.id as string); > {code} > Results with map reduce engine: > {code} > hive> select * from table_5; > OK > -1600.0 > -3790.0 > Time taken: 0.061 seconds, Fetched: 2 row(s) > {code} > Exception with tez engine: > {code} > Status: Failed > Vertex failed, vertexName=Reducer 4, vertexId=vertex_1432809678493_0891_4_06, > diagnostics=[Task failed, taskId=task_1432809678493_0891_4_06_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) > {"key":{"reducesinkkey0":"305"},"value":{"_col1":-1600.0}} > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row (tag=0) > {"key":{"reducesinkkey0":"305"},"value":{"_col1":-1600.0}} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:337) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:218) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:168) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163) > ... 13 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:84) > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43) > at >