[jira] [Commented] (HIVE-13563) Hive Streaming does not honor orc.compress.size and orc.stripe.size table properties

2016-04-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259451#comment-15259451
 ] 

Hive QA commented on HIVE-13563:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12800332/HIVE-13563.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 46 failed/errored test(s), 9935 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestMiniTezCliDriver-auto_join1.q-vector_complex_join.q-vectorization_limit.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_decimal_2.q-explainuser_1.q-explainuser_3.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_varchar_4.q-smb_cache.q-tez_join_hash.q-and-8-more 
- did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nomore_ambiguous_table_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regexp_extract
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern3
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern4
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_nonkey_groupby
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_selectDistinctStarNeg_2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_subquery_shared_alias
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udtf_not_supported1
org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote.org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote
org.apache.hadoop.hive.metastore.TestFilterHooks.org.apache.hadoop.hive.metastore.TestFilterHooks
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testAddPartitions
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener
org.apache.hadoop.hive.metastore.TestMetaStoreEventListenerOnlyOnCommit.testEventStatus
org.apache.hadoop.hive.metastore.TestMetaStoreInitListener.testMetaStoreInitListener
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.org.apache.hadoop.hive.metastore.TestMetaStoreMetrics
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithUnicode
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithValidPartVal
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithCommas
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithUnicode
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithValidCharacters
org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler
org.apache.hadoop.hive.ql.security.TestClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestFolderPermissions.org.apache.hadoop.hive.ql.security.TestFolderPermissions
org.apache.hadoop.hive.ql.security.TestMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener.org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener
org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure
org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testSaslWithHiveMetaStore
org.apache.hive.hcatalog.listener.TestDbNotificationListener.sqlInsertPartition
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.org.apache.hive.minikdc.TestJdbcWithDBTokenStore

[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-26 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259412#comment-15259412
 ] 

Rui Li commented on HIVE-13572:
---

Here's the time (in seconds) spent of copying 183 files in my local test:
||W/O patch||Patch v1||Patch v2||
|16.36|3.6|0.72|

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13343) Need to disable hybrid grace hash join in llap mode except for dynamically partitioned hash join

2016-04-26 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259356#comment-15259356
 ] 

Vikram Dixit K commented on HIVE-13343:
---

Addressed comments. Created RB.

> Need to disable hybrid grace hash join in llap mode except for dynamically 
> partitioned hash join
> 
>
> Key: HIVE-13343
> URL: https://issues.apache.org/jira/browse/HIVE-13343
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-13343.1.patch, HIVE-13343.2.patch, 
> HIVE-13343.3.patch
>
>
> Due to performance reasons, we should disable use of hybrid grace hash join 
> in llap when dynamic partition hash join is not used. With dynamic partition 
> hash join, we need hybrid grace hash join due to the possibility of skews.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13610) Hive exec module won't compile with IBM JDK

2016-04-26 Thread Pan Yuxuan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259351#comment-15259351
 ] 

Pan Yuxuan commented on HIVE-13610:
---

[~sershe]
Updated the patch, please take a look, thanks for your kindly review.

> Hive exec module won't compile with IBM JDK
> ---
>
> Key: HIVE-13610
> URL: https://issues.apache.org/jira/browse/HIVE-13610
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
> Environment: Hive 1.1.0 + IBM JDK 1.7 +ppc64 architecture
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
> Attachments: HIVE-13610.1.patch, HIVE-13610.patch
>
>
> org.apache.hadoop.hive.ql.debug.Utils explicitly import 
> com.sun.management.HotSpotDiagnosticMXBean which is not supported by IBM JDK.
> So we can make HotSpotDiagnosticMXBean as runtime but not compile.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13610) Hive exec module won't compile with IBM JDK

2016-04-26 Thread Pan Yuxuan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pan Yuxuan updated HIVE-13610:
--
Attachment: HIVE-13610.1.patch

> Hive exec module won't compile with IBM JDK
> ---
>
> Key: HIVE-13610
> URL: https://issues.apache.org/jira/browse/HIVE-13610
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
> Environment: Hive 1.1.0 + IBM JDK 1.7 +ppc64 architecture
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
> Attachments: HIVE-13610.1.patch, HIVE-13610.patch
>
>
> org.apache.hadoop.hive.ql.debug.Utils explicitly import 
> com.sun.management.HotSpotDiagnosticMXBean which is not supported by IBM JDK.
> So we can make HotSpotDiagnosticMXBean as runtime but not compile.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13343) Need to disable hybrid grace hash join in llap mode except for dynamically partitioned hash join

2016-04-26 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-13343:
--
Attachment: HIVE-13343.3.patch

> Need to disable hybrid grace hash join in llap mode except for dynamically 
> partitioned hash join
> 
>
> Key: HIVE-13343
> URL: https://issues.apache.org/jira/browse/HIVE-13343
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-13343.1.patch, HIVE-13343.2.patch, 
> HIVE-13343.3.patch
>
>
> Due to performance reasons, we should disable use of hybrid grace hash join 
> in llap when dynamic partition hash join is not used. With dynamic partition 
> hash join, we need hybrid grace hash join due to the possibility of skews.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259315#comment-15259315
 ] 

Ashutosh Chauhan commented on HIVE-13572:
-

I see. Do you have numbers on how do these two approaches compare?  Just trying 
to figure out how much improvement we are getting?

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-26 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259306#comment-15259306
 ] 

Rui Li commented on HIVE-13572:
---

Just created RS for v2 patch. Changes to sessionstate and shim are trying to 
add more work to threadpool and make the synchronization more efficient. It's 
not a bug.

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13395) Lost Update problem in ACID

2016-04-26 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13395:
--
Attachment: HIVE-13395.11.patch

patch 11 (not final) reworks the implementation to get WriteSet information 
from TxnHandler.addDynamicPartitions() call rather than lock information (where 
applicable).  The later knows exactly which partitions have been written to. 
This increases concurrency dramatically.
For example "update T set a = 7 where b = 17".  Suppose b is not a partition 
column and the table has 10K partitions.  Suppose further that only 5 
partitions match "b=17".  We'll currently lock all the existing partitions but 
a true WriteSet is the 5 partitions actually modified.

Further optimizations in HIVE-13622 are useful but not absolutely required.

> Lost Update problem in ACID
> ---
>
> Key: HIVE-13395
> URL: https://issues.apache.org/jira/browse/HIVE-13395
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13395.11.patch, HIVE-13395.6.patch, 
> HIVE-13395.7.patch, HIVE-13395.8.patch
>
>
> ACID users can run into Lost Update problem.
> In Hive 1.2, Driver.recordValidTxns() (which records the snapshot to use for 
> the query) is called in Driver.compile().
> Now suppose to concurrent "update T set x = x + 1" are executed.  (for 
> simplicity assume there is exactly 1 row in T)
> What can happen is that both compile at the same time (more precisely before 
> acquireLocksAndOpenTxn() in runInternal() is called) and thus will lock in 
> the same snapshot, say the value of x = 7 in this snapshot.
> Now 1 will get the lock on the row, the second will block.  
> Now 1, makes x = 8 and commits.
> Now 2 proceeds and makes x = 8 again since in it's snapshot x is still 7.
> This specific issue is solved in Hive 1.3/2.0 (HIVE-11077 which is a large 
> patch that deals with multi-statement txns) by moving recordValidTxns() after 
> locks are acquired which reduces the likelihood of this but doesn't eliminate 
> the problem.
> 
> Even in 1.3 version of the code, you could have the same issue.  Assume the 
> same 2 queries:
> Both start a txn, say txnid 9 and 10.  Say 10 gets the lock first, 9 blocks.
> 10 updates the row (so x = 8) and thus ReaderKey.currentTransactionId=10.
> 10 commits.
> Now 9 can proceed and it will get a snapshot that includes 10, i.e. it will 
> see x = 8 and it will write x = 9, but it will set 
> ReaderKey.currentTransactionId = 9.  Thus when merge logic runs, it will see 
> x = 8 is the later version of this row, i.e. lost update.
> The problem is that locks alone are insufficient for MVCC architecture.  
> 
> At lower level Row ID has (originalTransactionId, rowid, bucket id, 
> currentTransactionId) and since on update/delete we do a table scan, we could 
> check that we are about to write a row with currentTransactionId < 
> (currentTransactionId of row we've read) and fail the query.  Currently, 
> currentTransactionId is not surfaced at higher level where this check can be 
> made.
> This would not work (efficiently) longer term where we want to support fast 
> update on user defined PK vis streaming ingest.
> Also, this would not work with multi statement txns since in that case we'd 
> lock in the snapshot at the start of the txn, but then 2nd, 3rd etc queries 
> would use the same snapshot and the locks for these queries would be acquired 
> after the snapshot is locked in so this would be the same situation as pre 
> HIVE-11077.
> 
>  
> A more robust solution (commonly used with MVCC) is to keep track of start 
> and commit time (logical counter) or each transaction to detect if two txns 
> overlap.  The 2nd part is to keep track of write-set, i.e. which data (rows, 
> partitions, whatever appropriate level of granularity is) were modified by 
> any txn and if 2 txns overlap in time and wrote the same element, abort later 
> one.  This is called first-committer-wins rule.  This requires a MS DB schema 
> change
> It would be most convenient to use the same sequence for txnId, start and 
> commit time (in which case txnid=start time).  In this case we'd need to add 
> 1 filed to TXNS table.  The complication here is that we'll be using elements 
> of the sequence faster and they are used as part of file name of delta and 
> base dir and currently limited to 7 digits which can be exceeded.  So this 
> would require some thought to handling upgrade/migration.
> Also, write-set tracking requires either additional metastore table or 
> keeping info in HIVE_LOCKS around longer with new state.
> 
> In the short term, on SQL side of things we could (in auto commit mode only)
> acquire the locks first 

[jira] [Updated] (HIVE-13622) WriteSet tracking optimizations

2016-04-26 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13622:
--
Description: 
HIVE-13395 solves the the lost update problem with some inefficiencies.

1. TxhHandler.OperationType is currently derived from LockType.  This doesn't  
distinguish between Update and Delete but would be useful.  See comments in 
TxnHandler.  Should be able to pass in Insert/Update/Delete info from client 
into TxnHandler.
2. TxnHandler.addDynamicPartitions() should know the OperationType as well from 
the client.  It currently extrapolates it from TXN_COMPONENTS.  This works but 
requires extra SQL statements and is thus less performant.  It will not work 
multi-stmt txns.  See comments in the code.
3. TxnHandler.checkLock() see more comments around 
"isPartOfDynamicPartitionInsert".  If TxnHandler knew whether it is being 
called as part of an op running with dynamic partitions, it could be more 
efficient.  In that case we don't have to write to TXN_COMPONENTS at all during 
lock acquisition.  Conversely, if not running with DynPart then, we can kill 
current txn on lock grant rather than wait until commit time.

All of these require some Thrift changes

Once done, re-enable TestDbTxnHandler2.testWriteSetTracking11()

  was:
HIVE-13395 solves the the lost update problem with some inefficiencies.

1. TxhHandler.OperationType is currently derived from LockType.  This doesn't  
distinguish between Update and Delete but would be useful.  See comments in 
TxnHandler.  Should be able to pass in Insert/Update/Delete info from client 
into TxnHandler.
2. TxnHandler.addDynamicPartitions() should know the OperationType as well from 
the client.  It currently extrapolates it from TXN_COMPONENTS.  This works but 
requires extra SQL statements and is thus less performant.  It will not work 
multi-stmt txns.  See comments in the code.
3. TxnHandler.checkLock() see more comments around 
"isPartOfDynamicPartitionInsert".  If TxnHandler knew whether it is being 
called as part of an op running with dynamic partitions, it could be more 
efficient.  In that case we don't have to write to TXN_COMPONENTS at all during 
lock acquisition.  Conversely, if not running with DynPart then, we can kill 
current txn on lock grant rather than wait until commit time.

All of these require some Thrift changes


> WriteSet tracking optimizations
> ---
>
> Key: HIVE-13622
> URL: https://issues.apache.org/jira/browse/HIVE-13622
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> HIVE-13395 solves the the lost update problem with some inefficiencies.
> 1. TxhHandler.OperationType is currently derived from LockType.  This doesn't 
>  distinguish between Update and Delete but would be useful.  See comments in 
> TxnHandler.  Should be able to pass in Insert/Update/Delete info from client 
> into TxnHandler.
> 2. TxnHandler.addDynamicPartitions() should know the OperationType as well 
> from the client.  It currently extrapolates it from TXN_COMPONENTS.  This 
> works but requires extra SQL statements and is thus less performant.  It will 
> not work multi-stmt txns.  See comments in the code.
> 3. TxnHandler.checkLock() see more comments around 
> "isPartOfDynamicPartitionInsert".  If TxnHandler knew whether it is being 
> called as part of an op running with dynamic partitions, it could be more 
> efficient.  In that case we don't have to write to TXN_COMPONENTS at all 
> during lock acquisition.  Conversely, if not running with DynPart then, we 
> can kill current txn on lock grant rather than wait until commit time.
> All of these require some Thrift changes
> Once done, re-enable TestDbTxnHandler2.testWriteSetTracking11()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9675) Support START TRANSACTION/COMMIT/ROLLBACK commands

2016-04-26 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259292#comment-15259292
 ] 

Eugene Koifman commented on HIVE-9675:
--

make sure to fix HIVE-13622 first (the part about knowing in TxnHandler if the 
query is using Dynamic Partitions)

> Support START TRANSACTION/COMMIT/ROLLBACK commands
> --
>
> Key: HIVE-9675
> URL: https://issues.apache.org/jira/browse/HIVE-9675
> Project: Hive
>  Issue Type: New Feature
>  Components: SQL, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> Hive 0.14 added support for insert/update/delete statements with ACID 
> semantics.  Hive 0.14 only supports auto-commit mode.  We need to add support 
> for START TRANSACTION/COMMIT/ROLLBACK commands so that the user can demarcate 
> transaction boundaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13395) Lost Update problem in ACID

2016-04-26 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259289#comment-15259289
 ] 

Eugene Koifman commented on HIVE-13395:
---

HIVE-13622 covers some optimizations for this

> Lost Update problem in ACID
> ---
>
> Key: HIVE-13395
> URL: https://issues.apache.org/jira/browse/HIVE-13395
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13395.6.patch, HIVE-13395.7.patch, 
> HIVE-13395.8.patch
>
>
> ACID users can run into Lost Update problem.
> In Hive 1.2, Driver.recordValidTxns() (which records the snapshot to use for 
> the query) is called in Driver.compile().
> Now suppose to concurrent "update T set x = x + 1" are executed.  (for 
> simplicity assume there is exactly 1 row in T)
> What can happen is that both compile at the same time (more precisely before 
> acquireLocksAndOpenTxn() in runInternal() is called) and thus will lock in 
> the same snapshot, say the value of x = 7 in this snapshot.
> Now 1 will get the lock on the row, the second will block.  
> Now 1, makes x = 8 and commits.
> Now 2 proceeds and makes x = 8 again since in it's snapshot x is still 7.
> This specific issue is solved in Hive 1.3/2.0 (HIVE-11077 which is a large 
> patch that deals with multi-statement txns) by moving recordValidTxns() after 
> locks are acquired which reduces the likelihood of this but doesn't eliminate 
> the problem.
> 
> Even in 1.3 version of the code, you could have the same issue.  Assume the 
> same 2 queries:
> Both start a txn, say txnid 9 and 10.  Say 10 gets the lock first, 9 blocks.
> 10 updates the row (so x = 8) and thus ReaderKey.currentTransactionId=10.
> 10 commits.
> Now 9 can proceed and it will get a snapshot that includes 10, i.e. it will 
> see x = 8 and it will write x = 9, but it will set 
> ReaderKey.currentTransactionId = 9.  Thus when merge logic runs, it will see 
> x = 8 is the later version of this row, i.e. lost update.
> The problem is that locks alone are insufficient for MVCC architecture.  
> 
> At lower level Row ID has (originalTransactionId, rowid, bucket id, 
> currentTransactionId) and since on update/delete we do a table scan, we could 
> check that we are about to write a row with currentTransactionId < 
> (currentTransactionId of row we've read) and fail the query.  Currently, 
> currentTransactionId is not surfaced at higher level where this check can be 
> made.
> This would not work (efficiently) longer term where we want to support fast 
> update on user defined PK vis streaming ingest.
> Also, this would not work with multi statement txns since in that case we'd 
> lock in the snapshot at the start of the txn, but then 2nd, 3rd etc queries 
> would use the same snapshot and the locks for these queries would be acquired 
> after the snapshot is locked in so this would be the same situation as pre 
> HIVE-11077.
> 
>  
> A more robust solution (commonly used with MVCC) is to keep track of start 
> and commit time (logical counter) or each transaction to detect if two txns 
> overlap.  The 2nd part is to keep track of write-set, i.e. which data (rows, 
> partitions, whatever appropriate level of granularity is) were modified by 
> any txn and if 2 txns overlap in time and wrote the same element, abort later 
> one.  This is called first-committer-wins rule.  This requires a MS DB schema 
> change
> It would be most convenient to use the same sequence for txnId, start and 
> commit time (in which case txnid=start time).  In this case we'd need to add 
> 1 filed to TXNS table.  The complication here is that we'll be using elements 
> of the sequence faster and they are used as part of file name of delta and 
> base dir and currently limited to 7 digits which can be exceeded.  So this 
> would require some thought to handling upgrade/migration.
> Also, write-set tracking requires either additional metastore table or 
> keeping info in HIVE_LOCKS around longer with new state.
> 
> In the short term, on SQL side of things we could (in auto commit mode only)
> acquire the locks first and then open the txn AND update these locks with txn 
> id.
> This implies another Thrift change to pass in lockId to openTxn.
> The same would not work for Streaming API since it opens several txns at once 
> and then acquires locks for each.
> (Not sure if that's is an issue or not since Streaming only does Insert).
> Either way this feels hacky.
> 
> Here is one simple example why we need Write-Set tracking for multi-statement 
> txns
> Consider transactions T ~1~ and T ~2~:
> T ~1~: r ~1~\[x] -> w ~1~\[y] -> c ~1~ 
> T ~2~: w ~2~\[x] -> w ~2~\[y] -> c ~2~  
> Suppose the order of operations 

[jira] [Updated] (HIVE-13622) WriteSet tracking optimizations

2016-04-26 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13622:
--
Description: 
HIVE-13395 solves the the lost update problem with some inefficiencies.

1. TxhHandler.OperationType is currently derived from LockType.  This doesn't  
distinguish between Update and Delete but would be useful.  See comments in 
TxnHandler.  Should be able to pass in Insert/Update/Delete info from client 
into TxnHandler.
2. TxnHandler.addDynamicPartitions() should know the OperationType as well from 
the client.  It currently extrapolates it from TXN_COMPONENTS.  This works but 
requires extra SQL statements and is thus less performant.  It will not work 
multi-stmt txns.  See comments in the code.
3. TxnHandler.checkLock() see more comments around 
"isPartOfDynamicPartitionInsert".  If TxnHandler knew whether it is being 
called as part of an op running with dynamic partitions, it could be more 
efficient.  In that case we don't have to write to TXN_COMPONENTS at all during 
lock acquisition.  Conversely, if not running with DynPart then, we can kill 
current txn on lock grant rather than wait until commit time.

All of these require some Thrift changes

  was:
HIVE-13395 solves the the lost update problem with some inefficiencies.

1. TxhHandler.OperationType is currently derived from LockType.  This doesn't  
distinguish between Update and Delete but would be useful.  See comments in 
TxnHandler.  Should be able to pass in Insert/Update/Delete info from client 
into TxnHandler.
2. TxnHandler.addDynamicPartitions() should know the OperationType as well from 
the client.  It currently extrapolates it from TXN_COMPONENTS.  This works but 
requires extra SQL statements and is thus less performant.  It will not work 
multi-stmt txns.  See comments in the code.
3. TxnHandler.checkLock() see more comments around 
"isPartOfDynamicPartitionInsert".  If TxnHandler knew whether it is being 
called as part of an op running with dynamic partitions, it could be more 
efficient.  In that case we don't have to write to TXN_COMPONENTS at all during 
lock acquisition.  Conversely, if not running with DynPart then, we can kill 
current txn on lock grant rather than wait until commit time.


> WriteSet tracking optimizations
> ---
>
> Key: HIVE-13622
> URL: https://issues.apache.org/jira/browse/HIVE-13622
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> HIVE-13395 solves the the lost update problem with some inefficiencies.
> 1. TxhHandler.OperationType is currently derived from LockType.  This doesn't 
>  distinguish between Update and Delete but would be useful.  See comments in 
> TxnHandler.  Should be able to pass in Insert/Update/Delete info from client 
> into TxnHandler.
> 2. TxnHandler.addDynamicPartitions() should know the OperationType as well 
> from the client.  It currently extrapolates it from TXN_COMPONENTS.  This 
> works but requires extra SQL statements and is thus less performant.  It will 
> not work multi-stmt txns.  See comments in the code.
> 3. TxnHandler.checkLock() see more comments around 
> "isPartOfDynamicPartitionInsert".  If TxnHandler knew whether it is being 
> called as part of an op running with dynamic partitions, it could be more 
> efficient.  In that case we don't have to write to TXN_COMPONENTS at all 
> during lock acquisition.  Conversely, if not running with DynPart then, we 
> can kill current txn on lock grant rather than wait until commit time.
> All of these require some Thrift changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13621) compute stats in certain cases fails with NPE

2016-04-26 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-13621:
--
Attachment: HIVE-13621.1.patch

> compute stats in certain cases fails with NPE
> -
>
> Key: HIVE-13621
> URL: https://issues.apache.org/jira/browse/HIVE-13621
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Metastore, Metastore
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-13621.1.patch
>
>
> {code}
> FAILED: NullPointerException null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatistics(StatsUtils.java:693)
>   at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.convertColStats(StatsUtils.java:739)
>   at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:728)
>   at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:183)
>   at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:136)
>   at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:124){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13621) compute stats in certain cases fails with NPE

2016-04-26 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-13621:
--
Status: Patch Available  (was: Open)

> compute stats in certain cases fails with NPE
> -
>
> Key: HIVE-13621
> URL: https://issues.apache.org/jira/browse/HIVE-13621
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Metastore, Metastore
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-13621.1.patch
>
>
> {code}
> FAILED: NullPointerException null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatistics(StatsUtils.java:693)
>   at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.convertColStats(StatsUtils.java:739)
>   at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:728)
>   at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:183)
>   at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:136)
>   at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:124){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13620) Merge llap branch work to master

2016-04-26 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-13620:
--
Attachment: llap_master_diff.txt

Attaching diff of llap branch with master.

> Merge llap branch work to master
> 
>
> Key: HIVE-13620
> URL: https://issues.apache.org/jira/browse/HIVE-13620
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: llap_master_diff.txt
>
>
> Would like to try to merge the llap branch work for HIVE-12991 into the 
> master branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13563) Hive Streaming does not honor orc.compress.size and orc.stripe.size table properties

2016-04-26 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259235#comment-15259235
 ] 

Wei Zheng commented on HIVE-13563:
--

ping [~owen.omalley]..

> Hive Streaming does not honor orc.compress.size and orc.stripe.size table 
> properties
> 
>
> Key: HIVE-13563
> URL: https://issues.apache.org/jira/browse/HIVE-13563
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>  Labels: TODOC2.1
> Attachments: HIVE-13563.1.patch
>
>
> According to the doc:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-HiveQLSyntax
> One should be able to specify tblproperties for many ORC options.
> But the settings for orc.compress.size and orc.stripe.size don't take effect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13568) Add UDFs to support column-masking

2016-04-26 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259229#comment-15259229
 ] 

Gunther Hagleitner commented on HIVE-13568:
---

Some more comments on RB. Getting closer. I've looked through the test failures 
above. They look unrelated.

> Add UDFs to support column-masking
> --
>
> Key: HIVE-13568
> URL: https://issues.apache.org/jira/browse/HIVE-13568
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Madhan Neethiraj
>Assignee: Madhan Neethiraj
> Attachments: HIVE-13568.1.patch, HIVE-13568.1.patch, 
> HIVE-13568.2.patch
>
>
> HIVE-13125 added support to provide column-masking and row-filtering during 
> select via HiveAuthorizer interface. This JIRA is track addition of UDFs that 
> can be used by HiveAuthorizer implementations to mask column values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC

2016-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9660:
---
Attachment: HIVE-9660.11.patch

Rebased again, removed the writer version (which was also the cause of some 
test failures)

> store end offset of compressed data for RG in RowIndex in ORC
> -
>
> Key: HIVE-9660
> URL: https://issues.apache.org/jira/browse/HIVE-9660
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9660.01.patch, HIVE-9660.02.patch, 
> HIVE-9660.03.patch, HIVE-9660.04.patch, HIVE-9660.05.patch, 
> HIVE-9660.06.patch, HIVE-9660.07.patch, HIVE-9660.07.patch, 
> HIVE-9660.08.patch, HIVE-9660.09.patch, HIVE-9660.10.patch, 
> HIVE-9660.10.patch, HIVE-9660.11.patch, HIVE-9660.patch, HIVE-9660.patch
>
>
> Right now the end offset is estimated, which in some cases results in tons of 
> extra data being read.
> We can add a separate array to RowIndex (positions_v2?) that stores number of 
> compressed buffers for each RG, or end offset, or something, to remove this 
> estimation magic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12878) Support Vectorization for TEXTFILE and other formats

2016-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259218#comment-15259218
 ] 

Sergey Shelukhin commented on HIVE-12878:
-

Left some more comments on RB

> Support Vectorization for TEXTFILE and other formats
> 
>
> Key: HIVE-12878
> URL: https://issues.apache.org/jira/browse/HIVE-12878
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12878.01.patch, HIVE-12878.02.patch, 
> HIVE-12878.03.patch, HIVE-12878.04.patch, HIVE-12878.05.patch, 
> HIVE-12878.06.patch, HIVE-12878.07.patch, HIVE-12878.08.patch, 
> HIVE-12878.09.patch
>
>
> Support vectorizing when the input format is TEXTFILE and other formats for 
> better Map Vertex performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13560) Adding Omid as connection manager for HBase Metastore

2016-04-26 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259212#comment-15259212
 ] 

Alan Gates commented on HIVE-13560:
---

In conf/tez/hive-site.xml did you mean to stomp the 
hive.orc.splits.ms.footer.cache.enabled value?

Why take out the option to test Tephra, since we haven't taken out the Tephra 
connector?

HBaseStore.java, line 451, why did you change the catch from IOException to 
Exception?  I can't see any other changes in the code that should require this.

I don't understand why you removed the transactions from getPartitionsByExpr.

I think removing the transactions around the get/putFileMetadata is fine, but 
we should explicitly comment that these operations are outside of the 
transactions and why.

It's not clear to me that you need OmidHBaseConnection.transaction to be a 
thread local variable.  HBaseReadWrite is already a thread local in HBaseStore, 
so you should be guaranteed that there's an OmidHBaseConnection per thread.





> Adding Omid as connection manager for HBase Metastore
> -
>
> Key: HIVE-13560
> URL: https://issues.apache.org/jira/browse/HIVE-13560
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-13560.1.patch, HIVE-13560.2.patch, 
> HIVE-13560.3.patch
>
>
> Adding Omid as a transaction manager to HBase Metastore. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10176) skip.header.line.count causes values to be skipped when performing insert values

2016-04-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259209#comment-15259209
 ] 

Hive QA commented on HIVE-10176:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12800836/HIVE-10176.14.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/91/testReport
Console output: 
http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/91/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-91/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] 
[INFO] 
[INFO] Building Spark Remote Client 2.1.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ spark-client ---
[INFO] Deleting 
/data/hive-ptest/working/apache-github-source-source/spark-client/target
[INFO] Deleting 
/data/hive-ptest/working/apache-github-source-source/spark-client (includes = 
[datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
spark-client ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ spark-client 
---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
spark-client ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-github-source-source/spark-client/src/main/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ spark-client ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ spark-client 
---
[INFO] Compiling 28 source files to 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/classes
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java:
 
/data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
 uses or overrides a deprecated API.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java:
 Recompile with -Xlint:deprecation for details.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java:
 Some input files use unchecked or unsafe operations.
[WARNING] 
/data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java:
 Recompile with -Xlint:unchecked for details.
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
spark-client ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ spark-client ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp/conf
 [copy] Copying 15 files to 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
spark-client ---
[INFO] Compiling 5 source files to 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/test-classes
[INFO] 
[INFO] --- maven-dependency-plugin:2.8:copy (copy-guava-14) @ spark-client ---
[INFO] Configured Artifact: com.google.guava:guava:14.0.1:jar
[INFO] Copying guava-14.0.1.jar to 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/dependency/guava-14.0.1.jar
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ spark-client ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ spark-client ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-github-source-source/spark-client/target/spark-client-2.1.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
spark-client ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ spark-client ---

[jira] [Commented] (HIVE-13421) Propagate job progress in operation status

2016-04-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259206#comment-15259206
 ] 

Hive QA commented on HIVE-13421:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12800477/HIVE-13421.05.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/89/testReport
Console output: 
http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/89/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-89/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-MASTER-Build-89/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   af4766d..a3502d0  branch-2.0 -> origin/branch-2.0
   1548501..815499a  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 1548501 HIVE-13541: Pass view's ColumnAccessInfo to 
HiveAuthorizer (Pengcheng Xiong, reviewed by Ashutosh Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 3 commits, and can be fast-forwarded.
+ git reset --hard origin/master
HEAD is now at 815499a HIVE-13585: Add counter metric for direct sql failures 
(Mohit Sabharwal, reviewed by Aihua Xu, Sergey Shelukhin)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12800477 - PreCommit-HIVE-MASTER-Build

> Propagate job progress in operation status
> --
>
> Key: HIVE-13421
> URL: https://issues.apache.org/jira/browse/HIVE-13421
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Fix For: 2.1.0
>
> Attachments: HIVE-13421.01.patch, HIVE-13421.02.patch, 
> HIVE-13421.03.patch, HIVE-13421.04.patch, HIVE-13421.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13603) Fix ptest unit tests broken by HIVE13505

2016-04-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259204#comment-15259204
 ] 

Hive QA commented on HIVE-13603:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12800459/HIVE-13603.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 81 failed/errored test(s), 9953 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nomore_ambiguous_table_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regexp_extract
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge9
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge_diff_fs
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join5
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern3
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern4
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_nonkey_groupby
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_selectDistinctStarNeg_2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_subquery_shared_alias
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udtf_not_supported1
org.apache.hadoop.hive.llap.daemon.impl.comparator.TestShortestJobFirstComparator.testWaitQueueComparatorWithinDagPriority
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testForcedLocalityPreemption
org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote.org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote
org.apache.hadoop.hive.metastore.TestFilterHooks.org.apache.hadoop.hive.metastore.TestFilterHooks
org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener
org.apache.hadoop.hive.metastore.TestMetaStoreEventListenerOnlyOnCommit.testEventStatus
org.apache.hadoop.hive.metastore.TestMetaStoreInitListener.testMetaStoreInitListener
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.org.apache.hadoop.hive.metastore.TestMetaStoreMetrics
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithUnicode
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithValidPartVal
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithCommas
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithUnicode
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithValidCharacters
org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler
org.apache.hadoop.hive.ql.security.TestClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestExtendedAcls.org.apache.hadoop.hive.ql.security.TestExtendedAcls
org.apache.hadoop.hive.ql.security.TestFolderPermissions.org.apache.hadoop.hive.ql.security.TestFolderPermissions
org.apache.hadoop.hive.ql.security.TestMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener.org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener

[jira] [Commented] (HIVE-13615) nomore_ambiguous_table_col.q is failing on master

2016-04-26 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259199#comment-15259199
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-13615:
--

I think this is a much generic issue which got exposed by the recent parser 
changes in HIVE-13290. In general, if we use any of the nonreserved key words 
in IdentifiersParser.g as an alias in the statements like below, we will run 
into an error:

{code}
FROM src rely 
INSERT OVERWRITE TABLE ambiguous SELECT rely.key, rely.value WHERE rely.value < 
'val_100';

FROM src key
INSERT OVERWRITE TABLE ambiguous SELECT key.key, key.value WHERE key.value < 
'val_100';

 FROM src uri
 INSERT OVERWRITE TABLE ambiguous SELECT uri.key, uri.value WHERE uri.value < 
'val_100';
{code}

> nomore_ambiguous_table_col.q is failing on master
> -
>
> Key: HIVE-13615
> URL: https://issues.apache.org/jira/browse/HIVE-13615
> Project: Hive
>  Issue Type: Test
>  Components: Parser
>Affects Versions: 2.1.0
>Reporter: Ashutosh Chauhan
>
> Fails with:
> FAILED: ParseException line 3:9 cannot recognize input near 'src' 'key' 
> 'INSERT' in from source 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13585) Add counter metric for direct sql failures

2016-04-26 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-13585:

   Resolution: Fixed
Fix Version/s: 2.1.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks Mohit for the work.

> Add counter metric for direct sql failures
> --
>
> Key: HIVE-13585
> URL: https://issues.apache.org/jira/browse/HIVE-13585
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohit Sabharwal
>Assignee: Mohit Sabharwal
> Fix For: 2.1.0
>
> Attachments: HIVE-13585.patch
>
>
> In case of direct sql failure, metastore query falls back to DataNucleus. 
> It'd be good to record how often this happens as a metrics counter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12887) Handle ORC schema on read with fewer columns than file schema (after Schema Evolution changes)

2016-04-26 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12887:

Fix Version/s: 2.0.1

> Handle ORC schema on read with fewer columns than file schema (after Schema 
> Evolution changes)
> --
>
> Key: HIVE-12887
> URL: https://issues.apache.org/jira/browse/HIVE-12887
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 1.3.0, 2.1.0, 2.0.1
>
> Attachments: HIVE-12887.01.patch, HIVE-12887.02.patch
>
>
> Exception caused by reading after column removal.
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 10, Size: 10
>   at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>   at java.util.ArrayList.get(ArrayList.java:429)
>   at java.util.Collections$UnmodifiableList.get(Collections.java:1309)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240)
>   at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:2053)
>   at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2481)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:216)
>   at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:598)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:179)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.(OrcRawRecordMerger.java:222)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:442)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1285)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1165)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13619) Bucket map join plan is incorrect

2016-04-26 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259174#comment-15259174
 ] 

Vikram Dixit K commented on HIVE-13619:
---

Yes. The method is named findSingleUpstreamOperatorJoinAccounted. We are 
expecting only one instance of the operator type to be returned. It is not 
trying to find one specific operator in the list.

> Bucket map join plan is incorrect
> -
>
> Key: HIVE-13619
> URL: https://issues.apache.org/jira/browse/HIVE-13619
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-13619.1.patch
>
>
> Same as HIVE-12992. Missed a single line check. TPCDS query 4 with bucketing 
> can produce this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12887) Handle ORC schema on read with fewer columns than file schema (after Schema Evolution changes)

2016-04-26 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259168#comment-15259168
 ] 

Matt McCline commented on HIVE-12887:
-

Committed to branch-2.0

> Handle ORC schema on read with fewer columns than file schema (after Schema 
> Evolution changes)
> --
>
> Key: HIVE-12887
> URL: https://issues.apache.org/jira/browse/HIVE-12887
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 1.3.0, 2.1.0, 2.0.1
>
> Attachments: HIVE-12887.01.patch, HIVE-12887.02.patch
>
>
> Exception caused by reading after column removal.
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 10, Size: 10
>   at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>   at java.util.ArrayList.get(ArrayList.java:429)
>   at java.util.Collections$UnmodifiableList.get(Collections.java:1309)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240)
>   at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:2053)
>   at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2481)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:216)
>   at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:598)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:179)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.(OrcRawRecordMerger.java:222)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:442)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1285)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1165)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13619) Bucket map join plan is incorrect

2016-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259158#comment-15259158
 ] 

Sergey Shelukhin commented on HIVE-13619:
-

The method expecting a single thing is ok with just taking the first of many 
things?

> Bucket map join plan is incorrect
> -
>
> Key: HIVE-13619
> URL: https://issues.apache.org/jira/browse/HIVE-13619
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-13619.1.patch
>
>
> Same as HIVE-12992. Missed a single line check. TPCDS query 4 with bucketing 
> can produce this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12799) Always use Schema Evolution for ACID

2016-04-26 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259152#comment-15259152
 ] 

Matt McCline commented on HIVE-12799:
-

Also committed to branch-2.0

> Always use Schema Evolution for ACID
> 
>
> Key: HIVE-12799
> URL: https://issues.apache.org/jira/browse/HIVE-12799
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>  Labels: TODOC1.3, TODOC2.1
> Fix For: 2.1.0, 2.0.1
>
> Attachments: HIVE-12799.01.patch, HIVE-12799.02.patch, 
> HIVE-12799.03.patch, HIVE-12799.04.patch
>
>
> Always use Schema Evolution for ACID -- ignore hive.exec.schema.evolution 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12799) Always use Schema Evolution for ACID

2016-04-26 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12799:

Fix Version/s: 2.0.1

> Always use Schema Evolution for ACID
> 
>
> Key: HIVE-12799
> URL: https://issues.apache.org/jira/browse/HIVE-12799
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>  Labels: TODOC1.3, TODOC2.1
> Fix For: 2.1.0, 2.0.1
>
> Attachments: HIVE-12799.01.patch, HIVE-12799.02.patch, 
> HIVE-12799.03.patch, HIVE-12799.04.patch
>
>
> Always use Schema Evolution for ACID -- ignore hive.exec.schema.evolution 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13241) LLAP: Incremental Caching marks some small chunks as "incomplete CB"

2016-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259145#comment-15259145
 ] 

Sergey Shelukhin commented on HIVE-13241:
-

Committed the description change in the addendum patch

> LLAP: Incremental Caching marks some small chunks as "incomplete CB"
> 
>
> Key: HIVE-13241
> URL: https://issues.apache.org/jira/browse/HIVE-13241
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.1
> Fix For: 2.1.0
>
> Attachments: HIVE-13241.01.patch, HIVE-13241.patch
>
>
> Run #3 of a query with 1 node still has cache misses.
> {code}
> LLAP IO Summary
> --
>   VERTICES ROWGROUPS  META_HIT  META_MISS  DATA_HIT  DATA_MISS  ALLOCATION
>  USED  TOTAL_IO
> --
>  Map 111  1116  01.65GB93.61MB  0B
>0B32.72s
> --
> {code}
> {code}
> 2016-03-08T21:05:39,417 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:prepareRangesForCompressedRead(695)) - Locking 
> 0x1c44401d(1) due to reuse
> 2016-03-08T21:05:39,417 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:prepareRangesForCompressedRead(701)) - Adding an 
> already-uncompressed buffer 0x1c44401d(2)
> 2016-03-08T21:05:39,417 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:prepareRangesForCompressedRead(695)) - Locking 
> 0x4e51b032(1) due to reuse
> 2016-03-08T21:05:39,417 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:prepareRangesForCompressedRead(701)) - Adding an 
> already-uncompressed buffer 0x4e51b032(2)
> 2016-03-08T21:05:39,418 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:addOneCompressionBuffer(1161)) - Found CB at 1373931, 
> chunk length 86587, total 86590, compressed
> 2016-03-08T21:05:39,418 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:addIncompleteCompressionBuffer(1241)) - Replacing 
> data range [1373931, 1408408), size: 34474(!) type: direct (and 0 previous 
> chunks) with incomplete CB start: 1373931 end: 1408408 in the buffers
> 2016-03-08T21:05:39,418 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:createRgColumnStreamData(441)) - Getting data for 
> column 7 RG 14 stream DATA at 1460521, 319811 index position 0: compressed 
> [1626961, 1780332)
> {code}
> {code}
> 2016-03-08T21:05:38,925 INFO  
> [IO-Elevator-Thread-7[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.OrcEncodedDataReader (OrcEncodedDataReader.java:readFileData(878)) - 
> Disk ranges after disk read (file 5372745, base offset 3): [{start: 18986 
> end: 20660 cache buffer: 0x660faf7c(1)}, {start: 20660 end: 35775 cache 
> buffer: 0x1dcb1d97(1)}, {start: 318852 end: 422353 cache buffer: 
> 0x6c7f9a05(1)}, {start: 1148616 end: 1262468 cache buffer: 0x196e1d41(1)}, 
> {start: 1262468 end: 1376342 cache buffer: 0x201255f(1)}, {data range 
> [1376342, 1410766), size: 34424 type: direct}, {start: 1631359 end: 1714694 
> cache buffer: 0x47e3a72d(1)}, {start: 1714694 end: 1785770 cache buffer: 
> 0x57dca266(1)}, {start: 4975035 end: 5095215 cache buffer: 0x3e3139c9(1)}, 
> {start: 5095215 end: 5197863 cache buffer: 0x3511c88d(1)}, {start: 7448387 
> end: 7572268 cache buffer: 0x6f11dbcd(1)}, {start: 7572268 end: 7696182 cache 
> buffer: 0x5d6c9bdb(1)}, {data range [7696182, 7710537), size: 14355 type: 
> direct}, {start: 8235756 end: 8345367 cache buffer: 0x6a241ece(1)}, {start: 
> 8345367 end: 8455009 cache buffer: 0x51caf6a7(1)}, {data range [8455009, 
> 8497906), size: 42897 type: direct}, {start: 9035815 end: 9159708 cache 
> buffer: 0x306480e0(1)}, {start: 9159708 end: 9283629 cache buffer: 
> 0x9ef7774(1)}, {data range [9283629, 9297965), size: 14336 type: direct}, 
> {start: 9989884 end: 10113731 cache buffer: 0x43f7cae9(1)}, {start: 10113731 
> end: 10237589 cache buffer: 0x458e63fe(1)}, {data range [10237589, 10252034), 
> size: 14445 type: direct}, {start: 11897896 end: 12021787 cache buffer: 
> 0x51f9982f(1)}, {start: 12021787 end: 12145656 cache 

[jira] [Commented] (HIVE-13142) Make HiveSplitGenerator usable independent of Tez

2016-04-26 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259137#comment-15259137
 ] 

Jason Dere commented on HIVE-13142:
---

At least some of this may have been done as part of the cleanup in HIVE-13594.

> Make HiveSplitGenerator usable independent of Tez
> -
>
> Key: HIVE-13142
> URL: https://issues.apache.org/jira/browse/HIVE-13142
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>
> Already exists in the branch, but a bunch of cleanup is required. The branch 
> contains code which makes some fields non-final, and a separate set method 
> instead of the constructor. Should be simplified so that it can be 
> constructed independently of Tez



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore

2016-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259139#comment-15259139
 ] 

Sergey Shelukhin commented on HIVE-13596:
-

RB at https://reviews.apache.org/r/46715/

> HS2 should be able to get UDFs on demand from metastore
> ---
>
> Key: HIVE-13596
> URL: https://issues.apache.org/jira/browse/HIVE-13596
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13596.01.patch, HIVE-13596.02.patch, 
> HIVE-13596.patch
>
>
> When multiple HS2s are run, creating a permanent fn is only executed on one 
> of them, and the other HS2s don't get the new function. Unlike say with 
> tables, where we always get stuff from db on demand, fns are registered at 
> certain points in the code and if the new one is not registered, it will not 
> be available. 
> We should restore the pre-HIVE-2573 behavior of being able to refresh the 
> UDFs on demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore

2016-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13596:

Attachment: HIVE-13596.02.patch

Updated the patch to add it to the system registry instead, after some 
discussion. I changed the synchronized methods to use a lock object without any 
logic changes wrt locking, and then took the MS call (and resource acquisition) 
out of the lock to avoid bottlenecking on the system registry.

> HS2 should be able to get UDFs on demand from metastore
> ---
>
> Key: HIVE-13596
> URL: https://issues.apache.org/jira/browse/HIVE-13596
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13596.01.patch, HIVE-13596.02.patch, 
> HIVE-13596.patch
>
>
> When multiple HS2s are run, creating a permanent fn is only executed on one 
> of them, and the other HS2s don't get the new function. Unlike say with 
> tables, where we always get stuff from db on demand, fns are registered at 
> certain points in the code and if the new one is not registered, it will not 
> be available. 
> We should restore the pre-HIVE-2573 behavior of being able to refresh the 
> UDFs on demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-13522) regexp_extract.q hangs on master

2016-04-26 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-13522.
--
   Resolution: Fixed
 Assignee: Owen O'Malley  (was: Ashutosh Chauhan)
Fix Version/s: 2.1.0

This got closed as part of HIVE-12159.

> regexp_extract.q hangs on master
> 
>
> Key: HIVE-13522
> URL: https://issues.apache.org/jira/browse/HIVE-13522
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Ashutosh Chauhan
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 2.1.0
>
> Attachments: HIVE-13522.patch, jstack_regexp_extract.txt
>
>
> Disable to unblock Hive QA runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13611) add jar causes beeline not to output log messages

2016-04-26 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-13611:
---
Description: 
After adding a jar in beeline warning messages and job log ouptut are no longer 
shown. This only occurs if you use short connection strings (e.g. 
jdbc:hive2://). Example below:
{code}
0: jdbc:hive2://nightly55-1.gce.cloudera.com:> !connect jdbc:hive2://
Connecting to jdbc:hive2://
Enter username for jdbc:hive2://: hive
Enter password for jdbc:hive2://: 
Connected to: Apache Hive (version 1.1.0-cdh5.5.4)
Driver: Hive JDBC (version 1.1.0-cdh5.5.4)
Transaction isolation: TRANSACTION_REPEATABLE_READ
1: jdbc:hive2://> select count(*) from sample_07 limit 1;
INFO  : Number of reduce tasks determined at compile time: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1461621650734_0020
INFO  : The url to track the job: 
http://nightly55-1.gce.cloudera.com:8088/proxy/application_1461621650734_0020/
INFO  : Starting Job = job_1461621650734_0020, Tracking URL = 
http://nightly55-1.gce.cloudera.com:8088/proxy/application_1461621650734_0020/
INFO  : Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill 
job_1461621650734_0020
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of 
reducers: 1
INFO  : 2016-04-26 01:36:04,297 Stage-1 map = 0%,  reduce = 0%
INFO  : 2016-04-26 01:36:11,802 Stage-1 map = 100%,  reduce = 0%, Cumulative 
CPU 1.52 sec
INFO  : 2016-04-26 01:36:19,419 Stage-1 map = 100%,  reduce = 100%, Cumulative 
CPU 3.25 sec
INFO  : MapReduce Total cumulative CPU time: 3 seconds 250 msec
INFO  : Ended Job = job_1461621650734_0020
+--+--+
| _c0  |
+--+--+
| 823  |
+--+--+
1 row selected (25.908 seconds)
1: jdbc:hive2://> add jar hdfs://some_nn.com/tmp/somedir/some_jar.jar 
1: jdbc:hive2://> ;
converting to local hdfs://some_nn.com/tmp/somedir/some_jar.jar
Added [/tmp/93ca63a2-5019-4f37-b9b4-75f1740b53c8_resources/some_jar.jar] to 
class path
Added resources: [hdfs://some_nn.com/tmp/somedir/some_jar.jar]
No rows affected (0.179 seconds)
1: jdbc:hive2://> select count(*) from sample_07 limit 1;
+--+--+
| _c0  |
+--+--+
| 823  |
+--+--+
1: jdbc:hive2://> 
{code}

  was:
After adding a jar in beeline warning messages and job log ouptut are no longer 
shown. This only occurs if you use short connection strings (e.g. 
jdbc:hive2://). Example below:

0: jdbc:hive2://nightly55-1.gce.cloudera.com:> !connect jdbc:hive2://
Connecting to jdbc:hive2://
Enter username for jdbc:hive2://: hive
Enter password for jdbc:hive2://: 
Connected to: Apache Hive (version 1.1.0-cdh5.5.4)
Driver: Hive JDBC (version 1.1.0-cdh5.5.4)
Transaction isolation: TRANSACTION_REPEATABLE_READ
1: jdbc:hive2://> select count(*) from sample_07 limit 1;
INFO  : Number of reduce tasks determined at compile time: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1461621650734_0020
INFO  : The url to track the job: 
http://nightly55-1.gce.cloudera.com:8088/proxy/application_1461621650734_0020/
INFO  : Starting Job = job_1461621650734_0020, Tracking URL = 
http://nightly55-1.gce.cloudera.com:8088/proxy/application_1461621650734_0020/
INFO  : Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill 
job_1461621650734_0020
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of 
reducers: 1
INFO  : 2016-04-26 01:36:04,297 Stage-1 map = 0%,  reduce = 0%
INFO  : 2016-04-26 01:36:11,802 Stage-1 map = 100%,  reduce = 0%, Cumulative 
CPU 1.52 sec
INFO  : 2016-04-26 01:36:19,419 Stage-1 map = 100%,  reduce = 100%, Cumulative 
CPU 3.25 sec
INFO  : MapReduce Total cumulative CPU time: 3 seconds 250 msec
INFO  : Ended Job = job_1461621650734_0020
+--+--+
| _c0  |
+--+--+
| 823  |
+--+--+
1 row selected (25.908 seconds)
1: jdbc:hive2://> add jar hdfs://some_nn.com/tmp/somedir/some_jar.jar 
1: jdbc:hive2://> ;
converting to local hdfs://some_nn.com/tmp/somedir/some_jar.jar
Added [/tmp/93ca63a2-5019-4f37-b9b4-75f1740b53c8_resources/some_jar.jar] to 
class path
Added resources: [hdfs://some_nn.com/tmp/somedir/some_jar.jar]
No rows affected (0.179 seconds)
1: jdbc:hive2://> select count(*) from sample_07 limit 1;
+--+--+
| _c0  |
+--+--+
| 823  |
+--+--+
1: jdbc:hive2://> 


> add jar causes beeline not to output log messages
> 

[jira] [Updated] (HIVE-13619) Bucket map join plan is incorrect

2016-04-26 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-13619:
--
Status: Patch Available  (was: Open)

> Bucket map join plan is incorrect
> -
>
> Key: HIVE-13619
> URL: https://issues.apache.org/jira/browse/HIVE-13619
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-13619.1.patch
>
>
> Same as HIVE-12992. Missed a single line check. TPCDS query 4 with bucketing 
> can produce this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13619) Bucket map join plan is incorrect

2016-04-26 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-13619:
--
Attachment: HIVE-13619.1.patch

> Bucket map join plan is incorrect
> -
>
> Key: HIVE-13619
> URL: https://issues.apache.org/jira/browse/HIVE-13619
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-13619.1.patch
>
>
> Same as HIVE-12992. Missed a single line check. TPCDS query 4 with bucketing 
> can produce this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13619) Bucket map join plan is incorrect

2016-04-26 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-13619:
--
Description: Same as HIVE-12992. Missed a single line check. TPCDS query 4 
with bucketing can produce this issue.  (was: Same as HIVE-12992. Missed a 
single line check.)

> Bucket map join plan is incorrect
> -
>
> Key: HIVE-13619
> URL: https://issues.apache.org/jira/browse/HIVE-13619
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>
> Same as HIVE-12992. Missed a single line check. TPCDS query 4 with bucketing 
> can produce this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13408) Issue appending HIVE_QUERY_ID without checking if the prefix already exists

2016-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13408:

Target Version/s:   (was: 2.1.0, 2.0.1)

> Issue appending HIVE_QUERY_ID without checking if the prefix already exists
> ---
>
> Key: HIVE-13408
> URL: https://issues.apache.org/jira/browse/HIVE-13408
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-13408.1.patch, HIVE-13408.2.patch
>
>
> {code}
> We are resetting the hadoop caller context to HIVE_QUERY_ID:HIVE_QUERY_ID:
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13618) Trailing spaces in partition column will be treated differently

2016-04-26 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13618:
---
Summary: Trailing spaces in partition column will be treated differently  
(was: trailing spaces in partition column will be treated differently)

> Trailing spaces in partition column will be treated differently
> ---
>
> Key: HIVE-13618
> URL: https://issues.apache.org/jira/browse/HIVE-13618
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> We store the partition spec value in the metastore. In mysql (and derby i 
> think), the trailing space is ignored. That is, if you have a partition 
> column "col" (type varchar or string) with value "a " and then select from 
> the table where col = "a", it will return. However, in postgres and Oracle, 
> the trailing space is not ignored. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13408) Issue appending HIVE_QUERY_ID without checking if the prefix already exists

2016-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13408:

Resolution: Invalid
Status: Resolved  (was: Patch Available)

This is actually broken by HIVE-12254 (that is not committed yet), according to 
[~vikram.dixit]; should be included there.

> Issue appending HIVE_QUERY_ID without checking if the prefix already exists
> ---
>
> Key: HIVE-13408
> URL: https://issues.apache.org/jira/browse/HIVE-13408
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-13408.1.patch, HIVE-13408.2.patch
>
>
> {code}
> We are resetting the hadoop caller context to HIVE_QUERY_ID:HIVE_QUERY_ID:
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12254) Improve logging with yarn/hdfs

2016-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259108#comment-15259108
 ] 

Sergey Shelukhin commented on HIVE-12254:
-

+1 conditional on also including the fix for HIVE-13408 :)

> Improve logging with yarn/hdfs
> --
>
> Key: HIVE-12254
> URL: https://issues.apache.org/jira/browse/HIVE-12254
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 1.2.1
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-12254.1.patch, HIVE-12254.2.patch
>
>
> In extension to HIVE-12249, adding info for Yarn/HDFS as well. Both 
> HIVE-12249 and HDFS-9184 are required (and upgraded in hive for the HDFS 
> issue) before this can be resolved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13463) Fix ImportSemanticAnalyzer to allow for different src/dst filesystems

2016-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13463:

   Resolution: Fixed
Fix Version/s: 2.0.1
   2.1.0
   Status: Resolved  (was: Patch Available)

Committed to master and branch-2.0. Thanks for the patch

> Fix ImportSemanticAnalyzer to allow for different src/dst filesystems
> -
>
> Key: HIVE-13463
> URL: https://issues.apache.org/jira/browse/HIVE-13463
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Affects Versions: 2.0.0
>Reporter: Zach York
>Assignee: Zach York
> Fix For: 2.1.0, 2.0.1
>
> Attachments: HIVE-13463-1.patch, HIVE-13463-2.patch, 
> HIVE-13463-3.patch, HIVE-13463-4.patch, HIVE-13463.4.patch, HIVE-13463.patch
>
>
> In ImportSemanticAnalyzer, there is an assumption that the src filesystem for 
> import and the final location are on the same filesystem. Therefore the check 
> for emptiness and getExternalTmpLocation will be looking on the wrong 
> filesystem and will cause an error. The output path should be fed into 
> getExternalTmpLocation to get a temporary file on the correct filesystem. The 
> check for emptiness should use the output filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13617) LLAP: support non-vectorized execution in IO

2016-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259047#comment-15259047
 ] 

Sergey Shelukhin commented on HIVE-13617:
-

Yes :)

> LLAP: support non-vectorized execution in IO
> 
>
> Key: HIVE-13617
> URL: https://issues.apache.org/jira/browse/HIVE-13617
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> Two approaches - a separate decoding path, into rows instead of VRBs; or 
> decoding VRBs into rows on a higher level (the original LlapInputFormat). I 
> think the latter might be better - it's not a hugely important path, and perf 
> in non-vectorized case is not the best anyway, so it's better to make do with 
> much less new code and architectural disruption. 
> Some ORC patches in progress introduce an easy to reuse (or so I hope, 
> anyway) VRB-to-row conversion, so we should just use that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13603) Fix ptest unit tests broken by HIVE13505

2016-04-26 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259030#comment-15259030
 ] 

Siddharth Seth commented on HIVE-13603:
---

Oh well. Just so happens that the current test run is this patch. Will wait for 
it.

> Fix ptest unit tests broken by HIVE13505
> 
>
> Key: HIVE-13603
> URL: https://issues.apache.org/jira/browse/HIVE-13603
> Project: Hive
>  Issue Type: Task
>Affects Versions: 2.1.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-13603.01.patch
>
>
> HIVE-13505 broke some unit tests in the ptest2 framework, which need to be 
> fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13603) Fix ptest unit tests broken by HIVE13505

2016-04-26 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259023#comment-15259023
 ] 

Siddharth Seth commented on HIVE-13603:
---

The patch on HIVE-13505 itself works. It's been running on the precommit boxes 
for a while. Unfortunately I did not run the ptest2 tests along with that 
patch, and noticed ptest test failures while making other changes. This just 
updates the template files to remove the verfication of TestDummy. I'll commit 
this shortly, and remove it from the precommit queue.
Thanks for the review.

> Fix ptest unit tests broken by HIVE13505
> 
>
> Key: HIVE-13603
> URL: https://issues.apache.org/jira/browse/HIVE-13603
> Project: Hive
>  Issue Type: Task
>Affects Versions: 2.1.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-13603.01.patch
>
>
> HIVE-13505 broke some unit tests in the ptest2 framework, which need to be 
> fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13617) LLAP: support non-vectorized execution in IO

2016-04-26 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258969#comment-15258969
 ] 

Lefty Leverenz commented on HIVE-13617:
---

Aha, HIVE-11417 discusses VectorizedRowBatch.  Is that it?

> LLAP: support non-vectorized execution in IO
> 
>
> Key: HIVE-13617
> URL: https://issues.apache.org/jira/browse/HIVE-13617
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> Two approaches - a separate decoding path, into rows instead of VRBs; or 
> decoding VRBs into rows on a higher level (the original LlapInputFormat). I 
> think the latter might be better - it's not a hugely important path, and perf 
> in non-vectorized case is not the best anyway, so it's better to make do with 
> much less new code and architectural disruption. 
> Some ORC patches in progress introduce an easy to reuse (or so I hope, 
> anyway) VRB-to-row conversion, so we should just use that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13354) Add ability to specify Compaction options per table and per request

2016-04-26 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13354:
-
Summary: Add ability to specify Compaction options per table and per 
request  (was: Add ability to specify Compaction options per table)

> Add ability to specify Compaction options per table and per request
> ---
>
> Key: HIVE-13354
> URL: https://issues.apache.org/jira/browse/HIVE-13354
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>  Labels: TODOC2.1
>
> Currently the are a few options that determine when automatic compaction is 
> triggered.  They are specified once for the warehouse.
> This doesn't make sense - some table may be more important and need to be 
> compacted more often.
> We should allow specifying these on per table basis.
> Also, compaction is an MR job launched from within the metastore.  There is 
> currently no way to control job parameters (like memory, for example) except 
> to specify it in hive-site.xml for metastore which means they are site wide.
> Should add a way to specify these per table (perhaps even per compaction if 
> launched via ALTER TABLE)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13617) LLAP: support non-vectorized execution in IO

2016-04-26 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258959#comment-15258959
 ] 

Lefty Leverenz commented on HIVE-13617:
---

Acronym clarification:  What's a VRB when it's at home?

Google fun:  vanadium redox battery, variable reenlistment bonus, victim 
row-buffer, vodka red bull, Virginia regional ballet, etc.  (VRB-to-ROW is a 
flight from Vero Beach to Roswell.)

> LLAP: support non-vectorized execution in IO
> 
>
> Key: HIVE-13617
> URL: https://issues.apache.org/jira/browse/HIVE-13617
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> Two approaches - a separate decoding path, into rows instead of VRBs; or 
> decoding VRBs into rows on a higher level (the original LlapInputFormat). I 
> think the latter might be better - it's not a hugely important path, and perf 
> in non-vectorized case is not the best anyway, so it's better to make do with 
> much less new code and architectural disruption. 
> Some ORC patches in progress introduce an easy to reuse (or so I hope, 
> anyway) VRB-to-row conversion, so we should just use that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11417) Create shims for the row by row read path that is backed by VectorizedRowBatch

2016-04-26 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-11417:
-
Attachment: HIVE-11417.patch

Fixed the spurious changes that Matt found in the patch.

> Create shims for the row by row read path that is backed by VectorizedRowBatch
> --
>
> Key: HIVE-11417
> URL: https://issues.apache.org/jira/browse/HIVE-11417
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.1.0
>
> Attachments: HIVE-11417.patch, HIVE-11417.patch, HIVE-11417.patch, 
> HIVE-11417.patch
>
>
> I'd like to make the default path for reading and writing ORC files to be 
> vectorized. To ensure that Hive can still read row by row, we'll need shims 
> to support the old API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13607) Change website references to HQL/HiveQL to SQL

2016-04-26 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-13607:
--
Attachment: HIVE-13607.2.patch

Note to self: turn on spell checking in vim

Thanks [~leftylev] for catching that.  Posting a new patch with driver spelled 
correctly.  

> Change website references to HQL/HiveQL to SQL
> --
>
> Key: HIVE-13607
> URL: https://issues.apache.org/jira/browse/HIVE-13607
> Project: Hive
>  Issue Type: Improvement
>  Components: Website
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-13607.2.patch, HIVE-13607.patch
>
>
> When it started Hive's SQL dialect was far enough from standard SQL that the 
> developers called it HQL or HiveQL. 
> Over the years Hive's SQL dialect has matured.  It still has some oddities 
> but it is explicitly pushing towards SQL 2011 conformance.  Calling the 
> language anything but SQL now is confusing for users.
> In addition to changing the website I propose to make changes in the wiki.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore

2016-04-26 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1525#comment-1525
 ] 

Jason Dere commented on HIVE-13596:
---

Ok, looking at the code again, I see why checkFunctionClass() is no longer 
called - there is a separate registerToSessionRegistry() method to add a UDF 
from the system registry to the session registry. It's not bad to have it in 
the session registry, since the UDF does eventually need to get added there, 
though it is a bit different from the UDFs added via reloadFunctions() since 
those are added to the system registry.

> HS2 should be able to get UDFs on demand from metastore
> ---
>
> Key: HIVE-13596
> URL: https://issues.apache.org/jira/browse/HIVE-13596
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13596.01.patch, HIVE-13596.patch
>
>
> When multiple HS2s are run, creating a permanent fn is only executed on one 
> of them, and the other HS2s don't get the new function. Unlike say with 
> tables, where we always get stuff from db on demand, fns are registered at 
> certain points in the code and if the new one is not registered, it will not 
> be available. 
> We should restore the pre-HIVE-2573 behavior of being able to refresh the 
> UDFs on demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12887) Handle ORC schema on read with fewer columns than file schema (after Schema Evolution changes)

2016-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258887#comment-15258887
 ] 

Sergey Shelukhin commented on HIVE-12887:
-

[~mmccline] ping??

> Handle ORC schema on read with fewer columns than file schema (after Schema 
> Evolution changes)
> --
>
> Key: HIVE-12887
> URL: https://issues.apache.org/jira/browse/HIVE-12887
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 1.3.0, 2.1.0
>
> Attachments: HIVE-12887.01.patch, HIVE-12887.02.patch
>
>
> Exception caused by reading after column removal.
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 10, Size: 10
>   at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>   at java.util.ArrayList.get(ArrayList.java:429)
>   at java.util.Collections$UnmodifiableList.get(Collections.java:1309)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240)
>   at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:2053)
>   at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2481)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:216)
>   at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:598)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:179)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.(OrcRawRecordMerger.java:222)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:442)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1285)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1165)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13617) LLAP: support non-vectorized execution in IO

2016-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13617:

Description: 
Two approaches - a separate decoding path, into rows instead of VRBs; or 
decoding VRBs into rows on a higher level (the original LlapInputFormat). I 
think the latter might be better - it's not a hugely important path, and perf 
in non-vectorized case is not the best anyway, so it's better to make do with 
much less new code and architectural disruption. 

Some ORC patches in progress introduce an easy to reuse (or so I hope, anyway) 
VRB-to-row conversion, so we should just use that.



  was:
Two approaches - a separate decoding path, into rows instead of VRBs; or 
decoding VRBs into rows. I think the latter might be better - it's not a hugely 
important path, and perf in non-vectorized case is not the best anyway, so it's 
better to make do with much less new code and architectural disruption. 

Some ORC patches in progress introduce an easy to reuse (or so I hope, anyway) 
VRB-to-row conversion, so we should just use that.




> LLAP: support non-vectorized execution in IO
> 
>
> Key: HIVE-13617
> URL: https://issues.apache.org/jira/browse/HIVE-13617
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> Two approaches - a separate decoding path, into rows instead of VRBs; or 
> decoding VRBs into rows on a higher level (the original LlapInputFormat). I 
> think the latter might be better - it's not a hugely important path, and perf 
> in non-vectorized case is not the best anyway, so it's better to make do with 
> much less new code and architectural disruption. 
> Some ORC patches in progress introduce an easy to reuse (or so I hope, 
> anyway) VRB-to-row conversion, so we should just use that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13617) LLAP: support non-vectorized execution in IO

2016-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258879#comment-15258879
 ] 

Sergey Shelukhin edited comment on HIVE-13617 at 4/26/16 8:42 PM:
--

[~prasanth_j] [~hagleitn] fyi

[~owen.omalley] what is the final JIRA that adds the vrb-to-row conversion? We 
would like to reuse that code after it goes in. Is it HIVE-11417?


was (Author: sershe):
[~prasanth_j] [~hagleitn] fyi

[~owen.omalley] what is the final JIRA that adds the vrb-to-row conversion? We 
would like to reuse that code after it goes in. Is it hive-11417?

> LLAP: support non-vectorized execution in IO
> 
>
> Key: HIVE-13617
> URL: https://issues.apache.org/jira/browse/HIVE-13617
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> Two approaches - a separate decoding path, into rows instead of VRBs; or 
> decoding VRBs into rows. I think the latter might be better - it's not a 
> hugely important path, and perf in non-vectorized case is not the best 
> anyway, so it's better to make do with much less new code and architectural 
> disruption. 
> Some ORC patches in progress introduce an easy to reuse (or so I hope, 
> anyway) VRB-to-row conversion, so we should just use that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13617) LLAP: support non-vectorized execution in IO

2016-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258879#comment-15258879
 ] 

Sergey Shelukhin edited comment on HIVE-13617 at 4/26/16 8:42 PM:
--

[~prasanth_j] [~hagleitn] fyi

[~owen.omalley] what is the final JIRA that adds the vrb-to-row conversion? We 
would like to reuse that code after it goes in. Is it hive-11417?


was (Author: sershe):
[~prasanth_j] [~hagleitn] fyi

[~owen.omalley] what is the final JIRA that adds the vrb-to-row conversion? We 
would like to reuse that code after it goes in

> LLAP: support non-vectorized execution in IO
> 
>
> Key: HIVE-13617
> URL: https://issues.apache.org/jira/browse/HIVE-13617
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> Two approaches - a separate decoding path, into rows instead of VRBs; or 
> decoding VRBs into rows. I think the latter might be better - it's not a 
> hugely important path, and perf in non-vectorized case is not the best 
> anyway, so it's better to make do with much less new code and architectural 
> disruption. 
> Some ORC patches in progress introduce an easy to reuse (or so I hope, 
> anyway) VRB-to-row conversion, so we should just use that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13617) LLAP: support non-vectorized execution in IO

2016-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258879#comment-15258879
 ] 

Sergey Shelukhin commented on HIVE-13617:
-

[~prasanth_j] [~hagleitn] fyi

[~owen.omalley] what is the final JIRA that adds the vrb-to-row conversion? We 
would like to reuse that code after it goes in

> LLAP: support non-vectorized execution in IO
> 
>
> Key: HIVE-13617
> URL: https://issues.apache.org/jira/browse/HIVE-13617
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> Two approaches - a separate decoding path, into rows instead of VRBs; or 
> decoding VRBs into rows. I think the latter might be better - it's not a 
> hugely important path, and perf in non-vectorized case is not the best 
> anyway, so it's better to make do with much less new code and architectural 
> disruption. 
> Some ORC patches in progress introduce an easy to reuse (or so I hope, 
> anyway) VRB-to-row conversion, so we should just use that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11417) Create shims for the row by row read path that is backed by VectorizedRowBatch

2016-04-26 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-11417:
-
Attachment: HIVE-11417.patch

This is the same patch, generated without the -C parameter that causes git to 
find file moves.

> Create shims for the row by row read path that is backed by VectorizedRowBatch
> --
>
> Key: HIVE-11417
> URL: https://issues.apache.org/jira/browse/HIVE-11417
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.1.0
>
> Attachments: HIVE-11417.patch, HIVE-11417.patch, HIVE-11417.patch
>
>
> I'd like to make the default path for reading and writing ORC files to be 
> vectorized. To ensure that Hive can still read row by row, we'll need shims 
> to support the old API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13609) Fix UDTFs to allow local fetch task to fetch rows forwarded by GenericUDTF.close()

2016-04-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258842#comment-15258842
 ] 

Ashutosh Chauhan commented on HIVE-13609:
-

+1 LGTM pending tests

> Fix UDTFs to allow local fetch task to fetch rows forwarded by 
> GenericUDTF.close()
> --
>
> Key: HIVE-13609
> URL: https://issues.apache.org/jira/browse/HIVE-13609
> Project: Hive
>  Issue Type: Bug
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-13609.1.patch
>
>
> From [~ashutoshc]'s comments in HIVE-13586, attempt to fix whatever is 
> causing the local fetch task to not get the rows forwarded by UDTF close().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13509) HCatalog getSplits should ignore the partition with invalid path

2016-04-26 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258780#comment-15258780
 ] 

Mithun Radhakrishnan commented on HIVE-13509:
-

Yes, sir. +1.

> HCatalog getSplits should ignore the partition with invalid path
> 
>
> Key: HIVE-13509
> URL: https://issues.apache.org/jira/browse/HIVE-13509
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-13509.1.patch, HIVE-13509.2.patch, HIVE-13509.patch
>
>
> It is quite common that there is the discrepancy between partition directory 
> and its HMS metadata, simply because the directory could be added/deleted 
> externally using hdfs shell command. Technically it should be fixed by MSCK 
> and alter table .. add/drop command etc, but sometimes it might not be 
> practical especially in a multi-tenant env. This discrepancy does not cause 
> any problem to Hive, Hive returns no rows for a partition with an invalid 
> (e.g. non-existing) path, but it fails the Pig load with HCatLoader, because 
> the HCatBaseInputFormat getSplits throws an error when getting a split for a 
> non-existing path. The error message might looks like:
> {code}
> Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does 
> not exist: 
> hdfs://xyz.com:8020/user/hive/warehouse/xyz/date=2016-01-01/country=BR
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
>   at 
> org.apache.hive.hcatalog.mapreduce.HCatBaseInputFormat.getSplits(HCatBaseInputFormat.java:162)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13609) Fix UDTFs to allow local fetch task to fetch rows forwarded by GenericUDTF.close()

2016-04-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258755#comment-15258755
 ] 

Ashutosh Chauhan commented on HIVE-13609:
-

[~jdere] Can you create a RB for this?

> Fix UDTFs to allow local fetch task to fetch rows forwarded by 
> GenericUDTF.close()
> --
>
> Key: HIVE-13609
> URL: https://issues.apache.org/jira/browse/HIVE-13609
> Project: Hive
>  Issue Type: Bug
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-13609.1.patch
>
>
> From [~ashutoshc]'s comments in HIVE-13586, attempt to fix whatever is 
> causing the local fetch task to not get the rows forwarded by UDTF close().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258743#comment-15258743
 ] 

Ashutosh Chauhan commented on HIVE-13572:
-

Can you create a RB for this? Also I see changes related to sessionstate and in 
shims. Was there any bug you encountered there?

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13572.1.patch, HIVE-13572.2.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13510) Dynamic partitioning doesn’t work when remote metastore is used

2016-04-26 Thread Illya Yalovyy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Illya Yalovyy updated HIVE-13510:
-
Attachment: HIVE-13510.2.patch

It seems like my patch was never picked up by jenkins, so I'm uploading it 
again.

> Dynamic partitioning doesn’t work when remote metastore is used
> ---
>
> Key: HIVE-13510
> URL: https://issues.apache.org/jira/browse/HIVE-13510
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.0
> Environment: Hadoop 2.7.1
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
>Priority: Critical
> Attachments: HIVE-13510.1.patch, HIVE-13510.2.patch
>
>
> *Steps to reproduce:*
> # Configure remote metastore (hive.metastore.uris)
> # Create table t1 (a string);
> # Create table t2 (a string) partitioned by (b string);
> # set hive.exec.dynamic.partition.mode=nonstrict;
> # Insert overwrite table t2 partition (b) select a,a from t1;
> *Result:*
> {noformat}
> FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
> 16/04/13 15:04:51 [c679e424-2501-4347-8146-cf1b1cae217c main]: ERROR 
> ql.Driver: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
> org.apache.hadoop.hive.ql.parse.SemanticException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
> at 
> org.apache.hadoop.hive.ql.plan.DynamicPartitionCtx.(DynamicPartitionCtx.java:84)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6550)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:9315)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9204)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10071)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9949)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10607)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:358)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10618)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:233)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:245)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:476)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:318)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1192)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1287)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1118)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1106)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:236)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:339)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:748)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:721)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:648)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
> at org.apache.hadoop.hive.ql.metadata.Hive.getMetaConf(Hive.java:3493)
> at 
> org.apache.hadoop.hive.ql.plan.DynamicPartitionCtx.(DynamicPartitionCtx.java:82)
> ... 29 more
> Caused by: org.apache.thrift.TApplicationException: getMetaConf failed: 
> unknown result
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_getMetaConf(ThriftHiveMetastore.java:666)
> at 
> 

[jira] [Updated] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore

2016-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13596:

Attachment: HIVE-13596.01.patch

> HS2 should be able to get UDFs on demand from metastore
> ---
>
> Key: HIVE-13596
> URL: https://issues.apache.org/jira/browse/HIVE-13596
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13596.01.patch, HIVE-13596.patch
>
>
> When multiple HS2s are run, creating a permanent fn is only executed on one 
> of them, and the other HS2s don't get the new function. Unlike say with 
> tables, where we always get stuff from db on demand, fns are registered at 
> certain points in the code and if the new one is not registered, it will not 
> be available. 
> We should restore the pre-HIVE-2573 behavior of being able to refresh the 
> UDFs on demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore

2016-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258724#comment-15258724
 ] 

Sergey Shelukhin commented on HIVE-13596:
-

{noformat}
Should this be a settable option (as opposed to always on)? And why default 
to false?
{noformat}
It's settable per session. Defaulting to false because that was the current 
behavior for a while.

{noformat}
Which Registry is performing the UDF lookup, the system registry or the 
session registry? If it is the system registry, then we may run into HIVE-6672 
again. checkFunctionClass() (removed in your patch) was added for this purpose.
{noformat}
Session registry. I removed the method because it was unused... 

{noformat}
If the functions are being looked up/added to the session registry, then 
this may not be an issue because every session would need to lookup UDF/load 
JARs. Actually I see that the permanent UDFs by Hive.reloadFunctions() (at 
initialize time) are added to the system registry .. I suspect Hive probably 
has class loading issues if we ever use "RELOAD FUNCTIONS" to pick up new UDFs, 
since Hive no longer seems to be calling checkFunctionClass().
{noformat}
Hmm... should this be done in the system registry then? Does it hurt to have 
them in the session registry? 
Also, does checkFunctionClass need to be reinstated here or in a separate JIRA.

{noformat}
public Registry(boolean isNative, HiveConf conf):
conf needs a null check before it's used
{noformat}
Implied that it's not used for a native registry; I will add an explicit check.


> HS2 should be able to get UDFs on demand from metastore
> ---
>
> Key: HIVE-13596
> URL: https://issues.apache.org/jira/browse/HIVE-13596
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13596.patch
>
>
> When multiple HS2s are run, creating a permanent fn is only executed on one 
> of them, and the other HS2s don't get the new function. Unlike say with 
> tables, where we always get stuff from db on demand, fns are registered at 
> certain points in the code and if the new one is not registered, it will not 
> be available. 
> We should restore the pre-HIVE-2573 behavior of being able to refresh the 
> UDFs on demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13568) Add UDFs to support column-masking

2016-04-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258723#comment-15258723
 ] 

Hive QA commented on HIVE-13568:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12800706/HIVE-13568.2.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 48 failed/errored test(s), 9959 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestMiniTezCliDriver-cte_4.q-schema_evol_text_nonvec_mapwork_table.q-vector_groupby_reduce.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nomore_ambiguous_table_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regexp_extract
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern3
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern4
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_nonkey_groupby
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_selectDistinctStarNeg_2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_subquery_shared_alias
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udtf_not_supported1
org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote.org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote
org.apache.hadoop.hive.metastore.TestFilterHooks.org.apache.hadoop.hive.metastore.TestFilterHooks
org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener
org.apache.hadoop.hive.metastore.TestMetaStoreEventListenerOnlyOnCommit.testEventStatus
org.apache.hadoop.hive.metastore.TestMetaStoreInitListener.testMetaStoreInitListener
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.org.apache.hadoop.hive.metastore.TestMetaStoreMetrics
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithValidPartVal
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithCommas
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithUnicode
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithValidCharacters
org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler
org.apache.hadoop.hive.metastore.hbase.TestHBaseImport.org.apache.hadoop.hive.metastore.hbase.TestHBaseImport
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.concurrencyFalse
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testLockTimeout
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testUpdate
org.apache.hadoop.hive.ql.security.TestClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestExtendedAcls.org.apache.hadoop.hive.ql.security.TestExtendedAcls
org.apache.hadoop.hive.ql.security.TestFolderPermissions.org.apache.hadoop.hive.ql.security.TestFolderPermissions
org.apache.hadoop.hive.ql.security.TestMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener.org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener
org.apache.hadoop.hive.ql.security.TestStorageBasedClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropDatabase
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropView
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadDbSuccess
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccess
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableSuccessWithReadOnly
org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testSaslWithHiveMetaStore
org.apache.hive.hcatalog.listener.TestDbNotificationListener.cleanupNotifs
org.apache.hive.hcatalog.listener.TestDbNotificationListener.dropDatabase

[jira] [Commented] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore

2016-04-26 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258709#comment-15258709
 ] 

Jason Dere commented on HIVE-13596:
---

- Should this be a settable option (as opposed to always on)? And why default 
to false?

- Which Registry is performing the UDF lookup, the system registry or the 
session registry? If it is the system registry, then we may run into HIVE-6672 
again. checkFunctionClass() (removed in your patch) was added for this purpose.
If the functions are being looked up/added to the session registry, then this 
may not be an issue because every session would need to lookup UDF/load JARs. 
Actually I see that the permanent UDFs by Hive.reloadFunctions() (at initialize 
time) are added to the system registry .. I suspect Hive probably has class 
loading issues if we ever use "RELOAD FUNCTIONS" to pick up new UDFs, since 
Hive no longer seems to be calling checkFunctionClass().

- public Registry(boolean isNative, HiveConf conf):
  - conf needs a null check before it's used

- private FunctionInfo getQualifiedFunctionInfo():
  - Wrap if/then with braces

- private boolean refreshFunctionInfoFromMetastore(String functionName)
  - line 629: wrap if/then with braces

> HS2 should be able to get UDFs on demand from metastore
> ---
>
> Key: HIVE-13596
> URL: https://issues.apache.org/jira/browse/HIVE-13596
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13596.patch
>
>
> When multiple HS2s are run, creating a permanent fn is only executed on one 
> of them, and the other HS2s don't get the new function. Unlike say with 
> tables, where we always get stuff from db on demand, fns are registered at 
> certain points in the code and if the new one is not registered, it will not 
> be available. 
> We should restore the pre-HIVE-2573 behavior of being able to refresh the 
> UDFs on demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13509) HCatalog getSplits should ignore the partition with invalid path

2016-04-26 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-13509:
---
Attachment: HIVE-13509.2.patch

Thanks [~mithun] for reviewing the patch. I fixed the #2. Please take a look.

> HCatalog getSplits should ignore the partition with invalid path
> 
>
> Key: HIVE-13509
> URL: https://issues.apache.org/jira/browse/HIVE-13509
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-13509.1.patch, HIVE-13509.2.patch, HIVE-13509.patch
>
>
> It is quite common that there is the discrepancy between partition directory 
> and its HMS metadata, simply because the directory could be added/deleted 
> externally using hdfs shell command. Technically it should be fixed by MSCK 
> and alter table .. add/drop command etc, but sometimes it might not be 
> practical especially in a multi-tenant env. This discrepancy does not cause 
> any problem to Hive, Hive returns no rows for a partition with an invalid 
> (e.g. non-existing) path, but it fails the Pig load with HCatLoader, because 
> the HCatBaseInputFormat getSplits throws an error when getting a split for a 
> non-existing path. The error message might looks like:
> {code}
> Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does 
> not exist: 
> hdfs://xyz.com:8020/user/hive/warehouse/xyz/date=2016-01-01/country=BR
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
>   at 
> org.apache.hive.hcatalog.mapreduce.HCatBaseInputFormat.getSplits(HCatBaseInputFormat.java:162)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13610) Hive exec module won't compile with IBM JDK

2016-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258667#comment-15258667
 ] 

Sergey Shelukhin edited comment on HIVE-13610 at 4/26/16 6:40 PM:
--

DUMP_HEAP_METHOD.invoke needs a null check. Also the exception could be logged 
in case of error (or at least the message from it). Otherwise, looks good.


was (Author: sershe):
DUMP_HEAP_METHOD.invoke needs a null check. Otherwise, looks good.

> Hive exec module won't compile with IBM JDK
> ---
>
> Key: HIVE-13610
> URL: https://issues.apache.org/jira/browse/HIVE-13610
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
> Environment: Hive 1.1.0 + IBM JDK 1.7 +ppc64 architecture
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
> Attachments: HIVE-13610.patch
>
>
> org.apache.hadoop.hive.ql.debug.Utils explicitly import 
> com.sun.management.HotSpotDiagnosticMXBean which is not supported by IBM JDK.
> So we can make HotSpotDiagnosticMXBean as runtime but not compile.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13610) Hive exec module won't compile with IBM JDK

2016-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258667#comment-15258667
 ] 

Sergey Shelukhin commented on HIVE-13610:
-

DUMP_HEAP_METHOD.invoke needs a null check. Otherwise, looks good.

> Hive exec module won't compile with IBM JDK
> ---
>
> Key: HIVE-13610
> URL: https://issues.apache.org/jira/browse/HIVE-13610
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
> Environment: Hive 1.1.0 + IBM JDK 1.7 +ppc64 architecture
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
> Attachments: HIVE-13610.patch
>
>
> org.apache.hadoop.hive.ql.debug.Utils explicitly import 
> com.sun.management.HotSpotDiagnosticMXBean which is not supported by IBM JDK.
> So we can make HotSpotDiagnosticMXBean as runtime but not compile.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13615) nomore_ambiguous_table_col.q is failing on master

2016-04-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258658#comment-15258658
 ] 

Ashutosh Chauhan commented on HIVE-13615:
-

In addition to that error message has changed for -ve test cases of 
=nonkey_groupby.q,subquery_shared_alias.q,clustern3.q,clustern4.q,udtf_not_supported1.q,selectDistinctStarNeg_2.q
 We are losing line number and char position in error message. We should try to 
restore previous behavior.

> nomore_ambiguous_table_col.q is failing on master
> -
>
> Key: HIVE-13615
> URL: https://issues.apache.org/jira/browse/HIVE-13615
> Project: Hive
>  Issue Type: Test
>  Components: Parser
>Affects Versions: 2.1.0
>Reporter: Ashutosh Chauhan
>
> Fails with:
> FAILED: ParseException line 3:9 cannot recognize input near 'src' 'key' 
> 'INSERT' in from source 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore

2016-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13596:

Attachment: HIVE-13596.patch

This restores some of the functionality, with a config flag, and accounting for 
other registry changes. [~jdere] can you take a look?

> HS2 should be able to get UDFs on demand from metastore
> ---
>
> Key: HIVE-13596
> URL: https://issues.apache.org/jira/browse/HIVE-13596
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: HIVE-13596.patch
>
>
> When multiple HS2s are run, creating a permanent fn is only executed on one 
> of them, and the other HS2s don't get the new function. Unlike say with 
> tables, where we always get stuff from db on demand, fns are registered at 
> certain points in the code and if the new one is not registered, it will not 
> be available. 
> We should restore the pre-HIVE-2573 behavior of being able to refresh the 
> UDFs on demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore

2016-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13596:

Assignee: Sergey Shelukhin
  Status: Patch Available  (was: Open)

> HS2 should be able to get UDFs on demand from metastore
> ---
>
> Key: HIVE-13596
> URL: https://issues.apache.org/jira/browse/HIVE-13596
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13596.patch
>
>
> When multiple HS2s are run, creating a permanent fn is only executed on one 
> of them, and the other HS2s don't get the new function. Unlike say with 
> tables, where we always get stuff from db on demand, fns are registered at 
> certain points in the code and if the new one is not registered, it will not 
> be available. 
> We should restore the pre-HIVE-2573 behavior of being able to refresh the 
> UDFs on demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11417) Create shims for the row by row read path that is backed by VectorizedRowBatch

2016-04-26 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-11417:
-
Attachment: HIVE-11417.patch

This patch moves the ReaderImpl and RecordReaderImpl to the orc module after 
removing the row by row API. The row by row API is emulated in ql with child 
classes.

> Create shims for the row by row read path that is backed by VectorizedRowBatch
> --
>
> Key: HIVE-11417
> URL: https://issues.apache.org/jira/browse/HIVE-11417
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.1.0
>
> Attachments: HIVE-11417.patch, HIVE-11417.patch
>
>
> I'd like to make the default path for reading and writing ORC files to be 
> vectorized. To ensure that Hive can still read row by row, we'll need shims 
> to support the old API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore

2016-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13596:

Description: 
When multiple HS2s are run, creating a permanent fn is only executed on one of 
them, and the other HS2s don't get the new function. Unlike say with tables, 
where we always get stuff from db on demand, fns are registered at certain 
points in the code and if the new one is not registered, it will not be 
available. 
We should restore the pre-HIVE-2573 behavior of being able to refresh the UDFs 
on demand.

  was:
When multiple HS2s are run, creating a permanent fn is only executed on one of 
them, and the other HS2s don't get the new function. Unlike say with tables, 
where we always get stuff from db on demand, fns are registered at certain 
points in the code and if the new one is not registered, it will not be 
available. 
We could change the code to refresh the udf by name if it's missing, similar to 
getting a table or whatever; or we could refresh UDFs when a session is started 
in multi-HS2 case, or at some other convenient point.


> HS2 should be able to get UDFs on demand from metastore
> ---
>
> Key: HIVE-13596
> URL: https://issues.apache.org/jira/browse/HIVE-13596
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> When multiple HS2s are run, creating a permanent fn is only executed on one 
> of them, and the other HS2s don't get the new function. Unlike say with 
> tables, where we always get stuff from db on demand, fns are registered at 
> certain points in the code and if the new one is not registered, it will not 
> be available. 
> We should restore the pre-HIVE-2573 behavior of being able to refresh the 
> UDFs on demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13596) HS2 should be able to get UDFs on demand from metastore

2016-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13596:

Summary: HS2 should be able to get UDFs on demand from metastore  (was: HS2 
should refresh UDFs more frequently(?), at least in multi-HS2 case)

> HS2 should be able to get UDFs on demand from metastore
> ---
>
> Key: HIVE-13596
> URL: https://issues.apache.org/jira/browse/HIVE-13596
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> When multiple HS2s are run, creating a permanent fn is only executed on one 
> of them, and the other HS2s don't get the new function. Unlike say with 
> tables, where we always get stuff from db on demand, fns are registered at 
> certain points in the code and if the new one is not registered, it will not 
> be available. 
> We could change the code to refresh the udf by name if it's missing, similar 
> to getting a table or whatever; or we could refresh UDFs when a session is 
> started in multi-HS2 case, or at some other convenient point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10176) skip.header.line.count causes values to be skipped when performing insert values

2016-04-26 Thread Vladyslav Pavlenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladyslav Pavlenko updated HIVE-10176:
--
Attachment: HIVE-10176.14.patch

Ok, I do what you ask. About question from RB, I found related ticked: 
https://issues.apache.org/jira/browse/HIVE-11996. Table doesn't have properties 
"line.delimiter" or equals. And I don't found workaround for it problem.

> skip.header.line.count causes values to be skipped when performing insert 
> values
> 
>
> Key: HIVE-10176
> URL: https://issues.apache.org/jira/browse/HIVE-10176
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0, 1.2.1
>Reporter: Wenbo Wang
>Assignee: Vladyslav Pavlenko
> Fix For: 2.0.0
>
> Attachments: HIVE-10176.1.patch, HIVE-10176.10.patch, 
> HIVE-10176.11.patch, HIVE-10176.12.patch, HIVE-10176.13.patch, 
> HIVE-10176.14.patch, HIVE-10176.2.patch, HIVE-10176.3.patch, 
> HIVE-10176.4.patch, HIVE-10176.5.patch, HIVE-10176.6.patch, 
> HIVE-10176.7.patch, HIVE-10176.8.patch, HIVE-10176.9.patch, data
>
>
> When inserting values in to tables with TBLPROPERTIES 
> ("skip.header.line.count"="1") the first value listed is also skipped. 
> create table test (row int, name string) TBLPROPERTIES 
> ("skip.header.line.count"="1"); 
> load data local inpath '/root/data' into table test;
> insert into table test values (1, 'a'), (2, 'b'), (3, 'c');
> (1, 'a') isn't inserted into the table. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13178) Enhance ORC Schema Evolution to handle more standard data type conversions

2016-04-26 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13178:

Attachment: HIVE-13178.092.patch

HIVE-12159 went in -- rebase.

> Enhance ORC Schema Evolution to handle more standard data type conversions
> --
>
> Key: HIVE-13178
> URL: https://issues.apache.org/jira/browse/HIVE-13178
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13178.01.patch, HIVE-13178.02.patch, 
> HIVE-13178.03.patch, HIVE-13178.04.patch, HIVE-13178.05.patch, 
> HIVE-13178.06.patch, HIVE-13178.07.patch, HIVE-13178.08.patch, 
> HIVE-13178.09.patch, HIVE-13178.091.patch, HIVE-13178.092.patch
>
>
> Currently, SHORT -> INT -> BIGINT is supported.
> Handle ORC data type conversions permitted by Implicit conversion allowed by 
> TypeIntoUtils.implicitConvertible method.
>*   STRING_GROUP -> DOUBLE
>*   STRING_GROUP -> DECIMAL
>*   DATE_GROUP -> STRING
>*   NUMERIC_GROUP -> STRING
>*   STRING_GROUP -> STRING_GROUP
>*
>*   // Upward from "lower" type to "higher" numeric type:
>*   BYTE -> SHORT -> INT -> BIGINT -> FLOAT -> DOUBLE -> DECIMAL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13541) Pass view's ColumnAccessInfo to HiveAuthorizer

2016-04-26 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13541:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

double checked the test case failures. The MiniTez ones are not reproducible. 
The MR ones are not reproducible. The Negative ones are existing failures. 
pushed to master. Thanks [~ashutoshc] for the review.

> Pass view's ColumnAccessInfo to HiveAuthorizer
> --
>
> Key: HIVE-13541
> URL: https://issues.apache.org/jira/browse/HIVE-13541
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-13541.01.patch, HIVE-13541.02.patch
>
>
> RIght now, only table's ColumnAccessInfo is passed to HiveAuthorizer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13541) Pass view's ColumnAccessInfo to HiveAuthorizer

2016-04-26 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13541:
---
Affects Version/s: 2.0.0

> Pass view's ColumnAccessInfo to HiveAuthorizer
> --
>
> Key: HIVE-13541
> URL: https://issues.apache.org/jira/browse/HIVE-13541
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-13541.01.patch, HIVE-13541.02.patch
>
>
> RIght now, only table's ColumnAccessInfo is passed to HiveAuthorizer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13509) HCatalog getSplits should ignore the partition with invalid path

2016-04-26 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258527#comment-15258527
 ] 

Mithun Radhakrishnan commented on HIVE-13509:
-

bq. ... with Google Guava's {{Iterators.filter()}}.
Actually, please ignore comment#3, above. 

I was trying to avoid checking {{ignoreInvalidPath}} multiple times. I tried 
writing it out myself (to illustrate), and saw that the call to 
{{fs.makeQualified()}} implies that we'll need to use both 
{{Iterators.filter()}} and {{Iterators.transform}}, at which point, it's no 
longer short and sweet. 

Please fix #2 above, and I will +1.

Also, thanks for adding tests.

> HCatalog getSplits should ignore the partition with invalid path
> 
>
> Key: HIVE-13509
> URL: https://issues.apache.org/jira/browse/HIVE-13509
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-13509.1.patch, HIVE-13509.patch
>
>
> It is quite common that there is the discrepancy between partition directory 
> and its HMS metadata, simply because the directory could be added/deleted 
> externally using hdfs shell command. Technically it should be fixed by MSCK 
> and alter table .. add/drop command etc, but sometimes it might not be 
> practical especially in a multi-tenant env. This discrepancy does not cause 
> any problem to Hive, Hive returns no rows for a partition with an invalid 
> (e.g. non-existing) path, but it fails the Pig load with HCatLoader, because 
> the HCatBaseInputFormat getSplits throws an error when getting a split for a 
> non-existing path. The error message might looks like:
> {code}
> Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does 
> not exist: 
> hdfs://xyz.com:8020/user/hive/warehouse/xyz/date=2016-01-01/country=BR
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
>   at 
> org.apache.hive.hcatalog.mapreduce.HCatBaseInputFormat.getSplits(HCatBaseInputFormat.java:162)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13560) Adding Omid as connection manager for HBase Metastore

2016-04-26 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-13560:
--
Attachment: HIVE-13560.3.patch

> Adding Omid as connection manager for HBase Metastore
> -
>
> Key: HIVE-13560
> URL: https://issues.apache.org/jira/browse/HIVE-13560
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-13560.1.patch, HIVE-13560.2.patch, 
> HIVE-13560.3.patch
>
>
> Adding Omid as a transaction manager to HBase Metastore. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13588) NPE is thrown from MapredLocalTask.executeInChildVM

2016-04-26 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258483#comment-15258483
 ] 

Yongzhi Chen commented on HIVE-13588:
-

The new PATCH LGTM +1

> NPE is thrown from MapredLocalTask.executeInChildVM
> ---
>
> Key: HIVE-13588
> URL: https://issues.apache.org/jira/browse/HIVE-13588
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-13588.1.patch, HIVE-13588.patch, HIVE-13588.patch
>
>
> NPE was thrown out from MapredLocalTask.executeInChildVM in running some 
> queries with CLI, see error below:
> {code}
>   java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInChildVM(MapredLocalTask.java:321)
>  [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.execute(MapredLocalTask.java:148)
>  [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:172) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1868) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1595) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1346) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1117) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1105) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:236) 
> [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187) 
> [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) 
> [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:782) 
> [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:721) 
> [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:648) 
> [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.7.0_45]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> ~[?:1.7.0_45]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.7.0_45]
> {code}
> It is because the operationLog is only applicable to HS2 but CLI, therefore 
> it might not be set (null)
> It is related to HIVE-13183



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13615) nomore_ambiguous_table_col.q is failing on master

2016-04-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258453#comment-15258453
 ] 

Ashutosh Chauhan commented on HIVE-13615:
-

[~hsubramaniyan] May be related to recent parser changes. Can you take a look?

> nomore_ambiguous_table_col.q is failing on master
> -
>
> Key: HIVE-13615
> URL: https://issues.apache.org/jira/browse/HIVE-13615
> Project: Hive
>  Issue Type: Test
>  Components: Parser
>Affects Versions: 2.1.0
>Reporter: Ashutosh Chauhan
>
> Fails with:
> FAILED: ParseException line 3:9 cannot recognize input near 'src' 'key' 
> 'INSERT' in from source 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13509) HCatalog getSplits should ignore the partition with invalid path

2016-04-26 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258454#comment-15258454
 ] 

Mithun Radhakrishnan commented on HIVE-13509:
-

Reviewing your patch now. On the face of it, it looks good. Looking at it a 
little more closely...

A couple of observations:
# {{hcat.input.ignore.invalid.path}} is well-named, and would make sense to 
anyone who'd want to override the default. (I thought we'd go with 
{{hcat.input.allow.invalid.path=true}}, but your version is better.
# Consider replacing {{(pathString == null || pathString.trim().isEmpty())}} 
with {{StringUtils.isBlank(pathString)}}.
# Nitpick: Consider replacing the loop at {{HCatBaseInputFormat.java:Line#335}} 
with Google Guava's {{Iterators.filter()}}. Then, depending on whether 
{{ignoreInvalidPath}} is set, the erstwhile loop at Line#329 will either loop 
on {{paths}} or on {{filteredPaths}}. This will be more readable.

> HCatalog getSplits should ignore the partition with invalid path
> 
>
> Key: HIVE-13509
> URL: https://issues.apache.org/jira/browse/HIVE-13509
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-13509.1.patch, HIVE-13509.patch
>
>
> It is quite common that there is the discrepancy between partition directory 
> and its HMS metadata, simply because the directory could be added/deleted 
> externally using hdfs shell command. Technically it should be fixed by MSCK 
> and alter table .. add/drop command etc, but sometimes it might not be 
> practical especially in a multi-tenant env. This discrepancy does not cause 
> any problem to Hive, Hive returns no rows for a partition with an invalid 
> (e.g. non-existing) path, but it fails the Pig load with HCatLoader, because 
> the HCatBaseInputFormat getSplits throws an error when getting a split for a 
> non-existing path. The error message might looks like:
> {code}
> Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does 
> not exist: 
> hdfs://xyz.com:8020/user/hive/warehouse/xyz/date=2016-01-01/country=BR
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
>   at 
> org.apache.hive.hcatalog.mapreduce.HCatBaseInputFormat.getSplits(HCatBaseInputFormat.java:162)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13241) LLAP: Incremental Caching marks some small chunks as "incomplete CB"

2016-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258434#comment-15258434
 ] 

Sergey Shelukhin commented on HIVE-13241:
-

Sorry, I will do a follow-up patch

> LLAP: Incremental Caching marks some small chunks as "incomplete CB"
> 
>
> Key: HIVE-13241
> URL: https://issues.apache.org/jira/browse/HIVE-13241
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.1
> Fix For: 2.1.0
>
> Attachments: HIVE-13241.01.patch, HIVE-13241.patch
>
>
> Run #3 of a query with 1 node still has cache misses.
> {code}
> LLAP IO Summary
> --
>   VERTICES ROWGROUPS  META_HIT  META_MISS  DATA_HIT  DATA_MISS  ALLOCATION
>  USED  TOTAL_IO
> --
>  Map 111  1116  01.65GB93.61MB  0B
>0B32.72s
> --
> {code}
> {code}
> 2016-03-08T21:05:39,417 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:prepareRangesForCompressedRead(695)) - Locking 
> 0x1c44401d(1) due to reuse
> 2016-03-08T21:05:39,417 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:prepareRangesForCompressedRead(701)) - Adding an 
> already-uncompressed buffer 0x1c44401d(2)
> 2016-03-08T21:05:39,417 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:prepareRangesForCompressedRead(695)) - Locking 
> 0x4e51b032(1) due to reuse
> 2016-03-08T21:05:39,417 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:prepareRangesForCompressedRead(701)) - Adding an 
> already-uncompressed buffer 0x4e51b032(2)
> 2016-03-08T21:05:39,418 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:addOneCompressionBuffer(1161)) - Found CB at 1373931, 
> chunk length 86587, total 86590, compressed
> 2016-03-08T21:05:39,418 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:addIncompleteCompressionBuffer(1241)) - Replacing 
> data range [1373931, 1408408), size: 34474(!) type: direct (and 0 previous 
> chunks) with incomplete CB start: 1373931 end: 1408408 in the buffers
> 2016-03-08T21:05:39,418 INFO  
> [IO-Elevator-Thread-9[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.EncodedReaderImpl 
> (EncodedReaderImpl.java:createRgColumnStreamData(441)) - Getting data for 
> column 7 RG 14 stream DATA at 1460521, 319811 index position 0: compressed 
> [1626961, 1780332)
> {code}
> {code}
> 2016-03-08T21:05:38,925 INFO  
> [IO-Elevator-Thread-7[attempt_1455662455106_2688_3_00_01_0]]: 
> encoded.OrcEncodedDataReader (OrcEncodedDataReader.java:readFileData(878)) - 
> Disk ranges after disk read (file 5372745, base offset 3): [{start: 18986 
> end: 20660 cache buffer: 0x660faf7c(1)}, {start: 20660 end: 35775 cache 
> buffer: 0x1dcb1d97(1)}, {start: 318852 end: 422353 cache buffer: 
> 0x6c7f9a05(1)}, {start: 1148616 end: 1262468 cache buffer: 0x196e1d41(1)}, 
> {start: 1262468 end: 1376342 cache buffer: 0x201255f(1)}, {data range 
> [1376342, 1410766), size: 34424 type: direct}, {start: 1631359 end: 1714694 
> cache buffer: 0x47e3a72d(1)}, {start: 1714694 end: 1785770 cache buffer: 
> 0x57dca266(1)}, {start: 4975035 end: 5095215 cache buffer: 0x3e3139c9(1)}, 
> {start: 5095215 end: 5197863 cache buffer: 0x3511c88d(1)}, {start: 7448387 
> end: 7572268 cache buffer: 0x6f11dbcd(1)}, {start: 7572268 end: 7696182 cache 
> buffer: 0x5d6c9bdb(1)}, {data range [7696182, 7710537), size: 14355 type: 
> direct}, {start: 8235756 end: 8345367 cache buffer: 0x6a241ece(1)}, {start: 
> 8345367 end: 8455009 cache buffer: 0x51caf6a7(1)}, {data range [8455009, 
> 8497906), size: 42897 type: direct}, {start: 9035815 end: 9159708 cache 
> buffer: 0x306480e0(1)}, {start: 9159708 end: 9283629 cache buffer: 
> 0x9ef7774(1)}, {data range [9283629, 9297965), size: 14336 type: direct}, 
> {start: 9989884 end: 10113731 cache buffer: 0x43f7cae9(1)}, {start: 10113731 
> end: 10237589 cache buffer: 0x458e63fe(1)}, {data range [10237589, 10252034), 
> size: 14445 type: direct}, {start: 11897896 end: 12021787 cache buffer: 
> 0x51f9982f(1)}, {start: 12021787 end: 12145656 cache buffer: 0x23df01b3(1)}, 

[jira] [Updated] (HIVE-13588) NPE is thrown from MapredLocalTask.executeInChildVM

2016-04-26 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-13588:
---
Attachment: HIVE-13588.1.patch

Yeah, it is not necessary to reset the operationLog when tasks are not run in 
parallel. I revised the patch. Thanks [~ychena]

> NPE is thrown from MapredLocalTask.executeInChildVM
> ---
>
> Key: HIVE-13588
> URL: https://issues.apache.org/jira/browse/HIVE-13588
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-13588.1.patch, HIVE-13588.patch, HIVE-13588.patch
>
>
> NPE was thrown out from MapredLocalTask.executeInChildVM in running some 
> queries with CLI, see error below:
> {code}
>   java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInChildVM(MapredLocalTask.java:321)
>  [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.execute(MapredLocalTask.java:148)
>  [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:172) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1868) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1595) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1346) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1117) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1105) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:236) 
> [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187) 
> [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) 
> [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:782) 
> [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:721) 
> [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:648) 
> [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.7.0_45]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> ~[?:1.7.0_45]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.7.0_45]
> {code}
> It is because the operationLog is only applicable to HS2 but CLI, therefore 
> it might not be set (null)
> It is related to HIVE-13183



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13537) Update slf4j version to 1.7.10

2016-04-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258383#comment-15258383
 ] 

Hive QA commented on HIVE-13537:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12800455/HIVE-13537.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 62 failed/errored test(s), 9940 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_distinct_2.q-tez_joins_explain.q-cte_mat_1.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vectorized_parquet.q-vector_decimal_aggregate.q-tez_self_join.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nomore_ambiguous_table_col
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge9
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_orc_merge_diff_fs
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join5
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern3
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_clustern4
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_nonkey_groupby
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_selectDistinctStarNeg_2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_subquery_shared_alias
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udtf_not_supported1
org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote.org.apache.hadoop.hive.metastore.TestAuthzApiEmbedAuthorizerInRemote
org.apache.hadoop.hive.metastore.TestFilterHooks.org.apache.hadoop.hive.metastore.TestFilterHooks
org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener
org.apache.hadoop.hive.metastore.TestMetaStoreEventListenerOnlyOnCommit.testEventStatus
org.apache.hadoop.hive.metastore.TestMetaStoreInitListener.testMetaStoreInitListener
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.org.apache.hadoop.hive.metastore.TestMetaStoreMetrics
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithUnicode
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAddPartitionWithValidPartVal
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithCommas
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithUnicode
org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation.testAppendPartitionWithValidCharacters
org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler
org.apache.hadoop.hive.metastore.hbase.TestHBaseImport.org.apache.hadoop.hive.metastore.hbase.TestHBaseImport
org.apache.hadoop.hive.ql.security.TestClientSideAuthorizationProvider.testSimplePrivileges
org.apache.hadoop.hive.ql.security.TestExtendedAcls.org.apache.hadoop.hive.ql.security.TestExtendedAcls
org.apache.hadoop.hive.ql.security.TestFolderPermissions.org.apache.hadoop.hive.ql.security.TestFolderPermissions
org.apache.hadoop.hive.ql.security.TestMetastoreAuthorizationProvider.testSimplePrivileges

[jira] [Commented] (HIVE-13588) NPE is thrown from MapredLocalTask.executeInChildVM

2016-04-26 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258373#comment-15258373
 ] 

Yongzhi Chen commented on HIVE-13588:
-

[~ctang.ma], Does your fix need changing location of 
tskRun.setOperationLog(OperationLog.getCurrentOperationLog()); ?
>From HIVE-9120 , it looks like the setOperationLog is not needed when 
>hive.exec.parallel is false. 

> NPE is thrown from MapredLocalTask.executeInChildVM
> ---
>
> Key: HIVE-13588
> URL: https://issues.apache.org/jira/browse/HIVE-13588
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-13588.patch, HIVE-13588.patch
>
>
> NPE was thrown out from MapredLocalTask.executeInChildVM in running some 
> queries with CLI, see error below:
> {code}
>   java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInChildVM(MapredLocalTask.java:321)
>  [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.execute(MapredLocalTask.java:148)
>  [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:172) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1868) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1595) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1346) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1117) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1105) 
> [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:236) 
> [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187) 
> [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) 
> [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:782) 
> [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:721) 
> [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:648) 
> [hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.7.0_45]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> ~[?:1.7.0_45]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.7.0_45]
> {code}
> It is because the operationLog is only applicable to HS2 but CLI, therefore 
> it might not be set (null)
> It is related to HIVE-13183



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12159) Create vectorized readers for the complex types

2016-04-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258293#comment-15258293
 ] 

ASF GitHub Bot commented on HIVE-12159:
---

Github user omalley closed the pull request at:

https://github.com/apache/hive/pull/68


> Create vectorized readers for the complex types
> ---
>
> Key: HIVE-12159
> URL: https://issues.apache.org/jira/browse/HIVE-12159
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.1.0
>
> Attachments: HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, 
> HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, 
> HIVE-12159.patch, HIVE-12159.patch
>
>
> We need vectorized readers for the complex types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12159) Create vectorized readers for the complex types

2016-04-26 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258289#comment-15258289
 ] 

Owen O'Malley commented on HIVE-12159:
--

The failures from Jenkins are not related. I just committed this. Thanks for 
the reviews, Matt & Prasanth.

> Create vectorized readers for the complex types
> ---
>
> Key: HIVE-12159
> URL: https://issues.apache.org/jira/browse/HIVE-12159
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.1.0
>
> Attachments: HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, 
> HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, 
> HIVE-12159.patch, HIVE-12159.patch
>
>
> We need vectorized readers for the complex types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12159) Create vectorized readers for the complex types

2016-04-26 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-12159:
-
   Resolution: Fixed
Fix Version/s: 2.1.0
   Status: Resolved  (was: Patch Available)

> Create vectorized readers for the complex types
> ---
>
> Key: HIVE-12159
> URL: https://issues.apache.org/jira/browse/HIVE-12159
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.1.0
>
> Attachments: HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, 
> HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, 
> HIVE-12159.patch, HIVE-12159.patch
>
>
> We need vectorized readers for the complex types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13525) HoS hangs when job is empty

2016-04-26 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258256#comment-15258256
 ] 

Rui Li commented on HIVE-13525:
---

{{TestSparkClient.testMetricsCollection}} failure is related. The problem is, 
if {{RemoteDriver}} considers a job is done after future#get returns, we may 
send JobResult before the listener can handle TaskEnd event and send the 
metrics. At the client side, the job handle is removed after JobResult is 
received, which means the later metrics will be simply discarded.
I'll think about how to solve this. And any idea is welcome.

> HoS hangs when job is empty
> ---
>
> Key: HIVE-13525
> URL: https://issues.apache.org/jira/browse/HIVE-13525
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13525.1.patch, HIVE-13525.2.patch
>
>
> Observed in local tests. This should be the cause of HIVE-13402.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10867) ArrayIndexOutOfBoundsException LazyBinaryUtils.byteArrayToLong with Hive on Tez

2016-04-26 Thread Alina Abramova (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alina Abramova updated HIVE-10867:
--
Attachment: HIVE-10867.patch

I created this patch based on https://issues.apache.org/jira/browse/HIVE-9517
I see that that fix works for Tez too

> ArrayIndexOutOfBoundsException LazyBinaryUtils.byteArrayToLong with Hive on 
> Tez
> ---
>
> Key: HIVE-10867
> URL: https://issues.apache.org/jira/browse/HIVE-10867
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Affects Versions: 0.14.0, 1.0.0
> Environment: Hortwonworks distribution 2.2.4-2
> Hive 0.14.0
> Tez 0.5.2.2.2.4.2-2 on cluster
> Tez 0.7.0 in local setup
>Reporter: Per Ullberg
>Assignee: Alina Abramova
> Attachments: HIVE-10867.patch
>
>
> Hi, 
> The following query runs fine on map reduce engine but when setting the 
> hive.exection.engine to tez it produces an ArrayIndexOutOfBoundsException.
> Query
> {code}
> create external table table_1 (id string, date string, amount bigint);
> insert into table table_1 values (305,'2013-03-02',3790);
> create external table table_2 (id string);
> insert into table table_2 VALUES (305);
> create external table table_3 (id string, date_3 string, amount_3 bigint);
> insert into table table_3 values (305,'2013-03-01',-1600);
> create external table table_4 (id bigint, str_4 string, amount_4 bigint);
> create table table_5
> as
>   SELECT
> c.diff
>   FROM (
> SELECT
>   id AS id,
>   date AS create_date,
>   -amount AS diff
> FROM table_1
> UNION ALL
> SELECT
>   p.id AS id,
>   p.str_4 AS create_date,
>   -p.amount_4 AS diff
> FROM table_4 p
> UNION ALL
> SELECT
>   id,
>   create_date,
>   diff
> FROM (
>   SELECT
> i.id AS id,
> tp.date_3 AS create_date,
> cast(amount_3 as double) AS diff
>   FROM table_3 tp
>   INNER JOIN table_2 i ON cast(tp.id as string) = cast(i.id as string)
> ) fees
>   ) c
> INNER JOIN table_2 i ON cast(c.id as string) = cast(i.id as string);
> {code}
> Results with map reduce engine:
> {code}
> hive> select * from table_5;
> OK
> -1600.0
> -3790.0
> Time taken: 0.061 seconds, Fetched: 2 row(s)
> {code}
> Exception with tez engine:
> {code}
> Status: Failed
> Vertex failed, vertexName=Reducer 4, vertexId=vertex_1432809678493_0891_4_06, 
> diagnostics=[Task failed, taskId=task_1432809678493_0891_4_06_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) 
> {"key":{"reducesinkkey0":"305"},"value":{"_col1":-1600.0}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{"reducesinkkey0":"305"},"value":{"_col1":-1600.0}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:337)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:218)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:168)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
>   ... 13 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 6
>   at 
> 

[jira] [Updated] (HIVE-10867) ArrayIndexOutOfBoundsException LazyBinaryUtils.byteArrayToLong with Hive on Tez

2016-04-26 Thread Alina Abramova (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alina Abramova updated HIVE-10867:
--
Affects Version/s: 1.0.0
   Status: Patch Available  (was: In Progress)

> ArrayIndexOutOfBoundsException LazyBinaryUtils.byteArrayToLong with Hive on 
> Tez
> ---
>
> Key: HIVE-10867
> URL: https://issues.apache.org/jira/browse/HIVE-10867
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Affects Versions: 1.0.0, 0.14.0
> Environment: Hortwonworks distribution 2.2.4-2
> Hive 0.14.0
> Tez 0.5.2.2.2.4.2-2 on cluster
> Tez 0.7.0 in local setup
>Reporter: Per Ullberg
>Assignee: Alina Abramova
> Attachments: HIVE-10867.patch
>
>
> Hi, 
> The following query runs fine on map reduce engine but when setting the 
> hive.exection.engine to tez it produces an ArrayIndexOutOfBoundsException.
> Query
> {code}
> create external table table_1 (id string, date string, amount bigint);
> insert into table table_1 values (305,'2013-03-02',3790);
> create external table table_2 (id string);
> insert into table table_2 VALUES (305);
> create external table table_3 (id string, date_3 string, amount_3 bigint);
> insert into table table_3 values (305,'2013-03-01',-1600);
> create external table table_4 (id bigint, str_4 string, amount_4 bigint);
> create table table_5
> as
>   SELECT
> c.diff
>   FROM (
> SELECT
>   id AS id,
>   date AS create_date,
>   -amount AS diff
> FROM table_1
> UNION ALL
> SELECT
>   p.id AS id,
>   p.str_4 AS create_date,
>   -p.amount_4 AS diff
> FROM table_4 p
> UNION ALL
> SELECT
>   id,
>   create_date,
>   diff
> FROM (
>   SELECT
> i.id AS id,
> tp.date_3 AS create_date,
> cast(amount_3 as double) AS diff
>   FROM table_3 tp
>   INNER JOIN table_2 i ON cast(tp.id as string) = cast(i.id as string)
> ) fees
>   ) c
> INNER JOIN table_2 i ON cast(c.id as string) = cast(i.id as string);
> {code}
> Results with map reduce engine:
> {code}
> hive> select * from table_5;
> OK
> -1600.0
> -3790.0
> Time taken: 0.061 seconds, Fetched: 2 row(s)
> {code}
> Exception with tez engine:
> {code}
> Status: Failed
> Vertex failed, vertexName=Reducer 4, vertexId=vertex_1432809678493_0891_4_06, 
> diagnostics=[Task failed, taskId=task_1432809678493_0891_4_06_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) 
> {"key":{"reducesinkkey0":"305"},"value":{"_col1":-1600.0}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{"reducesinkkey0":"305"},"value":{"_col1":-1600.0}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:337)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:218)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:168)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
>   ... 13 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 6
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:84)
>   at 
> 

[jira] [Assigned] (HIVE-10867) ArrayIndexOutOfBoundsException LazyBinaryUtils.byteArrayToLong with Hive on Tez

2016-04-26 Thread Alina Abramova (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alina Abramova reassigned HIVE-10867:
-

Assignee: Alina Abramova

> ArrayIndexOutOfBoundsException LazyBinaryUtils.byteArrayToLong with Hive on 
> Tez
> ---
>
> Key: HIVE-10867
> URL: https://issues.apache.org/jira/browse/HIVE-10867
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Affects Versions: 0.14.0
> Environment: Hortwonworks distribution 2.2.4-2
> Hive 0.14.0
> Tez 0.5.2.2.2.4.2-2 on cluster
> Tez 0.7.0 in local setup
>Reporter: Per Ullberg
>Assignee: Alina Abramova
>
> Hi, 
> The following query runs fine on map reduce engine but when setting the 
> hive.exection.engine to tez it produces an ArrayIndexOutOfBoundsException.
> Query
> {code}
> create external table table_1 (id string, date string, amount bigint);
> insert into table table_1 values (305,'2013-03-02',3790);
> create external table table_2 (id string);
> insert into table table_2 VALUES (305);
> create external table table_3 (id string, date_3 string, amount_3 bigint);
> insert into table table_3 values (305,'2013-03-01',-1600);
> create external table table_4 (id bigint, str_4 string, amount_4 bigint);
> create table table_5
> as
>   SELECT
> c.diff
>   FROM (
> SELECT
>   id AS id,
>   date AS create_date,
>   -amount AS diff
> FROM table_1
> UNION ALL
> SELECT
>   p.id AS id,
>   p.str_4 AS create_date,
>   -p.amount_4 AS diff
> FROM table_4 p
> UNION ALL
> SELECT
>   id,
>   create_date,
>   diff
> FROM (
>   SELECT
> i.id AS id,
> tp.date_3 AS create_date,
> cast(amount_3 as double) AS diff
>   FROM table_3 tp
>   INNER JOIN table_2 i ON cast(tp.id as string) = cast(i.id as string)
> ) fees
>   ) c
> INNER JOIN table_2 i ON cast(c.id as string) = cast(i.id as string);
> {code}
> Results with map reduce engine:
> {code}
> hive> select * from table_5;
> OK
> -1600.0
> -3790.0
> Time taken: 0.061 seconds, Fetched: 2 row(s)
> {code}
> Exception with tez engine:
> {code}
> Status: Failed
> Vertex failed, vertexName=Reducer 4, vertexId=vertex_1432809678493_0891_4_06, 
> diagnostics=[Task failed, taskId=task_1432809678493_0891_4_06_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) 
> {"key":{"reducesinkkey0":"305"},"value":{"_col1":-1600.0}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{"reducesinkkey0":"305"},"value":{"_col1":-1600.0}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:337)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:218)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:168)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
>   ... 13 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 6
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:84)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43)
>   at 
> 

[jira] [Work started] (HIVE-10867) ArrayIndexOutOfBoundsException LazyBinaryUtils.byteArrayToLong with Hive on Tez

2016-04-26 Thread Alina Abramova (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-10867 started by Alina Abramova.
-
> ArrayIndexOutOfBoundsException LazyBinaryUtils.byteArrayToLong with Hive on 
> Tez
> ---
>
> Key: HIVE-10867
> URL: https://issues.apache.org/jira/browse/HIVE-10867
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Affects Versions: 0.14.0
> Environment: Hortwonworks distribution 2.2.4-2
> Hive 0.14.0
> Tez 0.5.2.2.2.4.2-2 on cluster
> Tez 0.7.0 in local setup
>Reporter: Per Ullberg
>Assignee: Alina Abramova
>
> Hi, 
> The following query runs fine on map reduce engine but when setting the 
> hive.exection.engine to tez it produces an ArrayIndexOutOfBoundsException.
> Query
> {code}
> create external table table_1 (id string, date string, amount bigint);
> insert into table table_1 values (305,'2013-03-02',3790);
> create external table table_2 (id string);
> insert into table table_2 VALUES (305);
> create external table table_3 (id string, date_3 string, amount_3 bigint);
> insert into table table_3 values (305,'2013-03-01',-1600);
> create external table table_4 (id bigint, str_4 string, amount_4 bigint);
> create table table_5
> as
>   SELECT
> c.diff
>   FROM (
> SELECT
>   id AS id,
>   date AS create_date,
>   -amount AS diff
> FROM table_1
> UNION ALL
> SELECT
>   p.id AS id,
>   p.str_4 AS create_date,
>   -p.amount_4 AS diff
> FROM table_4 p
> UNION ALL
> SELECT
>   id,
>   create_date,
>   diff
> FROM (
>   SELECT
> i.id AS id,
> tp.date_3 AS create_date,
> cast(amount_3 as double) AS diff
>   FROM table_3 tp
>   INNER JOIN table_2 i ON cast(tp.id as string) = cast(i.id as string)
> ) fees
>   ) c
> INNER JOIN table_2 i ON cast(c.id as string) = cast(i.id as string);
> {code}
> Results with map reduce engine:
> {code}
> hive> select * from table_5;
> OK
> -1600.0
> -3790.0
> Time taken: 0.061 seconds, Fetched: 2 row(s)
> {code}
> Exception with tez engine:
> {code}
> Status: Failed
> Vertex failed, vertexName=Reducer 4, vertexId=vertex_1432809678493_0891_4_06, 
> diagnostics=[Task failed, taskId=task_1432809678493_0891_4_06_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) 
> {"key":{"reducesinkkey0":"305"},"value":{"_col1":-1600.0}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{"reducesinkkey0":"305"},"value":{"_col1":-1600.0}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:337)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:218)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:168)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
>   ... 13 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 6
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:84)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43)
>   at 
> 

  1   2   >