[jira] [Created] (HIVE-20858) Serializer is not correctly initialized with configuration in Utilities.createEmptyBuckets()

2018-11-01 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-20858:


 Summary: Serializer is not correctly initialized with 
configuration in Utilities.createEmptyBuckets()
 Key: HIVE-20858
 URL: https://issues.apache.org/jira/browse/HIVE-20858
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.0
Reporter: Wei Zheng
Assignee: Wei Zheng
 Attachments: HIVE-20858.1.patch





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20800) Use "posix" for property tarLongFileMode for maven-assembly-plugin

2018-10-24 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-20800:


 Summary: Use "posix" for property tarLongFileMode for 
maven-assembly-plugin
 Key: HIVE-20800
 URL: https://issues.apache.org/jira/browse/HIVE-20800
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Affects Versions: 3.1.0
Reporter: Wei Zheng
Assignee: Wei Zheng
 Fix For: 4.0.0


Came across this error when building hive using "mvn clean install -DskipTests"

{code}

[INFO] Building tar: 
/Users/wei/apache/hive/standalone-metastore/target/apache-hive-standalone-metastore-4.0.0-SNAPSHOT-src.tar.gz
[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Hive Storage API 2.7.0-SNAPSHOT  SUCCESS [  5.656 s]
[INFO] Hive 4.0.0-SNAPSHOT  SUCCESS [  0.779 s]
[INFO] Hive Classifications ... SUCCESS [  0.908 s]
[INFO] Hive Shims Common .. SUCCESS [  3.217 s]
[INFO] Hive Shims 0.23  SUCCESS [  7.102 s]
[INFO] Hive Shims Scheduler ... SUCCESS [  2.069 s]
[INFO] Hive Shims . SUCCESS [  1.905 s]
[INFO] Hive Common  SUCCESS [  8.185 s]
[INFO] Hive Service RPC ... SUCCESS [  3.603 s]
[INFO] Hive Serde . SUCCESS [  7.438 s]
[INFO] Hive Standalone Metastore .. FAILURE [  0.576 s]
[INFO] Hive Standalone Metastore Common Code .. SKIPPED
[INFO] Hive Metastore . SKIPPED
[INFO] Hive Vector-Code-Gen Utilities . SKIPPED
[INFO] Hive Llap Common ... SKIPPED
[INFO] Hive Llap Client ... SKIPPED
[INFO] Hive Llap Tez .. SKIPPED
[INFO] Hive Spark Remote Client ... SKIPPED
[INFO] Hive Metastore Server .. SKIPPED
[INFO] Hive Query Language  SKIPPED
[INFO] Hive Llap Server ... SKIPPED
[INFO] Hive Service ... SKIPPED
[INFO] Hive Accumulo Handler .. SKIPPED
[INFO] Hive JDBC .. SKIPPED
[INFO] Hive Beeline ... SKIPPED
[INFO] Hive CLI ... SKIPPED
[INFO] Hive Contrib ... SKIPPED
[INFO] Hive Druid Handler . SKIPPED
[INFO] Hive HBase Handler . SKIPPED
[INFO] Hive JDBC Handler .. SKIPPED
[INFO] Hive HCatalog .. SKIPPED
[INFO] Hive HCatalog Core . SKIPPED
[INFO] Hive HCatalog Pig Adapter .. SKIPPED
[INFO] Hive HCatalog Server Extensions  SKIPPED
[INFO] Hive HCatalog Webhcat Java Client .. SKIPPED
[INFO] Hive HCatalog Webhcat .. SKIPPED
[INFO] Hive HCatalog Streaming  SKIPPED
[INFO] Hive HPL/SQL ... SKIPPED
[INFO] Hive Streaming . SKIPPED
[INFO] Hive Llap External Client .. SKIPPED
[INFO] Hive Shims Aggregator .. SKIPPED
[INFO] Hive Kryo Registrator .. SKIPPED
[INFO] Hive TestUtils . SKIPPED
[INFO] Hive Kafka Storage Handler . SKIPPED
[INFO] Hive Packaging . SKIPPED
[INFO] Hive Metastore Tools ... SKIPPED
[INFO] Hive Metastore Tools common libraries .. SKIPPED
[INFO] Hive metastore benchmarks .. SKIPPED
[INFO] Hive Upgrade Acid .. SKIPPED
[INFO] Hive Pre Upgrade Acid 4.0.0-SNAPSHOT ... SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 42.026 s
[INFO] Finished at: 2018-10-24T15:34:40-07:00
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-assembly-plugin:3.1.0:single (assemble) on 
project hive-standalone-metastore: Execution assemble of goal 
org.apache.maven.plugins:maven-assembly-plugin:3.1.0:single failed: group id 
'74715970' is too big ( > 2097151 ). Use STAR or POSIX extensions 

[jira] [Created] (HIVE-17361) Support LOAD DATA for transactional tables

2017-08-19 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-17361:


 Summary: Support LOAD DATA for transactional tables
 Key: HIVE-17361
 URL: https://issues.apache.org/jira/browse/HIVE-17361
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Reporter: Wei Zheng
Assignee: Wei Zheng


LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16963) rely on AcidUtils.getAcidState() for read path

2017-06-26 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-16963:


 Summary: rely on AcidUtils.getAcidState() for read path
 Key: HIVE-16963
 URL: https://issues.apache.org/jira/browse/HIVE-16963
 Project: Hive
  Issue Type: Sub-task
Reporter: Wei Zheng
Assignee: Wei Zheng


This is to make MM table more consistent to full ACID table. Also it's a 
prerequisite for Insert Overwrite support for MM table (refer to HIVE-14988).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16850) Only open a new transaction when there's no currently opened transaction

2017-06-07 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-16850:


 Summary: Only open a new transaction when there's no currently 
opened transaction
 Key: HIVE-16850
 URL: https://issues.apache.org/jira/browse/HIVE-16850
 Project: Hive
  Issue Type: Sub-task
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16819) Add MM test for temporary table

2017-06-02 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-16819:


 Summary: Add MM test for temporary table
 Key: HIVE-16819
 URL: https://issues.apache.org/jira/browse/HIVE-16819
 Project: Hive
  Issue Type: Sub-task
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16817) Restore CTAS tests in mm_all.q

2017-06-02 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-16817:


 Summary: Restore CTAS tests in mm_all.q
 Key: HIVE-16817
 URL: https://issues.apache.org/jira/browse/HIVE-16817
 Project: Hive
  Issue Type: Sub-task
Reporter: Wei Zheng
Assignee: Wei Zheng


In earlier ACID integration patch CTAS was not supported. (previously I used a 
different approach in which I created a new data operation type for INSERT 
ONLY, which got errored out in TxnHandler. Later I changed that to INSERT which 
is working fine) As CTAS is working now, the corresponding tests for it should 
be restored.

Note we still have the same limitations for MM tables as Hive regular tables do 
in general:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS)
 
CTAS has these restrictions:
The target table cannot be a partitioned table.
The target table cannot be an external table.
The target table cannot be a list bucketing table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16810) Fix an export/import bug due to ACID integration

2017-06-01 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-16810:


 Summary: Fix an export/import bug due to ACID integration
 Key: HIVE-16810
 URL: https://issues.apache.org/jira/browse/HIVE-16810
 Project: Hive
  Issue Type: Sub-task
Affects Versions: hive-14535
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16760) Update errata.txt for HIVE-16743

2017-05-25 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-16760:


 Summary: Update errata.txt for HIVE-16743
 Key: HIVE-16760
 URL: https://issues.apache.org/jira/browse/HIVE-16760
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.0.0, hive-14535
Reporter: Wei Zheng
Assignee: Wei Zheng


Refer to:
https://issues.apache.org/jira/browse/HIVE-16743?focusedCommentId=16024139=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16024139



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16753) Add tests that cover createValidReadTxnList and createValidCompactTxnList in TxnUtils.java

2017-05-24 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-16753:


 Summary: Add tests that cover createValidReadTxnList and 
createValidCompactTxnList in TxnUtils.java
 Key: HIVE-16753
 URL: https://issues.apache.org/jira/browse/HIVE-16753
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Wei Zheng
Assignee: Wei Zheng


Both are critical methods used in ACID paths. But there is no corresponding 
tests for them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16743) BitSet set() is not incorrectly used in TxnUtils.createValidCompactTxnList()

2017-05-23 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-16743:


 Summary: BitSet set() is not incorrectly used in 
TxnUtils.createValidCompactTxnList()
 Key: HIVE-16743
 URL: https://issues.apache.org/jira/browse/HIVE-16743
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Wei Zheng
Assignee: Wei Zheng


The second line is problematic
{code}
BitSet bitSet = new BitSet(exceptions.length);
bitSet.set(0, bitSet.length()); // for ValidCompactorTxnList, everything in 
exceptions are aborted
{code}
For example, exceptions' length is 2. We declare a BitSet object with initial 
size of 2 via the first line above. But that's not the actual size of the 
BitSet. So bitSet.length() will still return 0.

The intention of the second line above is to set all the bits to true. This was 
not achieved because bitSet.set(0, bitSet.length()) is equivalent to 
bitSet.set(0, 0).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16728) Fix some regression caused by HIVE-14879

2017-05-22 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-16728:


 Summary: Fix some regression caused by HIVE-14879
 Key: HIVE-16728
 URL: https://issues.apache.org/jira/browse/HIVE-16728
 Project: Hive
  Issue Type: Sub-task
Affects Versions: hive-14535
Reporter: Wei Zheng
Assignee: Wei Zheng


HIVE-14879 integrates ACID logic with MM table. But it broke some existing ACID 
tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16565) Improve how the open transactions and aborted transactions are deserialized in ValidReadTxnList.readFromString

2017-05-01 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-16565:


 Summary: Improve how the open transactions and aborted 
transactions are deserialized in ValidReadTxnList.readFromString
 Key: HIVE-16565
 URL: https://issues.apache.org/jira/browse/HIVE-16565
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Wei Zheng
Assignee: Wei Zheng


This is a follow-up of HIVE-16534.

In ValidReadTxnList.writeToString, we write out two open and aborted 
transactions as two sorted lists. We can take advantage of that and perform 
merge sort them together when reading them back in readFromString. Note that 
the aborted bits should also be handled properly during the merge sort.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16534) Add capability to tell aborted transactions apart from open transactions in ValidTxnList

2017-04-25 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-16534:


 Summary: Add capability to tell aborted transactions apart from 
open transactions in ValidTxnList
 Key: HIVE-16534
 URL: https://issues.apache.org/jira/browse/HIVE-16534
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Reporter: Wei Zheng
Assignee: Wei Zheng


Currently in ValidReadTxnList, open transactions and aborted transactions are 
stored together in one array. That makes it impossible to extract just aborted 
transactions or open transactions.

For ValidCompactorTxnList this is fine, since we only store aborted 
transactions but no open transactions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16092) Generate and use universal mmId instead of per db/table

2017-03-02 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-16092:


 Summary: Generate and use universal mmId instead of per db/table
 Key: HIVE-16092
 URL: https://issues.apache.org/jira/browse/HIVE-16092
 Project: Hive
  Issue Type: Sub-task
Reporter: Wei Zheng
Assignee: Wei Zheng


To facilitate later replacement for it with txnId



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16063) instead of explicitly specifying mmWriteId during compilation phase, it should only be generated whenever needed during runtime

2017-02-28 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-16063:


 Summary: instead of explicitly specifying mmWriteId during 
compilation phase, it should only be generated whenever needed during runtime
 Key: HIVE-16063
 URL: https://issues.apache.org/jira/browse/HIVE-16063
 Project: Hive
  Issue Type: Sub-task
Reporter: Wei Zheng
Assignee: Wei Zheng


For ACID transaction logic to work with mm table, first thing is to make the ID 
usage logic consistent. ACID stores valid txn list in VALID_TXNS_KEY.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16028) Fail UPDATE/DELETE/MERGE queries when Ranger authorization manager is used

2017-02-23 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-16028:


 Summary: Fail UPDATE/DELETE/MERGE queries when Ranger 
authorization manager is used
 Key: HIVE-16028
 URL: https://issues.apache.org/jira/browse/HIVE-16028
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Affects Versions: 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng


This is a followup of HIVE-15891. In that jira an error-out logic was added, 
but the assumption that we need to do row filtering/column masking for entries 
in a non-empty list of tables returned by applyRowFilterAndColumnMasking is 
wrong, because on Ranger side, 
RangerHiveAuthorizer#applyRowFilterAndColumnMasking will unconditionally return 
a list of tables no matter whether row filtering/column masking is applicable 
on the tables.

The fix for Hive for now will be to move the error-out logic after we figure 
out there's no replacement text for the query. But ideally we should consider 
modifying Ranger logic to only return tables that need to be masked.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15999) Fix flakiness in TestDbTxnManager2

2017-02-21 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15999:


 Summary: Fix flakiness in TestDbTxnManager2
 Key: HIVE-15999
 URL: https://issues.apache.org/jira/browse/HIVE-15999
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng


Right now there is test flakiness wrt. TestDbTxnManager2. The error is like 
this:
{code}
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.checkExpectedLocks
 Error Details
Table/View 'TXNS' already exists in Schema 'APP'.
{code}
The failure is due to HiveConf used in the test being polluted by some test, 
e.g. in testDummyTxnManagerOnAcidTable(), conf entry HIVE_TXN_MANAGER is set to 
"org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager" but not switched back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15934) Downgrade Maven surefire plugin from 2.19.1 to 2.18.1

2017-02-15 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15934:


 Summary: Downgrade Maven surefire plugin from 2.19.1 to 2.18.1
 Key: HIVE-15934
 URL: https://issues.apache.org/jira/browse/HIVE-15934
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng


Surefire 2.19.1 has some issue 
(https://issues.apache.org/jira/browse/SUREFIRE-1255) which caused debugging 
session to abort after a short period of time. Many IntelliJ users have seen 
this, although it looks fine for Eclipse users. Version 2.18.1 works fine.

We'd better make the change to not impact the development for IntelliJ guys. We 
can upgrade again once the root cause is figured out.

cc [~kgyrtkirk] [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15891) Detect query rewrite scenario for UPDATE/DELETE/MERGE and fail fast

2017-02-13 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15891:


 Summary: Detect query rewrite scenario for UPDATE/DELETE/MERGE and 
fail fast
 Key: HIVE-15891
 URL: https://issues.apache.org/jira/browse/HIVE-15891
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng


Currently ACID UpdateDeleteSemanticAnalyzer directly manipulates the AST tree 
but it's different from the general approach of modifying the token stream and 
thus will cause AST tree mismatch if there is any rewrite happening after 
UpdateDeleteSemanticAnalyzer.

The long term solution will be to rewrite the AST handling logic in 
UpdateDeleteSemanticAnalyzer, to make it consistent with the general approach.

This ticket will for now detect the error prone cases and fail early. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15774) Ensure DbLockManager backward compatibility for non-ACID resources

2017-01-31 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15774:


 Summary: Ensure DbLockManager backward compatibility for non-ACID 
resources
 Key: HIVE-15774
 URL: https://issues.apache.org/jira/browse/HIVE-15774
 Project: Hive
  Issue Type: Bug
  Components: Hive, Transactions
Reporter: Wei Zheng
Assignee: Wei Zheng


In pre-ACID days, users perform operations such as INSERT with either 
ZooKeeperHiveLockManager or no lock manager at all. If their workflow is 
designed to take advantage of no locking and they take care of the control of 
concurrency, this works well with good performance.
With ACID, if users enable transactions (i.e. using DbTxnManager & 
DbLockManager), then for all the operations, different types of locks will be 
acquired accordingly by DbLockManager, even for non-ACID resources. This may 
impact the performance of some workflows designed for pre-ACID use cases.
A viable solution would be to differentiate the locking mode for ACID and 
non-ACID resources, so that DbLockManager will continue its current behavior 
for ACID tables, but will be able to acquire a less strict lock type for 
non-ACID resources, thus avoiding the performance loss for those workflows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15681) Pull specified version of jetty for Hive

2017-01-20 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15681:


 Summary: Pull specified version of jetty for Hive
 Key: HIVE-15681
 URL: https://issues.apache.org/jira/browse/HIVE-15681
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15644) Collect JVM metrics via JvmPauseMonitor

2017-01-16 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15644:


 Summary: Collect JVM metrics via JvmPauseMonitor
 Key: HIVE-15644
 URL: https://issues.apache.org/jira/browse/HIVE-15644
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.2.0
Reporter: Wei Zheng


Similar to what Hadoop's JvmMetrics is doing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15628) Add more logs for hybrid grace hash join during the initial hash table loading

2017-01-13 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15628:


 Summary: Add more logs for hybrid grace hash join during the 
initial hash table loading
 Key: HIVE-15628
 URL: https://issues.apache.org/jira/browse/HIVE-15628
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng


This can be useful for debugging memory issues.
Metrics that can be possibly added:
1. Log memory usage after say every 50M data has been loaded
2. Add a counter for number of write buffers already allocated
3. Log a snapshot of partitions (memory usage for each of them)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15623) Use customized version of netty for llap

2017-01-13 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15623:


 Summary: Use customized version of netty for llap
 Key: HIVE-15623
 URL: https://issues.apache.org/jira/browse/HIVE-15623
 Project: Hive
  Issue Type: Bug
  Components: llap
Affects Versions: 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15622) Remove HWI component from Hive

2017-01-13 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15622:


 Summary: Remove HWI component from Hive
 Key: HIVE-15622
 URL: https://issues.apache.org/jira/browse/HIVE-15622
 Project: Hive
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15621) Remove use of JvmPauseMonitor in LLAP

2017-01-13 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15621:


 Summary: Remove use of JvmPauseMonitor in LLAP
 Key: HIVE-15621
 URL: https://issues.apache.org/jira/browse/HIVE-15621
 Project: Hive
  Issue Type: Bug
  Components: llap
Affects Versions: 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15589) Remove redundant test from TestDbTxnManager.testHeartbeater

2017-01-11 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15589:


 Summary: Remove redundant test from 
TestDbTxnManager.testHeartbeater
 Key: HIVE-15589
 URL: https://issues.apache.org/jira/browse/HIVE-15589
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng


Case 1 claims there's no delay for the heartbeat startup, but actually the 
logic is when delay is specified as 0, we will unconditionally set 
HiveConf.ConfVars.HIVE_TXN_TIMEOUT / 2 to be the delay. So this case 1 is not 
needed, as it's covered by case 2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource

2016-12-12 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15421:


 Summary: Assumption in exception handling can be wrong in 
DagUtils.localizeResource
 Key: HIVE-15421
 URL: https://issues.apache.org/jira/browse/HIVE-15421
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng


In localizeResource once we got an IOException, we always assume this is due to 
another thread writing the same file. But that is not always the case. Even 
without the interference from other threads, it may still get an IOException 
(RemoteException) due to failure of copyFromLocalFile in a specific 
environment, for example, in a kerberized HDFS encryption zone where the TGT is 
expired.

We'd better fail early with different message to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-06 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15376:


 Summary: Improve heartbeater scheduling for transactions
 Key: HIVE-15376
 URL: https://issues.apache.org/jira/browse/HIVE-15376
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15362) Add the missing fields for 2.2.0 upgrade scripts

2016-12-05 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15362:


 Summary: Add the missing fields for 2.2.0 upgrade scripts
 Key: HIVE-15362
 URL: https://issues.apache.org/jira/browse/HIVE-15362
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng


The 2.2.0 upgrade scripts were cut on 05/25/16, while HIVE-13354 (which added 
some fields to upgrade scripts) was committed to master on 05/27/16, and 
there's no conflict. So we accidentally missed those fields for 2.2.0.

cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15267) Make query length calculation logic more accurate in TxnUtils.needNewQuery()

2016-11-22 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15267:


 Summary: Make query length calculation logic more accurate in 
TxnUtils.needNewQuery()
 Key: HIVE-15267
 URL: https://issues.apache.org/jira/browse/HIVE-15267
 Project: Hive
  Issue Type: Bug
  Components: Hive, Transactions
Affects Versions: 2.1.0, 1.2.1
Reporter: Wei Zheng
Assignee: Wei Zheng


In HIVE-15181 there's such review comment, for which this ticket will handle
{code}
in TxnUtils.needNewQuery() "sizeInBytes / 1024 > queryMemoryLimit" doesn't do 
the right thing.
If the user sets METASTORE_DIRECT_SQL_MAX_QUERY_LENGTH to 1K, they most likely 
want each SQL string to be at most 1K.
But if sizeInBytes=2047, this still returns false.
It should include length of "suffix" in computation of sizeInBytes
Along the same lines: the check for max query length is done after each batch 
is already added to the query. Suppose there are 1000 9-digit txn IDs in each 
IN(...). That's, conservatively, 18KB of text. So the length of each query is 
increasing in 18KB chunks. 
I think the check for query length should be done for each item in IN clause.
If some DB has a limit on query length of X, then any query > X will fail. So I 
think this must ensure not to produce any queries > X, even by 1 char.
For example, case 3.1 of the UT generates a query of almost 4000 characters - 
this is clearly > 1KB.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15265) support snapshot isolation for MM tables

2016-11-22 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15265:


 Summary: support snapshot isolation for MM tables
 Key: HIVE-15265
 URL: https://issues.apache.org/jira/browse/HIVE-15265
 Project: Hive
  Issue Type: Sub-task
Reporter: Wei Zheng


Since MM table is using the incremental "delta" insertion mechanism via ACID, 
it makes sense to make MM tables support snapshot isolation as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15181) buildQueryWithINClause didn't properly handle multiples of ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE

2016-11-10 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15181:


 Summary: buildQueryWithINClause didn't properly handle multiples 
of ConfVars.METASTORE_DIRECT_SQL_MAX_ELEMENTS_IN_CLAUSE
 Key: HIVE-15181
 URL: https://issues.apache.org/jira/browse/HIVE-15181
 Project: Hive
  Issue Type: Bug
  Components: Hive, Transactions
Affects Versions: 2.1.0, 1.2.1
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15099) PTFOperator.PTFInvocation didn't properly reset the input partition

2016-10-31 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15099:


 Summary: PTFOperator.PTFInvocation didn't properly reset the input 
partition
 Key: HIVE-15099
 URL: https://issues.apache.org/jira/browse/HIVE-15099
 Project: Hive
  Issue Type: Bug
  Components: Hive, PTF-Windowing
Affects Versions: 2.1.0, 1.2.1, 1.3.0, 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng


There is an issue with PTFOperator.PTFInvocation where the inputPart is not 
reset properly. The inputPart has been closed and its content (member 
variables) has been cleaned up, but since itself is not nullified, it's reused 
in the next round and caused NPE issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15087) integrate MM tables into ACID: replace "hivecommit" property with ACID property

2016-10-27 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15087:


 Summary: integrate MM tables into ACID: replace "hivecommit" 
property with ACID property
 Key: HIVE-15087
 URL: https://issues.apache.org/jira/browse/HIVE-15087
 Project: Hive
  Issue Type: Sub-task
Reporter: Wei Zheng
Assignee: Wei Zheng


Previously declared DDL
{code}
create table t1 (key int, key2 int)  tblproperties("hivecommit"="true");
{code}
should be replaced with:
{code}
create table t1 (key int, key2 int)  tblproperties("transactional"="true", 
"transactional_properties"="insert_only");
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14479) Add some join tests for acid table

2016-08-08 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-14479:


 Summary: Add some join tests for acid table
 Key: HIVE-14479
 URL: https://issues.apache.org/jira/browse/HIVE-14479
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14447) Set HIVE_TRANSACTIONAL_TABLE_SCAN to the correct job conf for FetchOperator

2016-08-05 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-14447:


 Summary: Set HIVE_TRANSACTIONAL_TABLE_SCAN to the correct job conf 
for FetchOperator
 Key: HIVE-14447
 URL: https://issues.apache.org/jira/browse/HIVE-14447
 Project: Hive
  Issue Type: Bug
  Components: Hive, Transactions
Affects Versions: 1.3.0, 2.2.0, 2.1.1
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14446) Disable bloom filter for hybrid grace hash join when row count exceeds certain limit

2016-08-05 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-14446:


 Summary: Disable bloom filter for hybrid grace hash join when row 
count exceeds certain limit
 Key: HIVE-14446
 URL: https://issues.apache.org/jira/browse/HIVE-14446
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.3.0, 2.2.0, 2.1.1
Reporter: Wei Zheng
Assignee: Wei Zheng


When row count exceeds certain limit, it doesn't make sense to generate a bloom 
filter, since its size will be a few hundred MB or even a few GB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14400) Handle concurrent insert with dynamic partition

2016-08-01 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-14400:


 Summary: Handle concurrent insert with dynamic partition
 Key: HIVE-14400
 URL: https://issues.apache.org/jira/browse/HIVE-14400
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng


With multiple users concurrently issuing insert statements on the same 
partition has a side effect that some queries may not see a partition at the 
time when they're issued, but will realize the partition is actually there when 
it is trying to add such partition to the metastore and thus get 
AlreadyExistsException, because some earlier query just created it (race 
condition).

For example, imagine such a table is created:
{code}
create table T (name char(50)) partitioned by (ds string) clustered by (name) 
into 2 buckets stored as orc tblproperties('transactional'='true');
{code}
and the following two queries are launched at the same time, from different 
sessions:
{code}
insert into table T partition (ds) values ('Bob', 'today'); -- creates the 
partition 'today'
insert into table T partition (ds) values ('Joe', 'today'); -- will fail with 
AlreadyExistsException
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14381) Handle null value in WindowingTableFunction.WindowingIterator.next()

2016-07-29 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-14381:


 Summary: Handle null value in 
WindowingTableFunction.WindowingIterator.next()
 Key: HIVE-14381
 URL: https://issues.apache.org/jira/browse/HIVE-14381
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Affects Versions: 2.1.0, 1.3.0, 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14339) Fix UT failure for acid_globallimit.q

2016-07-26 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-14339:


 Summary: Fix UT failure for acid_globallimit.q
 Key: HIVE-14339
 URL: https://issues.apache.org/jira/browse/HIVE-14339
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.1.0, 1.3.0, 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14311) No need to schedule Heartbeat task if the query doesn't require locks

2016-07-22 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-14311:


 Summary: No need to schedule Heartbeat task if the query doesn't 
require locks
 Key: HIVE-14311
 URL: https://issues.apache.org/jira/browse/HIVE-14311
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.3.0, 2.2.0, 2.1.1
Reporter: Wei Zheng
Assignee: Wei Zheng


Otherwise the Heartbeat task will just stay there and not be cleaned up, which 
may cause OOM eventually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14061) Add unit test for kerberos support in Hive streaming

2016-06-20 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-14061:


 Summary: Add unit test for kerberos support in Hive streaming
 Key: HIVE-14061
 URL: https://issues.apache.org/jira/browse/HIVE-14061
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13972) Resolve class dependency issue introduced by HIVE-13354

2016-06-07 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13972:


 Summary: Resolve class dependency issue introduced by HIVE-13354
 Key: HIVE-13972
 URL: https://issues.apache.org/jira/browse/HIVE-13972
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.3.0, 2.1.0, 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng
Priority: Blocker


HIVE-13354 moved a helper class StringableMap from 
ql/txn/compactor/CompactorMR.java to metastore/txn/TxnUtils.java

This introduced a dependency from ql package to metastore package which is not 
allowed and fails in a real cluster.

Instead of moving it to metastore, it should be moved to common package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13961) ACID: Major compaction fails to include the original bucket files if there's no delta directory

2016-06-07 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13961:


 Summary: ACID: Major compaction fails to include the original 
bucket files if there's no delta directory
 Key: HIVE-13961
 URL: https://issues.apache.org/jira/browse/HIVE-13961
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.3.0, 2.1.0, 2.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng


The issue can be reproduced by steps below:
1. Insert a row to Non-ACID table
2. Convert Non-ACID to ACID table
3. Perform Major compaction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13934) Tez needs to allocate extra buffer space for joins

2016-06-02 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13934:


 Summary: Tez needs to allocate extra buffer space for joins
 Key: HIVE-13934
 URL: https://issues.apache.org/jira/browse/HIVE-13934
 Project: Hive
  Issue Type: Bug
Reporter: Wei Zheng
Assignee: Siddharth Seth


Otherwise it's very easy to run OOM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13834) Use LinkedHashMap instead of HashMap for LockRequestBuilder to maintain predictable iteration order

2016-05-24 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13834:


 Summary: Use LinkedHashMap instead of HashMap for 
LockRequestBuilder to maintain predictable iteration order
 Key: HIVE-13834
 URL: https://issues.apache.org/jira/browse/HIVE-13834
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.3.0, 2.1.0
Reporter: Wei Zheng
Assignee: Wei Zheng


In Java 7 it is assumed the iteration order is always the same as the insert 
order, but that's not guaranteed. In Java 8 some unit test breaks because of 
this ordering change. Solution is to use LinkedHashMap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13833) Add an initial delay when starting the heartbeat

2016-05-24 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13833:


 Summary: Add an initial delay when starting the heartbeat
 Key: HIVE-13833
 URL: https://issues.apache.org/jira/browse/HIVE-13833
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 2.0.0, 2.1.0
Reporter: Wei Zheng
Assignee: Wei Zheng
Priority: Minor


Since the scheduling of heartbeat happens immediately after lock acquisition, 
it's unnecessary to send heartbeat at the time when locks is acquired. Add an 
initial delay to skip this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13809) Hybrid Grace Hash Join memory usage estimation didn't take into account the bloom filter size

2016-05-20 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13809:


 Summary: Hybrid Grace Hash Join memory usage estimation didn't 
take into account the bloom filter size
 Key: HIVE-13809
 URL: https://issues.apache.org/jira/browse/HIVE-13809
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.0.0, 2.1.0
Reporter: Wei Zheng
Assignee: Wei Zheng


Memory estimation is important during hash table loading, because we need to 
make the decision of whether to load the next hash partition in memory or spill 
it. If the assumption is there's enough memory but it turns out not the case, 
we will run into OOM problem.

Currently hybrid grace hash join memory usage estimation didn't take into 
account the bloom filter size. In large test cases (TB scale) the bloom filter 
grows as big as hundreds of MB, big enough to cause estimation error.

The solution is to count in the bloom filter size into memory estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13755) Hybrid mapjoin allocates memory the same for multi broadcast

2016-05-12 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13755:


 Summary: Hybrid mapjoin allocates memory the same for multi 
broadcast
 Key: HIVE-13755
 URL: https://issues.apache.org/jira/browse/HIVE-13755
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.1.0
Reporter: Wei Zheng
Assignee: Wei Zheng


PROBLEM:

When hybrid mapjoin gets the memory needed, it estimates memory needed for each 
hashtable the same. This may cause problem when there are multiple broadcast, 
as it may exceeds the memory intended to allocate to it.

Example reducer task log attached.  This task has 5 broadcast input,

Reducer 3 <- Map 10 (BROADCAST_EDGE), Map 11 (BROADCAST_EDGE), Map 12 
(BROADCAST_EDGE), Map 8 (SIMPLE_EDGE), Map 9 (BROADCAST_EDGE), Reducer 2 
(SIMPLE_EDGE)



excerpt of it:

{code}
2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Memory 
manager allocates 0 bytes for the loading hashtable.
2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] |persistence.HashMapWrapper|: 
Key count from statistics is 210; setting map size to 280
2016-03-15 19:23:50,811 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Total available memory: 1968177152
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Estimated small table size: 155190
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Number of hash partitions to be 
created: 16
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Write buffer size: 524288
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Number of partitions created: 16
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Number of partitions spilled directly 
to disk on creation: 0
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Using 
tableContainer HybridHashTableContainer
2016-03-15 19:23:50,812 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Initializing container with 
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe and 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |readers.UnorderedKVReader|: 
Num Records read: 20
2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |log.PerfLogger|: 
2016-03-15 19:23:50,814 [INFO] [pool-47-thread-1] |tez.ObjectCache|: Caching 
key: 
svc-phx-efmhadoop_20160315191303_8c53ce88-e64f-4d36-bad0-846bbf096f57__HASH_MAP_MAPJOIN_126_container
2016-03-15 19:23:50,814 [INFO] [TezChild] |exec.HashTableDummyOperator|: 
Initializing operator HASHTABLEDUMMY[32]
2016-03-15 19:23:50,814 [INFO] [TezChild] |exec.MapJoinOperator|: Initializing 
operator MAPJOIN[26]
2016-03-15 19:23:50,816 [INFO] [TezChild] |exec.CommonJoinOperator|: JOIN 
struct<_col3:string,_col4:decimal(5,0),_col5:char(1),_col6:char(1),_col7:date,_col8:string,_col9:string,_col12:string,_col13:string,_col14:string,_col15:string,_col16:string,_col19:decimal(13,3),_col20:string,_col22:decimal(5,0),_col23:decimal(5,0),_col24:decimal(5,0),_col25:decimal(5,0),_col26:decimal(13,2),_col27:decimal(5,0),_col28:decimal(15,2),_col29:decimal(15,2),_col31:decimal(3,0),_col33:char(1),_col41:decimal(3,1),_col42:char(1),_col43:decimal(3,1),_col44:string,_col45:char(1),_col48:char(1),_col55:char(1),_col57:char(1),_col59:char(1),_col60:string,_col64:string,_col65:string,_col67:decimal(15,2),_col76:decimal(3,0),_col81:char(1),_col98:string,_col99:string,_col105:string,_col108:string,_col122:string,_col123:decimal(5,0),_col127:string,_col128:decimal(5,0),_col129:string,_col137:char(1),_col139:string,_col145:string,_col151:string,_col152:string,_col154:string,_col158:char(1),_col164:char(1),_col204:string,_col213:string,_col214:char(1),_col215:string,_col218:char(1),_col219:date,_col220:string,_col221:decimal(5,0),_col222:decimal(5,0),_col223:string,_col224:char(1),_col225:string,_col226:decimal(3,0),_col231:string,_col232:string,_col233:string,_col234:decimal(9,5),_col236:date,_col240:date,_col256:string,_col257:string,_col268:string,_col269:string,_col270:char(1),_col271:string,_col272:char(1),_col324:string,_col344:string,_col464:string,_col478:decimal(5,0),_col479:decimal(5,0),_col519:string,_col532:string,_col534:char(1),_col540:decimal(13,3),_col541:decimal(13,3),_col561:string,_col568:char(1),_col570:string>
 totalsz = 95
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |log.PerfLogger|: 
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |tez.HashTableLoader|: Memory 
manager allocates 0 bytes for the loading hashtable.
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] |persistence.HashMapWrapper|: 
Key count from statistics is 5942112; setting map size to 7922816
2016-03-15 19:23:50,817 [INFO] [pool-47-thread-1] 
|persistence.HybridHashTableContainer|: Total 

[jira] [Created] (HIVE-13753) Make metastore client thread safe in DbTxnManager

2016-05-12 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13753:


 Summary: Make metastore client thread safe in DbTxnManager
 Key: HIVE-13753
 URL: https://issues.apache.org/jira/browse/HIVE-13753
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.3.0, 2.1.0
Reporter: Wei Zheng
Assignee: Wei Zheng


The fact that multiple threads sharing the same metastore client which is used 
for RPC to Thrift is not thread safe.

Race condition can happen when one sees "out of sequence response" error 
message from Thrift server. That means the response from the Thrift server is 
for a different request (by a different thread).

Solution will be to synchronize methods from the client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13724) Backport HIVE-11591 to branch-1 to use undated annotations

2016-05-09 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13724:


 Summary: Backport HIVE-11591 to branch-1 to use undated annotations
 Key: HIVE-13724
 URL: https://issues.apache.org/jira/browse/HIVE-13724
 Project: Hive
  Issue Type: Bug
  Components: Thrift API
Affects Versions: 1.2.1
Reporter: Wei Zheng
Assignee: Wei Zheng


HIVE-12832 changed branch-1 hive pom file and updated thrift version from 0.9.2 
to 0.9.3. But it didn't update the thrift args part to use undated annotation 
from HIVE-11591.

So every time someone is running maven thrift re-gen command, it will still 
update a lot of unrelated files, just because of the date change.

Need backport HIVE-11591 to branch-1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13694) Prevent ACID table being unusable due to DDL changes

2016-05-05 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13694:


 Summary: Prevent ACID table being unusable due to DDL changes
 Key: HIVE-13694
 URL: https://issues.apache.org/jira/browse/HIVE-13694
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Reporter: Wei Zheng
Assignee: Wei Zheng


Currently in order to define an ACID table, the following three conditions need 
to be satisfied:
* tblproperties ('transactional'='true')
* table has to be bucketed
* table format has to be stored as ORC
If any of the above condition doesn't hold, the table won't be ACID compliant, 
and query result against the table will be unexpected.

HIVE-12064 made sure that reverting tblproperties 'transactional' from 'true' 
to 'false' is not allowed. But changes for the other two conditions are still 
not restrained.

We need to make sure an ACID table cannot be un-bucketed, and cannot use other 
data format other than ORC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13684) Remove the deprecated IMetaStoreClient.showLocks()

2016-05-03 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13684:


 Summary: Remove the deprecated IMetaStoreClient.showLocks()
 Key: HIVE-13684
 URL: https://issues.apache.org/jira/browse/HIVE-13684
 Project: Hive
  Issue Type: Bug
Reporter: Wei Zheng
Assignee: Wei Zheng


IMetaStoreClient.showLocks() is deprecated in Hive 2.1.

This method can be removed in Hive 2.2 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13563) Hive Streaming does not honor orc.compress.size and orc.stripe.size table properties

2016-04-20 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13563:


 Summary: Hive Streaming does not honor orc.compress.size and 
orc.stripe.size table properties
 Key: HIVE-13563
 URL: https://issues.apache.org/jira/browse/HIVE-13563
 Project: Hive
  Issue Type: Bug
  Components: ORC
Affects Versions: 2.1.0
Reporter: Wei Zheng
Assignee: Wei Zheng


According to the doc:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-HiveQLSyntax
One should be able to specify tblproperties for many ORC options.

But the settings for orc.compress.size and orc.stripe.size don't take effect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13458) Heartbeater doesn't fail query when heartbeat fails

2016-04-07 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13458:


 Summary: Heartbeater doesn't fail query when heartbeat fails
 Key: HIVE-13458
 URL: https://issues.apache.org/jira/browse/HIVE-13458
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 2.1.0
Reporter: Wei Zheng
Assignee: Wei Zheng


When a heartbeat fails to locate a lock, it should fail the current query. That 
doesn't happen, which is a bug.

Another thing is, we need to make sure stopHeartbeat really stops the 
heartbeat, i.e. no additional heartbeat will be sent, since that will break the 
assumption and cause the query to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13388) Fix inconsistent content due to Thrift changes

2016-03-30 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13388:


 Summary: Fix inconsistent content due to Thrift changes
 Key: HIVE-13388
 URL: https://issues.apache.org/jira/browse/HIVE-13388
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.1.0
Reporter: Wei Zheng
Assignee: Wei Zheng


HIVE-12442 and HIVE-12862 are related here.

If one wants to make some thrift change by following instruction here:
https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-GeneratingThriftCode
When they first execute (i.e. in a clean environment)
{code}
mvn clean install -Pthriftif -DskipTests -Dthrift.home=/usr/local -Phadoop-2
{code}
The following content will show up
{code}
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Untracked files:
  (use "git add ..." to include in what will be committed)

service-rpc/src/gen/thrift/gen-py/__init__.py
service/src/gen/

nothing added to commit but untracked files present (use "git add" to track)
{code}

They should have been included in the codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13249) Hard upper bound on number of open transactions

2016-03-09 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13249:


 Summary: Hard upper bound on number of open transactions
 Key: HIVE-13249
 URL: https://issues.apache.org/jira/browse/HIVE-13249
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 2.0.0
Reporter: Wei Zheng
Assignee: Wei Zheng


We need to have a safeguard by adding an upper bound for open transactions to 
avoid huge number of open-transaction requests, usually due to improper 
configuration of clients such as Storm.

Once that limit is reached, clients will start failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13201) Compaction shouldn't be allowed on non-ACID table

2016-03-03 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13201:


 Summary: Compaction shouldn't be allowed on non-ACID table
 Key: HIVE-13201
 URL: https://issues.apache.org/jira/browse/HIVE-13201
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 2.0.0
Reporter: Wei Zheng
Assignee: Wei Zheng


Looks like compaction is allowed on non-ACID table, although that's of no sense 
and does nothing. Moreover the compaction request will be enqueued into 
COMPACTION_QUEUE metastore table, which brings unnecessary overhead.
We should prevent compaction commands being allowed on non-ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13186) ALTER TABLE RENAME should lowercase table name and hdfs location

2016-02-29 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13186:


 Summary: ALTER TABLE RENAME should lowercase table name and hdfs 
location
 Key: HIVE-13186
 URL: https://issues.apache.org/jira/browse/HIVE-13186
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.0.0
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13175) Disallow making external tables transactional

2016-02-26 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13175:


 Summary: Disallow making external tables transactional
 Key: HIVE-13175
 URL: https://issues.apache.org/jira/browse/HIVE-13175
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.0.0
Reporter: Wei Zheng
Assignee: Wei Zheng


The fact that compactor rewrites contents of ACID tables is in conflict with 
what is expected of external tables.

Conversely, end user can write to External table which certainly not what is 
expected of ACID table.

So we should explicitly disallow making an external table ACID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13174) Remove Vectorizer noise in logs

2016-02-26 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13174:


 Summary: Remove Vectorizer noise in logs
 Key: HIVE-13174
 URL: https://issues.apache.org/jira/browse/HIVE-13174
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.0.0
Reporter: Wei Zheng
Assignee: Wei Zheng


If you have a table with a bin column you're hs2/client logs are full of the 
stack traces below. These should either be made debug or we just log the 
message not the trace.
{code}
2015-10-12 12:34:23,922 INFO  [main]: physical.Vectorizer 
(Vectorizer.java:validateExprNodeDesc(1249)) - Failed to vectorize
org.apache.hadoop.hive.ql.metadata.HiveException: No vector argument type for 
type name binary
at 
org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getConstantVectorExpression(VectorizationContext.java:872)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:443)
at 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateExprNodeDesc(Vectorizer.java:1243)
at 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateExprNodeDesc(Vectorizer.java:1234)
at 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateSelectOperator(Vectorizer.java:1100)
at 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateMapWorkOperator(Vectorizer.java:911)
at 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$MapWorkValidationNodeProcessor.process(Vectorizer.java:581)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
at 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateMapWork(Vectorizer.java:412)
at 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(Vectorizer.java:355)
at 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Vectorizer.java:330)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125)
at 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(Vectorizer.java:890)
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(TezCompiler.java:469)
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:227)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10188)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:211)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
at 
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
{code}



--
This message was sent 

[jira] [Created] (HIVE-13151) Clean up UGI objects in FileSystem cache for transactions

2016-02-24 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13151:


 Summary: Clean up UGI objects in FileSystem cache for transactions
 Key: HIVE-13151
 URL: https://issues.apache.org/jira/browse/HIVE-13151
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.0.0
Reporter: Wei Zheng
Assignee: Wei Zheng


One issue with FileSystem.CACHE is that it does not clean itself. The key in 
that cache includes UGI object. When new UGI objects are created and used with 
the FileSystem api, new entries get added to the cache.

We need to manually clean up those UGI objects once they are no longer in use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13126) Clean up MapJoinOperator properly to avoid object cache reuse with unintentional states

2016-02-23 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-13126:


 Summary: Clean up MapJoinOperator properly to avoid object cache 
reuse with unintentional states
 Key: HIVE-13126
 URL: https://issues.apache.org/jira/browse/HIVE-13126
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.0.0
Reporter: Wei Zheng
Assignee: Wei Zheng


For a given job, one task may reuse other task's object cache (plan cache) such 
as MapJoinOperator. This is fine. But if we have some dirty states left over, 
it may cause issue like wrong results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12996) Temp tables shouldn't be stored in metastore tables for ACID

2016-02-03 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-12996:


 Summary: Temp tables shouldn't be stored in metastore tables for 
ACID
 Key: HIVE-12996
 URL: https://issues.apache.org/jira/browse/HIVE-12996
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.0.0
Reporter: Wei Zheng
Assignee: Wei Zheng


Internally, INSERT INTO ... VALUES statements use temp table to accomplish its 
functionality. But temp tables shouldn't be stored in the metastore tables for 
ACID, because they are by definition only visible inside the session that 
created them, and we don't allow multiple threads inside a session. If a temp 
table is used in a query, it should be ignored by lock manager.
{code}
mysql> select * from COMPLETED_TXN_COMPONENTS;
+---+--+---+--+
| CTC_TXNID | CTC_DATABASE | CTC_TABLE | CTC_PARTITION|
+---+--+---+--+
| 1 | acid | t1| NULL |
| 1 | acid | values__tmp__table__1 | NULL |
| 2 | acid | t1| NULL |
| 2 | acid | values__tmp__table__2 | NULL |
| 3 | acid | values__tmp__table__3 | NULL |
| 3 | acid | t1| NULL |
| 4 | acid | values__tmp__table__1 | NULL |
| 4 | acid | t2p   | ds=today |
| 5 | acid | values__tmp__table__1 | NULL |
| 5 | acid | t3p   | ds=today/hour=12 |
+---+--+---+--+
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12837) Better memory estimation/allocation for hybrid grace hash join during hash table loading

2016-01-11 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-12837:


 Summary: Better memory estimation/allocation for hybrid grace hash 
join during hash table loading
 Key: HIVE-12837
 URL: https://issues.apache.org/jira/browse/HIVE-12837
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.1.0
Reporter: Wei Zheng
Assignee: Wei Zheng


This is to avoid an edge case when the memory available is very little (less 
than a single write buffer size), and we start loading the hash table. Since 
the write buffer is lazily allocated, we will easily run out of memory before 
even checking if we should spill any hash partition.

e.g.
Total memory available: 210 MB
Size of ref array of BytesBytesMultiHashMap for each hash partition: ~16 MB
Size of write buffer: 8 MB (lazy allocation)
# hash partitions: 16
# hash partitions created in memory: 13
# hash partitions created on disk: 3
Available memory left after HybridHashTableContainer initialization: 
210-16*13=2MB

Now let's say a row is to be loaded into a hash partition in memory, it will 
try to allocate an 8MB write buffer for it, but we only have 2MB, thus OOM.

Solution is to perform the check for possible spilling earlier so we can spill 
partitions if memory is about to be full, to avoid OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job

2015-12-21 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-12724:


 Summary: ACID: Major compaction fails to include the original 
bucket files into MR job
 Key: HIVE-12724
 URL: https://issues.apache.org/jira/browse/HIVE-12724
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.0.0, 2.1.0
Reporter: Wei Zheng
Assignee: Wei Zheng


How the problem happens:
* Create a non-ACID table
* Before non-ACID to ACID table conversion, we inserted row one
* After non-ACID to ACID table conversion, we inserted row two
* Both rows can be retrieved before MAJOR compaction
* After MAJOR compaction, row one is lost
{code}
hive> USE acidtest;
OK
Time taken: 0.77 seconds
hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment 
STRING)
> CLUSTERED BY (regionkey) INTO 2 BUCKETS
> STORED AS ORC;
OK
Time taken: 0.179 seconds
hive> DESC FORMATTED t1;
OK
# col_name  data_type   comment

nationkey   int
namestring
regionkey   int
comment string

# Detailed Table Information
Database:   acidtest
Owner:  wzheng
CreateTime: Mon Dec 14 15:50:40 PST 2015
LastAccessTime: UNKNOWN
Retention:  0
Location:   file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime   1450137040

# Storage Information
SerDe Library:  org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:   org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Num Buckets:2
Bucket Columns: [regionkey]
Sort Columns:   []
Storage Desc Params:
serialization.format1
Time taken: 0.198 seconds, Fetched: 28 row(s)
hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db;
Found 1 items
drwxr-xr-x   - wzheng staff 68 2015-12-14 15:50 
/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. tez, spark) 
or using Hive 1.X releases.
Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Job running in-process (local Hadoop)
2015-12-14 15:51:58,070 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local73977356_0001
Loading data to table acidtest.t1
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 2.825 seconds
hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
Found 2 items
-rwxr-xr-x   1 wzheng staff112 2015-12-14 15:51 
/Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0
-rwxr-xr-x   1 wzheng staff472 2015-12-14 15:51 
/Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0
hive> SELECT * FROM t1;
OK
1   USA 1   united states
Time taken: 0.434 seconds, Fetched: 1 row(s)
hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true');
OK
Time taken: 0.071 seconds
hive> DESC FORMATTED t1;
OK
# col_name  data_type   comment

nationkey   int
namestring
regionkey   int
comment string

# Detailed Table Information
Database:   acidtest
Owner:  wzheng
CreateTime: Mon Dec 14 15:50:40 PST 2015
LastAccessTime: UNKNOWN
Retention:  0
Location:   file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE   false
last_modified_bywzheng
last_modified_time  1450137141
numFiles2
numRows -1
rawDataSize -1
totalSize   584
transactional   true
transient_lastDdlTime   1450137141

# Storage Information
SerDe Library:  org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:   org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Num Buckets:2
Bucket Columns: [regionkey]
Sort Columns:   []
Storage Desc Params:

[jira] [Created] (HIVE-12685) Remove invalid property in common/src/test/resources/hive-site.xml

2015-12-15 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-12685:


 Summary: Remove invalid property in 
common/src/test/resources/hive-site.xml
 Key: HIVE-12685
 URL: https://issues.apache.org/jira/browse/HIVE-12685
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0, 2.1.0
Reporter: Wei Zheng
Assignee: Wei Zheng


Currently there's such a property as below, which is obviously wrong
{code}

  javax.jdo.option.ConnectionDriverName
  hive-site.xml
  Override ConfVar defined in HiveConf

{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12610) Hybrid Grace Hash Join should fail task faster if processing first batch fails, instead of continuing processing the rest

2015-12-07 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-12610:


 Summary: Hybrid Grace Hash Join should fail task faster if 
processing first batch fails, instead of continuing processing the rest
 Key: HIVE-12610
 URL: https://issues.apache.org/jira/browse/HIVE-12610
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.1
Reporter: Wei Zheng
Assignee: Wei Zheng


During processing the in memory partition(s), if there's any fatal error, such 
as Kryo exception, then we should exit early, instead of moving on to process 
the spilled partition(s).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12453) Check SessionState status before performing cleanup

2015-11-17 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-12453:


 Summary: Check SessionState status before performing cleanup
 Key: HIVE-12453
 URL: https://issues.apache.org/jira/browse/HIVE-12453
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.3.0, 2.0.0
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12444) Queries against ACID table without base directory may throw exception

2015-11-17 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-12444:


 Summary: Queries against ACID table without base directory may 
throw exception
 Key: HIVE-12444
 URL: https://issues.apache.org/jira/browse/HIVE-12444
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.1
Reporter: Wei Zheng
Assignee: Wei Zheng


Steps to reproduce:

set hive.fetch.task.conversion=minimal;
set hive.limit.optimize.enable=true;

create table acidtest1(
 c_custkey int,
 c_name string,
 c_nationkey int,
 c_acctbal double)
clustered by (c_nationkey) into 3 buckets
stored as orc
tblproperties("transactional"="true");

insert into table acidtest1
select c_custkey, c_name, c_nationkey, c_acctbal from tpch_text_10.customer;

select cast (c_nationkey as string) from acidtest.acidtest1 limit 10;
{code}
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
vertexId=vertex_1447362491939_0020_1_00, diagnostics=[Vertex 
vertex_1447362491939_0020_1_00 [Map 1] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: acidtest1 initializer failed, 
vertex=vertex_1447362491939_0020_1_00 [Map 1], java.lang.RuntimeException: 
serious problem
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1035)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1062)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:308)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:410)
at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:246)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:240)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:240)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:227)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: 
java.lang.IllegalArgumentException: delta_017_017 does not start with 
base_
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1012)
... 15 more
Caused by: java.lang.IllegalArgumentException: delta_017_017 does not 
start with base_
at org.apache.hadoop.hive.ql.io.AcidUtils.parseBase(AcidUtils.java:144)
at 
org.apache.hadoop.hive.ql.io.AcidUtils.parseBaseBucketFilename(AcidUtils.java:172)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:667)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:625)
... 4 more
]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12366) Refactor Heartbeater logic for transaction

2015-11-08 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-12366:


 Summary: Refactor Heartbeater logic for transaction
 Key: HIVE-12366
 URL: https://issues.apache.org/jira/browse/HIVE-12366
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Wei Zheng
Assignee: Wei Zheng


Currently there is a gap between the time locks acquisition and the first 
heartbeat being sent out. Normally the gap is negligible, but when it's big it 
will cause query fail since the locks are timed out by the time the heartbeat 
is sent.

Need to remove this gap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12180) Use MapJoinDesc::isHybridHashJoin() instead of the HiveConf lookup in Vectorizer

2015-10-14 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-12180:


 Summary: Use MapJoinDesc::isHybridHashJoin() instead of the 
HiveConf lookup in Vectorizer
 Key: HIVE-12180
 URL: https://issues.apache.org/jira/browse/HIVE-12180
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12074) Conditionally turn off hybrid grace hash join based on est. data size, etc

2015-10-08 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-12074:


 Summary: Conditionally turn off hybrid grace hash join based on 
est. data size, etc
 Key: HIVE-12074
 URL: https://issues.apache.org/jira/browse/HIVE-12074
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.1, 1.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng


Currently, as long as the below flag is set to true, we always do grace hash 
join for map join. This may not be necessary, esp. for cases where the data 
size is quite small, and number of distinct values is also small.

hive.mapjoin.hybridgrace.hashtable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12032) Add unit test for HIVE-9855

2015-10-05 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-12032:


 Summary: Add unit test for HIVE-9855
 Key: HIVE-12032
 URL: https://issues.apache.org/jira/browse/HIVE-12032
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.1, 1.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12041) Add unit test for HIVE-9386

2015-10-05 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-12041:


 Summary: Add unit test for HIVE-9386
 Key: HIVE-12041
 URL: https://issues.apache.org/jira/browse/HIVE-12041
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.1, 1.1.1, 1.1.0, 1.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11942) MetaException(message:The threadlocal Deadline is null, please register it first.)

2015-09-23 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-11942:


 Summary: MetaException(message:The threadlocal Deadline is null, 
please register it first.)
 Key: HIVE-11942
 URL: https://issues.apache.org/jira/browse/HIVE-11942
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.2.1
Reporter: Wei Zheng


I got such exception when running qtest unionDistinct_1.q with my WIP patch for 
another JIRA (attached). I tried the same qfile on master w/o my patch and 
couldn't reproduce.

But I don't have any change that's related to metastore, so I guess maybe my 
code exposed some bug.
{code}
2015-09-23T17:02:05,385 ERROR [main]: ql.Driver 
(SessionState.java:printError(967)) - FAILED: RuntimeException 
org.apache.hadoop.hive.ql.parse.SemanticException: MetaException(message:The 
threadlocal Deadline is null, please register it first.)
java.lang.RuntimeException: org.apache.hadoop.hive.ql.parse.SemanticException: 
MetaException(message:The threadlocal Deadline is null, please register it 
first.)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:151)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:617)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:252)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10143)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:212)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:240)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:240)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:310)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1156)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1209)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1085)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1075)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1084)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1058)
at 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:147)
at 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_unionDistinct_1(TestMiniTezCliDriver.java:131)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at junit.framework.TestCase.runTest(TestCase.java:176)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:129)
at junit.framework.TestSuite.runTest(TestSuite.java:255)
at junit.framework.TestSuite.run(TestSuite.java:250)
at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: 
MetaException(message:The threadlocal Deadline is null, please register it 
first.)
at 
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getPartitionsFromServer(PartitionPruner.java:431)
at 
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:219)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.computePartitionList(RelOptHiveTable.java:253)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HivePartitionPruneRule.perform(HivePartitionPruneRule.java:55)
at 

[jira] [Created] (HIVE-11889) Add unit test for HIVE-11449

2015-09-18 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-11889:


 Summary: Add unit test for HIVE-11449
 Key: HIVE-11889
 URL: https://issues.apache.org/jira/browse/HIVE-11889
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.3.0, 2.0.0
Reporter: Wei Zheng
Assignee: Wei Zheng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11714) Handle cross product join properly for Hybrid grace hashjoin

2015-09-01 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-11714:


 Summary: Handle cross product join properly for Hybrid grace 
hashjoin
 Key: HIVE-11714
 URL: https://issues.apache.org/jira/browse/HIVE-11714
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng


Current partitioning calculation is solely based on hash value of the key. For 
cross product join where keys are empty, all the rows will be put into 
partition 0. This falls back to the regular mapjoin behavior where we only have 
one hashtable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11566) Hybrid grace hash join should only allocate write buffer for a hash partition when first write happens

2015-08-14 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-11566:


 Summary: Hybrid grace hash join should only allocate write buffer 
for a hash partition when first write happens
 Key: HIVE-11566
 URL: https://issues.apache.org/jira/browse/HIVE-11566
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Wei Zheng
Assignee: Wei Zheng


Currently it's allocating a write buffer for a fixed number of hash partitions 
up front, which causes GC pause.

It's better to do the write buffer allocation on demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11467) WriteBuffers rounding wbSize to next power of 2 may cause OOM

2015-08-05 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-11467:


 Summary: WriteBuffers rounding wbSize to next power of 2 may cause 
OOM
 Key: HIVE-11467
 URL: https://issues.apache.org/jira/browse/HIVE-11467
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0, 2.0.0
Reporter: Wei Zheng
Assignee: Wei Zheng


If wbSize passed to WriteBuffers cstr is not power of 2, it will do a rounding 
first to the next power of 2
{code}
  public WriteBuffers(int wbSize, long maxSize) {
this.wbSize = Integer.bitCount(wbSize) == 1 ? wbSize : 
(Integer.highestOneBit(wbSize)  1);
this.wbSizeLog2 = 31 - Integer.numberOfLeadingZeros(this.wbSize);
this.offsetMask = this.wbSize - 1;
this.maxSize = maxSize;
writePos.bufferIndex = -1;
nextBufferToWrite();
  }
{code}
That may break existing memory consumption assumption for mapjoin, and 
potentially cause OOM.

The solution will be to pass a power of 2 number as wbSize from upstream during 
hashtable creation, to avoid this late expansion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11256) Update release note to clarify hadoop compatibility

2015-07-14 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-11256:


 Summary: Update release note to clarify hadoop compatibility
 Key: HIVE-11256
 URL: https://issues.apache.org/jira/browse/HIVE-11256
 Project: Hive
  Issue Type: Bug
  Components: Documentation, Website
Affects Versions: 1.2.0, 1.0.0, 0.14.0, 1.1.0, 1.0.1, 1.1.1, 1.2.1
Reporter: Wei Zheng


On the Downloads page: http://hive.apache.org/downloads.html
We should say This release works with Hadoop 1.2.0+, 2.x.y for Hive 0.14+.

This is due to HIVE-8189 starting using org.apache.hadoop.mapred.JobConf.unset 
method, which is only available since Hadoop 1.2.0.

Users using Hadoop versions earlier than that encountered NoSuchMethodError 
exception:
e.g. HIVE-11246
http://stackoverflow.com/questions/28070003/error-while-executing-select-query-in-hive



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11193) ConstantPropagateProcCtx should use a Set instead of a List to hold operators to be deleted

2015-07-07 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-11193:


 Summary: ConstantPropagateProcCtx should use a Set instead of a 
List to hold operators to be deleted
 Key: HIVE-11193
 URL: https://issues.apache.org/jira/browse/HIVE-11193
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Reporter: Wei Zheng
Assignee: Wei Zheng


During Constant Propagation optimization, sometimes a node ends up being added 
to opToDelete list more than once.

Later in ConstantPropagate transform, we try to delete that operator multiple 
times, which will cause SemanticException since the node has already been 
removed in an earlier pass.

The data structure for storing opToDelete is List. We should use Set to avoid 
the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11155) TestHiveMetaTool needs to cover updates for DBS and SDS tables in metastore as well

2015-06-30 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-11155:


 Summary: TestHiveMetaTool needs to cover updates for DBS and SDS 
tables in metastore as well
 Key: HIVE-11155
 URL: https://issues.apache.org/jira/browse/HIVE-11155
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Wei Zheng
Assignee: Wei Zheng
Priority: Minor


In HIVE-11147 we solved a Hive MetaTool bug. During testing, it has been found 
that TestHiveMetaTool doesn't cover many cases such as updating FS Root 
location for DBS and SDS table. The test coverage needs to be enhanced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11147) MetaTool doesn't update FS root location for partitions with space in name

2015-06-29 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-11147:


 Summary: MetaTool doesn't update FS root location for partitions 
with space in name
 Key: HIVE-11147
 URL: https://issues.apache.org/jira/browse/HIVE-11147
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Wei Zheng
Assignee: Wei Zheng


Problem happens when trying to update the FS root location:
{code}
# HIVE_CONF_DIR=/etc/hive/conf.server/ hive --service metatool -dryRun 
-updateLocation hdfs://mycluster hdfs://c6401.ambari.apache.org:8020
...
Looking for LOCATION_URI field in DBS table to update..
Dry Run of updateLocation on table DBS..
old location: hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse new 
location: hdfs://mycluster/apps/hive/warehouse
Found 1 records in DBS table to update
Looking for LOCATION field in SDS table to update..
Dry Run of updateLocation on table SDS..
old location: 
hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/web_sales/ws_web_site_sk=12
 new location: hdfs://mycluster/apps/hive/warehouse/web_sales/ws_web_site_sk=12
old location: 
hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/web_sales/ws_web_site_sk=13
 new location: hdfs://mycluster/apps/hive/warehouse/web_sales/ws_web_site_sk=13
...
Found 143 records in SDS table to update
Warning: Found records with bad LOCATION in SDS table..
bad location URI: 
hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=Advanced
 Degree
bad location URI: 
hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=Advanced
 Degree
bad location URI: 
hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=4
 yr Degree
bad location URI: 
hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=4
 yr Degree
bad location URI: 
hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=2
 yr Degree
bad location URI: 
hdfs://c6401.ambari.apache.org:8020/apps/hive/warehouse/customer_demographics/cd_education_status=2
 yr Degree
{code}

The reason why some entries are marked as bad location is that they have space 
character in the partition name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11038) MiniTezCli tests are hanging

2015-06-17 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-11038:


 Summary: MiniTezCli tests are hanging
 Key: HIVE-11038
 URL: https://issues.apache.org/jira/browse/HIVE-11038
 Project: Hive
  Issue Type: Bug
  Components: Hive, Tez
Affects Versions: 2.0.0
Reporter: Wei Zheng
Priority: Blocker


Whenever running a MiniTezCli test, it just hangs.

Here's the maven command to run a test:
{code}
$ mvn test -Phadoop-2 -Dtest=TestMiniTezCliDriver 
-Dqfile=dynamic_partition_pruning.q
{code}
Here's the tail of org.apache.hadoop.hive.cli.TestMiniTezCliDriver-output.txt:
{code}
Status: Running (Executing on YARN cluster with App id 
application_1434574617753_0001)

Map 1: -/-  Reducer 2: 0/1
Map 1: 1/1  Reducer 2: 1/1
POSTHOOK: query: analyze table lineitem compute statistics for columns
POSTHOOK: type: QUERY
POSTHOOK: Input: default@lineitem
POSTHOOK: Output: 
file:/Users/wzheng/bf/hive/itests/qtest/target/tmp/localscratchdir/c684ea6a-11b1-4253-a529-c3778695b72a/hive_2015-06-17_13-57-19_047_1275844087077606719-1/-mr-1
OK
Time taken: 0.387 seconds
Begin query: dynamic_partition_pruning.q
ivysettings.xml file not found in HIVE_HOME or 
HIVE_CONF_DIR,/Users/wzheng/bf/hive/conf/ivysettings.xml will be used
{code}
And here's the jstack output (partial):
{code}
main #1 prio=5 os_prio=31 tid=0x7fc75e805800 nid=0x1303 waiting on 
condition [0x000101d84000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor.monitorExecution(TezJobMonitor.java:378)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:168)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1657)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1416)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1197)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1061)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1033)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1007)
at 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:146)
at 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning(TestMiniTezCliDriver.java:130)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at junit.framework.TestCase.runTest(TestCase.java:176)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:129)
at junit.framework.TestSuite.runTest(TestSuite.java:255)
at junit.framework.TestSuite.run(TestSuite.java:250)
at 
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)

VM Thread os_prio=31 tid=0x7fc75e830800 nid=0x3103 runnable

GC task thread#0 (ParallelGC) os_prio=31 tid=0x7fc75e811800 nid=0x2103 
runnable

GC task thread#1 (ParallelGC) os_prio=31 tid=0x7fc75f00 nid=0x2303 
runnable

GC task thread#2 (ParallelGC) os_prio=31 tid=0x7fc75f001000 nid=0x2503 
runnable

GC task thread#3 (ParallelGC) os_prio=31 tid=0x7fc75f80 nid=0x2703 
runnable

GC task thread#4 (ParallelGC) os_prio=31 tid=0x7fc75f801000 nid=0x2903 

[jira] [Created] (HIVE-10559) IndexOutOfBoundsException with RemoveDynamicPruningBySize

2015-04-30 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-10559:


 Summary: IndexOutOfBoundsException with RemoveDynamicPruningBySize
 Key: HIVE-10559
 URL: https://issues.apache.org/jira/browse/HIVE-10559
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 1.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng


The problem can be reproduced by running the script attached.

Backtrace
{code}
2015-04-29 10:34:36,390 ERROR [main]: ql.Driver 
(SessionState.java:printError(956)) - FAILED: IndexOutOfBoundsException Index: 
0, Size: 0
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at 
org.apache.hadoop.hive.ql.optimizer.RemoveDynamicPruningBySize.process(RemoveDynamicPruningBySize.java:61)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
at 
org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:77)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsDependentOptimizations(TezCompiler.java:281)
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:123)
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10092)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9932)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1026)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1000)
at 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:139)
at 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_q85(TestMiniTezCliDriver.java:123)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at junit.framework.TestCase.runTest(TestCase.java:176)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:129)
at junit.framework.TestSuite.runTest(TestSuite.java:255)
at junit.framework.TestSuite.run(TestSuite.java:250)
at 
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10403) Add n-way join support for Hybrid Grace Hash Join

2015-04-20 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-10403:


 Summary: Add n-way join support for Hybrid Grace Hash Join
 Key: HIVE-10403
 URL: https://issues.apache.org/jira/browse/HIVE-10403
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng


Currently Hybrid Grace Hash Join only supports 2-way join (one big table and 
one small table). This task will enable n-way join (one big table and multiple 
small tables).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10368) VectorExpressionWriter doesn't match vectorColumn during row spilling in HybridGraceHashJoin

2015-04-16 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-10368:


 Summary: VectorExpressionWriter doesn't match vectorColumn during 
row spilling in HybridGraceHashJoin
 Key: HIVE-10368
 URL: https://issues.apache.org/jira/browse/HIVE-10368
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng


This problem was exposed by HIVE-10284, when testing vectorized_context

Below is the query and backtrace:
{code}
select store.s_city, ss_net_profit
from store_sales
JOIN store ON store_sales.ss_store_sk = store.s_store_sk
JOIN household_demographics ON store_sales.ss_hdemo_sk = 
household_demographics.hd_demo_sk
limit 100
{code}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector cannot be cast to 
org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:175)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.getRowObject(VectorMapJoinOperator.java:347)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.spillBigTableRow(VectorMapJoinOperator.java:306)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:390)
... 24 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10072) Add vectorization support for Hybrid Grace Hash Join

2015-03-24 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-10072:


 Summary: Add vectorization support for Hybrid Grace Hash Join
 Key: HIVE-10072
 URL: https://issues.apache.org/jira/browse/HIVE-10072
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng
 Fix For: 1.2.0


This task is to enable vectorization support for Hybrid Grace Hash Join feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9833) udaf_percentile_approx_23.q fails intermittently

2015-03-02 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-9833:
---

 Summary: udaf_percentile_approx_23.q fails intermittently
 Key: HIVE-9833
 URL: https://issues.apache.org/jira/browse/HIVE-9833
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Wei Zheng


For the query below:

select percentile_approx(case when key  100 then cast('NaN' as double) else 
key end, 0.5) from bucket

the base test result is 341.5.

But sometimes it returns 342 during QA testing.

It happens randomly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9277) Hybrid Hybrid Grace Hash Join

2015-02-20 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-9277:

Attachment: HIVE-9277.03.patch

Uploaded 3rd patch for testing

 Hybrid Hybrid Grace Hash Join
 -

 Key: HIVE-9277
 URL: https://issues.apache.org/jira/browse/HIVE-9277
 Project: Hive
  Issue Type: New Feature
  Components: Physical Optimizer
Reporter: Wei Zheng
Assignee: Wei Zheng
  Labels: join
 Attachments: HIVE-9277.01.patch, HIVE-9277.02.patch, 
 HIVE-9277.03.patch, High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf


 We are proposing an enhanced hash join algorithm called _“hybrid hybrid grace 
 hash join”_.
 We can benefit from this feature as illustrated below:
 * The query will not fail even if the estimated memory requirement is 
 slightly wrong
 * Expensive garbage collection overhead can be avoided when hash table grows
 * Join execution using a Map join operator even though the small table 
 doesn't fit in memory as spilling some data from the build and probe sides 
 will still be cheaper than having to shuffle the large fact table
 The design was based on Hadoop’s parallel processing capability and 
 significant amount of memory available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9277) Hybrid Hybrid Grace Hash Join

2015-02-19 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-9277:

Attachment: HIVE-9277.01.patch

Uploading 1st patch for testing

 Hybrid Hybrid Grace Hash Join
 -

 Key: HIVE-9277
 URL: https://issues.apache.org/jira/browse/HIVE-9277
 Project: Hive
  Issue Type: New Feature
  Components: Physical Optimizer
Reporter: Wei Zheng
Assignee: Wei Zheng
  Labels: join
 Attachments: HIVE-9277.01.patch, 
 High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf


 We are proposing an enhanced hash join algorithm called “hybrid hybrid grace 
 hash join”. We can benefit from this feature as illustrated below:
 o The query will not fail even if the estimated memory requirement is 
 slightly wrong
 o Expensive garbage collection overhead can be avoided when hash table grows
 o Join execution using a Map join operator even though the small table 
 doesn't fit in memory as spilling some data from the build and probe sides 
 will still be cheaper than having to shuffle the large fact table
 The design was based on Hadoop’s parallel processing capability and 
 significant amount of memory available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9277) Hybrid Hybrid Grace Hash Join

2015-02-19 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-9277:

Status: Patch Available  (was: Open)

 Hybrid Hybrid Grace Hash Join
 -

 Key: HIVE-9277
 URL: https://issues.apache.org/jira/browse/HIVE-9277
 Project: Hive
  Issue Type: New Feature
  Components: Physical Optimizer
Reporter: Wei Zheng
Assignee: Wei Zheng
  Labels: join
 Attachments: HIVE-9277.01.patch, 
 High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf


 We are proposing an enhanced hash join algorithm called “hybrid hybrid grace 
 hash join”. We can benefit from this feature as illustrated below:
 o The query will not fail even if the estimated memory requirement is 
 slightly wrong
 o Expensive garbage collection overhead can be avoided when hash table grows
 o Join execution using a Map join operator even though the small table 
 doesn't fit in memory as spilling some data from the build and probe sides 
 will still be cheaper than having to shuffle the large fact table
 The design was based on Hadoop’s parallel processing capability and 
 significant amount of memory available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9277) Hybrid Hybrid Grace Hash Join

2015-02-19 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-9277:

Attachment: HIVE-9277.02.patch

Uploading 2nd patch for testing

 Hybrid Hybrid Grace Hash Join
 -

 Key: HIVE-9277
 URL: https://issues.apache.org/jira/browse/HIVE-9277
 Project: Hive
  Issue Type: New Feature
  Components: Physical Optimizer
Reporter: Wei Zheng
Assignee: Wei Zheng
  Labels: join
 Attachments: HIVE-9277.01.patch, HIVE-9277.02.patch, 
 High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf


 We are proposing an enhanced hash join algorithm called “hybrid hybrid grace 
 hash join”. We can benefit from this feature as illustrated below:
 o The query will not fail even if the estimated memory requirement is 
 slightly wrong
 o Expensive garbage collection overhead can be avoided when hash table grows
 o Join execution using a Map join operator even though the small table 
 doesn't fit in memory as spilling some data from the build and probe sides 
 will still be cheaper than having to shuffle the large fact table
 The design was based on Hadoop’s parallel processing capability and 
 significant amount of memory available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9676) add serialization to BytesBytes hashtable

2015-02-12 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319205#comment-14319205
 ] 

Wei Zheng commented on HIVE-9676:
-

Thanks [~sershe]! I will test out the code and give you an update.

 add serialization to BytesBytes hashtable
 -

 Key: HIVE-9676
 URL: https://issues.apache.org/jira/browse/HIVE-9676
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-9676.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9277) Hybrid Hybrid Grace Hash Join

2015-01-23 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289866#comment-14289866
 ] 

Wei Zheng commented on HIVE-9277:
-

Thanks [~leftylev] for your review! I've updated the wording. Hope this time 
it's better explained.

 Hybrid Hybrid Grace Hash Join
 -

 Key: HIVE-9277
 URL: https://issues.apache.org/jira/browse/HIVE-9277
 Project: Hive
  Issue Type: New Feature
  Components: Physical Optimizer
Reporter: Wei Zheng
Assignee: Wei Zheng
  Labels: join
 Attachments: High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf


 We are proposing an enhanced hash join algorithm called “hybrid hybrid grace 
 hash join”. We can benefit from this feature as illustrated below:
 o The query will not fail even if the estimated memory requirement is 
 slightly wrong
 o Expensive garbage collection overhead can be avoided when hash table grows
 o Join execution using a Map join operator even though the small table 
 doesn't fit in memory as spilling some data from the build and probe sides 
 will still be cheaper than having to shuffle the large fact table
 The design was based on Hadoop’s parallel processing capability and 
 significant amount of memory available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9277) Hybrid Hybrid Grace Hash Join

2015-01-21 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-9277:

Attachment: High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf

Uploaded design doc version 1.

 Hybrid Hybrid Grace Hash Join
 -

 Key: HIVE-9277
 URL: https://issues.apache.org/jira/browse/HIVE-9277
 Project: Hive
  Issue Type: New Feature
  Components: Physical Optimizer
Reporter: Wei Zheng
Assignee: Wei Zheng
  Labels: join
 Attachments: High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf


 We are proposing an enhanced hash join algorithm called “hybrid hybrid grace 
 hash join”. We can benefit from this feature as illustrated below:
 o The query will not fail even if the estimated memory requirement is 
 slightly wrong
 o Expensive garbage collection overhead can be avoided when hash table grows
 o Join execution using a Map join operator even though the small table 
 doesn't fit in memory as spilling some data from the build and probe sides 
 will still be cheaper than having to shuffle the large fact table
 The design was based on Hadoop’s parallel processing capability and 
 significant amount of memory available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9382) Query got rerun with Global Limit optimization on and Fetch optimization off

2015-01-16 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14280676#comment-14280676
 ] 

Wei Zheng commented on HIVE-9382:
-

The test failure is irrelevant.

 Query got rerun with Global Limit optimization on and Fetch optimization off
 

 Key: HIVE-9382
 URL: https://issues.apache.org/jira/browse/HIVE-9382
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Wei Zheng
Assignee: Wei Zheng
 Attachments: HIVE-9382.1.patch


 When Global Limit optimization is enabled, and Fetch Optimization for Simple 
 Queries is off or not applicable, some queries with LIMIT clause will run 
 twice.
 set hive.limit.optimize.enable=true;
 set hive.fetch.task.conversion=none;
 For example,
 {code:sql}
 hive select * from t1 limit 10;
 Query ID = wzheng_20150107185252_4a6d0e65-9e58-464b-9ed3-9177740c30a9
 Total jobs = 1
 Launching Job 1 out of 1
 Status: Running (Executing on YARN cluster with App id 
 application_1420567249453_0039)
 
 VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
 KILLED
 
 Map 1 ..   SUCCEEDED  1  100   0  
  0
 
 VERTICES: 01/01  [==] 100%  ELAPSED TIME: 0.41 s
 
 OK
 20120899848   119820  32627   982976  509206  0.000100898
 20120899745   119820  32627   982976  509206  0.000100898
 20120899739   119820  32627   982976  509206  0.000100898
 20120899847   119820  32627   982976  509206  0.000100898
 201208613588  119820  32627   982976  509206  0.000100898
 20120899809   119820  32627   982976  509206  0.000100898
 20120899725   119820  32627   982976  509206  0.000100898
 20120899666   119820  32627   982976  509206  0.000100898
 20120899743   119820  32627   982976  509206  0.000100898
 20120899801   119820  32627   982976  509206  0.000100898
 Retry query with a different approach...
 Query ID = wzheng_20150107185252_8a77f793-cad7-4c6b-b64a-07d8310970b9
 Total jobs = 1
 Launching Job 1 out of 1
 Status: Running (Executing on YARN cluster with App id 
 application_1420567249453_0039)
 
 VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
 KILLED
 
 Map 1 ..   SUCCEEDED30930900   0  
  0
 
 VERTICES: 01/01  [==] 100%  ELAPSED TIME: 2.04 s
 
 OK
 20120899848   119820  32627   982976  509206  0.000100898
 20120899745   119820  32627   982976  509206  0.000100898
 20120899739   119820  32627   982976  509206  0.000100898
 20120899847   119820  32627   982976  509206  0.000100898
 201208613588  119820  32627   982976  509206  0.000100898
 20120899809   119820  32627   982976  509206  0.000100898
 20120899725   119820  32627   982976  509206  0.000100898
 20120899666   119820  32627   982976  509206  0.000100898
 20120899743   119820  32627   982976  509206  0.000100898
 20120899801   119820  32627   982976  509206  0.000100898
 Time taken: 2.748 seconds, Fetched: 10 row(s)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >