[jira] [Created] (HIVE-11667) Support Trash and Snapshot in Truncate Table
Chaoyu Tang created HIVE-11667: -- Summary: Support Trash and Snapshot in Truncate Table Key: HIVE-11667 URL: https://issues.apache.org/jira/browse/HIVE-11667 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Currently Truncate Table (or Partition) is implemented using FileSystem.delete and then recreate the directory. It does not support HDFS Trash if it is turned on. The table/partition can not be truncated if it has a snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11666) Discrepency in INSERT OVERWRITE LOCAL DIRECTORY between Beeline and CLI
Chaoyu Tang created HIVE-11666: -- Summary: Discrepency in INSERT OVERWRITE LOCAL DIRECTORY between Beeline and CLI Key: HIVE-11666 URL: https://issues.apache.org/jira/browse/HIVE-11666 Project: Hive Issue Type: Sub-task Components: CLI, HiveServer2 Reporter: Chaoyu Tang Hive CLI writes to local host when INSERT OVERWRITE LOCAL DIRECTORY. But Beeline writes to HS2 local directory. For a user migrating from CLI to Beeline, it might be a big chance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
what's the plan of the next release?
Anyone who knows when we are supposed to release the next version?As we know we dont have any release since June. 来自我的新浪邮箱android客户端
Review Request 37852: HIVE-11668 make sure directsql calls pre-query init when needed
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37852/ --- Review request for hive and Ashutosh Chauhan. Repository: hive-git Description --- blah Diffs - metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 522fcc2 Diff: https://reviews.apache.org/r/37852/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Created] (HIVE-11669) OrcFileDump service should support directories
Prasanth Jayachandran created HIVE-11669: Summary: OrcFileDump service should support directories Key: HIVE-11669 URL: https://issues.apache.org/jira/browse/HIVE-11669 Project: Hive Issue Type: Bug Affects Versions: 1.3.0, 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran orcfiledump service does not support directories. If directory is specified then the program should iterate through all the files in the directory and perform file dump. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11668) make sure directsql calls pre-query init when needed
Sergey Shelukhin created HIVE-11668: --- Summary: make sure directsql calls pre-query init when needed Key: HIVE-11668 URL: https://issues.apache.org/jira/browse/HIVE-11668 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin See comments in HIVE-11123 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 37810: HIVE-10021 Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled
On Aug. 27, 2015, 8:57 p.m., Chao Sun wrote: Patch looks good. Just one question: instead of populating the user name in several places, is it possible to use the one stored in SessionState (by calling SessionState.get().getUserName() before creating the Driver)? Didn't know there is one in SessionState. Let me take a look how those usernames are co-related. - Aihua --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37810/#review96749 --- On Aug. 26, 2015, 8:14 p.m., Aihua Xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37810/ --- (Updated Aug. 26, 2015, 8:14 p.m.) Review request for hive. Repository: hive-git Description --- HIVE-10021 Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled Diffs - ql/src/java/org/apache/hadoop/hive/ql/Context.java ca0d487b8195da7c848a8212a5b869620ee857af ql/src/java/org/apache/hadoop/hive/ql/Driver.java 4030075dc5393b60bff25c50a700ccffdb1720bc ql/src/java/org/apache/hadoop/hive/ql/index/AbstractIndexHandler.java 1d27306ef1a644e4ad37a73f6f9eeed92cf79a5a ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java e67996d3fba94e9ff33078f4bf7fd97141103138 ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java b076933b7bd6611cd4b678441cd1cc2b0e16786b ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 1dbe230917564d5c17c198a898d4de7b52adab3b ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java 92cae67e9111d42b36594cf44a174fb9f0812a7a ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 9f8c756bed3811f04ec1dc8625f89724faab99ff Diff: https://reviews.apache.org/r/37810/diff/ Testing --- Thanks, Aihua Xu
Re: Review Request 37810: HIVE-10021 Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37810/#review96749 --- Patch looks good. Just one question: instead of populating the user name in several places, is it possible to use the one stored in SessionState (by calling SessionState.get().getUserName() before creating the Driver)? - Chao Sun On Aug. 26, 2015, 8:14 p.m., Aihua Xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37810/ --- (Updated Aug. 26, 2015, 8:14 p.m.) Review request for hive. Repository: hive-git Description --- HIVE-10021 Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled Diffs - ql/src/java/org/apache/hadoop/hive/ql/Context.java ca0d487b8195da7c848a8212a5b869620ee857af ql/src/java/org/apache/hadoop/hive/ql/Driver.java 4030075dc5393b60bff25c50a700ccffdb1720bc ql/src/java/org/apache/hadoop/hive/ql/index/AbstractIndexHandler.java 1d27306ef1a644e4ad37a73f6f9eeed92cf79a5a ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java e67996d3fba94e9ff33078f4bf7fd97141103138 ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java b076933b7bd6611cd4b678441cd1cc2b0e16786b ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 1dbe230917564d5c17c198a898d4de7b52adab3b ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java 92cae67e9111d42b36594cf44a174fb9f0812a7a ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 9f8c756bed3811f04ec1dc8625f89724faab99ff Diff: https://reviews.apache.org/r/37810/diff/ Testing --- Thanks, Aihua Xu
[jira] [Created] (HIVE-11670) Strip out password information from TezSessionState configuration
Hari Sankar Sivarama Subramaniyan created HIVE-11670: Summary: Strip out password information from TezSessionState configuration Key: HIVE-11670 URL: https://issues.apache.org/jira/browse/HIVE-11670 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Remove password information from configuration copy that is sent to Yarn/Tez. We don't need it there. The config entries can potentially be visible to other users. HIVE-10508 had the fix which removed this in certain places, however, when I initiated a session via Hive Cli, I could still see the password information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 37778: HIVE-11634
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37778/ --- (Updated Aug. 27, 2015, 10:30 p.m.) Review request for hive, Ashutosh Chauhan, Jesús Camacho Rodríguez, and John Pullokkaran. Repository: hive-git Description --- Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...) Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8a00079 ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 439f616 ql/src/java/org/apache/hadoop/hive/ql/optimizer/PartitionColumnsSeparator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/PointLookupOptimizer.java d83636d ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/OpProcFactory.java 7262164 ql/src/java/org/apache/hadoop/hive/ql/plan/FilterDesc.java 6a31689 ql/src/test/queries/clientpositive/pcs.q PRE-CREATION ql/src/test/results/clientpositive/pcs.q.out PRE-CREATION Diff: https://reviews.apache.org/r/37778/diff/ Testing --- Local testing done. More unit tests coming in the next patch. Thanks, Hari Sankar Sivarama Subramaniyan
[jira] [Created] (HIVE-11671) Optimize RuleRegExp in DPP codepath
Rajesh Balamohan created HIVE-11671: --- Summary: Optimize RuleRegExp in DPP codepath Key: HIVE-11671 URL: https://issues.apache.org/jira/browse/HIVE-11671 Project: Hive Issue Type: Sub-task Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan When running a large query with DPP in its codepath, RuleRegExp came up as hotspot. Creating this JIRA to optimize RuleRegExp.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11677) Access to opHandleSet in HiveSession should be synchronized
Mohit Sabharwal created HIVE-11677: -- Summary: Access to opHandleSet in HiveSession should be synchronized Key: HIVE-11677 URL: https://issues.apache.org/jira/browse/HIVE-11677 Project: Hive Issue Type: Bug Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal In the scenario where multiple threads share the same session, reading/writing to HiveSessionImpl.opHandleSet can lead to a race condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11678) Add AggregateProjectMergeRule
Ashutosh Chauhan created HIVE-11678: --- Summary: Add AggregateProjectMergeRule Key: HIVE-11678 URL: https://issues.apache.org/jira/browse/HIVE-11678 Project: Hive Issue Type: New Feature Components: CBO, Logical Optimizer Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan This will help to get rid of extra projects on top of Aggregation, thus compacting query plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11665) ORC StringDictionaryReader should not used Chunked buffers
Gopal V created HIVE-11665: -- Summary: ORC StringDictionaryReader should not used Chunked buffers Key: HIVE-11665 URL: https://issues.apache.org/jira/browse/HIVE-11665 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Prasanth Jayachandran ORC String Dictionary Reader is slow due to the chunking of the input stream. {code} private void readDictionaryStream(InStream in) throws IOException { if (in != null) { // Guard against empty dictionary stream. if (in.available() 0) { dictionaryBuffer = new DynamicByteArray(64, in.available()); dictionaryBuffer.readAll(in); // Since its start of strip invalidate the cache. dictionaryBufferInBytesCache = null; } in.close(); } else { dictionaryBuffer = null; } } {code} The fact that the data is chunked offers no advantage for the read-path where there is no grow() operation for memory savings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11672) Hive Streaming API handles bucketing incorrectly
Raj Bains created HIVE-11672: Summary: Hive Streaming API handles bucketing incorrectly Key: HIVE-11672 URL: https://issues.apache.org/jira/browse/HIVE-11672 Project: Hive Issue Type: Bug Affects Versions: 1.2.1 Reporter: Raj Bains Assignee: Roshan Naik Priority: Critical Fix For: 1.2.2 Hive Streaming API allows the clients to get a random bucket and then insert data into it. However, this leads to incorrect bucketing as Hive expects data to be distributed into buckets based on a hash function applied to bucket key. The data is inserted randomly by the clients right now. They have no way of # Knowing what bucket a row (tuple) belongs to # Asking for a specific bucket There are optimization such as Sort Merge Join and Bucket Map Join that rely on the data being correctly distributed across buckets and these will cause incorrect read results if the data is not distributed correctly. There are two obvious design choices # Hive Streaming API should fix this internally by distributing the data correctly # Hive Streaming API should expose data distribution scheme to the clients and allow them to distribute the data correctly The first option will mean every client thread will write to many buckets, causing many small files in each bucket and too many connections open. this does not seem feasible. The second option pushes more functionality into the client of the Hive Streaming API, but can maintain high throughput and write good sized ORC files. This option seems preferable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11673) LLAP: TestLlapTaskSchedulerService is flaky
Sergey Shelukhin created HIVE-11673: --- Summary: LLAP: TestLlapTaskSchedulerService is flaky Key: HIVE-11673 URL: https://issues.apache.org/jira/browse/HIVE-11673 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Siddharth Seth {noformat} java.lang.Exception: test timed out after 1 milliseconds at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService$TaskExecutorServiceForTest$InternalCompletionListenerForTest.awaitCompletion(TestTaskExecutorService.java:244) at org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService$TaskExecutorServiceForTest$InternalCompletionListenerForTest.access$000(TestTaskExecutorService.java:208) at org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption(TestTaskExecutorService.java:168) {noformat} Cannot repro locally. See HIVE-11642 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11674) LLAP: Update tez version to fix build
Prasanth Jayachandran created HIVE-11674: Summary: LLAP: Update tez version to fix build Key: HIVE-11674 URL: https://issues.apache.org/jira/browse/HIVE-11674 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran With tez version 0.8.0-SNAPSHOT the llap branch build is broken. Throws the following exception {code} work/hive/hive-git/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java:[60,41] package org.apache.tez.serviceplugins.api does not exist [ERROR] /work/hive/hive-git/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java:[61,41] package org.apache.tez.serviceplugins.api does not exist [ERROR] /work/hive/hive-git/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java:[62,41] package org.apache.tez.serviceplugins.api does not exist [ERROR] /work/hive/hive-git/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java:[63,41] package org.apache.tez.serviceplugins.api does not exist [ERROR] /work/hive/hive-git/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java:[111,37] cannot find symbol [ERROR] symbol: class VertexExecutionContext [ERROR] location: class org.apache.tez.dag.api.Vertex [ERROR] /work/hive/hive-git/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java:[673,11] cannot find symbol 7:10 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11675) make use of file footer PPD API in ETL strategy or separate strategy
Sergey Shelukhin created HIVE-11675: --- Summary: make use of file footer PPD API in ETL strategy or separate strategy Key: HIVE-11675 URL: https://issues.apache.org/jira/browse/HIVE-11675 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Need to take a look at the best flow. It won't be much different if we do filtering metastore call for each partition. So perhaps we'd need the custom sync point/batching after all. Or we can make it opportunistic and not fetch any footers unless it can be pushed down to metastore or fetched from local cache, that way the only slow threaded op is directory listings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11676) implement metastore API to do file footer PPD
Sergey Shelukhin created HIVE-11676: --- Summary: implement metastore API to do file footer PPD Key: HIVE-11676 URL: https://issues.apache.org/jira/browse/HIVE-11676 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Need to pass on the expression/sarg, extract column stats from footer (at write time?) and then apply one to the other. I may file a separate JIRA for ORC changes cause that is usually PITA -- This message was sent by Atlassian JIRA (v6.3.4#6332)