[jira] [Created] (HIVE-11667) Support Trash and Snapshot in Truncate Table

2015-08-27 Thread Chaoyu Tang (JIRA)
Chaoyu Tang created HIVE-11667:
--

 Summary: Support Trash and Snapshot in Truncate Table
 Key: HIVE-11667
 URL: https://issues.apache.org/jira/browse/HIVE-11667
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor


Currently Truncate Table (or Partition) is implemented using FileSystem.delete 
and then recreate the directory. It does not support HDFS Trash if it is turned 
on. The table/partition can not be truncated if it has a snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11666) Discrepency in INSERT OVERWRITE LOCAL DIRECTORY between Beeline and CLI

2015-08-27 Thread Chaoyu Tang (JIRA)
Chaoyu Tang created HIVE-11666:
--

 Summary: Discrepency in INSERT OVERWRITE LOCAL DIRECTORY between 
Beeline and CLI
 Key: HIVE-11666
 URL: https://issues.apache.org/jira/browse/HIVE-11666
 Project: Hive
  Issue Type: Sub-task
  Components: CLI, HiveServer2
Reporter: Chaoyu Tang


Hive CLI writes to local host when INSERT OVERWRITE LOCAL DIRECTORY. But 
Beeline writes to HS2 local directory. For a user migrating from CLI to 
Beeline, it might be a big chance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


what's the plan of the next release?

2015-08-27 Thread darren_duan
Anyone who knows when we are supposed to release the next version?As we know we 
 dont have any release since June.


来自我的新浪邮箱android客户端


Review Request 37852: HIVE-11668 make sure directsql calls pre-query init when needed

2015-08-27 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37852/
---

Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

blah


Diffs
-

  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
522fcc2 

Diff: https://reviews.apache.org/r/37852/diff/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Created] (HIVE-11669) OrcFileDump service should support directories

2015-08-27 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-11669:


 Summary: OrcFileDump service should support directories
 Key: HIVE-11669
 URL: https://issues.apache.org/jira/browse/HIVE-11669
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.3.0, 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


orcfiledump service does not support directories. If directory is specified 
then the program should iterate through all the files in the directory and 
perform file dump.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11668) make sure directsql calls pre-query init when needed

2015-08-27 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-11668:
---

 Summary: make sure directsql calls pre-query init when needed
 Key: HIVE-11668
 URL: https://issues.apache.org/jira/browse/HIVE-11668
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


See comments in HIVE-11123



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 37810: HIVE-10021 Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled

2015-08-27 Thread Aihua Xu


 On Aug. 27, 2015, 8:57 p.m., Chao Sun wrote:
  Patch looks good. Just one question: instead of populating the user name in 
  several places, is it possible to use the one stored in SessionState (by 
  calling SessionState.get().getUserName() before creating the Driver)?

Didn't know there is one in SessionState. Let me take a look how those 
usernames are co-related.


- Aihua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37810/#review96749
---


On Aug. 26, 2015, 8:14 p.m., Aihua Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/37810/
 ---
 
 (Updated Aug. 26, 2015, 8:14 p.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-10021 Alter index rebuild statements submitted through HiveServer2 
 fail when Sentry is enabled
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
 ca0d487b8195da7c848a8212a5b869620ee857af 
   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
 4030075dc5393b60bff25c50a700ccffdb1720bc 
   ql/src/java/org/apache/hadoop/hive/ql/index/AbstractIndexHandler.java 
 1d27306ef1a644e4ad37a73f6f9eeed92cf79a5a 
   ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
 e67996d3fba94e9ff33078f4bf7fd97141103138 
   ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 
 b076933b7bd6611cd4b678441cd1cc2b0e16786b 
   
 ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
 1dbe230917564d5c17c198a898d4de7b52adab3b 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java 
 92cae67e9111d42b36594cf44a174fb9f0812a7a 
   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
 9f8c756bed3811f04ec1dc8625f89724faab99ff 
 
 Diff: https://reviews.apache.org/r/37810/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Aihua Xu
 




Re: Review Request 37810: HIVE-10021 Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled

2015-08-27 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37810/#review96749
---


Patch looks good. Just one question: instead of populating the user name in 
several places, is it possible to use the one stored in SessionState (by 
calling SessionState.get().getUserName() before creating the Driver)?

- Chao Sun


On Aug. 26, 2015, 8:14 p.m., Aihua Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/37810/
 ---
 
 (Updated Aug. 26, 2015, 8:14 p.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-10021 Alter index rebuild statements submitted through HiveServer2 
 fail when Sentry is enabled
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
 ca0d487b8195da7c848a8212a5b869620ee857af 
   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
 4030075dc5393b60bff25c50a700ccffdb1720bc 
   ql/src/java/org/apache/hadoop/hive/ql/index/AbstractIndexHandler.java 
 1d27306ef1a644e4ad37a73f6f9eeed92cf79a5a 
   ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
 e67996d3fba94e9ff33078f4bf7fd97141103138 
   ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 
 b076933b7bd6611cd4b678441cd1cc2b0e16786b 
   
 ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
 1dbe230917564d5c17c198a898d4de7b52adab3b 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java 
 92cae67e9111d42b36594cf44a174fb9f0812a7a 
   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
 9f8c756bed3811f04ec1dc8625f89724faab99ff 
 
 Diff: https://reviews.apache.org/r/37810/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Aihua Xu
 




[jira] [Created] (HIVE-11670) Strip out password information from TezSessionState configuration

2015-08-27 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-11670:


 Summary: Strip out password information from TezSessionState 
configuration
 Key: HIVE-11670
 URL: https://issues.apache.org/jira/browse/HIVE-11670
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


Remove password information from configuration copy that is sent to Yarn/Tez. 
We don't need it there. The config entries can potentially be visible to other 
users.
HIVE-10508 had the fix which removed this in certain places, however, when I 
initiated a session via Hive Cli, I could still see the password information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 37778: HIVE-11634

2015-08-27 Thread Hari Sankar Sivarama Subramaniyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37778/
---

(Updated Aug. 27, 2015, 10:30 p.m.)


Review request for hive, Ashutosh Chauhan, Jesús Camacho Rodríguez, and John 
Pullokkaran.


Repository: hive-git


Description
---

Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8a00079 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 439f616 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/PartitionColumnsSeparator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/PointLookupOptimizer.java 
d83636d 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/OpProcFactory.java 
7262164 
  ql/src/java/org/apache/hadoop/hive/ql/plan/FilterDesc.java 6a31689 
  ql/src/test/queries/clientpositive/pcs.q PRE-CREATION 
  ql/src/test/results/clientpositive/pcs.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/37778/diff/


Testing
---

Local testing done. More unit tests coming in the next patch.


Thanks,

Hari Sankar Sivarama Subramaniyan



[jira] [Created] (HIVE-11671) Optimize RuleRegExp in DPP codepath

2015-08-27 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HIVE-11671:
---

 Summary: Optimize RuleRegExp in DPP codepath
 Key: HIVE-11671
 URL: https://issues.apache.org/jira/browse/HIVE-11671
 Project: Hive
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan


When running a large query with DPP in its codepath, RuleRegExp came up as 
hotspot. Creating this JIRA to optimize RuleRegExp.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11677) Access to opHandleSet in HiveSession should be synchronized

2015-08-27 Thread Mohit Sabharwal (JIRA)
Mohit Sabharwal created HIVE-11677:
--

 Summary: Access to opHandleSet in HiveSession should be 
synchronized
 Key: HIVE-11677
 URL: https://issues.apache.org/jira/browse/HIVE-11677
 Project: Hive
  Issue Type: Bug
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal


In the scenario where multiple threads share the same session, reading/writing 
to HiveSessionImpl.opHandleSet can lead to a race condition. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11678) Add AggregateProjectMergeRule

2015-08-27 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-11678:
---

 Summary: Add AggregateProjectMergeRule
 Key: HIVE-11678
 URL: https://issues.apache.org/jira/browse/HIVE-11678
 Project: Hive
  Issue Type: New Feature
  Components: CBO, Logical Optimizer
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


This will help to get rid of extra projects on top of Aggregation, thus 
compacting query plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11665) ORC StringDictionaryReader should not used Chunked buffers

2015-08-27 Thread Gopal V (JIRA)
Gopal V created HIVE-11665:
--

 Summary: ORC StringDictionaryReader should not used Chunked buffers
 Key: HIVE-11665
 URL: https://issues.apache.org/jira/browse/HIVE-11665
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 1.3.0, 2.0.0
Reporter: Gopal V
Assignee: Prasanth Jayachandran


ORC String Dictionary Reader is slow due to the chunking of the input stream.

{code}
 private void readDictionaryStream(InStream in) throws IOException {
  if (in != null) { // Guard against empty dictionary stream.
if (in.available()  0) {
  dictionaryBuffer = new DynamicByteArray(64, in.available());
  dictionaryBuffer.readAll(in);
  // Since its start of strip invalidate the cache.
  dictionaryBufferInBytesCache = null;
}
in.close();
  } else {
dictionaryBuffer = null;
  }
}
{code}

The fact that the data is chunked offers no advantage for the read-path where 
there is no grow() operation for memory savings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11672) Hive Streaming API handles bucketing incorrectly

2015-08-27 Thread Raj Bains (JIRA)
Raj Bains created HIVE-11672:


 Summary: Hive Streaming API handles bucketing incorrectly
 Key: HIVE-11672
 URL: https://issues.apache.org/jira/browse/HIVE-11672
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Raj Bains
Assignee: Roshan Naik
Priority: Critical
 Fix For: 1.2.2


Hive Streaming API allows the clients to get a random bucket and then insert 
data into it. However, this leads to incorrect bucketing as Hive expects data 
to be distributed into buckets based on a hash function applied to bucket key. 
The data is inserted randomly by the clients right now. They have no way of
# Knowing what bucket a row (tuple) belongs to
# Asking for a specific bucket

There are optimization such as Sort Merge Join and Bucket Map Join that rely on 
the data being correctly distributed across buckets and these will cause 
incorrect read results if the data is not distributed correctly.

There are two obvious design choices
# Hive Streaming API should fix this internally by distributing the data 
correctly
# Hive Streaming API should expose data distribution scheme to the clients and 
allow them to distribute the data correctly

The first option will mean every client thread will write to many buckets, 
causing many small files in each bucket and too many connections open. this 
does not seem feasible. The second option pushes more functionality into the 
client of the Hive Streaming API, but can maintain high throughput and write 
good sized ORC files. This option seems preferable.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11673) LLAP: TestLlapTaskSchedulerService is flaky

2015-08-27 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-11673:
---

 Summary: LLAP: TestLlapTaskSchedulerService is flaky
 Key: HIVE-11673
 URL: https://issues.apache.org/jira/browse/HIVE-11673
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Siddharth Seth


{noformat}
java.lang.Exception: test timed out after 1 milliseconds
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at 
org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService$TaskExecutorServiceForTest$InternalCompletionListenerForTest.awaitCompletion(TestTaskExecutorService.java:244)
at 
org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService$TaskExecutorServiceForTest$InternalCompletionListenerForTest.access$000(TestTaskExecutorService.java:208)
at 
org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption(TestTaskExecutorService.java:168)
{noformat}

Cannot repro locally. See HIVE-11642



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11674) LLAP: Update tez version to fix build

2015-08-27 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-11674:


 Summary: LLAP: Update tez version to fix build
 Key: HIVE-11674
 URL: https://issues.apache.org/jira/browse/HIVE-11674
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


With tez version 0.8.0-SNAPSHOT the llap branch build is broken. Throws the 
following exception 
{code}
work/hive/hive-git/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java:[60,41]
 package org.apache.tez.serviceplugins.api does not exist
[ERROR] 
/work/hive/hive-git/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java:[61,41]
 package org.apache.tez.serviceplugins.api does not exist
[ERROR] 
/work/hive/hive-git/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java:[62,41]
 package org.apache.tez.serviceplugins.api does not exist
[ERROR] 
/work/hive/hive-git/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java:[63,41]
 package org.apache.tez.serviceplugins.api does not exist
[ERROR] 
/work/hive/hive-git/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java:[111,37]
 cannot find symbol
[ERROR] symbol:   class VertexExecutionContext
[ERROR] location: class org.apache.tez.dag.api.Vertex
[ERROR] 
/work/hive/hive-git/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java:[673,11]
 cannot find symbol
7:10
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11675) make use of file footer PPD API in ETL strategy or separate strategy

2015-08-27 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-11675:
---

 Summary: make use of file footer PPD API in ETL strategy or 
separate strategy
 Key: HIVE-11675
 URL: https://issues.apache.org/jira/browse/HIVE-11675
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


Need to take a look at the best flow. It won't be much different if we do 
filtering metastore call for each partition. So perhaps we'd need the custom 
sync point/batching after all.
Or we can make it opportunistic and not fetch any footers unless it can be 
pushed down to metastore or fetched from local cache, that way the only slow 
threaded op is directory listings



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11676) implement metastore API to do file footer PPD

2015-08-27 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-11676:
---

 Summary: implement metastore API to do file footer PPD
 Key: HIVE-11676
 URL: https://issues.apache.org/jira/browse/HIVE-11676
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


Need to pass on the expression/sarg, extract column stats from footer (at write 
time?) and then apply one to the other. I may file a separate JIRA for ORC 
changes cause that is usually PITA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)