[jira] [Created] (HIVE-14784) Operation logs are disabled automatically if the parent directory does not exist.

2016-09-16 Thread Naveen Gangam (JIRA)
Naveen Gangam created HIVE-14784:


 Summary: Operation logs are disabled automatically if the parent 
directory does not exist.
 Key: HIVE-14784
 URL: https://issues.apache.org/jira/browse/HIVE-14784
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 1.1.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam


Operation logging is disabled automatically for the query if for some reason 
the parent directory (named after the hive session id) that gets created when 
the session is established gets deleted (for any reason). For ex: if the 
operation logdir is /tmp which automatically can get purged at a configured 
interval by the OS.

Running a query from that session leads to
{code}
2016-09-15 15:09:16,723 WARN org.apache.hive.service.cli.operation.Operation: 
Unable to create operation log file: 
/tmp/hive/operation_logs/b8809985-6b38-47ec-a49b-6158a67cd9fc/d35414f7-2418-426c-8489-c6f643ca4599
java.io.IOException: No such file or directory
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:1012)
at 
org.apache.hive.service.cli.operation.Operation.createOperationLog(Operation.java:195)
at 
org.apache.hive.service.cli.operation.Operation.beforeRun(Operation.java:237)
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:255)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:398)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:385)
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:271)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:490)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

{code}

This later leads to errors like (more prominent when using HUE as HUE does not 
close hive sessions and attempts to retrieve the operations logs days after 
they were created).
{code}
WARN org.apache.hive.service.cli.thrift.ThriftCLIService: Error fetching 
results: 
org.apache.hive.service.cli.HiveSQLException: Couldn't find log associated with 
operation handle: OperationHandle [opType=EXECUTE_STATEMENT, 
getHandleIdentifier()=d35414f7-2418-426c-8489-c6f643ca4599]
at 
org.apache.hive.service.cli.operation.OperationManager.getOperationLogRowSet(OperationManager.java:259)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:701)
at 
org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:451)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:676)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) 
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 51694: HIVE-14713 LDAP Authentication Provider should be covered with unit tests

2016-09-16 Thread Chaoyu Tang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51694/#review148634
---




service/pom.xml (line 165)


1. Need remove the empty spaces (in red squares)
2. Should we add the version 
${mockito-all.version}. currently the 
${mockito-all.version} in Hive is 1.9.5.



service/src/java/org/apache/hive/service/auth/ldap/ChainFilterFactory.java 
(line 46)


Need remove all the extra empty spaces (in red squares) here and also other 
places.



service/src/java/org/apache/hive/service/auth/ldap/GroupFilterFactory.java 
(line 37)


Do we really need an extra factory layer and have a factory for each filter?
In Hive, actaully each session instantiates its own 
LdapAuthenticationProviderImpl, which now contains different factories with 
each one generating only one instance of its filter.



service/src/java/org/apache/hive/service/auth/ldap/LdapSearch.java (line 62)


Is not it the LdapUtils.patternsToBaseDns(userPatterns)?



service/src/java/org/apache/hive/service/auth/ldap/LdapSearch.java (line 89)


I don't think it is necessary to cache the user/userDn and also it might be 
a potential security issue.



service/src/java/org/apache/hive/service/auth/ldap/LdapSearch.java (line 108)


can getSingleLdapName be used to enforce only one returned entry? that API 
in SearachResultHandler is never used.



service/src/java/org/apache/hive/service/auth/ldap/LdapUtils.java (line 105)


This method might throw out runtime exception such as NPE, 
IndexOutOfBoundsException, should we check the passed in parameter rdn? 
We might not run into this situation in old code, but since this line of 
code is refactored as a separate API, I think we should do the check. Same for 
the other methods like patternToBaseDn etc.



service/src/java/org/apache/hive/service/auth/ldap/LdapUtils.java (line 159)


I am not sure if there is any precedence for these configurations, but here 
it seems that the GUIDKEY/BASEDN takes precedence over DNPATTERN, which is 
different from the existing implementation and cause the behavior change.



service/src/java/org/apache/hive/service/auth/ldap/Query.java (line 122)


Will it improve the performance to set the search limit? I did not see it 
is used.


- Chaoyu Tang


On Sept. 7, 2016, 2:24 p.m., Illya Yalovyy wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51694/
> ---
> 
> (Updated Sept. 7, 2016, 2:24 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Chaoyu Tang, Naveen Gangam, and 
> Szehon Ho.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently LdapAuthenticationProviderImpl class is not covered with unit 
> tests. To make this class testable some minor refactoring will be required.
> 
> 
> Diffs
> -
> 
>   service/pom.xml ecea719 
>   
> service/src/java/org/apache/hive/service/auth/LdapAuthenticationProviderImpl.java
>  efd5393 
>   service/src/java/org/apache/hive/service/auth/ldap/ChainFilterFactory.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/auth/ldap/CustomQueryFilterFactory.java
>  PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/DirSearch.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/DirSearchFactory.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/Filter.java PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/FilterFactory.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/GroupFilterFactory.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/LdapSearch.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/LdapSearchFactory.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/LdapUtils.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/Query.java PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/QueryFactory.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/SearchResultHandler.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/UserFilterFactory.java 
> PRE-CREATION 
>   
> 

[jira] [Created] (HIVE-14783) bucketing column should be part of sorting for delete/update operation when spdo is on

2016-09-16 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-14783:
---

 Summary: bucketing column should be part of sorting for 
delete/update operation when spdo is on
 Key: HIVE-14783
 URL: https://issues.apache.org/jira/browse/HIVE-14783
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Transactions
Affects Versions: 2.2.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14782) Remove mapreduce_stack_trace_hadoop20.q as we no longer have hadoop20

2016-09-16 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-14782:


 Summary: Remove mapreduce_stack_trace_hadoop20.q as we no longer 
have hadoop20
 Key: HIVE-14782
 URL: https://issues.apache.org/jira/browse/HIVE-14782
 Project: Hive
  Issue Type: Sub-task
  Components: Test
Affects Versions: 2.2.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


mapreduce_stack_trace_hadoop20.q runs as an isolated test which is no longer 
required as we no longer support hadoop 0.20.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14781) ptest killall command does not work

2016-09-16 Thread Siddharth Seth (JIRA)
Siddharth Seth created HIVE-14781:
-

 Summary: ptest killall command does not work
 Key: HIVE-14781
 URL: https://issues.apache.org/jira/browse/HIVE-14781
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth


killall -f is not a valid flag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


ptest result structure change (batched unit tests)

2016-09-16 Thread Siddharth Seth
HIVE-14540 changes ptest to batch unit tests together (instead of invoking
mvn test on individual test classes).
As a result, output for multiple tests is a common directory - similar to
what happens for batched q tests.

To identify the relevant directory, lookup the consoleOuput for the run.

e.g.
To find TestDummy
Look for UTBatch.*TestDummy in consoleOutput.

example output.
[name=829_UTBatch_itests__qtest_8_tests, id=829, moduleName=itests/qtest,
isParallel=true, testList=[TestContribNegativeCliDriver,
TestHBaseNegativeCliDriver, TestCompareCliDriver,
TestEncryptedHDFSCliDriver, TestPerfCliDriver, TestContribCliDriver,
TestParseNegativeDriver, TestDummy]]

829_UTBatch would be the directory where the output for this test exists.


[jira] [Created] (HIVE-14780) Determine unit tests to batch together based on previous run info

2016-09-16 Thread Siddharth Seth (JIRA)
Siddharth Seth created HIVE-14780:
-

 Summary: Determine unit tests to batch together based on previous 
run info
 Key: HIVE-14780
 URL: https://issues.apache.org/jira/browse/HIVE-14780
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth


Post HIVE-14540 - batch unit tests together with a time target, to avoid skew.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14779) make DbTxnManager.HeartbeaterThread a daemon

2016-09-16 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-14779:
-

 Summary: make DbTxnManager.HeartbeaterThread a daemon
 Key: HIVE-14779
 URL: https://issues.apache.org/jira/browse/HIVE-14779
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.1.0, 1.3.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
Priority: Minor


setDaemon(true);

make heartbeaterThreadPoolSize static 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14778) document threading model of Streaming API

2016-09-16 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-14778:
-

 Summary: document threading model of Streaming API
 Key: HIVE-14778
 URL: https://issues.apache.org/jira/browse/HIVE-14778
 Project: Hive
  Issue Type: Bug
  Components: HCatalog, Transactions
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


The model is not obvious and needs to be documented properly.

A StreamingConnection internally maintains 2 MetaStoreClient objects (each has 
1 Thrift client for actual RPC). Let's call them "primary" and "heartbeat". 
Each TransactionBatch created from a given StreamingConnection, gets a 
reference to both of these MetaStoreClients. 
So the model is that there is at most 1 outstanding (not closed) 
TransactionBatch for any given StreamingConnection and for any given 
TransactionBatch there can be at most 2 threads accessing it concurrently. 1 
thread calling TransactionBatch.heartbeat() (and nothing else) and the other 
calling all other methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14777) Add support of Spark-2.0.0 in Hive-2.X.X

2016-09-16 Thread Oleksiy Sayankin (JIRA)
Oleksiy Sayankin created HIVE-14777:
---

 Summary: Add support of Spark-2.0.0 in Hive-2.X.X
 Key: HIVE-14777
 URL: https://issues.apache.org/jira/browse/HIVE-14777
 Project: Hive
  Issue Type: Wish
Reporter: Oleksiy Sayankin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14776) Skip 'distcp' call when copying data from HDSF to S3

2016-09-16 Thread JIRA
Sergio Peña created HIVE-14776:
--

 Summary: Skip 'distcp' call when copying data from HDSF to S3
 Key: HIVE-14776
 URL: https://issues.apache.org/jira/browse/HIVE-14776
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Reporter: Sergio Peña
Assignee: Sergio Peña


Hive uses 'distcp' to copy files in parallel between HDFS encryption zones when 
the {{hive.exec.copyfile.maxsize}} threshold is lower than the file to copy. 
This 'distcp' is also executed when copying to S3, but it is causing slower 
copies.

We should not invoke distcp when copying to blobstore systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14775) Investigate IOException usage in Metrics APIs

2016-09-16 Thread Barna Zsombor Klara (JIRA)
Barna Zsombor Klara created HIVE-14775:
--

 Summary: Investigate IOException usage in Metrics APIs
 Key: HIVE-14775
 URL: https://issues.apache.org/jira/browse/HIVE-14775
 Project: Hive
  Issue Type: Sub-task
Reporter: Barna Zsombor Klara
Assignee: Barna Zsombor Klara


A large number of metrics APIs seems to declare to throw IOExceptions 
needlessly. (incrementCounter, decrementCounter etc.)
This is not only misleading but it fills up the code with unnecessary catch 
blocks never to be reached.

We should investigate if these exceptions are thrown at all, and remove them if 
 it is truly unused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14774) Canceling query using Ctrl-C in beeline might lead to stale locks

2016-09-16 Thread Chaoyu Tang (JIRA)
Chaoyu Tang created HIVE-14774:
--

 Summary: Canceling query using Ctrl-C in beeline might lead to 
stale locks
 Key: HIVE-14774
 URL: https://issues.apache.org/jira/browse/HIVE-14774
 Project: Hive
  Issue Type: Bug
  Components: Locking
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


Terminating a running query using Ctrl-C in Beeline might lead to stale locks 
since the process running the query might still be able to acquire the locks 
but fail to release them after the query terminate abnormally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14773) NPE aggregating column statistics for date column in partitioned table

2016-09-16 Thread Nita Dembla (JIRA)
Nita Dembla created HIVE-14773:
--

 Summary: NPE aggregating column statistics for date column in 
partitioned table
 Key: HIVE-14773
 URL: https://issues.apache.org/jira/browse/HIVE-14773
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.2.0
Reporter: Nita Dembla






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14772) NPE when MSCK REPAIR

2016-09-16 Thread Per Ullberg (JIRA)
Per Ullberg created HIVE-14772:
--

 Summary: NPE when MSCK REPAIR
 Key: HIVE-14772
 URL: https://issues.apache.org/jira/browse/HIVE-14772
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.1.0
 Environment: HiveRunner on OSX Yosemite
Reporter: Per Ullberg


HiveMetaStoreChecker throws NullPointerException when doing a MSCK REPAIR TABLE.

The bug is here:

{code}
...
18  package org.apache.hadoop.hive.ql.metadata;
...
58  public class HiveMetaStoreChecker {
...
408if (!directoryFound) {
409 allDirs.put(path, null);
410}
...
{code}

allDirs is a ConcurrentHashMap and those does not allow either key or value to 
be null.

I found the bug while trying to port https://github.com/klarna/HiveRunner to 
Hive 2.1.0

Implemented explicit test case that exposes the bug here: 
https://github.com/klarna/HiveRunner/blob/hive-2.1.0-NPE-at-msck-repair/src/test/java/com/klarna/hiverunner/MSCKRepairNPE.java

Reproduce by cloning branch 
https://github.com/klarna/HiveRunner/tree/hive-2.1.0-NPE-at-msck-repair
and run 
{code} mvn -Dtest=MSCKRepairNPE clean test{code}

( Does not work on windows :( )

Looks like this email thread talks about the same issue: 
http://user.hive.apache.narkive.com/ETOpbKk5/msck-repair-table-and-hive-v2-1-0




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)