[jira] [Created] (HIVE-27202) Disable flaky test TestJdbcWithMiniLlapRow#testComplexQuery

2023-03-31 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-27202:
--

 Summary: Disable flaky test 
TestJdbcWithMiniLlapRow#testComplexQuery
 Key: HIVE-27202
 URL: https://issues.apache.org/jira/browse/HIVE-27202
 Project: Hive
  Issue Type: Test
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


TestJdbcWithMiniLlapRow#testComplexQuery is flaky and should be disabled.

 

http://ci.hive.apache.org/job/hive-flaky-check/634/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27175) Fix TestJdbcDriver2#testSelectExecAsync2

2023-03-25 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-27175:
--

 Summary: Fix TestJdbcDriver2#testSelectExecAsync2
 Key: HIVE-27175
 URL: https://issues.apache.org/jira/browse/HIVE-27175
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


TestJdbcDriver2#testSelectExecAsync2 is failing on branch-3. We need to 
backport HIVE-20897 to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27171) Backport HIVE-20680 to branch-3

2023-03-24 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-27171:
--

 Summary: Backport HIVE-20680 to branch-3
 Key: HIVE-27171
 URL: https://issues.apache.org/jira/browse/HIVE-27171
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


We need to backport HIVE-26836 to fix the 
TestReplicationScenariosAcrossInstances on branch-3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27154) Fix testBootstrapReplLoadRetryAfterFailureForPartitions

2023-03-19 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-27154:
--

 Summary: Fix testBootstrapReplLoadRetryAfterFailureForPartitions
 Key: HIVE-27154
 URL: https://issues.apache.org/jira/browse/HIVE-27154
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


`testBootstrapReplLoadRetryAfterFailureForPartitions` has been failing on 
branch-3

 

http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4067/12/tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27148) Disable TestJdbcGenericUDTFGetSplits

2023-03-16 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-27148:
--

 Summary: Disable TestJdbcGenericUDTFGetSplits
 Key: HIVE-27148
 URL: https://issues.apache.org/jira/browse/HIVE-27148
 Project: Hive
  Issue Type: Sub-task
  Components: Tests
Reporter: Vihang Karajgaonkar


TestJdbcGenericUDTFGetSplits is flaky and intermittently fails.

http://ci.hive.apache.org/job/hive-flaky-check/614/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27146) Re-enable orc_merge*.q tests for TestMiniSparkOnYarnCliDriver

2023-03-16 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-27146:
--

 Summary: Re-enable orc_merge*.q tests for 
TestMiniSparkOnYarnCliDriver
 Key: HIVE-27146
 URL: https://issues.apache.org/jira/browse/HIVE-27146
 Project: Hive
  Issue Type: Test
Reporter: Vihang Karajgaonkar


It was found that the q.out file for these tests fail with a diff in the 
replication factor of the files. The tests only fail on the CI job so it is 
possible that it is due to some test environment issues. The tests also fail on 
3.1.3 release.

E.g orc_merge4.q fails with the error. Similarly the other tests fail with the 
same difference in replication factor.
{code:java}
40c40
< -rw-r--r--   1 ### USER ### ### GROUP ###   2530 ### HDFS DATE ### 
hdfs://### HDFS PATH ###
---
> -rw-r--r--   3 ### USER ### ### GROUP ###   2530 ### HDFS DATE ### 
> hdfs://### HDFS PATH ###
66c66
< -rw-r--r--   1 ### USER ### ### GROUP ###   2530 ### HDFS DATE ### 
hdfs://### HDFS PATH ###
---
> -rw-r--r--   3 ### USER ### ### GROUP ###   2530 ### HDFS DATE ### 
> hdfs://### HDFS PATH ###
68c68
< -rw-r--r--   1 ### USER ### ### GROUP ###   2530 ### HDFS DATE ### 
hdfs://### HDFS PATH ###
---
> -rw-r--r--   3 ### USER ### ### GROUP ###   2530 ### HDFS DATE ### 
> hdfs://### HDFS PATH ###
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27092) Reenable flaky test TestRpc

2023-02-17 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-27092:
--

 Summary: Reenable flaky test TestRpc
 Key: HIVE-27092
 URL: https://issues.apache.org/jira/browse/HIVE-27092
 Project: Hive
  Issue Type: Test
Reporter: Vihang Karajgaonkar






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27087) Fix TestMiniSparkOnYarnCliDriver test failures on branch-3

2023-02-16 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-27087:
--

 Summary: Fix TestMiniSparkOnYarnCliDriver test failures on branch-3
 Key: HIVE-27087
 URL: https://issues.apache.org/jira/browse/HIVE-27087
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


TestMiniSparkOnYarnCliDriver are failing with the error below

[ERROR] 2023-02-16 14:13:08.991 [Driver] SparkContext - Error initializing 
SparkContext.
java.lang.RuntimeException: java.lang.NoSuchFieldException: 
DEFAULT_TINY_CACHE_SIZE
at 
org.apache.spark.network.util.NettyUtils.getPrivateStaticField(NettyUtils.java:131)
 ~[spark-network-common_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.network.util.NettyUtils.createPooledByteBufAllocator(NettyUtils.java:118)
 ~[spark-network-common_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.network.server.TransportServer.init(TransportServer.java:94) 
~[spark-network-common_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.network.server.TransportServer.(TransportServer.java:73) 
~[spark-network-common_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.network.TransportContext.createServer(TransportContext.java:114)
 ~[spark-network-common_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rpc.netty.NettyRpcEnv.startServer(NettyRpcEnv.scala:119) 
~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.rpc.netty.NettyRpcEnvFactory$$anonfun$4.apply(NettyRpcEnv.scala:465)
 ~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.rpc.netty.NettyRpcEnvFactory$$anonfun$4.apply(NettyRpcEnv.scala:464)
 ~[spark-core_2.11-2.3.0.jar:2.3.0]
at 
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2271)
 ~[spark-core_2.11-2.3.0.jar:2.3.0]
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) 
~[scala-library-2.11.8.jar:?]
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2263) 
~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:469) 
~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:57) 
~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:249) 
~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175) 
~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:256) 
[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.SparkContext.(SparkContext.scala:423) 
[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58) 
[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:161) 
[hive-exec-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:536) 
[hive-exec-3.2.0-SNAPSHOT.jar:3.2.0-SNAPSHOT]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_322]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_322]


The root cause of the problem is that we upgrade the netty library from 
4.1.17.Final to 4.1.69.Final. The upgraded library does not have 
`DEFAULT_TINY_CACHE_SIZE` field 
[here|https://github.com/netty/netty/blob/netty-4.1.51.Final/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L46]
 which was removed in 4.1.52.Final



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27062) Disable flaky test TestRpc#testClientTimeout

2023-02-09 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-27062:
--

 Summary: Disable flaky test TestRpc#testClientTimeout
 Key: HIVE-27062
 URL: https://issues.apache.org/jira/browse/HIVE-27062
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


TestRpc#testClientTimeout is flaky in branch-3. I don't see this test in the 
main branch, so I think we should disable this test to make sure branch-3 is 
green.

Failing run: 
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-3989/6/tests/

Fails with the stack trace:

java.lang.AssertionError: Client should have failed to connect to server.
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.hive.spark.client.rpc.TestRpc.testClientTimeout(TestRpc.java:308)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)

Passing run (on same commit):
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-3989/5/tests/

In my opinion the test is not deterministic because it makes some timing 
assumptions IIUC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27009) Support pluggable user token provider HiveMetastoreClient

2023-01-31 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-27009:
--

 Summary: Support pluggable user token provider HiveMetastoreClient
 Key: HIVE-27009
 URL: https://issues.apache.org/jira/browse/HIVE-27009
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar


In HTTP mode, the HiveMetastoreClient can add a token based on the environment 
variable HMS_JWT. However, this approach is not very flexible because 
environment variables cannot be changed once set without restarting the JVM. It 
would be good to have a pluggable interface called 
HiveMetastoreUserTokenProvider which can provide token specific to the actual 
user session which is being used to connect to the HiveMetastore. If the user 
token provider is not available, we can fall back to using the environment 
variable HMS_JWT to keep backwards compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27001) Backport HIVE-26633 to branch-3

2023-01-30 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-27001:
--

 Summary: Backport HIVE-26633 to branch-3
 Key: HIVE-27001
 URL: https://issues.apache.org/jira/browse/HIVE-27001
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Reporter: Vihang Karajgaonkar


HIVE-26633 fixes the maximum response size in metastore http mode. We should 
backport this to branch-3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26949) Backport HIVE-26071 to branch-3

2023-01-16 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-26949:
--

 Summary: Backport HIVE-26071 to branch-3
 Key: HIVE-26949
 URL: https://issues.apache.org/jira/browse/HIVE-26949
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Creating this ticket to backport HIVE-26071 to branch-3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26948) Backport HIVE-21456 to branch-3

2023-01-16 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-26948:
--

 Summary: Backport HIVE-21456 to branch-3
 Key: HIVE-26948
 URL: https://issues.apache.org/jira/browse/HIVE-26948
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Standalone Metastore
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


HIVE-21456 adds support to connect to Hive metastore over http transport. This 
is a very useful feature especially in cloud based environments. Creating this 
ticket to backport it to branch-3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-25796) Allow metastore clients to fetch remaining events if some of the events are cleaned up

2021-12-10 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-25796:
--

 Summary: Allow metastore clients to fetch remaining events if some 
of the events are cleaned up
 Key: HIVE-25796
 URL: https://issues.apache.org/jira/browse/HIVE-25796
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


This is the code snippet from HiveMetastoreClient.java's getNextNotification 
method

{noformat}
  for (NotificationEvent e : rsp.getEvents()) {
LOG.debug("Got event with id : {}", e.getEventId());
if (e.getEventId() != nextEventId) {
  if (e.getEventId() == prevEventId) {
LOG.error("NOTIFICATION_LOG table has multiple events with the same 
event Id {}. " +
"Something went wrong when inserting notification events.  
Bootstrap the system " +
"again to get back teh consistent replicated state.", 
prevEventId);
throw new 
IllegalStateException(REPL_EVENTS_WITH_DUPLICATE_ID_IN_METASTORE);
  } else {
LOG.error("Requested events are found missing in NOTIFICATION_LOG 
table. Expected: {}, Actual: {}. "
+ "Probably, cleaner would've cleaned it up. "
+ "Try setting higher value for 
hive.metastore.event.db.listener.timetolive. "
+ "Also, bootstrap the system again to get back the 
consistent replicated state.",
nextEventId, e.getEventId());
throw new IllegalStateException(REPL_EVENTS_MISSING_IN_METASTORE);
  }
}
{noformat}

Consider the case when a client which caches a event id and tries to fetch the 
next events since the eventid after long time. In this case, it is possible 
that Metastore has cleaned up the events because they were more than 24 hrs 
old. In such a case, this API throws an exception. It is possible that client 
does not care if the events are not in sequence and hence this exception should 
be thrown optionally depending on what the client wants.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25479) Browser SSO auth may fail intermittently on chrome browser in virtual environments

2021-08-24 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-25479:
--

 Summary: Browser SSO auth may fail intermittently on chrome 
browser in virtual environments
 Key: HIVE-25479
 URL: https://issues.apache.org/jira/browse/HIVE-25479
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


When browser based SSO is enabled the Hive JDBC driver might miss the POST 
requests coming from the browser which provide the one-time token issued by 
HS2s after the SAML flow completes. The issue was observed mostly in virtual 
environments on Windows.

The issue seems to be that when the driver binds to a port even though the port 
is in LISTEN state, if the browser issues posts request on the port before it 
goes into ACCEPT state the result is non-deterministic. On native OSes we 
observed that the connection is buffered and is received by the driver when it 
begins accepting the connections. In case of VMs it is observed that even 
though the connection is buffered and presented when the port goes into ACCEPT 
mode, the payload of the request or the connection itself is lost. This race 
condition causes the driver to wait for the browser until it timesout and the 
browser keeps waiting for a response from the driver.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25281) Add optional fields to enable returning filemetadata for tables and partitions

2021-06-23 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-25281:
--

 Summary: Add optional fields to enable returning filemetadata for 
tables and partitions
 Key: HIVE-25281
 URL: https://issues.apache.org/jira/browse/HIVE-25281
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


The hive_metastore.thrift interface defines the fields for Table and Partition 
objects. Certain SQL engines like Impala use Table and Partition from the HMS 
and then augment it to include additional metadata useful for the engine itself 
e.g file metadata. It would be good to add support for such fields in the 
thrift definition itself. These fields currently will be optional fields so 
that HMS itself doesn't really need to support it for now, but this can be 
supported in future depending on which SQL engine is talking to HMS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24987) hive.metastore.disallow.incompatible.col.type.changes is too restrictive for some storage formats

2021-04-07 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-24987:
--

 Summary: hive.metastore.disallow.incompatible.col.type.changes is 
too restrictive for some storage formats
 Key: HIVE-24987
 URL: https://issues.apache.org/jira/browse/HIVE-24987
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar


Currently when {{hive.metastore.disallow.incompatible.col.type.changes}} is set 
to true it disallows any schema changes which are deemed as backwards 
incompatible e.g dropping a column of a table. While this may be a correct 
thing to do for Parquet or Orc tables, it is too restrictive for storage 
formats like Kudu. 

Currently, for Kudu tables, Impala supports dropping a column. But if we set 
this config to true metastore disallows changing the schema of the metastore 
table. I am assuming this would be problematic for Iceberg tables too which 
supports such schema changes.

The proposal is to have a new configuration which provided a exclusion list of 
the table fileformat where this check will be skipped. Currently, we will only 
include Kudu tables to skip this check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24899) create database event does not include managedLocation URI

2021-03-17 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-24899:
--

 Summary: create database event does not include managedLocation URI
 Key: HIVE-24899
 URL: https://issues.apache.org/jira/browse/HIVE-24899
 Project: Hive
  Issue Type: Bug
Reporter: Vihang Karajgaonkar


I noticed that when a database is created, Metastore generated Notification 
event for the database doesn't have the managed location set. If I do a 
getDatabase call later, metastore returns the managedLocationUri. This seems 
like a inconsistency and it would be good if the generated event includes the 
managedLocationUri as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24741) get_partitions_ps_with_auth performance can be improved when it is requesting all the partitions

2021-02-04 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-24741:
--

 Summary: get_partitions_ps_with_auth performance can be improved 
when it is requesting all the partitions
 Key: HIVE-24741
 URL: https://issues.apache.org/jira/browse/HIVE-24741
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


{{get_partitions_ps_with_auth}} API does not support DirectSQL. I have seen 
some large production use-cases where this API (specifically from Spark 
applications) is used heavily to request for all the partitions of a table. 
This performance of this API when requesting all the partitions of the table 
can be signficantly improved (~4 times from a realworld large workload usecase) 
if we forward this API call to a directSQL enabled API. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24732) CachedStore does not return the fields which are auto-generated by the database

2021-02-03 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-24732:
--

 Summary: CachedStore does not return the fields which are 
auto-generated by the database
 Key: HIVE-24732
 URL: https://issues.apache.org/jira/browse/HIVE-24732
 Project: Hive
  Issue Type: Bug
Reporter: Vihang Karajgaonkar


It looks like CachedStore directly caches the thrift objects as they are sent 
by the client. The general pattern seems to be similar to below:

{noformat}
  @Override public void createTable(Table tbl) throws InvalidObjectException, 
MetaException {
rawStore.createTable(tbl);
// in case of event based cache update, cache will be updated during commit.
if (canUseEvents) {
  return;
}
String catName = normalizeIdentifier(tbl.getCatName());
String dbName = normalizeIdentifier(tbl.getDbName());
String tblName = normalizeIdentifier(tbl.getTableName());
if (!shouldCacheTable(catName, dbName, tblName)) {
  return;
}
validateTableType(tbl);
// TODO in case of CachedStore we cache directly the object send by the 
client.
// this is problematic since certain fields of the object are populated
// after it is persisted. The cache will not be able to serve those fields 
correctly.
sharedCache.addTableToCache(catName, dbName, tblName, tbl);
  }
{noformat}

The problem here is that the table id is generated when the table is persisted 
in the database. The cachedStore will cache the Table object whose id will be 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24619) Exclude unnecessary dependencies from pac4j

2021-01-11 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-24619:
--

 Summary: Exclude unnecessary dependencies from pac4j
 Key: HIVE-24619
 URL: https://issues.apache.org/jira/browse/HIVE-24619
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


HIVE-24543 introduces pac4j dependency which pulls in multiple other 
dependencies. It would be great to exclude as many dependencies as possible. 
This JIRA is used to track this effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24594) results_cache_invalidation2.q is flaky

2021-01-06 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-24594:
--

 Summary: results_cache_invalidation2.q is flaky
 Key: HIVE-24594
 URL: https://issues.apache.org/jira/browse/HIVE-24594
 Project: Hive
  Issue Type: Test
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


results_cache_invalidation2.q failed for me couple of times on a unrelated PR. 
Here is the error log.

{noformat}
---
Test set: org.apache.hadoop.hive.cli.split19.TestMiniLlapLocalCliDriver
---
Tests run: 90, Failures: 1, Errors: 0, Skipped: 6, Time elapsed: 450.54 s <<< 
FAILURE! - in org.apache.hadoop.hive.cli.split19.TestMiniLlapLocalCliDriver
org.apache.hadoop.hive.cli.split19.TestMiniLlapLocalCliDriver.testCliDriver[results_cache_invalidation2]
  Time elapsed: 15.087 s  <<< FAILURE!
java.lang.AssertionError:
Client Execution succeeded but contained differences (error code = 1) after 
executing results_cache_invalidation2.q ^M
266a267
>  A masked pattern was here 
271a273
>  A masked pattern was here 
273c275,276
<   Stage-0 is a root stage
---
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
275a279,365
>   Stage: Stage-1
> Tez
>  A masked pattern was here 
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE)
>  A masked pattern was here 
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: tab1
>   filterExpr: key is not null (type: boolean)
>   Statistics: Num rows: 1500 Data size: 130500 Basic stats: 
> COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: key is not null (type: boolean)
> Statistics: Num rows: 1500 Data size: 130500 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Select Operator
>   expressions: key (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1500 Data size: 130500 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Reduce Output Operator
> key expressions: _col0 (type: string)
> null sort order: z
> sort order: +
> Map-reduce partition columns: _col0 (type: string)
> Statistics: Num rows: 1500 Data size: 130500 Basic 
> stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map 4
> Map Operator Tree:
> TableScan
>   alias: tab2
>   filterExpr: key is not null (type: boolean)
>   Statistics: Num rows: 500 Data size: 43500 Basic stats: 
> COMPLETE Column stats: COMPLETE
>   Fil^M
{noformat}

The test works for me locally. In fact the same PR had a successful run of this 
test in a previous commit. I think we should disable this and re-enable it 
after fixing the flakiness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24562) Deflake TestHivePrivilegeObjectOwnerNameAndType

2020-12-22 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-24562:
--

 Summary: Deflake TestHivePrivilegeObjectOwnerNameAndType
 Key: HIVE-24562
 URL: https://issues.apache.org/jira/browse/HIVE-24562
 Project: Hive
  Issue Type: Test
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


One of my unrelated PRs fails this test 
{{TestHivePrivilegeObjectOwnerNameAndType}}. The exception which I see in the 
logs is below:

{noformat}
Caused by: ERROR 42X05: Table/View 'TXN_LOCK_TBL' does not exist.
at org.apache.derby.iapi.error.StandardException.newException(Unknown 
Source)
at org.apache.derby.iapi.error.StandardException.newException(Unknown 
Source)
at 
org.apache.derby.impl.sql.compile.LockTableNode.bindStatement(Unknown Source)
at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
at 
org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
 Source)
... 73 more
)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.openTxns(TxnHandler.java:651)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.open_txns(HiveMetaStore.java:8301)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at com.sun.proxy.$Proxy46.open_txns(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openTxnsIntr(HiveMetaStoreClient.java:3634)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openTxn(HiveMetaStoreClient.java:3595)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:218)
at com.sun.proxy.$Proxy47.openTxn(Unknown Source)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.openTxn(DbTxnManager.java:243)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.openTxn(DbTxnManager.java:227)
at org.apache.hadoop.hive.ql.Compiler.openTransaction(Compiler.java:268)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:215)

at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:178)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:150)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:137)
at 
org.apache.hadoop.hive.ql.security.authorization.plugin.TestHivePrivilegeObjectOwnerNameAndType.runCmd(TestHivePrivilegeObjectOwnerNameAndType.java:86)
at 
org.apache.hadoop.hive.ql.security.authorization.plugin.TestHivePrivilegeObjectOwnerNameAndType.beforeTest(TestHivePrivilegeObjectOwnerNameAndType.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
at 
org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
at 

[jira] [Created] (HIVE-24561) Deflake TestCachedStoreUpdateUsingEvents

2020-12-22 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-24561:
--

 Summary: Deflake TestCachedStoreUpdateUsingEvents
 Key: HIVE-24561
 URL: https://issues.apache.org/jira/browse/HIVE-24561
 Project: Hive
  Issue Type: Test
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


TestCachedStoreUpdateUsingEvents seems to use "file:/tmp" as the table and 
database directory. The cleanUp method will clean all the sub-directories 
directories in /tmp which can be error prone.

Also noticed that I see a lot NPEs from {{SharedCache#getMemorySizeEstimator}} 
because the {{sizeEstimators}} field is null. We should add a null check for 
that field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2020-12-15 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-24543:
--

 Summary: Support SAML 2.0 as an authentication mechanism
 Key: HIVE-24543
 URL: https://issues.apache.org/jira/browse/HIVE-24543
 Project: Hive
  Issue Type: New Feature
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


With cloud based deployments, having a SAML 2.0 based authentication support in 
HS2 will be greatly useful in case of federated or external identity providers 
like Okta, PingIdentity or Azure AD.

This authentication mechanism can initially be only supported on http transport 
mode in HiveServer2 since the SAML 2.0 protocol is primarily designed for web 
clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23971) Cleanup unreleased method signatures in IMetastoreClient

2020-07-31 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-23971:
--

 Summary: Cleanup unreleased method signatures in IMetastoreClient
 Key: HIVE-23971
 URL: https://issues.apache.org/jira/browse/HIVE-23971
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


There are many methods in IMetastoreClient which are simply wrappers around 
another method. The code has become very intertwined and needs some cleanup. 
For instance, I see the following variations of {{getPartitionsByNames}} in 
{{IMetastoreClient}} 

{noformat}

List getPartitionsByNames(String db_name, String tbl_name, 
List part_names, boolean getColStats, String engine)

List getPartitionsByNames(String catName, String db_name, String 
tbl_name, List part_names)

List getPartitionsByNames(String catName, String db_name, String 
tbl_name, List part_names, boolean getColStats, String engine)
{noformat}

The problem seems be that every time a new field is added to the request object 
{{GetPartitionsByNamesRequest}} and new variant is introduced in 
IMetastoreClient. Many of these methods are not released yet and it would be 
good to clean them up by using the request object as method argument instead of 
individual fields. Once we release we will not be able to change the method 
signatures since we annotate IMetastoreClient as public API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23785) Database should have a unique id

2020-06-30 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-23785:
--

 Summary: Database should have a unique id
 Key: HIVE-23785
 URL: https://issues.apache.org/jira/browse/HIVE-23785
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


HIVE-20556 introduced a id field to the Table object. This is a useful 
information since a table which is dropped and recreated with the same name 
will have a different Id. If a HMS client is caching such table object, it can 
be used to determine if the table which is present on the client-side matches 
with the one in the HMS.

We can expand this idea to other HMS objects like Database, Catalogs and 
Partitions and add a new id field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23348) Add API timing metric for fire_listener_event API

2020-04-30 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-23348:
--

 Summary: Add API timing metric for fire_listener_event API
 Key: HIVE-23348
 URL: https://issues.apache.org/jira/browse/HIVE-23348
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Currently metastore does not have any metric to report the time taken to 
execute {{fire_listener_event}} API. It would be useful to add such a metric.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23116) get_partition_with_specs does not use filter spec when projection is empty

2020-03-31 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-23116:
--

 Summary: get_partition_with_specs does not use filter spec when 
projection is empty
 Key: HIVE-23116
 URL: https://issues.apache.org/jira/browse/HIVE-23116
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


The API implementation ignores the filter spec if the project spec is empty as 
seen here 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3903]

if (fieldList == null || fieldList.isEmpty()) {
  // no fields are requested. Fallback to regular getPartitions 
implementation to return all the fields
  return getPartitionsInternal(table.getCatName(), table.getDbName(), 
table.getTableName(), -1,
  true, true);
}

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23018) Provide a bulk API to fire multiple listener events

2020-03-12 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-23018:
--

 Summary: Provide a bulk API to fire multiple listener events
 Key: HIVE-23018
 URL: https://issues.apache.org/jira/browse/HIVE-23018
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Metastore provides a API to fire a listener event (currently only supports 
INSERT event). The problem with that API is that it only takes in one partition 
at a time. A typical query may insert data into multiple partitions at a time. 
In such a case query engines like HS2 or Impala will have to issue multiple 
RPCs to metastore sequentially to fire these events. This can show up as a 
slowdown to the user if the query engines do not return the prompt to the user 
until all the events are fired (In case of HS2 and Impala). It would be great 
if we have bulk API which takes in multiple partitions for a table so that 
metastore can generate many such events in one RPC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22126) hive-exec packaging should shade guava

2019-08-19 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-22126:
--

 Summary: hive-exec packaging should shade guava
 Key: HIVE-22126
 URL: https://issues.apache.org/jira/browse/HIVE-22126
 Project: Hive
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


The ql/pom.xml includes complete guava library into hive-exec.jar 
https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a 
problems for downstream clients of hive which have hive-exec.jar in their 
classpath since they are pinned to the same guava version as that of hive. 

We should shade guava classes so that other components which depend on 
hive-exec can independently use a different version of guava as needed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (HIVE-22100) Hive generates a add partition event with empty partition list

2019-08-12 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-22100:
--

 Summary: Hive generates a add partition event with empty partition 
list
 Key: HIVE-22100
 URL: https://issues.apache.org/jira/browse/HIVE-22100
 Project: Hive
  Issue Type: Bug
Reporter: Vihang Karajgaonkar


If the user issues a {{alter table  add if not exists partition 
}} and it the partition already exists, no partition is added. 
However, metastore still generates a {{ADD_PARTITION}} event with empty 
partition list. An {{alter table  drop if exists partition 
}} does not generate the {{DROP_PARTITION}} event in case the 
partition is not existing.

This behavior is inconsistent and misleading. Metastore should not generate 
such add_partition events.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-21932) IndexOutOfRangeExeption in FileChksumIterator

2019-06-27 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21932:
--

 Summary: IndexOutOfRangeExeption in FileChksumIterator
 Key: HIVE-21932
 URL: https://issues.apache.org/jira/browse/HIVE-21932
 Project: Hive
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


According to definition of {{InsertEventRequestData}} in 
{{hive_metastore.thrift}} the {{filesAddedChecksum}} is a optional field. But 
the FileChksumIterator does not handle it correctly when a client fires a 
insert event which does not have file checksums. The issue is that 
{{InsertEvent}} class initializes fileChecksums list to a empty arrayList to 
the following check will never come into play

{noformat}
result = ReplChangeManager.encodeFileUri(files.get(i), chksums != null ? 
chksums.get(i) : null,
subDirs != null ? subDirs.get(i) : null);
{noformat}

The chksums check above should include a {{!chksums.isEmpty()}} check as well 
in the above line.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21851) FireEventListenerResponse should include event id when available

2019-06-07 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21851:
--

 Summary: FireEventListenerResponse should include event id when 
available
 Key: HIVE-21851
 URL: https://issues.apache.org/jira/browse/HIVE-21851
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


The metastore API {{fire_listener_event}} gives clients the ability to fire a 
INSERT event on DML operations. However, the returned response is empty struct. 
It would be useful to sent back the event id information in the response so 
that clients can take actions based of the event id.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21617) Flaky test : TestMiniSparkOnYarnCliDriver.testCliDriver[truncate_column_buckets]

2019-04-15 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21617:
--

 Summary: Flaky test : 
TestMiniSparkOnYarnCliDriver.testCliDriver[truncate_column_buckets]
 Key: HIVE-21617
 URL: https://issues.apache.org/jira/browse/HIVE-21617
 Project: Hive
  Issue Type: Test
Reporter: Vihang Karajgaonkar


We should disable this test. Was seen in 
https://builds.apache.org/job/PreCommit-HIVE-Build/16961/testReport/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21596) HiveMetastoreClient should be able to connect to older metastore servers

2019-04-09 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21596:
--

 Summary: HiveMetastoreClient should be able to connect to older 
metastore servers
 Key: HIVE-21596
 URL: https://issues.apache.org/jira/browse/HIVE-21596
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


{{HiveMetastoreClient}} currently depends on the fact that both the client and 
server versions are the same. Additionally, since the server APIs are backwards 
compatible, it is possible for a older client (eg. 2.1.0 client version) to 
connect to a newer server (eg. 3.1.0 server version) without any issues. This 
is useful in setups where HMS is deployed in a remote mode and clients connect 
to it remotely.

It would be a good improvement if a newer version {{HiveMetastoreClient }} can 
connect to the a newer server version. When a newer client is talking to a 
older server following things can happen:

1. Client invokes a RPC to the older server which doesn't exist.
In such a case, thrift will throw {{Invalid method name}} exception which 
should be automatically be handled by the clients since each API throws 
TException.

2. Client invokes a RPC using thrift objects which has new fields added.
When a new field is added to a thrift object, the server does not deserialize 
the field in the first place since it does not know about that field id. So the 
wire-compatibility exists already. However, the client side application should 
understand the implications of such a behavior. In such cases, it would be 
better for the client to throw exception by checking the server version which 
was added in HIVE-21484

3. If the newer client has re-implemented a certain API, for example, using 
newer thrift API the client will start seeing exception {{Invalid method name}}
This can be handled on the client side by making sure that the newer 
implementation is conditional to the server version. Which means client should 
check the server version and invoke the new implementation only if the server 
version supports the newer API. (On a side note, it would be great if metastore 
also gives information of which APIs are supported for a given version)




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21595) HIVE-20556 breaks backwards compatibility

2019-04-09 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21595:
--

 Summary: HIVE-20556 breaks backwards compatibility
 Key: HIVE-21595
 URL: https://issues.apache.org/jira/browse/HIVE-21595
 Project: Hive
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


HIVE-20556 exposes a new field Table definition. However, it changes the order 
of the field ids which breaks backwards wire-compatibility. Any older client 
which is connects with HMS will not be able to deserialize table objects 
correctly since the field ids are different on client and server side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21586) Thrift generated cpp files for metastore do not compile

2019-04-05 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21586:
--

 Summary: Thrift generated cpp files for metastore do not compile
 Key: HIVE-21586
 URL: https://issues.apache.org/jira/browse/HIVE-21586
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.0
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


The way some structs like CreationMetadata, CompactionInfo, ColumnStatistics 
are defined in hive_metastore.thrift is that these structs are used before they 
are defined. While this works for the java code which is generated, it does not 
work for the generated cpp code since Thrift does not use pointer/references to 
the forward declared classes.

The easy fix for this would be to reorder the struct definitions in the 
hive_metastore.thrift so that they are always defined before they are used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21535) Re-enable TestCliDriver#vector_groupby_reduce

2019-03-28 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21535:
--

 Summary: Re-enable TestCliDriver#vector_groupby_reduce
 Key: HIVE-21535
 URL: https://issues.apache.org/jira/browse/HIVE-21535
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Vihang Karajgaonkar


The test was disabled since it was flaky in HIVE-21396. Creating this JIRA to 
re-enable the test by fixing the rounding logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21534) Flaky test : TestActivePassiveHA

2019-03-28 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21534:
--

 Summary: Flaky test : TestActivePassiveHA
 Key: HIVE-21534
 URL: https://issues.apache.org/jira/browse/HIVE-21534
 Project: Hive
  Issue Type: Test
Reporter: Vihang Karajgaonkar


Failed in 
https://issues.apache.org/jira/browse/HIVE-21484?focusedCommentId=16798031=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16798031

Works locally as well in the subsequent run of precommit later in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21484) Metastore API getVersion() should return real version

2019-03-20 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21484:
--

 Summary: Metastore API getVersion() should return real version
 Key: HIVE-21484
 URL: https://issues.apache.org/jira/browse/HIVE-21484
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Currently I see the {{getVersion}} implementation in the metastore is returning 
a hard-coded "3.0". It would be good to return the real version of the 
metastore server using {{HiveversionInfo}} so that clients can take certain 
actions based on metastore server versions.

Possible use-cases are:
1. Client A can make use of new features introduced in given Metastore version 
else stick to the base functionality.
2. This version number  can be used to do a version handshake between client 
and server in the future to improve our cross-version compatibity story.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21203) Add builder classes for the API and add a metastore client side implementation

2019-02-01 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21203:
--

 Summary: Add builder classes for the API and add a metastore 
client side implementation
 Key: HIVE-21203
 URL: https://issues.apache.org/jira/browse/HIVE-21203
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Adding builder classes for clients to use this API would make it more 
user-friendly. Also, we should add a client side API which uses this newly 
added API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21180) Fix branch-3 metastore test timeouts

2019-01-29 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21180:
--

 Summary: Fix branch-3 metastore test timeouts
 Key: HIVE-21180
 URL: https://issues.apache.org/jira/browse/HIVE-21180
 Project: Hive
  Issue Type: Test
Affects Versions: 3.2.0
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


The module name below is wrong since metastore-server doesn't exist on 
branch-3. This is most likely the reason why test batches are timing out on 
branch-3

{noformat}
2019-01-29 00:32:17,765  INFO [HostExecutor 3] 
HostExecutor.executeTestBatch:262 Drone [user=hiveptest, host=104.198.216.224, 
instance=0] executing UnitTestBatch 
[name=228_UTBatch_standalone-metastore__metastore-server_20_tests, id=228, 
moduleName=standalone-metastore/metastore-server, batchSize=20, 
isParallel=true, testList=[TestPartitionManagement, 
TestCatalogNonDefaultClient, TestCatalogOldClient, TestHiveAlterHandler, 
TestTxnHandlerNegative, TestTxnUtils, TestFilterHooks, TestRawStoreProxy, 
TestLockRequestBuilder, TestHiveMetastoreCli, TestCheckConstraint, 
TestAddPartitions, TestListPartitions, TestFunctions, TestGetTableMeta, 
TestTablesCreateDropAlterTruncate, TestRuntimeStats, TestDropPartitions, 
TestTablesList, TestUniqueConstraint]] with bash 
/home/hiveptest/104.198.216.224-hiveptest-0/scratch/hiveptest-228_UTBatch_standalone-metastore__metastore-server_20_tests.sh
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21168) Fix TestSchemaToolCatalogOps

2019-01-25 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21168:
--

 Summary: Fix TestSchemaToolCatalogOps
 Key: HIVE-21168
 URL: https://issues.apache.org/jira/browse/HIVE-21168
 Project: Hive
  Issue Type: Test
Affects Versions: 3.2.0
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


HIVE-21077 causes TestSchemaToolCatalogOps to fail on branch-3



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21155) create_time datatype for database and catalog should be int in sql server schema scripts

2019-01-23 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21155:
--

 Summary: create_time datatype for database and catalog should be 
int in sql server schema scripts
 Key: HIVE-21155
 URL: https://issues.apache.org/jira/browse/HIVE-21155
 Project: Hive
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


HIVE-21077 added create_time field to database and catalogs. However, the data 
type of this field was set to bigint instead of int like in case of other 
create_time fields for tbls and partitions. We should change it to int from 
bigint to be consistent with other create_time fields.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21131) Document some of the static util methods in MetastoreUtils

2019-01-17 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21131:
--

 Summary: Document some of the static util methods in MetastoreUtils
 Key: HIVE-21131
 URL: https://issues.apache.org/jira/browse/HIVE-21131
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


{{MetastoreUtils}} has some methods like {{makePartNameMatcher}} which could 
use some javadoc 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21128) hive.version.shortname should be 3.2 on branch-3

2019-01-16 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21128:
--

 Summary: hive.version.shortname should be 3.2 on branch-3
 Key: HIVE-21128
 URL: https://issues.apache.org/jira/browse/HIVE-21128
 Project: Hive
  Issue Type: Bug
Reporter: Vihang Karajgaonkar


Since 3.1.0 is already release, the {{hive.version.shortname}} property in the 
pom.xml of standalone-metastore should be 3.2.0. This version shortname is used 
to generate the metastore schema version and used by Schematool to initialize 
the schema using the correct script. Currently it using 3.1.0 schema init 
script instead of 3.2.0 init script



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21115) Add support for object versions in metastore

2019-01-10 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21115:
--

 Summary: Add support for object versions in metastore
 Key: HIVE-21115
 URL: https://issues.apache.org/jira/browse/HIVE-21115
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar


Currently, metastore objects are identified uniquely by their names (eg. 
catName, dbName and tblName for a table is unique). Once a table or partition 
is created it could be altered in many ways. There is no good way currently to 
identify the version of the object once it is altered. For example, suppose 
there are two clients (Hive and Impala) using the same metastore. Once some 
alter operations are performed by a client, another client which wants to do a 
alter operation has no good way to know if the object which it has is the same 
as the one stored in metastore. Metastore updates the {{transient_lastDdlTime}} 
every time there is a DDL operation on the object. However, this value cannot 
be relied for all the clients since after HIVE-1768 metastore updates the value 
only when it is not set in the parameters. It is possible that a client which 
alters the object state, does not remove the {{transient_lastDdlTime}} and 
metastore will not update it. Secondly, if there is a clock skew between 
multiple HMS instances when HMS-HA is configured, time values cannot be relied 
on to find out the sequence of alter operations on a given object.

This JIRA propose to use JDO versioning support by Datanucleus  
http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to 
generate a incrementing sequence number every time a object is altered. The 
value of this object can be set as one of the values in the parameters. The 
advantage of using Datanucleus the versioning can be done across HMS instances 
as part of the database transaction and it should work for all the supported 
databases.

In theory such a version can be used to detect if the client is presenting a 
object which is "stale" when issuing a alter request. Metastore can choose to 
reject such a alter request since the client may be caching a old version of 
the object and any alter operation on such stale object can potentially 
overwrite previous operations. However, this is can be done in a separate JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21089) Automate dbinstall tests

2019-01-04 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21089:
--

 Summary: Automate dbinstall tests
 Key: HIVE-21089
 URL: https://issues.apache.org/jira/browse/HIVE-21089
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar


When a patch makes a schema change, precommit should run the dbinstall tests to 
make sure that the db scripts are working on all the supported databases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21088) Improve usability of dbinstall tests

2019-01-04 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21088:
--

 Summary: Improve usability of dbinstall tests
 Key: HIVE-21088
 URL: https://issues.apache.org/jira/browse/HIVE-21088
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


There are nice integration tests which can be run manually for testing database 
schema changes. These tests spin up docker containers and install and upgrade 
the schema. Currently,  these tests expect that the host provides native 
support for docker daemon which is true in most cases. However, if you are 
using a lower version of macos (I tried it using 10.11), docker application 
cannot be installed and we need to install docker-toolbox instead. The issue 
with using docker-toolbox is that the docker daemon runs in a VM on the host 
which has a different IP address and hence the hardcoded {{localhost}} in the 
jdbc urls don't work. We can add a simple flag to provide the docker-machine ip 
as a  commandline arguemnt to override using localhost in the url.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21077) Database should have creation time

2018-12-28 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21077:
--

 Summary: Database should have creation time
 Key: HIVE-21077
 URL: https://issues.apache.org/jira/browse/HIVE-21077
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Currently, database do not have creation time like we have for tables and 
partitions.

{noformat}
// namespace for tables
struct Database {
  1: string name,
  2: string description,
  3: string locationUri,
  4: map parameters, // properties associated with the database
  5: optional PrincipalPrivilegeSet privileges,
  6: optional string ownerName,
  7: optional PrincipalType ownerType,
  8: optional string catalogName
}
{noformat}

Currently, without creationTime there is no way to identify if the copy of 
Database which a client has is the same as the one no the server if the dbName 
is same. Without object ids creationTimeStamp is the only way currently to 
identify uniquely a instance of metastore object. It would be good to have 
Database creation time as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21040) msck does unnecessary file listing at last level of partitions

2018-12-13 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21040:
--

 Summary: msck does unnecessary file listing at last level of 
partitions
 Key: HIVE-21040
 URL: https://issues.apache.org/jira/browse/HIVE-21040
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Here is the code snippet which is run by {{msck}} to list directories

{noformat}
final Path currentPath = pd.p;
  final int currentDepth = pd.depth;
  FileStatus[] fileStatuses = fs.listStatus(currentPath, 
FileUtils.HIDDEN_FILES_PATH_FILTER);
  // found no files under a sub-directory under table base path; it is 
possible that the table
  // is empty and hence there are no partition sub-directories created 
under base path
  if (fileStatuses.length == 0 && currentDepth > 0 && currentDepth < 
partColNames.size()) {
// since maxDepth is not yet reached, we are missing partition
// columns in currentPath
logOrThrowExceptionWithMsg(
"MSCK is missing partition columns under " + 
currentPath.toString());
  } else {
// found files under currentPath add them to the queue if it is a 
directory
for (FileStatus fileStatus : fileStatuses) {
  if (!fileStatus.isDirectory() && currentDepth < partColNames.size()) {
// found a file at depth which is less than number of partition keys
logOrThrowExceptionWithMsg(
"MSCK finds a file rather than a directory when it searches for 
"
+ fileStatus.getPath().toString());
  } else if (fileStatus.isDirectory() && currentDepth < 
partColNames.size()) {
// found a sub-directory at a depth less than number of partition 
keys
// validate if the partition directory name matches with the 
corresponding
// partition colName at currentDepth
Path nextPath = fileStatus.getPath();
String[] parts = nextPath.getName().split("=");
if (parts.length != 2) {
  logOrThrowExceptionWithMsg("Invalid partition name " + nextPath);
} else if 
(!parts[0].equalsIgnoreCase(partColNames.get(currentDepth))) {
  logOrThrowExceptionWithMsg(
  "Unexpected partition key " + parts[0] + " found at " + 
nextPath);
} else {
  // add sub-directory to the work queue if maxDepth is not yet 
reached
  pendingPaths.add(new PathDepthInfo(nextPath, currentDepth + 1));
}
  }
}
if (currentDepth == partColNames.size()) {
  return currentPath;
}
  }
{noformat}

You can see that when the {{currentDepth}} at the {{maxDepth}} it still does a 
unnecessary listing of the files. We can improve this call by checking the 
currentDepth and bailing out early.

This can improve the performance of msck command significantly especially when 
there are lot of files in each partitions on remote filesystems like S3 or ADLS



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20972) Enable TestMiniLlapLocalCliDriver.testCliDriver[cbo_limit]

2018-11-26 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-20972:
--

 Summary: Enable TestMiniLlapLocalCliDriver.testCliDriver[cbo_limit]
 Key: HIVE-20972
 URL: https://issues.apache.org/jira/browse/HIVE-20972
 Project: Hive
  Issue Type: Test
Reporter: Vihang Karajgaonkar






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20916) Fix typo in JSONCreateDatabaseMessage

2018-11-14 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-20916:
--

 Summary: Fix typo in JSONCreateDatabaseMessage
 Key: HIVE-20916
 URL: https://issues.apache.org/jira/browse/HIVE-20916
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Affects Versions: 4.0.0
Reporter: Vihang Karajgaonkar


{code}
public JSONCreateDatabaseMessage(String server, String servicePrincipal, 
Database db,
  Long timestamp) {
this.server = server;
this.servicePrincipal = servicePrincipal;
this.db = db.getName();
this.timestamp = timestamp;
try {
  this.dbJson = MessageBuilder.createDatabaseObjJson(db);
} catch (TException ex) {
  throw new IllegalArgumentException("Could not serialize Function object", 
ex);
}
checkValid();
  }
{code}

The exception message should say Database instead of Function. Also, the 
{{TestDbNotificationListener#createDatabase}} should be modified to make sure 
that the deserialized database object from the dbJson field matches with the 
original database object 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20860) Fix or disable TestMiniLlapLocalCliDriver.testCliDriver[cbo_limit]

2018-11-02 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-20860:
--

 Summary: Fix or disable 
TestMiniLlapLocalCliDriver.testCliDriver[cbo_limit]
 Key: HIVE-20860
 URL: https://issues.apache.org/jira/browse/HIVE-20860
 Project: Hive
  Issue Type: Test
Reporter: Vihang Karajgaonkar


Test failed in one of the precommit job. Looks like there is some case where 
there is additonal space in the diff

{noformat}
Error Message
Client Execution succeeded but contained differences (error code = 1) after 
executing cbo_limit.q 
11c11
<  1  4 2
---
>  1 4 2
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20799) Fix or disable TestRpc.testClientTimeout

2018-10-24 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-20799:
--

 Summary: Fix or disable TestRpc.testClientTimeout
 Key: HIVE-20799
 URL: https://issues.apache.org/jira/browse/HIVE-20799
 Project: Hive
  Issue Type: Test
Reporter: Vihang Karajgaonkar


Test failed without any code changes on master. See HIVE-20798



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20798) Test precommit job

2018-10-24 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-20798:
--

 Summary: Test precommit job
 Key: HIVE-20798
 URL: https://issues.apache.org/jira/browse/HIVE-20798
 Project: Hive
  Issue Type: Test
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Dummy patch request to test if precommit works after restart



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20740) Remove global lock in ObjectStore.setConf method

2018-10-12 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-20740:
--

 Summary: Remove global lock in ObjectStore.setConf method
 Key: HIVE-20740
 URL: https://issues.apache.org/jira/browse/HIVE-20740
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


The ObjectStore#setConf method has a global lock which can block other clients 
in concurrent workloads.

{code}
@Override
  @SuppressWarnings("nls")
  public void setConf(Configuration conf) {
// Although an instance of ObjectStore is accessed by one thread, there may
// be many threads with ObjectStore instances. So the static variables
// pmf and prop need to be protected with locks.
pmfPropLock.lock();
try {
  isInitialized = false;
  this.conf = conf;
  this.areTxnStatsSupported = MetastoreConf.getBoolVar(conf, 
ConfVars.HIVE_TXN_STATS_ENABLED);
  configureSSL(conf);
  Properties propsFromConf = getDataSourceProps(conf);
  boolean propsChanged = !propsFromConf.equals(prop);

  if (propsChanged) {
if (pmf != null){
  clearOutPmfClassLoaderCache(pmf);
  if (!forTwoMetastoreTesting) {
// close the underlying connection pool to avoid leaks
pmf.close();
  }
}
pmf = null;
prop = null;
  }

  assert(!isActiveTransaction());
  shutdown();
  // Always want to re-create pm as we don't know if it were created by the
  // most recent instance of the pmf
  pm = null;
  directSql = null;
  expressionProxy = null;
  openTrasactionCalls = 0;
  currentTransaction = null;
  transactionStatus = TXN_STATUS.NO_STATE;

  initialize(propsFromConf);

  String partitionValidationRegex =
  MetastoreConf.getVar(this.conf, 
ConfVars.PARTITION_NAME_WHITELIST_PATTERN);
  if (partitionValidationRegex != null && 
!partitionValidationRegex.isEmpty()) {
partitionValidationPattern = Pattern.compile(partitionValidationRegex);
  } else {
partitionValidationPattern = null;
  }

  // Note, if metrics have not been initialized this will return null, 
which means we aren't
  // using metrics.  Thus we should always check whether this is non-null 
before using.
  MetricRegistry registry = Metrics.getRegistry();
  if (registry != null) {
directSqlErrors = 
Metrics.getOrCreateCounter(MetricsConstants.DIRECTSQL_ERRORS);
  }

  this.batchSize = MetastoreConf.getIntVar(conf, 
ConfVars.RAWSTORE_PARTITION_BATCH_SIZE);

  if (!isInitialized) {
throw new RuntimeException(
"Unable to create persistence manager. Check dss.log for details");
  } else {
LOG.debug("Initialized ObjectStore");
  }
} finally {
  pmfPropLock.unlock();
}
  }
{code}

The {{pmfPropLock}} is a static object and it disallows any other new 
connection to HMS which is trying to instantiate ObjectStore. We should either 
remove the lock or reduce the scope of the lock so that it is held for a very 
small amount of time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20697) Some replication tests are super slow and cause batch timeouts

2018-10-05 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-20697:
--

 Summary: Some replication tests are super slow and cause batch 
timeouts
 Key: HIVE-20697
 URL: https://issues.apache.org/jira/browse/HIVE-20697
 Project: Hive
  Issue Type: Test
Reporter: Vihang Karajgaonkar


Some of these tests are taking a long time and can cause test batch timeouts 
given that we only give 40 min for a batch to complete. We should speed these 
tests up.

TestReplicationScenarios20 min
TestReplicationScenariosAcidTables  11 min
TestReplicationScenariosAcrossInstances 5 min 14 sec
TestReplicationScenariosIncrementalLoadAcidTables   20 min



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20472) mvn test failing for metastore-tool module

2018-08-27 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-20472:
--

 Summary: mvn test failing for metastore-tool module
 Key: HIVE-20472
 URL: https://issues.apache.org/jira/browse/HIVE-20472
 Project: Hive
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Fails because there are no applicable tests.

 

{code}

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.20.1:test (default-test) on 
project hive-metastore-benchmarks: No tests were executed! (Set 
-DfailIfNoTests=false to ignore this error.) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn  -rf :hive-metastore-benchmarks

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20461) Metastore unit tests do not run with directSQL enabled

2018-08-24 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-20461:
--

 Summary: Metastore unit tests do not run with directSQL enabled
 Key: HIVE-20461
 URL: https://issues.apache.org/jira/browse/HIVE-20461
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore, Test
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Some of the Metastore tests do not use directSQL by default. We should have a 
mechanism to force usage of directSQL so that we have them covered atleast.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20365) Fix warnings when regenerating thrift code

2018-08-10 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-20365:
--

 Summary: Fix warnings when regenerating thrift code
 Key: HIVE-20365
 URL: https://issues.apache.org/jira/browse/HIVE-20365
 Project: Hive
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


When you build thrift code you can see thrift warning like below.

[exec] 
[WARNING:hive/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift:2167]
 No field key specified for rqst, resulting protocol may have conflicts or not 
be backwards compatible!
 [exec]
 [exec] 
[WARNING:hive/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift:2235]
 No field key specified for o2, resulting protocol may have conflicts or not be 
backwards compatible!
 [exec]
 [exec] 
[WARNING:hive/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift:2167]
 No field key specified for rqst, resulting protocol may have conflicts or not 
be backwards compatible!
 [exec]
 [exec] 
[WARNING:hive/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift:2235]
 No field key specified for o2, resulting protocol may have conflicts or not be 
backwards compatible!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20307) Add support for filterspec

2018-08-03 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-20307:
--

 Summary: Add support for filterspec
 Key: HIVE-20307
 URL: https://issues.apache.org/jira/browse/HIVE-20307
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20308) Add support for pagination

2018-08-03 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-20308:
--

 Summary: Add support for pagination
 Key: HIVE-20308
 URL: https://issues.apache.org/jira/browse/HIVE-20308
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20306) Add the thrift API to get partially filled Partition

2018-08-03 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-20306:
--

 Summary: Add the thrift API to get partially filled Partition
 Key: HIVE-20306
 URL: https://issues.apache.org/jira/browse/HIVE-20306
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20122) Deploy and test standalone metastore for Hive 3.1

2018-07-09 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-20122:
--

 Summary: Deploy and test standalone metastore for Hive 3.1
 Key: HIVE-20122
 URL: https://issues.apache.org/jira/browse/HIVE-20122
 Project: Hive
  Issue Type: Task
Affects Versions: 3.1.0
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Creating a blocker JIRA for 3.1 so that this does not slip under radar. This 
jira tracks testing effort for standalone metastore for 3.1 release. I will 
create sub-tasks if I find any issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19999) Move precommit jobs to jdk 8

2018-06-26 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-1:
--

 Summary: Move precommit jobs to jdk 8
 Key: HIVE-1
 URL: https://issues.apache.org/jira/browse/HIVE-1
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19998) Changing batch sizes and test driver specific configuration should be open to everyone

2018-06-26 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19998:
--

 Summary: Changing batch sizes and test driver specific 
configuration should be open to everyone
 Key: HIVE-19998
 URL: https://issues.apache.org/jira/browse/HIVE-19998
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Currently, we have to change batch sizes we need to manually update profiles on 
the Ptest server. We should expose this configuration file to all the users so 
that simple commit to the branch and updating ptest server should work. We 
should remove the sensitive information from the profiles to a separate file 
and get the batch size information from the source code directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19988) Precommit jobs erroring out

2018-06-25 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19988:
--

 Summary: Precommit jobs erroring out
 Key: HIVE-19988
 URL: https://issues.apache.org/jira/browse/HIVE-19988
 Project: Hive
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


{code}
+ mvn clean package -B -DskipTests -Drat.numUnapprovedLicenses=1000 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-HIVE-Build/.m2/repository
[INFO] Scanning for projects...
[INFO] 
[INFO] 
[INFO] Building hive-ptest 3.0
[INFO] 
[INFO] Downloading from central: 
https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.pom
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 0.925 s
[INFO] Finished at: 2018-06-25T20:46:27Z
[INFO] Final Memory: 24M/1447M
[INFO] 
[ERROR] Plugin org.apache.maven.plugins:maven-clean-plugin:2.5 or one of its 
dependencies could not be resolved: Failed to read artifact descriptor for 
org.apache.maven.plugins:maven-clean-plugin:jar:2.5: Could not transfer 
artifact org.apache.maven.plugins:maven-clean-plugin:pom:2.5 from/to central 
(https://repo.maven.apache.org/maven2): Received fatal alert: protocol_version 
-> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/PluginResolutionException
+ return 1
+ ret=1
+ unpack_test_results
+ '[' -z /home/jenkins/jenkins-slave/workspace/PreCommit-HIVE-Build/hive/build 
']'
+ cd 
/home/jenkins/jenkins-slave/workspace/PreCommit-HIVE-Build/hive/build/hive/testutils/ptest2/target
jenkins-execute-build.sh: line 61: cd: 
/home/jenkins/jenkins-slave/workspace/PreCommit-HIVE-Build/hive/build/hive/testutils/ptest2/target:
 No such file or directory
+ [[ -f test-results.tar.gz ]]
+ exit 1
+ rm -f /tmp/tmp.LFKzzyYwIt
Build step 'Execute shell' marked build as failure
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files 
were found. Configuration error?
[description-setter] Description set: HIVE-19980  /   master-mr2
Finished: FAILURE
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19966) Compile error when using open-jdk

2018-06-21 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19966:
--

 Summary: Compile error when using open-jdk
 Key: HIVE-19966
 URL: https://issues.apache.org/jira/browse/HIVE-19966
 Project: Hive
  Issue Type: Bug
Reporter: Vihang Karajgaonkar


When you compile Hive using open-jdk-8 you see this error

{noformat}
hive/ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java:[72,16]
 sun.misc.Cleaner is internal proprietary API and may be removed in a future 
release
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hive-exec
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19804) msck repair should hold locks

2018-06-05 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19804:
--

 Summary: msck repair should hold locks
 Key: HIVE-19804
 URL: https://issues.apache.org/jira/browse/HIVE-19804
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar


{msck repair table }} does not hold locks. This can lead to weird 
race conditions when concurrent sessions are running on the same table.  For 
example if two sessions run msck on the same table at the same time, they both 
try to add partitions and they might both end up with failures due to 
AlreadyExistsException. Another example would be if a query is running on a 
partitioned table while some other session issues msck repair which add/drops 
the partitions, it could trigger errors during query execution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19757) hive.version.shortname should be 4.0

2018-05-31 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19757:
--

 Summary: hive.version.shortname should be 4.0
 Key: HIVE-19757
 URL: https://issues.apache.org/jira/browse/HIVE-19757
 Project: Hive
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


pom.xml still points to 
{{3.1.0}} which causes issues 
with schemaTool init scripts



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19696) Create schema scripts for Hive 3.1.0 and Hive 4.0.0

2018-05-24 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19696:
--

 Summary: Create schema scripts for Hive 3.1.0 and Hive 4.0.0
 Key: HIVE-19696
 URL: https://issues.apache.org/jira/browse/HIVE-19696
 Project: Hive
  Issue Type: Task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


metastore schema init scripts are missing for the new release after branch was 
cut-out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19681) Fix TestVectorIfStatement

2018-05-23 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19681:
--

 Summary: Fix TestVectorIfStatement
 Key: HIVE-19681
 URL: https://issues.apache.org/jira/browse/HIVE-19681
 Project: Hive
  Issue Type: Test
  Components: Vectorization
Affects Versions: 3.1.0, 4.0.0
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


{{TestVectorIfStatement}} generates interesting batches (injection of random 
repeating null column values and repeating non-null values) when evaluating the 
vectorized expressions. But the modification of random rows is done after the 
row mode is evaluated. Hence it is likely that comparison results will fail. I 
am not sure how its working in the first place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19514) Successful test logs should be copied

2018-05-13 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19514:
--

 Summary: Successful test logs should be copied
 Key: HIVE-19514
 URL: https://issues.apache.org/jira/browse/HIVE-19514
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Ptest tries to copy ~20G worth of logs from the worker nodes to the server. The 
total aggregate time spent by all the hosts in copying these files is close to 
15 min in copying files. Most of the times we don't check for the logs. We 
seldom check for logs of successful tests. We should ignore copying successful 
test logs by default to improve runtime.

{noformat}
2018-05-13 16:10:15,243  INFO [TestExecutor] PTest.run:235 Executed 13568 tests
2018-05-13 16:10:15,243  INFO [TestExecutor] PTest.run:237 PERF: ExecutionPhase 
took 71 minutes
2018-05-13 16:10:15,243  INFO [TestExecutor] PTest.run:237 PERF: 
ExecutionPhase.TotalRsyncElapsedTime took 15 minutes
2018-05-13 16:10:15,243  INFO [TestExecutor] PTest.run:237 PERF: PrepPhase took 
5 minutes
2018-05-13 16:10:15,243  INFO [TestExecutor] PTest.run:237 PERF: ReportingPhase 
took 0 minutes
2018-05-13 16:10:15,243  INFO [TestExecutor] PTest.run:237 PERF: TestCheckPhase 
took 0 minutes
2018-05-13 16:10:15,243  INFO [TestExecutor] PTest.run:237 PERF: YetusPhase 
took 0 minutes
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19513) ptest version in pom.xml should be 1.0

2018-05-13 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19513:
--

 Summary: ptest version in pom.xml should be 1.0
 Key: HIVE-19513
 URL: https://issues.apache.org/jira/browse/HIVE-19513
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Jenkins job has the hardcoded API point as {{http://:8080/hive-ptest-1.0}}. 
Currently the pom.xml has version as 3.0 which creates a hive-ptest-3.0.war 
file which is not correct. Changing back the version to 1.0 helps updating the 
ptest code without changing the jenkins job API endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19510) Add performance metric to find the total time spend in rsync

2018-05-12 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19510:
--

 Summary: Add performance metric to find the total time spend in 
rsync
 Key: HIVE-19510
 URL: https://issues.apache.org/jira/browse/HIVE-19510
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


I think we are spending a lot of time copying logs from worker nodes to the 
server. We should add some logging to print aggregate time spent in rsync to 
confirm.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19507) Ptest queue size is 5 which doesn't make any sense

2018-05-11 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19507:
--

 Summary: Ptest queue size is 5 which doesn't make any sense
 Key: HIVE-19507
 URL: https://issues.apache.org/jira/browse/HIVE-19507
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


The jenkins job anyway run one by one. It doesn't make sense to have a queue 
size of 5 since we are not running them in parallel anyways. We should change 
the queue size to 1 for now so that PtestServer restarts do not cause 
disruption of jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19505) Increase number of parallel runs from 2 per node to 4

2018-05-11 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19505:
--

 Summary: Increase number of parallel runs from 2 per node to 4
 Key: HIVE-19505
 URL: https://issues.apache.org/jira/browse/HIVE-19505
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


The test cluster is under utilized. For the short run we can increase the 
parallel threads from 2 to 3 (or may be 4) which will speedup parallel test 
batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19493) VectorUDFDateDiffColCol copySelected does not handle nulls correctly

2018-05-10 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19493:
--

 Summary: VectorUDFDateDiffColCol copySelected does not handle 
nulls correctly
 Key: HIVE-19493
 URL: https://issues.apache.org/jira/browse/HIVE-19493
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


The {{copySelected}} method in {{VectorUDFDateDiffColCol}} class was missed 
during HIVE-18622



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19487) Precommits generating 20G of logs

2018-05-10 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19487:
--

 Summary: Precommits generating 20G of logs
 Key: HIVE-19487
 URL: https://issues.apache.org/jira/browse/HIVE-19487
 Project: Hive
  Issue Type: Test
Reporter: Vihang Karajgaonkar


Precommit jobs are generating huge logs. The disk gets full pretty quickly. I 
am not sure what was the log size but I think it was definitely not a problem 
previously. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19461) TestDbNotificationListener is timing out on precommits

2018-05-08 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19461:
--

 Summary: TestDbNotificationListener is timing out on precommits
 Key: HIVE-19461
 URL: https://issues.apache.org/jira/browse/HIVE-19461
 Project: Hive
  Issue Type: Test
  Components: Hive
Reporter: Vihang Karajgaonkar


{{TestDbNotificationListener}} test is timing out on pre-commit consistently



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19429) Investigate alternative technologies like docker containers to increase parallelism

2018-05-04 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19429:
--

 Summary: Investigate alternative technologies like docker 
containers to increase parallelism
 Key: HIVE-19429
 URL: https://issues.apache.org/jira/browse/HIVE-19429
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19427) Offload some work to Jenkins server

2018-05-04 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19427:
--

 Summary: Offload some work to Jenkins server
 Key: HIVE-19427
 URL: https://issues.apache.org/jira/browse/HIVE-19427
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19428) Persist test result data on the Ptest server

2018-05-04 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19428:
--

 Summary: Persist test result data on the Ptest server
 Key: HIVE-19428
 URL: https://issues.apache.org/jira/browse/HIVE-19428
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19426) Move the Ptest queue to Ptest server

2018-05-04 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19426:
--

 Summary: Move the Ptest queue to Ptest server
 Key: HIVE-19426
 URL: https://issues.apache.org/jira/browse/HIVE-19426
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19425) General usability improvements for Ptest

2018-05-04 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19425:
--

 Summary: General usability improvements for Ptest
 Key: HIVE-19425
 URL: https://issues.apache.org/jira/browse/HIVE-19425
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


The Ptest framework has a lot of usability issues some of which are listed 
below:

1. Ptest can run only one job at a time.
2. The pending queue resides on the pre-commit jenkins server. If the jenkins 
server is restarted we lose the queue and the devs don't understand why their 
patches were not run.
3. Average Ptest run takes about ~100 min which not bad considering it is 
running > 10k tests; many of which are very complex queries. We should look to 
see if we can further reduce the turn-around time.

Some of the ideas to improve the current state of Ptest could be:

1. Move the queue to the Ptest server so that it is not lost due to jenkins 
restarts.
2. The jenkins server could do some useful work instead of just waiting for 
Ptest server to return. I propose it should run some pre-checkin (for lack of a 
better word) tests which are reliable and which run fast. The advantage of 
having pre-checkin tests if there are issues with the patch, the pre-commit 
fails fast without the long turn-around time and devs will get a quick feedback 
on issues which need to be fixed in the patch before the full suite of tests 
need to be run. The second advantage is Ptest server will be running a fairly 
tested patches so hopefully will be doing useful work rather. This would also 
reduce load on the Ptest.
3. If Ptest server has a database we can do interesting analysis on the test 
results like identifying flaky tests automatically, generating weekly reports 
about the test status.
4. Have a web-interface of the Ptest server so that devs can check the status 
of the queue and which patch is being run currently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19388) ClassCastException on VectorMapJoinCommonOperator when

2018-05-02 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19388:
--

 Summary: ClassCastException on VectorMapJoinCommonOperator when 
 Key: HIVE-19388
 URL: https://issues.apache.org/jira/browse/HIVE-19388
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.3.2, 2.2.0, 2.1.1, 3.0.0, 3.1.0
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


I see the following exceptions when I a mapjoin operator is being initialized 
on Hive-on-Spark and when vectorization is turned on.

This happens when the hashTable is empty. The code in 
{{MapJoinTableContainerSerDe#getDefaultEmptyContainer}} method returns a 
HashMapWrapper while the VectorMapJoinOperator expects a 
{{MapJoinBytesTableContainer}} when {{hive.mapjoin.optimized.hashtable}} is set 
to true.

{noformat}

Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper cannot be cast to 
org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerDirectAccess
 at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.optimized.VectorMapJoinOptimizedHashTable.(VectorMapJoinOptimizedHashTable.java:92)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.optimized.VectorMapJoinOptimizedHashMap.(VectorMapJoinOptimizedHashMap.java:127)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.optimized.VectorMapJoinOptimizedStringHashMap.(VectorMapJoinOptimizedStringHashMap.java:60)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.optimized.VectorMapJoinOptimizedCreateHashTable.createHashTable(VectorMapJoinOptimizedCreateHashTable.java:80)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.setUpHashTable(VectorMapJoinCommonOperator.java:485)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.completeInitializationOp(VectorMapJoinCommonOperator.java:461)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:471)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:401) 
~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:574) 
~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:526) 
~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
 at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:387) 
~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:109)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
 ... 16 more

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19345) createSources fails on branch-2.3

2018-04-27 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19345:
--

 Summary: createSources fails on branch-2.3
 Key: HIVE-19345
 URL: https://issues.apache.org/jira/browse/HIVE-19345
 Project: Hive
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


I see the following NPE when the source tables are getting created when I try 
to run a qtest.

{noformat}
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.estimateRowSizeFromSchema(StatsUtils.java:546)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getNumRows(StatsUtils.java:183)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:207)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:157)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:145)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:130)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:143)
at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:122)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
at 
org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runStatsAnnotation(SparkCompiler.java:240)
at 
org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:140)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11273)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
at 
org.apache.hadoop.hive.ql.QTestUtil.createSources(QTestUtil.java:1096)
at 
org.apache.hadoop.hive.ql.QTestUtil.createSources(QTestUtil.java:1073)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver$3.invokeInternal(CoreCliDriver.java:81)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver$3.invokeInternal(CoreCliDriver.java:78)
at 
org.apache.hadoop.hive.util.ElapsedTimeLoggingWrapper.invoke(ElapsedTimeLoggingWrapper.java:33)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.beforeClass(CoreCliDriver.java:84)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19344) Change default value of msck.repair.batch.size

2018-04-27 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19344:
--

 Summary: Change default value of msck.repair.batch.size 
 Key: HIVE-19344
 URL: https://issues.apache.org/jira/browse/HIVE-19344
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


{{msck.repair.batch.size}} default to 0 which means msck will try to add all 
the partitions in one API call to HMS. This can potentially add huge memory 
pressure on HMS. The default value should be changed to a reasonable number so 
that in case of large number of partitions we can batch the addition of 
partitions. Same goes for {{msck.repair.batch.max.retries}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19254) NumberFormatException in MetaStoreUtils.isFastStatsSame

2018-04-19 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19254:
--

 Summary: NumberFormatException in MetaStoreUtils.isFastStatsSame
 Key: HIVE-19254
 URL: https://issues.apache.org/jira/browse/HIVE-19254
 Project: Hive
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


I see the following exception under some cases in the logs. This possibly 
happens when you try to add empty partitions.

{noformat}
2018-04-19T19:32:19,260 ERROR [pool-7-thread-7] metastore.RetryingHMSHandler: 
MetaException(message:java.lang.NumberFormatException: null)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:6824)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:4864)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions(HiveMetaStore.java:4801)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at com.sun.proxy.$Proxy24.alter_partitions(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:16046)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:16030)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NumberFormatException: null
at java.lang.Long.parseLong(Long.java:552)
at java.lang.Long.parseLong(Long.java:631)
at 
org.apache.hadoop.hive.metastore.utils.MetaStoreUtils.isFastStatsSame(MetaStoreUtils.java:632)
at 
org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:743)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:4827)
... 21 more
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19242) CliAdapter silently ignores excluded qfiles

2018-04-18 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19242:
--

 Summary: CliAdapter silently ignores excluded qfiles
 Key: HIVE-19242
 URL: https://issues.apache.org/jira/browse/HIVE-19242
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


If a user is trying to run a qfile using {{-Dqfile}} and if it is excluded 
according to the {{CliConfig}} AbstractCliConfig silently ignores the qtest run 
and its very hard for the user to find out why the test did not run. We should 
log a helpful warning so that it is easier to find this out when it happens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19136) DbNotifications clean up throws NPE on mysql databases

2018-04-09 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19136:
--

 Summary: DbNotifications clean up throws NPE on mysql databases
 Key: HIVE-19136
 URL: https://issues.apache.org/jira/browse/HIVE-19136
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 3.0.0
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


I see following stack trace in HMS logs when db notification cleaner thread 
tries to clean the old notification.

{noformat}
Exception in thread "CleanerThread" javax.jdo.JDODataStoreException: 
Transaction failed to commit
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
at 
org.datanucleus.api.jdo.JDOTransaction.rollback(JDOTransaction.java:189)
at 
org.apache.hadoop.hive.metastore.ObjectStore.rollbackTransaction(ObjectStore.java:790)
at 
org.apache.hadoop.hive.metastore.ObjectStore.rollbackAndCleanup(ObjectStore.java:10425)
at 
org.apache.hadoop.hive.metastore.ObjectStore.cleanNotificationEvents(ObjectStore.java:9258)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
at com.sun.proxy.$Proxy22.cleanNotificationEvents(Unknown Source)
at 
org.apache.hive.hcatalog.listener.DbNotificationListener$CleanerThread.run(DbNotificationListener.java:737)
NestedThrowablesStackTrace:
Unexpected exception encountered during query.
org.datanucleus.exceptions.NucleusDataStoreException: Unexpected exception 
encountered during query.
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.close(ConnectionFactoryImpl.java:569)
at 
org.datanucleus.store.connection.ConnectionManagerImpl$2.transactionRolledBack(ConnectionManagerImpl.java:400)
at 
org.datanucleus.TransactionImpl.internalRollback(TransactionImpl.java:534)
at org.datanucleus.TransactionImpl.rollback(TransactionImpl.java:451)
at 
org.datanucleus.api.jdo.JDOTransaction.rollback(JDOTransaction.java:185)
at 
org.apache.hadoop.hive.metastore.ObjectStore.rollbackTransaction(ObjectStore.java:790)
at 
org.apache.hadoop.hive.metastore.ObjectStore.rollbackAndCleanup(ObjectStore.java:10425)
at 
org.apache.hadoop.hive.metastore.ObjectStore.cleanNotificationEvents(ObjectStore.java:9258)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
at com.sun.proxy.$Proxy22.cleanNotificationEvents(Unknown Source)
at 
org.apache.hive.hcatalog.listener.DbNotificationListener$CleanerThread.run(DbNotificationListener.java:737)
Caused by: java.sql.SQLException: Unexpected exception encountered during query.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:963)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:896)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:885)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:860)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2582)
at 
com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:4698)
at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4590)
at 
com.zaxxer.hikari.pool.ProxyConnection.close(ProxyConnection.java:233)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.close(ConnectionFactoryImpl.java:557)
... 14 more
Caused by: java.lang.NullPointerException
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2545)
... 18 more
Nested Throwables StackTrace:
java.sql.SQLException: Unexpected exception encountered during query.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:963)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:896)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:885)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:860)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2582)
at 
com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:4698)
at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4590)
at 

[jira] [Created] (HIVE-19050) DBNotificationListener does not catch exceptions in the cleaner thread

2018-03-26 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19050:
--

 Summary: DBNotificationListener does not catch exceptions in the 
cleaner thread
 Key: HIVE-19050
 URL: https://issues.apache.org/jira/browse/HIVE-19050
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Standalone Metastore
Affects Versions: 3.0.0, 2.4.0
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


The DbNotificationListener class has a separate thread which cleans the old 
notifications from the database. Here is the snippet from the {{run}} method.

{noformat}
public void run() {
  while (true) {
rs.cleanNotificationEvents(ttl);
LOG.debug("Cleaner thread done");
try {
  Thread.sleep(sleepTime);
} catch (InterruptedException e) {
  LOG.info("Cleaner thread sleep interrupted", e);
}
  }
}
{noformat}

If {{rs.cleanNotificationEvents}} throws a RuntimeException which datanucleus 
can throw the exception remains uncaught and the thread will die. This can lead 
to older notifications never getting cleaned until we restart HMS. Given that 
many operations generate loads of events, the notification log table can 
quickly have thousands of rows which are never get cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19041) Thrift deserialization of Partition objects should intern fields

2018-03-23 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19041:
--

 Summary: Thrift deserialization of Partition objects should intern 
fields
 Key: HIVE-19041
 URL: https://issues.apache.org/jira/browse/HIVE-19041
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 2.3.2, 3.0.0
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


When a client is creating large number of partitions, the thrift objects are 
deserialized into Partition objects. The read method of these objects does not 
intern the inputformat, location, outputformat which cause large number of 
duplicate Strings in the HMS memory. We should intern these objects while 
deserialization to reduce memory pressure. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19000) Fix TestNegativeCliDriver

2018-03-20 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-19000:
--

 Summary: Fix TestNegativeCliDriver
 Key: HIVE-19000
 URL: https://issues.apache.org/jira/browse/HIVE-19000
 Project: Hive
  Issue Type: Test
Reporter: Vihang Karajgaonkar


TestNegativeCliDriver is failing since a while. We should investigate and fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18964) HiveServer2 should not log errors when clients issue connection reset

2018-03-15 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-18964:
--

 Summary: HiveServer2 should not log errors when clients issue 
connection reset
 Key: HIVE-18964
 URL: https://issues.apache.org/jira/browse/HIVE-18964
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 2.3.2, 3.0.0
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


HiveServer2 logs an ugly exception trace when clients issue a connection reset. 
There is nothing we can do when the connection is reset. This exception trace 
should either be ignored or logged only in debug mode.

Things become worse if you are using load balancer like HAProxy which has its 
own health checks. HAProxy issues a connection reset to "quickly" close the 
connection once it finds that HS2 is up and available. This spams the logs at a 
very high frequency and makes them unusable for debugging purposes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18719) Metastore tests failing

2018-02-14 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-18719:
--

 Summary: Metastore tests failing
 Key: HIVE-18719
 URL: https://issues.apache.org/jira/browse/HIVE-18719
 Project: Hive
  Issue Type: Test
  Components: Metastore
Reporter: Vihang Karajgaonkar


Some of the recently added metastore tests are failing regularly. Possibly 
because the socket is in use.

Here is one such run: https://builds.apache.org/job/PreCommit-HIVE-Build/9218/

org.apache.hadoop.hive.metastore.client.TestFunctions.testGetFunctionNullDatabase[Embedded]
 (batchId=205)
org.apache.hadoop.hive.metastore.client.TestTablesGetExists.testGetAllTablesCaseInsensitive[Embedded]
 (batchId=205)
org.apache.hadoop.hive.metastore.client.TestTablesList.testListTableNamesByFilterNullDatabase[Embedded]
 (batchId=205)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18654) Add Hiveserver2 specific HADOOP_OPTS environment variable

2018-02-07 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-18654:
--

 Summary: Add Hiveserver2 specific HADOOP_OPTS environment variable 
 Key: HIVE-18654
 URL: https://issues.apache.org/jira/browse/HIVE-18654
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


HIVE-2665 added support to include metastore specific HADOOP_OPTS variable. 
This is helpful in debugging especially if you want to add some jvm parameters 
to metastore's process. A similar setting for Hiveserver2 is missing and could 
be very helpful in debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18553) VectorizedParquetReader fails after adding a new column to table

2018-01-25 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-18553:
--

 Summary: VectorizedParquetReader fails after adding a new column 
to table
 Key: HIVE-18553
 URL: https://issues.apache.org/jira/browse/HIVE-18553
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 2.3.2, 3.0.0, 2.4.0
Reporter: Vihang Karajgaonkar


VectorizedParquetReader throws an exception when trying to reading from a 
parquet table on which new columns are added. Steps to reproduce below:

{code}
0: jdbc:hive2://localhost:1/default> desc test_p;
+---++--+
| col_name  | data_type  | comment  |
+---++--+
| t1| tinyint|  |
| t2| tinyint|  |
| i1| int|  |
| i2| int|  |
+---++--+
0: jdbc:hive2://localhost:1/default> set hive.fetch.task.conversion=none;
0: jdbc:hive2://localhost:1/default> set 
hive.vectorized.execution.enabled=true;
0: jdbc:hive2://localhost:1/default> alter table test_p add columns (ts 
timestamp);
0: jdbc:hive2://localhost:1/default> select * from test_p;
Error: Error while processing statement: FAILED: Execution Error, return code 2 
from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
{code}

Following exception is seen in the logs

{code}
Caused by: java.lang.IllegalArgumentException: [ts] BINARY is not in the store: 
[[i1] INT32, [i2] INT32, [t1] INT32, [t2] INT32] 3
at 
org.apache.parquet.hadoop.ColumnChunkPageReadStore.getPageReader(ColumnChunkPageReadStore.java:160)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:479)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:432)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:393)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:345)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:88)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:167)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:52)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:142)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
 ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) 
~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) 
~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459) 
~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
 ~[hadoop-mapreduce-client-common-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_121]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_121]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[?:1.8.0_121]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
~[?:1.8.0_121]
at 

  1   2   3   >