[jira] [Created] (HIVE-22175) TestBudyAllocator#testMTT test is flaky
Adam Szita created HIVE-22175: - Summary: TestBudyAllocator#testMTT test is flaky Key: HIVE-22175 URL: https://issues.apache.org/jira/browse/HIVE-22175 Project: Hive Issue Type: Bug Reporter: Adam Szita This test has a fail rate of about 20%-25% -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (HIVE-22099) GenericUDFDateFormat can't handle Julian dates properly
Adam Szita created HIVE-22099: - Summary: GenericUDFDateFormat can't handle Julian dates properly Key: HIVE-22099 URL: https://issues.apache.org/jira/browse/HIVE-22099 Project: Hive Issue Type: Bug Reporter: Adam Szita Assignee: Adam Szita Currently dates that belong to Julian calendar (before Oct 15, 1582) are handled improperly by DateFormat UDF: Although the dates are in Julian calendar, the formatter insists to print these according to Gregorian calendar causing multiple days of difference in some cases: {code:java} beeline> select date_format('1001-01-05','dd---MM--'); ++ | _c0 | ++ | 30---12--1000 | ++{code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22043) Make LLAP's Yarn package dir on HDFS configurable
Adam Szita created HIVE-22043: - Summary: Make LLAP's Yarn package dir on HDFS configurable Key: HIVE-22043 URL: https://issues.apache.org/jira/browse/HIVE-22043 Project: Hive Issue Type: New Feature Reporter: Adam Szita Assignee: Adam Szita Currently at LLAP launch we're using a hardwired HDFS directory to upload libs and configs that are required for LLAP daemons. This is hive user home directory/.yarn I propose to have this configurable instead. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-21922) Allow keytabs to be reused in LLAP yarn applications through Yarn localization
Adam Szita created HIVE-21922: - Summary: Allow keytabs to be reused in LLAP yarn applications through Yarn localization Key: HIVE-21922 URL: https://issues.apache.org/jira/browse/HIVE-21922 Project: Hive Issue Type: New Feature Reporter: Adam Szita Assignee: Adam Szita In secure clusters LLAP has to be able to reach keytab files for kerberos login. Currently _hive.llap.task.scheduler.am.registry.keytab.file_ and _hive.llap.daemon.keytab.file_ configs are used to define the path of such keytabs on the Tez AM and LLAP daemon side respectively. Both presume local file system paths only - hence all nodes in the LLAP cluster (even those that eventually don't end up executing a daemon...) have to have Hive's keytab preinstalled on them. The above is described by this strategy: [Pre-installed_Keytabs_for_AM_and_containers|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html#Pre-installed_Keytabs_for_AM_and_containers] Another approach can be [Keytabs_for_AM_and_containers_distributed_via_YARN|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html#Keytabs_for_AM_and_containers_distributed_via_YARN] where we rely on HDFS and Yarn resource localization, and no prior keytab distribution is required. I intend to make this strategy an option for Hive-LLAP in this jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21866) LLAP status service driver may get stuck with wrong Yarn app ID
Adam Szita created HIVE-21866: - Summary: LLAP status service driver may get stuck with wrong Yarn app ID Key: HIVE-21866 URL: https://issues.apache.org/jira/browse/HIVE-21866 Project: Hive Issue Type: Bug Reporter: Adam Szita Assignee: Adam Szita LLAPStatusDriver might get stuck polling status from Yarn if the following happen in this order: * there was a running LLAP Yarn app previously which is now finished / killed * Yarn was restarted * LLAPStatusDriver is invoked before any new LLAP app gets kicked off * LLAPStatusDriver receives the old app ID, which is then cached in the Yarn serviceClient object (no evicition) * In the meantime if any new LLAP app gets kicked off, LLAPStatusDriver will not see it, as it constantly retries fetching info about the wrong, old app ID (this is because we don't create new serviceClient objects) {code:java} ERROR status.LlapStatusServiceDriver: FAILED: 20: Failed to get Yarn AppReport org.apache.hadoop.hive.llap.cli.status.LlapStatusCliException: 20: Failed to get Yarn AppReport at org.apache.hadoop.hive.llap.cli.status.LlapStatusServiceDriver.getAppReport(LlapStatusServiceDriver.java:292) [hive-llap-server-3.1.0.7.0.0.0-112.jar:3.1.0.7.0.0.0-134] at org.apache.hadoop.hive.llap.cli.status.LlapStatusServiceDriver.run(LlapStatusServiceDriver.java:209) [hive-llap-server-3.1.0.7.0.0.0-112.jar:3.1.0.7.0.0.0-134] at org.apache.hadoop.hive.llap.cli.status.LlapStatusServiceDriver.main(LlapStatusServiceDriver.java:537) [hive-llap-server-3.1.0.7.0.0.0-112.jar:3.1.0.7.0.0.0-134]{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21681) Describe formatted shows incorrect information for multiple primary keys
Adam Szita created HIVE-21681: - Summary: Describe formatted shows incorrect information for multiple primary keys Key: HIVE-21681 URL: https://issues.apache.org/jira/browse/HIVE-21681 Project: Hive Issue Type: Bug Reporter: Adam Szita Assignee: Adam Szita In tables with primary key spanning across multiple columns 'describe formatted' shows a maximum of two column names only: In the ASCII art table of 3 columns it will show: {{Column name|p1|p2}} Example queries: {code:java} CREATE TABLE test ( p1 string, p2 string, p3 string, c0 int, PRIMARY KEY(p1,p2,p3) DISABLE NOVALIDATE ); describe formatted test;{code} I propose we fix this so that primary key columns get listed one-by-one in separate rows, somewhat like how foreign keys are listed too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result
Adam Szita created HIVE-21509: - Summary: LLAP may cache corrupted column vectors and return wrong query result Key: HIVE-21509 URL: https://issues.apache.org/jira/browse/HIVE-21509 Project: Hive Issue Type: Bug Components: llap Reporter: Adam Szita Assignee: Adam Szita -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21217) Optimize range calculation for PTF
Adam Szita created HIVE-21217: - Summary: Optimize range calculation for PTF Key: HIVE-21217 URL: https://issues.apache.org/jira/browse/HIVE-21217 Project: Hive Issue Type: Improvement Reporter: Adam Szita Assignee: Adam Szita During window function execution Hive has to iterate on neighbouring rows of the current row to find the beginning and end of the proper range (on which the aggregation will be executed). When we're using range based windows and have many rows with a certain key value this can take a lot of time. (e.g. partition size of 80M, in which we have 2 ranges of 40M rows according to the orderby column: within these 40M rowsets we're doing 40M x 40M/2 steps.. which is of n^2 time complexity) I propose to introduce a cache that keeps track of already calculated range ends so it can be reused in future scans. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21176) SetSparkReducerParallelism should check spark.executor.instances before opening SparkSession during compilation
Adam Szita created HIVE-21176: - Summary: SetSparkReducerParallelism should check spark.executor.instances before opening SparkSession during compilation Key: HIVE-21176 URL: https://issues.apache.org/jira/browse/HIVE-21176 Project: Hive Issue Type: Bug Reporter: Adam Szita Assignee: Adam Szita {{SetSparkReducerParallelism}} creates a spark session in the compilation stage while holding the compile lock. This is a very expensive operation and can cause a complete slowdown of all the hive queries. The problem only occurs when dynamicAllocation is disabled, but we should find a way to improve this: e.g. if spark.executor.instances is set we already know how many executors will be launched -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21097) Replace SparkConf dependency from HS2 with Hadoop Conf
Adam Szita created HIVE-21097: - Summary: Replace SparkConf dependency from HS2 with Hadoop Conf Key: HIVE-21097 URL: https://issues.apache.org/jira/browse/HIVE-21097 Project: Hive Issue Type: Sub-task Reporter: Adam Szita Assignee: Adam Szita -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21096) Remove unnecessary Spark dependency from HS2 process
Adam Szita created HIVE-21096: - Summary: Remove unnecessary Spark dependency from HS2 process Key: HIVE-21096 URL: https://issues.apache.org/jira/browse/HIVE-21096 Project: Hive Issue Type: Improvement Components: HiveServer2, Spark Reporter: Adam Szita Assignee: Adam Szita When a HiveOnSpark job is kicked off most of the work is done by the RemoteDriver, which is a separate process. There a couple of smaller parts of code, where HS2 process depends on Spark jars, these for example include receiving stats from the driver or putting together a Spark conf object - used mostly during communication with RemoteDriver. We can limit the data types used for such communication so that we don't use (and serialize) types that are in Spark codebase, and hence we can refactor our code to only use Spark jars in the Remote Driver process. I think this way would be cleaner from dependencies point of view, and also less erroneous when users have to compile the classpath for their HS2 processes. (E.g. due to a change between Spark 2.2 and 2.4 we had to also include spark-unsafe*.jar - though it's an internal change to Spark..) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21015) HCatLoader can't provide statistics for tables no in default DB
Adam Szita created HIVE-21015: - Summary: HCatLoader can't provide statistics for tables no in default DB Key: HIVE-21015 URL: https://issues.apache.org/jira/browse/HIVE-21015 Project: Hive Issue Type: Bug Reporter: Adam Szita Assignee: Adam Szita This is due to a former change (HIVE-20330) that does not take database into consideration when retrieving the proper InputJobInfo for the loader. Found during testing: *07:52:56* 2018-12-05 07:52:16,599 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - Couldn't get statistics from LoadFunc: org.apache.hive.hcatalog.pig.HCatLoader@492fa72a*07:52:56* java.io.IOException: java.io.IOException: Could not calculate input size for location (table) tpcds_3000_decimal_parquet.date_dim*07:52:56* at org.apache.hive.hcatalog.pig.HCatLoader.getStatistics(HCatLoader.java:281)*07:52:56* at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator.getInputSizeFromLoader(InputSizeReducerEstimator.java:171)*07:52:56* at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator.getTotalInputFileSize(InputSizeReducerEstimator.java:118)*07:52:56* at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator.getTotalInputFileSize(InputSizeReducerEstimator.java:97)*07:52:56* at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator.estimateNumberOfReducers(InputSizeReducerEstimator.java:80)*07:52:56* at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:1148)*07:52:56* at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.calculateRuntimeReducers(JobControlCompiler.java:1115)*07:52:56* at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.adjustNumReducers(JobControlCompiler.java:1063)*07:52:56* at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:564)*07:52:56* at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:333)*07:52:56* at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:221)*07:52:56* at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:293)*07:52:56* at org.apache.pig.PigServer.launchPlan(PigServer.java:1475)*07:52:56* at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1460)*07:52:56* at org.apache.pig.PigServer.storeEx(PigServer.java:1119)*07:52:56* at org.apache.pig.PigServer.store(PigServer.java:1082)*07:52:56*at org.apache.pig.PigServer.openIterator(PigServer.java:995)*07:52:56* at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:782)*07:52:56* at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:383)*07:52:56* at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)*07:52:56* at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)*07:52:56* at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)*07:52:56* at org.apache.pig.Main.run(Main.java:630)*07:52:56* at org.apache.pig.Main.main(Main.java:175)*07:52:56*at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*07:52:56*at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)*07:52:56* at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*07:52:56* at java.lang.reflect.Method.invoke(Method.java:498)*07:52:56* at org.apache.hadoop.util.RunJar.run(RunJar.java:313)*07:52:56* at org.apache.hadoop.util.RunJar.main(RunJar.java:227)*07:52:56* Caused by: java.io.IOException: Could not calculate input size for location (table) tpcds_3000_decimal_parquet.date_dim*07:52:56* at org.apache.hive.hcatalog.pig.HCatLoader.getStatistics(HCatLoader.java:276)*07:52:56* ... 29 more -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20487) Update committer's list at hive.apache.org/people.html
Adam Szita created HIVE-20487: - Summary: Update committer's list at hive.apache.org/people.html Key: HIVE-20487 URL: https://issues.apache.org/jira/browse/HIVE-20487 Project: Hive Issue Type: Bug Reporter: Adam Szita Assignee: Adam Szita Adding pvary, kuczoram and szita -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20384) Fix flakiness of erasure_commands.q
Adam Szita created HIVE-20384: - Summary: Fix flakiness of erasure_commands.q Key: HIVE-20384 URL: https://issues.apache.org/jira/browse/HIVE-20384 Project: Hive Issue Type: Bug Reporter: Adam Szita Assignee: Adam Szita Qtest erasure_commands.q might fail if erasure_simple.q precedes it in the same batch -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20330) HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs
Adam Szita created HIVE-20330: - Summary: HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs Key: HIVE-20330 URL: https://issues.apache.org/jira/browse/HIVE-20330 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Adam Szita Assignee: Adam Szita While running performance tests on Pig (0.12 and 0.17) we've observed a huge performance drop in a workload that has multiple inputs from HCatLoader. The reason is that for a particular MR job with multiple Hive tables as input, Pig calls {{setLocation}} on each {{LoaderFunc (HCatLoader)}} instance but only one table's information (InputJobInfo instance) gets tracked in the JobConf. (This is under config key {{HCatConstants.HCAT_KEY_JOB_INFO}}). Any such call overwrites preexisting values, and thus only the last table's information will be considered when Pig calls {{getStatistics}} to calculate and estimate required reducer count. In cases when there are 2 input tables, 256GB and 1MB in size respectively, Pig will query the size information from HCat for both of them, but it will either see 1MB+1MB=2MB or 256GB+256GB=0.5TB depending on input order in the execution plan's DAG. It should of course see 256.00097GB in total and use 257 reducers by default accordingly. In unlucky cases this will be 2MB and 1 reducer will have to struggle with 256GB... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19969) Dependency order (dirlist) assessment fails in yetus run
Adam Szita created HIVE-19969: - Summary: Dependency order (dirlist) assessment fails in yetus run Key: HIVE-19969 URL: https://issues.apache.org/jira/browse/HIVE-19969 Project: Hive Issue Type: Bug Reporter: Adam Szita Assignee: Adam Szita As seen here, the dirlist step of yetus fails to determine order of modules to be built. It silently falls back to alphabetical order which may or may not work depending on the patch. {code:java} Thu Jun 21 02:43:04 UTC 2018 cd /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958 mvn -q exec:exec -Dexec.executable=pwd -Dexec.args='' /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/storage-api /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/upgrade-acid /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958 /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/classification /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/shims/common /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/shims/0.23 /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/shims/scheduler /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/shims/aggregator /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/common /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/service-rpc /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11958/serde Usage: java [-options] class [args...] (to execute a class) or java [-options] -jar jarfile [args...] (to execute a jar file) where options include:{code} The problem is in standalone-metastore module: maven plugin 'exec' has a global config set {{executable=java}} disregarding the dirlist task's {{-Dexec.executable=pwd}} and causing the above error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19944) Investigate and fix version mismatch of GCP
Adam Szita created HIVE-19944: - Summary: Investigate and fix version mismatch of GCP Key: HIVE-19944 URL: https://issues.apache.org/jira/browse/HIVE-19944 Project: Hive Issue Type: Sub-task Reporter: Adam Szita Assignee: Adam Szita We've observed that adding a new image to the ptest GCP project breaks our currently working infrastructure when we try to restart the hive ptest server. This is because upon initialization the project's images are queried and we immediately get an exception for newly added images - they don't have a field that our client thinks should be mandatory to have. I believe there's an upgrade needed on our side for the GCP libs we depend on. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19583) Some yetus working dirs are left on hivepest-server-upstream disk after test
Adam Szita created HIVE-19583: - Summary: Some yetus working dirs are left on hivepest-server-upstream disk after test Key: HIVE-19583 URL: https://issues.apache.org/jira/browse/HIVE-19583 Project: Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Adam Szita Assignee: Adam Szita PTest's PrepPhase is creating a yetus working folder for each build after checking out source code. The source code is then copied into that for yetus. This folder is cleaned up after the test executed, so if that doesn't happen e.g. due to patch not being applicable the folder is left on the disk. We need to remove it in this case too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19077) Handle duplicate ptests requests standing in queue at the same time
Adam Szita created HIVE-19077: - Summary: Handle duplicate ptests requests standing in queue at the same time Key: HIVE-19077 URL: https://issues.apache.org/jira/browse/HIVE-19077 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Adam Szita Assignee: Adam Szita I've been keeping on eye on our {{PreCommit-HIVE-Build}} job, and what I noticed that sometimes huge queues can build up, that contain jira's more than once. (Yesterday I've seen a queue of 40, having 31 distinct jiras..) Simple scenario is that I upload a patch, it gets queued for ptest (already long queue), and 3 hours later I will update it, re-upload and re-queue. Now the current ptest infra seems to be smart enough to always deal with the latest patch, so what will happen is that the same patch will be tested 2 times (with ~3 hours) diff, most probably with same result. I propose we do some deduplication - if ptest starts running the request for Jira X, then it can take a look on the current queue, and see if X is there again. If so, it can skip for now, it will be picked up later anyway. In practice this means that if you reconsider your patch and update it, your original place in the queue will be gone (like as a penalty for changing it), but overall it saves resources for the whole community. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19036) Fix whitespace error in testconfiguration.properties after HIVE-14032
Adam Szita created HIVE-19036: - Summary: Fix whitespace error in testconfiguration.properties after HIVE-14032 Key: HIVE-19036 URL: https://issues.apache.org/jira/browse/HIVE-19036 Project: Hive Issue Type: Bug Reporter: Adam Szita Assignee: Adam Szita HIVE-14032 has introduced a whitespace error {{in itests/src/test/resources/testconfiguration.properties}} (left a space after \) which caused 300+ test failures in ptest.. The reason ptest didn't pick this error up originally is that commit of that Jira had some extra changes compared to the patch uploaded to HIVE-14032. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18706) Ensure each Yetus execution has its own separate working dir
Adam Szita created HIVE-18706: - Summary: Ensure each Yetus execution has its own separate working dir Key: HIVE-18706 URL: https://issues.apache.org/jira/browse/HIVE-18706 Project: Hive Issue Type: Improvement Reporter: Adam Szita Assignee: Adam Szita Currently all Yetus executions started asynchronously by ptest are using the same working directory. This is not a problem in most of the cases because Yetus finishes in less than 30 minutes for small patches. For some oversized patches however, this may take more time than ptest test execution and thus overlapping with the next build. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18705) Improve HiveMetaStoreClient.dropDatabase
Adam Szita created HIVE-18705: - Summary: Improve HiveMetaStoreClient.dropDatabase Key: HIVE-18705 URL: https://issues.apache.org/jira/browse/HIVE-18705 Project: Hive Issue Type: Improvement Reporter: Adam Szita Assignee: Adam Szita {{HiveMetaStoreClient.dropDatabase}} has a strange implementation to ensure dealing with client side hooks (for non-native tables e.g. HBase). Currently it starts by retrieving all the tables from HMS, and then sends {{dropTable}} calls to HMS table-by-table. At the end a {{dropDatabase}} just to be sure :) I believe this could be refactored so that it speeds up the dropDB in situations where the average table count per DB is very high. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18704) Add test.groups parameter setting to ptest
Adam Szita created HIVE-18704: - Summary: Add test.groups parameter setting to ptest Key: HIVE-18704 URL: https://issues.apache.org/jira/browse/HIVE-18704 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Adam Szita Assignee: Adam Szita -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18612) Build subprocesses under Yetus in Ptest use 1.7 jre instead of 1.8
Adam Szita created HIVE-18612: - Summary: Build subprocesses under Yetus in Ptest use 1.7 jre instead of 1.8 Key: HIVE-18612 URL: https://issues.apache.org/jira/browse/HIVE-18612 Project: Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Adam Szita Assignee: Adam Szita As per this jira comment made by Yetus maven plugins that want to use {{java}} executable are seeing a 1.7 java binary. In this particular case Yetus sets JAVA_HOME to a 1.8 JDK installation, and thus maven uses that, but any subsequent {{java}} executes will use the JRE which they see on PATH. This should be fixed by adding the proper java/bin (that of JAVA_HOME setting) to PATH. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18604) DropDatabase cascade fails when there is an index in the DB
Adam Szita created HIVE-18604: - Summary: DropDatabase cascade fails when there is an index in the DB Key: HIVE-18604 URL: https://issues.apache.org/jira/browse/HIVE-18604 Project: Hive Issue Type: Bug Components: Metastore Reporter: Adam Szita Assignee: Adam Szita As seen in [HMS API test|https://github.com/apache/hive/blob/master/standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestDatabases.java#L452] dropping database (even with cascade) is failing when an index exists in the corresponding database, throwing MetaException: {code:java} MetaException(message:Exception thrown flushing changes to datastore ) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:208) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) at com.sun.proxy.$Proxy35.drop_table_with_environment_context(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.drop_table_with_environment_context(HiveMetaStoreClient.java:2495) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:1092) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:1007) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropDatabase(HiveMetaStoreClient.java:859) at org.apache.hadoop.hive.metastore.client.TestDatabases.testDropDatabaseWithIndexCascade(TestDatabases.java:470) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runners.Suite.runChild(Suite.java:127) at org.junit.runners.Suite.runChild(Suite.java:26) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) Caused by: javax.jdo.JDODataStoreException: Exception thrown flushing changes to datastore NestedThrowables: java.sql.BatchUpdateException: DELETE on table 'TBLS' caused a violation of foreign key constraint 'IDXS_FK1' for key (2). The statement has been rolled back. at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543) at org.datanucleus.api.jdo.JDOTransaction.commit(JDOTransaction.java:171) at org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:745) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) at com.sun.proxy.$Proxy33.commitTransaction(Unknown Source) at
[jira] [Created] (HIVE-18567) ObjectStore.getPartitionNamesNoTxn doesn't handle max param properly
Adam Szita created HIVE-18567: - Summary: ObjectStore.getPartitionNamesNoTxn doesn't handle max param properly Key: HIVE-18567 URL: https://issues.apache.org/jira/browse/HIVE-18567 Project: Hive Issue Type: Bug Components: Metastore Reporter: Adam Szita Assignee: Adam Szita As per [this HMS API test case|https://github.com/apache/hive/commit/fa0a8d27d4149cc5cc2dbb49d8eb6b03f46bc279#diff-25c67d898000b53e623a6df9221aad5dR1044] listing partition names doesn't check tha max param against MetaStoreConf.LIMIT_PARTITION_REQUEST (as other methods do by checkLimitNumberOfPartitionsByFilter), and also behaves differently on max=0 setting compared to other methods. We should bring this into consistency. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18554) Fix false positive test ql.io.parquet.TestHiveSchemaConverter.testMap
Adam Szita created HIVE-18554: - Summary: Fix false positive test ql.io.parquet.TestHiveSchemaConverter.testMap Key: HIVE-18554 URL: https://issues.apache.org/jira/browse/HIVE-18554 Project: Hive Issue Type: Bug Reporter: Adam Szita Assignee: Adam Szita In test case {{testMap}} the AssertEquals check was returning a false positive result, due to a Parquet bug: Original types were not asserted in equals method, this has been fixed here: [https://github.com/apache/parquet-mr/commit/878ebcd0bc2592fa9d5dda01117c07bc3c40bb33] What this test would produce after the Parquet fix is this: {code:java} expected: but was:{code} Once we upgrade to a Parquet lib with this fix in place our test case will produce failure too, hence I propose fixing it now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18542) Create tests to cover getTableMeta method
Adam Szita created HIVE-18542: - Summary: Create tests to cover getTableMeta method Key: HIVE-18542 URL: https://issues.apache.org/jira/browse/HIVE-18542 Project: Hive Issue Type: Sub-task Reporter: Adam Szita Assignee: Adam Szita -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18510) Enable running checkstyle on test sources as well
Adam Szita created HIVE-18510: - Summary: Enable running checkstyle on test sources as well Key: HIVE-18510 URL: https://issues.apache.org/jira/browse/HIVE-18510 Project: Hive Issue Type: Improvement Reporter: Adam Szita Assignee: Adam Szita Currently only source files are in the scope of checkstyle testing. We should expand the scope to include our testing code as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18468) Create tests to cover alterPartition and renamePartition methods
Adam Szita created HIVE-18468: - Summary: Create tests to cover alterPartition and renamePartition methods Key: HIVE-18468 URL: https://issues.apache.org/jira/browse/HIVE-18468 Project: Hive Issue Type: Sub-task Reporter: Adam Szita Assignee: Adam Szita -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18443) Ensure git gc finished in ptest prep phase before copying repo
Adam Szita created HIVE-18443: - Summary: Ensure git gc finished in ptest prep phase before copying repo Key: HIVE-18443 URL: https://issues.apache.org/jira/browse/HIVE-18443 Project: Hive Issue Type: Sub-task Components: Testing Infrastructure Reporter: Adam Szita Assignee: Adam Szita In ptest's prep phase script first we checkout the latest Hive code from git, and then we make copy of its contents (along .git folder) for that will serve as Yetus' working directory. In some cases we can see errors such as {{+ cp -R . ../yetus cp: cannot stat ?./.git/gc.pid?: No such file or directory}}, e.g. [here|https://issues.apache.org/jira/browse/HIVE-18372?focusedCommentId=16321507=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16321507] This is caused by git running its gc feature in the background when our prep script has already started copying. In cases where gc finishes while cp is running, we'll get this error -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18263) Ptest execution are multiple times slower sometimes due to dying executor slaves
Adam Szita created HIVE-18263: - Summary: Ptest execution are multiple times slower sometimes due to dying executor slaves Key: HIVE-18263 URL: https://issues.apache.org/jira/browse/HIVE-18263 Project: Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Adam Szita Assignee: Adam Szita -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18212) Make sure Yetus check always has a full log
Adam Szita created HIVE-18212: - Summary: Make sure Yetus check always has a full log Key: HIVE-18212 URL: https://issues.apache.org/jira/browse/HIVE-18212 Project: Hive Issue Type: Sub-task Reporter: Adam Szita Assignee: Adam Szita -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18085) Run checkstyle on storage-api module with proper configuration
Adam Szita created HIVE-18085: - Summary: Run checkstyle on storage-api module with proper configuration Key: HIVE-18085 URL: https://issues.apache.org/jira/browse/HIVE-18085 Project: Hive Issue Type: Bug Reporter: Adam Szita Assignee: Adam Szita Module storage-api is disconnected from the root Hive pom, so we need to add the proper plugin configuration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18084) Upgrade checkstyle version to support lambdas
Adam Szita created HIVE-18084: - Summary: Upgrade checkstyle version to support lambdas Key: HIVE-18084 URL: https://issues.apache.org/jira/browse/HIVE-18084 Project: Hive Issue Type: Sub-task Reporter: Adam Szita Assignee: Adam Szita -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18030) HCatalog can't be used with Pig on Spark
Adam Szita created HIVE-18030: - Summary: HCatalog can't be used with Pig on Spark Key: HIVE-18030 URL: https://issues.apache.org/jira/browse/HIVE-18030 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Adam Szita Assignee: Adam Szita When using Pig on Spark in cluster mode, all queries containing HCatalog access are failing: {code} 2017-11-03 12:39:19,268 [dispatcher-event-loop-19] INFO org.apache.spark.storage.BlockManagerInfo - Added broadcast_6_piece0 in memory on <>:<> (size: 83.0 KB, free: 408.5 MB) 2017-11-03 12:39:19,277 [task-result-getter-0] WARN org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 0.0 (TID 0, vc0918.halxg.cloudera.com, executor 2): java.lang.NullPointerException at org.apache.hadoop.security.Credentials.addAll(Credentials.java:401) at org.apache.hadoop.security.Credentials.addAll(Credentials.java:388) at org.apache.hive.hcatalog.pig.HCatLoader.setLocation(HCatLoader.java:128) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:147) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat$RecordReaderFactory.(PigInputFormat.java:115) at org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark$SparkRecordReaderFactory.(PigInputFormatSpark.java:126) at org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark.createRecordReader(PigInputFormatSpark.java:70) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:180) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:179) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17997) Add rat plugin and configuration to standalone metastore pom
Adam Szita created HIVE-17997: - Summary: Add rat plugin and configuration to standalone metastore pom Key: HIVE-17997 URL: https://issues.apache.org/jira/browse/HIVE-17997 Project: Hive Issue Type: Sub-task Reporter: Adam Szita Assignee: Adam Szita -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17996) Fix ASF headers
Adam Szita created HIVE-17996: - Summary: Fix ASF headers Key: HIVE-17996 URL: https://issues.apache.org/jira/browse/HIVE-17996 Project: Hive Issue Type: Sub-task Reporter: Adam Szita Assignee: Adam Szita Yetus check reports some ASF header related issues in Hive code. Let's fix them up. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17995) Run checkstyle on standalone-metastore module with proper configuration
Adam Szita created HIVE-17995: - Summary: Run checkstyle on standalone-metastore module with proper configuration Key: HIVE-17995 URL: https://issues.apache.org/jira/browse/HIVE-17995 Project: Hive Issue Type: Sub-task Reporter: Adam Szita Assignee: Adam Szita Maven module standalone-metastore is obviously not connected to Hive root pom, therefore if someone (or an automated Yetus check) runs {{mvn checkstyle}} it will not consider Hive-specific checkstyle settings (e.g. validates row lengths against 80, not 100) We need to make sure standalone-metastore pom has the proper checkstyle configuration -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17987) Certain metastore operation should use iterators of partitions over lists
Adam Szita created HIVE-17987: - Summary: Certain metastore operation should use iterators of partitions over lists Key: HIVE-17987 URL: https://issues.apache.org/jira/browse/HIVE-17987 Project: Hive Issue Type: Improvement Environment: In HS2 side we have a PartitionIterable class to reduce memory load and use iterators of partitions instead of whole lists when querying them from HMS. Inside HMS there is no such feature and we should create a similar one. (e.g alter table calls that have to apply a modification to each and every partition of the table) Reporter: Adam Szita -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17969) Metastore to alter table in batches of partitions when renaming table
Adam Szita created HIVE-17969: - Summary: Metastore to alter table in batches of partitions when renaming table Key: HIVE-17969 URL: https://issues.apache.org/jira/browse/HIVE-17969 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Adam Szita Assignee: Adam Szita Priority: Major -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17864) PTestClient cannot start during Precommit tests
Adam Szita created HIVE-17864: - Summary: PTestClient cannot start during Precommit tests Key: HIVE-17864 URL: https://issues.apache.org/jira/browse/HIVE-17864 Project: Hive Issue Type: Bug Reporter: Adam Szita Assignee: Adam Szita HIVE-17807 has bumped version number in testutils/ptest2/pom.xml from 1.0 to 3.0 resulting failure to start PTestClient during Precommit runs: {code} java -cp '/home/jenkins/jenkins-slave/workspace/PreCommit-HIVE-Build/hive/build/hive/testutils/ptest2/target/hive-ptest-1.0-classes.jar:/home/jenkins/jenkins-slave/workspace/PreCommit-HIVE-Build/hive/build/hive/testutils/ptest2/target/lib/*' org.apache.hive.ptest.api.client.PTestClient --command testStart --outputDir /home/jenkins/jenkins-slave/workspace/PreCommit-HIVE-Build/hive/build/hive/testutils/ptest2/target --password '[***]' --testHandle PreCommit-HIVE-Build-7389 --endpoint http://104.198.109.242:8080/hive-ptest-1.0 --logsEndpoint http://104.198.109.242/logs/ --profile master-mr2 --patch https://issues.apache.org/jira/secure/attachment/12893016/HIVE-17842.0.patch --jira HIVE-17842 Error: Could not find or load main class org.apache.hive.ptest.api.client.PTestClient {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17842) Run checkstyle on ptest2 module with proper configuration
Adam Szita created HIVE-17842: - Summary: Run checkstyle on ptest2 module with proper configuration Key: HIVE-17842 URL: https://issues.apache.org/jira/browse/HIVE-17842 Project: Hive Issue Type: Sub-task Reporter: Adam Szita Assignee: Adam Szita Maven module ptest2 is not connected to Hive root pom, therefore if someone (or an automated Yetus check) runs {{mvn checkstyle}} it will not consider Hive-specific checkstyle settings (e.g. validates row lengths against 80, not 100) We need to make sure ptest2 pom has the proper checkstyle configuration -- This message was sent by Atlassian JIRA (v6.4.14#64029)