[jira] [Created] (HIVE-21185) insert overwrite directory ... stored as nontextfile raise exception with merge files open
chengkun jia created HIVE-21185: --- Summary: insert overwrite directory ... stored as nontextfile raise exception with merge files open Key: HIVE-21185 URL: https://issues.apache.org/jira/browse/HIVE-21185 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 2.3.0, 2.1.1 Reporter: chengkun jia reproduce: {code:java} # init table with small files create table multiple_small_files (id int); insert into multiple_small_files values (1); insert into multiple_small_files values (1); insert into multiple_small_files values (1); insert into multiple_small_files values (1); insert into multiple_small_files values (1); insert into multiple_small_files values (1); insert into multiple_small_files values (1); insert into multiple_small_files values (1); # open small file merge set hive.merge.mapfiles=true; set hive.merge.mapredfiles=true; insert overwrite directory '/path/to/hdfs' stored as avro select * from multiple_small_files; {code} this will produce exception like: {code:java} Messages for this Task:Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable Objavro.schema�{"type":"record","name":"baseRecord","fields":[{"name":"_col0","type":["null","int"],"default":null}]}�$$N���e(��� �$$N���e(��� at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable Objavro.schema�{"type":"record","name":"baseRecord","fields":[{"name":"_col0","type":["null","int"],"default":null}]}�$$N���e(��� �$$N���e(��� at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160) ... 8 moreCaused by: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Expecting a AvroGenericRecordWritable at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:139) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.deserialize(AvroSerDe.java:216) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:128) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:92) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:488) ... 9 moreFAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask {code} This issue not only affect avrofile format but all nontextfile storage format. The rootcause is hive get wrong input format in file merge stage -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21184) Add Calcite plan to QueryPlan object
Jesus Camacho Rodriguez created HIVE-21184: -- Summary: Add Calcite plan to QueryPlan object Key: HIVE-21184 URL: https://issues.apache.org/jira/browse/HIVE-21184 Project: Hive Issue Type: Improvement Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Plan is more readable than full DAG. Explain formatted/extended will print the plan. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21183) Interrupt wait time for FileCacheCleanupThread
Oliver Draese created HIVE-21183: Summary: Interrupt wait time for FileCacheCleanupThread Key: HIVE-21183 URL: https://issues.apache.org/jira/browse/HIVE-21183 Project: Hive Issue Type: Improvement Components: llap Reporter: Oliver Draese Assignee: Oliver Draese The FileCacheCleanupThread is waiting unnecessarily long for eviction counts to increment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21182) Skip setting up hive scratch dir during planning
Vineet Garg created HIVE-21182: -- Summary: Skip setting up hive scratch dir during planning Key: HIVE-21182 URL: https://issues.apache.org/jira/browse/HIVE-21182 Project: Hive Issue Type: Improvement Reporter: Vineet Garg Assignee: Vineet Garg During metadata gathering phase hive creates staging/scratch dir which is further used by FS op (FS op sets up staging dir within this dir for tasks to write to). Since FS op do mkdirs to setup staging dir we can skip creating scratch dir during metadata gathering phase. FS op will take care of setting up all the dirs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
LZO Compression
Hello, Quick question about LZO compression. After reading the docs, it seems to me that I must use DeprecatedLzoTextInputFormat in order to work with LZO files. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO However, I am not sure that is correct. As I understand it, as long as there is no LZO index files, and the LZO codec is installed, I should be able to use the standard Hive table definitions. The normal mapper facilities will see the file's '.lzo' file extension and map it to the compression codec for unpacking. The specialized LZO TextInputFormat is only required if there are a mix of '.lzo' and 'lzo..index' files in the table. The normal facilities would try to process both files because it does not know that the 'index' file is not part of the normal data set; it does not know that the index file is simply metadata. Is this correct?
[jira] [Created] (HIVE-21181) Hive pre-upgrade tool not working with HDFS HA, tries connecting to nameservice as it was a NameNode
Attila Csaba Marosi created HIVE-21181: -- Summary: Hive pre-upgrade tool not working with HDFS HA, tries connecting to nameservice as it was a NameNode Key: HIVE-21181 URL: https://issues.apache.org/jira/browse/HIVE-21181 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Environment: Centos 7.4.1708 kernel 3.10.0-693.11.6.el7.x86_64 Ambari 2.6.2.2 HDP-2.6.5.0-292 Hive 1.2.1000 HDFS 2.7.3 Reporter: Attila Csaba Marosi While preparing a production cluster HDP-2.6.5 -> HDP-3.1 upgrades, we've noticed issues with the hive-pre-upgrade tool, when we tried running it, we got the exception: {{Found Acid table: default.hello_acid 2019-01-28 15:54:20,331 ERROR [main] acid.PreUpgradeTool (PreUpgradeTool.java:main(152)) - PreUpgradeTool failed java.lang.IllegalArgumentException: java.net.UnknownHostException: mytestcluster at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:439) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:321) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:696) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:636) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2796) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool.needsCompaction(PreUpgradeTool.java:417) at org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool.getCompactionCommands(PreUpgradeTool.java:384) at org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool.getCompactionCommands(PreUpgradeTool.java:374) at org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool.prepareAcidUpgradeInternal(PreUpgradeTool.java:235) at org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool.main(PreUpgradeTool.java:149) Caused by: java.net.UnknownHostException: mytestcluster ... 17 more}} We tried running it on a kerberized test cluster built based on the same blueprint like the production clusters, with HDP-2.6.5.0-292, Hive 1.2.1000, HDFS 2.7.3, with HDFS HA and without Hive HA. We enabled Hive ACID, created the same example ACID table as shown in https://hortonworks.com/tutorial/using-hive-acid-transactions-to-insert-update-and-delete-data/ We followed the steps described at https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.3.0/bk_ambari-upgrade-major/content/prepare_hive_for_upgrade.html , kinit-ed, used the "-Djavax.security.auth.useSubjectCredsOnly=false" parameter. Without the ACID table there is no issue. I'm attaching the hdfs-site.xml and core-site.xml. Feel free to ping me directly on slack, if any additional detail is needed, we can reproduce the issue on a lab cluster any time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21180) Fix branch-3 metastore test timeouts
Vihang Karajgaonkar created HIVE-21180: -- Summary: Fix branch-3 metastore test timeouts Key: HIVE-21180 URL: https://issues.apache.org/jira/browse/HIVE-21180 Project: Hive Issue Type: Test Affects Versions: 3.2.0 Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar The module name below is wrong since metastore-server doesn't exist on branch-3. This is most likely the reason why test batches are timing out on branch-3 {noformat} 2019-01-29 00:32:17,765 INFO [HostExecutor 3] HostExecutor.executeTestBatch:262 Drone [user=hiveptest, host=104.198.216.224, instance=0] executing UnitTestBatch [name=228_UTBatch_standalone-metastore__metastore-server_20_tests, id=228, moduleName=standalone-metastore/metastore-server, batchSize=20, isParallel=true, testList=[TestPartitionManagement, TestCatalogNonDefaultClient, TestCatalogOldClient, TestHiveAlterHandler, TestTxnHandlerNegative, TestTxnUtils, TestFilterHooks, TestRawStoreProxy, TestLockRequestBuilder, TestHiveMetastoreCli, TestCheckConstraint, TestAddPartitions, TestListPartitions, TestFunctions, TestGetTableMeta, TestTablesCreateDropAlterTruncate, TestRuntimeStats, TestDropPartitions, TestTablesList, TestUniqueConstraint]] with bash /home/hiveptest/104.198.216.224-hiveptest-0/scratch/hiveptest-228_UTBatch_standalone-metastore__metastore-server_20_tests.sh {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21179) Move SampleHBaseKeyFactory* Into Main Code Line
BELUGA BEHR created HIVE-21179: -- Summary: Move SampleHBaseKeyFactory* Into Main Code Line Key: HIVE-21179 URL: https://issues.apache.org/jira/browse/HIVE-21179 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 3.1.0, 4.0.0 Reporter: BELUGA BEHR https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration {quote} "hbase.composite.key.factory" should be the fully qualified class name of a class implementing HBaseKeyFactory. See SampleHBaseKeyFactory2 for a fixed length example in the same package. This class must be on your classpath in order for the above example to work. TODO: place these in an accessible place; they're currently only in test code. {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21178) COLUMNS_V2[COMMENT] size different between derby db & other dbs.
Venu Yanamandra created HIVE-21178: -- Summary: COLUMNS_V2[COMMENT] size different between derby db & other dbs. Key: HIVE-21178 URL: https://issues.apache.org/jira/browse/HIVE-21178 Project: Hive Issue Type: Bug Reporter: Venu Yanamandra Based on the sql scripts present for derby db, the size of COLUMNS_V2[COMMENT] is 4000. [https://github.com/apache/hive/tree/master/metastore/scripts/upgrade/derby] However, if we see those present in say - mysql, we see them limited at 256. [https://github.com/apache/hive/tree/master/metastore/scripts/upgrade/mysql] For a requirement to store larger amount of comments, non-derby dbs, limit the maximum size of the column comments. Kindly review the discrepancy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21177) Optimize AcidUtils.getLogicalLength()
Eugene Koifman created HIVE-21177: - Summary: Optimize AcidUtils.getLogicalLength() Key: HIVE-21177 URL: https://issues.apache.org/jira/browse/HIVE-21177 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman {{AcidUtils.getLogicalLength()}} - tries look for the side file {{OrcAcidUtils.getSideFile()}} on the file system even when the file couldn't possibly be there, e.g. when the path is delta_x_x or base_x. It could only be there in delta_x_y, x != y. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] hive pull request #523: HIVE-21029: External table replication for existing ...
GitHub user sankarh opened a pull request: https://github.com/apache/hive/pull/523 HIVE-21029: External table replication for existing deployments running incremental replication. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sankarh/hive HIVE-21029 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/523.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #523 commit ccf630904d75a0ff099bc24160efd5d6c03ae02f Author: Sankar Hariappan Date: 2019-01-29T11:18:47Z HIVE-21029: External table replication for existing deployments running incremental replication. ---
[jira] [Created] (HIVE-21176) SetSparkReducerParallelism should check spark.executor.instances before opening SparkSession during compilation
Adam Szita created HIVE-21176: - Summary: SetSparkReducerParallelism should check spark.executor.instances before opening SparkSession during compilation Key: HIVE-21176 URL: https://issues.apache.org/jira/browse/HIVE-21176 Project: Hive Issue Type: Bug Reporter: Adam Szita Assignee: Adam Szita {{SetSparkReducerParallelism}} creates a spark session in the compilation stage while holding the compile lock. This is a very expensive operation and can cause a complete slowdown of all the hive queries. The problem only occurs when dynamicAllocation is disabled, but we should find a way to improve this: e.g. if spark.executor.instances is set we already know how many executors will be launched -- This message was sent by Atlassian JIRA (v7.6.3#76005)