[jira] [Created] (HIVE-21185) insert overwrite directory ... stored as nontextfile raise exception with merge files open

2019-01-29 Thread chengkun jia (JIRA)
chengkun jia created HIVE-21185:
---

 Summary: insert overwrite directory ... stored as nontextfile 
raise exception with merge files open
 Key: HIVE-21185
 URL: https://issues.apache.org/jira/browse/HIVE-21185
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 2.3.0, 2.1.1
Reporter: chengkun jia


reproduce:

 
{code:java}
# init table with small files
create table multiple_small_files (id int);
insert into multiple_small_files values (1);
insert into multiple_small_files values (1);
insert into multiple_small_files values (1);
insert into multiple_small_files values (1);
insert into multiple_small_files values (1);
insert into multiple_small_files values (1);
insert into multiple_small_files values (1);
insert into multiple_small_files values (1);

# open small file merge
set hive.merge.mapfiles=true;
set hive.merge.mapredfiles=true;

insert overwrite directory '/path/to/hdfs' stored as avro
select * from multiple_small_files;
{code}
this will produce exception like:
{code:java}
Messages for this Task:Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing writable 
Objavro.schema�{"type":"record","name":"baseRecord","fields":[{"name":"_col0","type":["null","int"],"default":null}]}�$$N���e(���
                                                             �$$N���e(��� 
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169) at 
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at 
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)Caused by: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing writable 
Objavro.schema�{"type":"record","name":"baseRecord","fields":[{"name":"_col0","type":["null","int"],"default":null}]}�$$N���e(���
                                     �$$N���e(��� at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497) at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160) ... 8 
moreCaused by: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Expecting 
a AvroGenericRecordWritable at 
org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:139)
 at 
org.apache.hadoop.hive.serde2.avro.AvroSerDe.deserialize(AvroSerDe.java:216) at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:128)
 at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:92)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:488) 
... 9 moreFAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
{code}
 

This issue not only affect avrofile format but all nontextfile storage format. 
The rootcause is hive get wrong input format in file merge stage



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21184) Add Calcite plan to QueryPlan object

2019-01-29 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-21184:
--

 Summary: Add Calcite plan to QueryPlan object
 Key: HIVE-21184
 URL: https://issues.apache.org/jira/browse/HIVE-21184
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Plan is more readable than full DAG. Explain formatted/extended will print the 
plan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21183) Interrupt wait time for FileCacheCleanupThread

2019-01-29 Thread Oliver Draese (JIRA)
Oliver Draese created HIVE-21183:


 Summary: Interrupt wait time for FileCacheCleanupThread
 Key: HIVE-21183
 URL: https://issues.apache.org/jira/browse/HIVE-21183
 Project: Hive
  Issue Type: Improvement
  Components: llap
Reporter: Oliver Draese
Assignee: Oliver Draese


The FileCacheCleanupThread is waiting unnecessarily long for eviction counts to 
increment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21182) Skip setting up hive scratch dir during planning

2019-01-29 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-21182:
--

 Summary: Skip setting up hive scratch dir during planning
 Key: HIVE-21182
 URL: https://issues.apache.org/jira/browse/HIVE-21182
 Project: Hive
  Issue Type: Improvement
Reporter: Vineet Garg
Assignee: Vineet Garg


During metadata gathering phase hive creates staging/scratch dir which is 
further used by FS op (FS op sets up staging dir within this dir for tasks to 
write to).
Since FS op do mkdirs to setup staging dir we can skip creating scratch dir 
during metadata gathering phase. FS op will take care of setting up all the 
dirs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


LZO Compression

2019-01-29 Thread dam6923
Hello,

Quick question about LZO compression.

After reading the docs, it seems to me that I must
use DeprecatedLzoTextInputFormat in order to work with LZO files.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO

However, I am not sure that is correct.  As I understand it, as long as
there is no LZO index files, and the LZO codec is installed, I should be
able to use the standard Hive table definitions.  The normal mapper
facilities will see the file's '.lzo' file extension and map it to the
compression codec for unpacking.  The specialized LZO TextInputFormat is
only required if there are a mix of '.lzo' and 'lzo..index' files in the
table.  The normal facilities would try to process both files because it
does not know that the 'index' file is not part of the normal data set; it
does not know that the index file is simply metadata.

Is this correct?


[jira] [Created] (HIVE-21181) Hive pre-upgrade tool not working with HDFS HA, tries connecting to nameservice as it was a NameNode

2019-01-29 Thread Attila Csaba Marosi (JIRA)
Attila Csaba Marosi created HIVE-21181:
--

 Summary: Hive pre-upgrade tool not working with HDFS HA, tries 
connecting to nameservice as it was a NameNode
 Key: HIVE-21181
 URL: https://issues.apache.org/jira/browse/HIVE-21181
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.1
 Environment: Centos 7.4.1708
kernel 3.10.0-693.11.6.el7.x86_64
Ambari 2.6.2.2
HDP-2.6.5.0-292
Hive 1.2.1000
HDFS 2.7.3
Reporter: Attila Csaba Marosi


While preparing a production cluster HDP-2.6.5 -> HDP-3.1 upgrades, we've 
noticed issues with the hive-pre-upgrade tool, when we tried running it, we got 
the exception:

{{Found Acid table: default.hello_acid
2019-01-28 15:54:20,331 ERROR [main] acid.PreUpgradeTool 
(PreUpgradeTool.java:main(152)) - PreUpgradeTool failed
java.lang.IllegalArgumentException: java.net.UnknownHostException: mytestcluster
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:439)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:321)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:696)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:636)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2796)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at 
org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool.needsCompaction(PreUpgradeTool.java:417)
at 
org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool.getCompactionCommands(PreUpgradeTool.java:384)
at 
org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool.getCompactionCommands(PreUpgradeTool.java:374)
at 
org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool.prepareAcidUpgradeInternal(PreUpgradeTool.java:235)
at 
org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool.main(PreUpgradeTool.java:149)
Caused by: java.net.UnknownHostException: mytestcluster
... 17 more}}


We tried running it on a kerberized test cluster built based on the same 
blueprint like the production clusters, with HDP-2.6.5.0-292, Hive 1.2.1000, 
HDFS 2.7.3, with HDFS HA and without Hive HA.
We enabled Hive ACID, created the same example ACID table as shown in 
https://hortonworks.com/tutorial/using-hive-acid-transactions-to-insert-update-and-delete-data/

We followed the steps described at 
https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.3.0/bk_ambari-upgrade-major/content/prepare_hive_for_upgrade.html
 , kinit-ed, used the "-Djavax.security.auth.useSubjectCredsOnly=false" 
parameter.

Without the ACID table there is no issue.
I'm attaching the hdfs-site.xml and core-site.xml.
Feel free to ping me directly on slack, if any additional detail is needed, we 
can reproduce the issue on a lab cluster any time.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21180) Fix branch-3 metastore test timeouts

2019-01-29 Thread Vihang Karajgaonkar (JIRA)
Vihang Karajgaonkar created HIVE-21180:
--

 Summary: Fix branch-3 metastore test timeouts
 Key: HIVE-21180
 URL: https://issues.apache.org/jira/browse/HIVE-21180
 Project: Hive
  Issue Type: Test
Affects Versions: 3.2.0
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


The module name below is wrong since metastore-server doesn't exist on 
branch-3. This is most likely the reason why test batches are timing out on 
branch-3

{noformat}
2019-01-29 00:32:17,765  INFO [HostExecutor 3] 
HostExecutor.executeTestBatch:262 Drone [user=hiveptest, host=104.198.216.224, 
instance=0] executing UnitTestBatch 
[name=228_UTBatch_standalone-metastore__metastore-server_20_tests, id=228, 
moduleName=standalone-metastore/metastore-server, batchSize=20, 
isParallel=true, testList=[TestPartitionManagement, 
TestCatalogNonDefaultClient, TestCatalogOldClient, TestHiveAlterHandler, 
TestTxnHandlerNegative, TestTxnUtils, TestFilterHooks, TestRawStoreProxy, 
TestLockRequestBuilder, TestHiveMetastoreCli, TestCheckConstraint, 
TestAddPartitions, TestListPartitions, TestFunctions, TestGetTableMeta, 
TestTablesCreateDropAlterTruncate, TestRuntimeStats, TestDropPartitions, 
TestTablesList, TestUniqueConstraint]] with bash 
/home/hiveptest/104.198.216.224-hiveptest-0/scratch/hiveptest-228_UTBatch_standalone-metastore__metastore-server_20_tests.sh
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21179) Move SampleHBaseKeyFactory* Into Main Code Line

2019-01-29 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-21179:
--

 Summary: Move SampleHBaseKeyFactory* Into Main Code Line
 Key: HIVE-21179
 URL: https://issues.apache.org/jira/browse/HIVE-21179
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 3.1.0, 4.0.0
Reporter: BELUGA BEHR


https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

{quote}
"hbase.composite.key.factory" should be the fully qualified class name of a 
class implementing HBaseKeyFactory. See SampleHBaseKeyFactory2 for a fixed 
length example in the same package. This class must be on your classpath in 
order for the above example to work. TODO: place these in an accessible place; 
they're currently only in test code.
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21178) COLUMNS_V2[COMMENT] size different between derby db & other dbs.

2019-01-29 Thread Venu Yanamandra (JIRA)
Venu Yanamandra created HIVE-21178:
--

 Summary: COLUMNS_V2[COMMENT] size different between derby db & 
other dbs.
 Key: HIVE-21178
 URL: https://issues.apache.org/jira/browse/HIVE-21178
 Project: Hive
  Issue Type: Bug
Reporter: Venu Yanamandra


Based on the sql scripts present for derby db, the size of COLUMNS_V2[COMMENT] 
is 4000.

[https://github.com/apache/hive/tree/master/metastore/scripts/upgrade/derby]

 

However, if we see those present in say - mysql, we see them limited at 256.

[https://github.com/apache/hive/tree/master/metastore/scripts/upgrade/mysql]

 

For a requirement to store larger amount of comments, non-derby dbs, limit the 
maximum size of the column comments.

 

Kindly review the discrepancy. 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21177) Optimize AcidUtils.getLogicalLength()

2019-01-29 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-21177:
-

 Summary: Optimize AcidUtils.getLogicalLength()
 Key: HIVE-21177
 URL: https://issues.apache.org/jira/browse/HIVE-21177
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


{{AcidUtils.getLogicalLength()}} - tries look for the side file 
{{OrcAcidUtils.getSideFile()}} on the file system even when the file couldn't 
possibly be there, e.g. when the path is delta_x_x or base_x.  It could only be 
there in delta_x_y, x != y.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #523: HIVE-21029: External table replication for existing ...

2019-01-29 Thread sankarh
GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/523

HIVE-21029: External table replication for existing deployments running 
incremental replication.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-21029

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/523.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #523


commit ccf630904d75a0ff099bc24160efd5d6c03ae02f
Author: Sankar Hariappan 
Date:   2019-01-29T11:18:47Z

HIVE-21029: External table replication for existing deployments running 
incremental replication.




---


[jira] [Created] (HIVE-21176) SetSparkReducerParallelism should check spark.executor.instances before opening SparkSession during compilation

2019-01-29 Thread Adam Szita (JIRA)
Adam Szita created HIVE-21176:
-

 Summary: SetSparkReducerParallelism should check 
spark.executor.instances before opening SparkSession during compilation
 Key: HIVE-21176
 URL: https://issues.apache.org/jira/browse/HIVE-21176
 Project: Hive
  Issue Type: Bug
Reporter: Adam Szita
Assignee: Adam Szita


{{SetSparkReducerParallelism}} creates a spark session in the compilation stage 
while holding the compile lock. This is a very expensive operation and can 
cause a complete slowdown of all the hive queries. The problem only occurs when 
dynamicAllocation is disabled, but we should find a way to improve this:

e.g. if spark.executor.instances is set we already know how many executors will 
be launched



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)