[jira] [Created] (HIVE-25105) Support Parquet as MV storage format

2021-05-11 Thread Jesus Camacho Rodriguez (Jira)
Jesus Camacho Rodriguez created HIVE-25105:
--

 Summary: Support Parquet as MV storage format
 Key: HIVE-25105
 URL: https://issues.apache.org/jira/browse/HIVE-25105
 Project: Hive
  Issue Type: Improvement
  Components: Materialized views
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Currently the support storage formats do not include Parquet:

{code}
...
HIVE_MATERIALIZED_VIEW_FILE_FORMAT("hive.materializedview.fileformat", 
"ORC",
new StringSet("none", "TextFile", "SequenceFile", "RCfile", "ORC"),
...
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones

2021-05-11 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-25104:
--

 Summary: Backward incompatible timestamp serialization in Parquet 
for certain timezones
 Key: HIVE-25104
 URL: https://issues.apache.org/jira/browse/HIVE-25104
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 3.1.2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
performed and to some extend how timestamps are serialized and deserialized in 
files (Parquet, Avro, Orc).

In versions that include HIVE-12192 or HIVE-20007 the serialization in Parquet 
files is not backwards compatible. In other words writing timestamps with a 
version of Hive that includes HIVE-12192/HIVE-20007 and reading them with 
another (not including the previous issues) may lead to different results 
depending on the default timezone of the system.

Consider the following scenario where the default system timezone is set to 
US/Pacific.

At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
{code:sql}
CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
 LOCATION '/tmp/hiveexttbl/employee';
INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
SELECT * FROM employee;
{code}
|1|1880-01-01 00:00:00|
|2|1884-01-01 00:00:00|
|3|1990-01-01 00:00:00|

At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
{code:sql}
CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
 LOCATION '/tmp/hiveexttbl/employee';
SELECT * FROM employee;
{code}
|1|1879-12-31 23:52:58|
|2|1884-01-01 00:00:00|
|3|1990-01-01 00:00:00|

The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25103) Update row.serde excludes defaults

2021-05-11 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-25103:
-

 Summary: Update row.serde excludes defaults
 Key: HIVE-25103
 URL: https://issues.apache.org/jira/browse/HIVE-25103
 Project: Hive
  Issue Type: Improvement
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


HIVE-16222 introduced row.serde.inputformat.excludes setting to disable 
row.serde for specific NON-Vectorized formats.
Since MapredParquetInputFormat is currently natively vectorized it should be 
removed from that list.

Even when hive.vectorized.use.vectorized.input.format is DISABLED
Vectorizer will not vectorize in row deserialize mode if the input format has 
is natively Vectorized so it is safe to remove.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25102) Cache Iceberg table objects within same query

2021-05-11 Thread Jira
László Pintér created HIVE-25102:


 Summary: Cache Iceberg table objects within same query
 Key: HIVE-25102
 URL: https://issues.apache.org/jira/browse/HIVE-25102
 Project: Hive
  Issue Type: Improvement
Reporter: László Pintér
Assignee: László Pintér


We run Catalogs.loadTable(configuration, props) plenty of times which is costly.
We should:
 - Cache it maybe even globally based on the queryId
 - Make sure that the query uses one snapshot during the whole execution of a 
single query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25101) Remove HBase libraries from HBase distribution

2021-05-11 Thread Istvan Toth (Jira)
Istvan Toth created HIVE-25101:
--

 Summary: Remove HBase libraries from HBase distribution
 Key: HIVE-25101
 URL: https://issues.apache.org/jira/browse/HIVE-25101
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler, Hive
Affects Versions: 4.0.0
Reporter: Istvan Toth
Assignee: Istvan Toth


Hive currently packages HBase libraries into its lib directory.
It also adds the HBase libraries separately to its classpath in the hive 
startup script.

Having both mechanisms is redundant, and it also causes errors, as the standard 
HBase libraries packaged into Hive are unshaded, while the libraries added by 
_hbase mapredcp_
are shaded, and the two are NOT compatible when custom coprocessors are used, 
and in some cases the classpaths during local execution and for MR/TEZ jobs are 
mutually incompatible.

I propose removing all HBase libraries from the distribution, and pulling them 
via the hbase mapredcp mechanism.

This also solves the old problem of including ancient HBase alpha versions Hive.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25100) Use default values of Iceberg client pool configuration

2021-05-11 Thread Jira
László Pintér created HIVE-25100:


 Summary: Use default values of Iceberg client pool configuration
 Key: HIVE-25100
 URL: https://issues.apache.org/jira/browse/HIVE-25100
 Project: Hive
  Issue Type: Bug
Reporter: László Pintér
Assignee: László Pintér






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25099) Support for spark 3.x execution engine

2021-05-11 Thread Ajith Kumar (Jira)
Ajith Kumar created HIVE-25099:
--

 Summary: Support for spark 3.x execution engine
 Key: HIVE-25099
 URL: https://issues.apache.org/jira/browse/HIVE-25099
 Project: Hive
  Issue Type: New Feature
Reporter: Ajith Kumar


Currently hive does not support newer versions of spark(3.x+).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)