[jira] [Created] (HIVE-25105) Support Parquet as MV storage format
Jesus Camacho Rodriguez created HIVE-25105: -- Summary: Support Parquet as MV storage format Key: HIVE-25105 URL: https://issues.apache.org/jira/browse/HIVE-25105 Project: Hive Issue Type: Improvement Components: Materialized views Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Currently the support storage formats do not include Parquet: {code} ... HIVE_MATERIALIZED_VIEW_FILE_FORMAT("hive.materializedview.fileformat", "ORC", new StringSet("none", "TextFile", "SequenceFile", "RCfile", "ORC"), ... {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones
Stamatis Zampetakis created HIVE-25104: -- Summary: Backward incompatible timestamp serialization in Parquet for certain timezones Key: HIVE-25104 URL: https://issues.apache.org/jira/browse/HIVE-25104 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 3.1.2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis HIVE-12192, HIVE-20007 changed the way that timestamp computations are performed and to some extend how timestamps are serialized and deserialized in files (Parquet, Avro, Orc). In versions that include HIVE-12192 or HIVE-20007 the serialization in Parquet files is not backwards compatible. In other words writing timestamps with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them with another (not including the previous issues) may lead to different results depending on the default timezone of the system. Consider the following scenario where the default system timezone is set to US/Pacific. At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3 {code:sql} CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET LOCATION '/tmp/hiveexttbl/employee'; INSERT INTO employee VALUES (1, '1880-01-01 00:00:00'); INSERT INTO employee VALUES (2, '1884-01-01 00:00:00'); INSERT INTO employee VALUES (3, '1990-01-01 00:00:00'); SELECT * FROM employee; {code} |1|1880-01-01 00:00:00| |2|1884-01-01 00:00:00| |3|1990-01-01 00:00:00| At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356 {code:sql} CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET LOCATION '/tmp/hiveexttbl/employee'; SELECT * FROM employee; {code} |1|1879-12-31 23:52:58| |2|1884-01-01 00:00:00| |3|1990-01-01 00:00:00| The timestamp for {{eid=1}} in branch-2.3 is different from the one in master. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25103) Update row.serde excludes defaults
Panagiotis Garefalakis created HIVE-25103: - Summary: Update row.serde excludes defaults Key: HIVE-25103 URL: https://issues.apache.org/jira/browse/HIVE-25103 Project: Hive Issue Type: Improvement Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis HIVE-16222 introduced row.serde.inputformat.excludes setting to disable row.serde for specific NON-Vectorized formats. Since MapredParquetInputFormat is currently natively vectorized it should be removed from that list. Even when hive.vectorized.use.vectorized.input.format is DISABLED Vectorizer will not vectorize in row deserialize mode if the input format has is natively Vectorized so it is safe to remove. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25102) Cache Iceberg table objects within same query
László Pintér created HIVE-25102: Summary: Cache Iceberg table objects within same query Key: HIVE-25102 URL: https://issues.apache.org/jira/browse/HIVE-25102 Project: Hive Issue Type: Improvement Reporter: László Pintér Assignee: László Pintér We run Catalogs.loadTable(configuration, props) plenty of times which is costly. We should: - Cache it maybe even globally based on the queryId - Make sure that the query uses one snapshot during the whole execution of a single query -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25101) Remove HBase libraries from HBase distribution
Istvan Toth created HIVE-25101: -- Summary: Remove HBase libraries from HBase distribution Key: HIVE-25101 URL: https://issues.apache.org/jira/browse/HIVE-25101 Project: Hive Issue Type: Improvement Components: HBase Handler, Hive Affects Versions: 4.0.0 Reporter: Istvan Toth Assignee: Istvan Toth Hive currently packages HBase libraries into its lib directory. It also adds the HBase libraries separately to its classpath in the hive startup script. Having both mechanisms is redundant, and it also causes errors, as the standard HBase libraries packaged into Hive are unshaded, while the libraries added by _hbase mapredcp_ are shaded, and the two are NOT compatible when custom coprocessors are used, and in some cases the classpaths during local execution and for MR/TEZ jobs are mutually incompatible. I propose removing all HBase libraries from the distribution, and pulling them via the hbase mapredcp mechanism. This also solves the old problem of including ancient HBase alpha versions Hive. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25100) Use default values of Iceberg client pool configuration
László Pintér created HIVE-25100: Summary: Use default values of Iceberg client pool configuration Key: HIVE-25100 URL: https://issues.apache.org/jira/browse/HIVE-25100 Project: Hive Issue Type: Bug Reporter: László Pintér Assignee: László Pintér -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25099) Support for spark 3.x execution engine
Ajith Kumar created HIVE-25099: -- Summary: Support for spark 3.x execution engine Key: HIVE-25099 URL: https://issues.apache.org/jira/browse/HIVE-25099 Project: Hive Issue Type: New Feature Reporter: Ajith Kumar Currently hive does not support newer versions of spark(3.x+). -- This message was sent by Atlassian Jira (v8.3.4#803005)