date:20240417

[jira] [Updated] (HIVE-28196) Preserve column stats when applying UDF upper/lower.

2024-04-17 Thread Sungwoo Park (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sungwoo Park updated HIVE-28196:

Labels: hive-4.0.1-must performance pull-request-available  (was: 
pull-request-available)

> Preserve column stats when applying UDF upper/lower.
> 
>
> Key: HIVE-28196
> URL: https://issues.apache.org/jira/browse/HIVE-28196
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.1-must, performance, pull-request-available
> Fix For: 4.1.0
>
>
> Current Hive re-estimates column stats (including avgColLen) when it 
> encounters UDF.
> In the case of upper and lower, Hive sets avgColLen to 
> hive.stats.max.variable.length.
> But these UDFs do not change column stats and the default value(100) is too 
> high for string type key columns, on which upper/lower are usually applied.
> This patch keeps input data's avgColLen after applying UDF upper/lower to 
> make a better query plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28204) Remove some HMS obsolete scripts

2024-04-17 Thread Zhihua Deng (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28204:
---
Description: As the Hive 1.x has reached end of life, the scripts for HMS 
metadata need to be removed from the repository and the packaged tarball, 
however it's better to keep the script for the Hive to upgrade from 1.x and the 
test.  (was: As the Hive 1.x has reached end of life, the scripts for HMS 
metadata need to be removed from the repository and the packaged tarball, 
however it's better to keep the script for the Hive to upgrade from 1.x.)

> Remove some HMS obsolete scripts
> 
>
> Key: HIVE-28204
> URL: https://issues.apache.org/jira/browse/HIVE-28204
> Project: Hive
>  Issue Type: Task
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
>
> As the Hive 1.x has reached end of life, the scripts for HMS metadata need to 
> be removed from the repository and the packaged tarball, however it's better 
> to keep the script for the Hive to upgrade from 1.x and the test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28204) Remove some HMS obsolete scripts

2024-04-17 Thread Zhihua Deng (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28204:
---
Summary: Remove some HMS obsolete scripts  (was: Remove the HMS 1.x init 
script)

> Remove some HMS obsolete scripts
> 
>
> Key: HIVE-28204
> URL: https://issues.apache.org/jira/browse/HIVE-28204
> Project: Hive
>  Issue Type: Task
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
>
> As the Hive 1.x has reached end of life, the scripts for HMS metadata need to 
> be removed from the repository and the packaged tarball, however it's better 
> to keep the script for the Hive to upgrade from 1.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28204) Remove the HMS 1.x init script

2024-04-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28204:
--
Labels: hive-4.0.1-must pull-request-available  (was: hive-4.0.1-must)

> Remove the HMS 1.x init script
> --
>
> Key: HIVE-28204
> URL: https://issues.apache.org/jira/browse/HIVE-28204
> Project: Hive
>  Issue Type: Task
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
>
> As the Hive 1.x has reached end of life, the scripts for HMS metadata need to 
> be removed from the repository and the packaged tarball, however it's better 
> to keep the script for the Hive to upgrade from 1.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26778) Pushdown Date data type to metastore via direct sql / JDO

2024-04-17 Thread katty he (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838445#comment-17838445
 ] 

katty he commented on HIVE-26778:
-

i use hive 3.1.2

> Pushdown Date data type to metastore via direct sql / JDO
> -
>
> Key: HIVE-26778
> URL: https://issues.apache.org/jira/browse/HIVE-26778
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> The original feature to push down date data type while doing partition 
> pruning via direct sql/JDO was added as part of the jira : 
> https://issues.apache.org/jira/browse/HIVE-5679
> Since the behavior of Hive has changed with CBO, Now when CBO is turned on, 
> The date data types are not pushed down to metastore due to CBO adding extra 
> keyword 'DATE' with the original filter since the filter parser is not 
> handled to parse this extra keyword it fails and hence the date data type is 
> not pushed down to the metastore.
> {code:java}
> select * from test_table where date_col = '2022-01-01';
> {code}
> When CBO is turned on, The filter predicate generated is 
> date_col=DATE'2022-01-01' which the filter parser fails to recognize.
>  
> *Steps to reproduce*
> The following query will generate "{color:#6a8759}Error parsing partition 
> filter; lexer error" 
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}
>  
> {color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't 
> see this error message.{color}
> {code:java}
> create table part_time(a int) partitioned by(date_c date);
> insert into part_time partition(date_c='2000-01-01') values (1);
> insert into part_time partition(date_c='2000-02-01') values (1);
> insert into part_time partition(date_c='2000-03-01') values (1); 
> select * from part_time where date_c = '2000-03-01';
> {code}
>  
> *Performance Improvement*
> In my testing setup of table having *10k partitions* in the table. When we do 
> a select query on one of the partitions without the change it was *300 ms* 
> and after the change it was {*}14 ms{*}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-26778) Pushdown Date data type to metastore via direct sql / JDO

2024-04-17 Thread katty he (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838444#comment-17838444
 ] 

katty he commented on HIVE-26778:
-

it seems like, this patch will cause spark runs error , for example, we create 
a table 

{panel:title=My title}
  CREATE EXTERNAL TABLE `sample.tmts_grey_device_log_bak`(
  `name` string COMMENT '', 
  
  `authcookie` string COMMENT '')
PARTITIONED BY ( 
  `date` string, 
  `hour` string)
{panel}

and then query with spark , 

{panel:title=My title}
select
  *
from
  sample.tmts_grey_device_log_bak
where
  `date` = '2022-01-01' 
  limit 5;
{panel}

the error is 
Caused by: MetaException(message:Error parsing partition filter; lexer error: 
null; exception NoViableAltException(17@[]))
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_filter_result$get_partitions_by_filter_resultStandardScheme.read(ThriftHiveMetastore.java)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_filter_result$get_partitions_by_filter_resultStandardScheme.read(ThriftHiveMetastore.java)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_filter_result.read(ThriftHiveMetastore.java)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partitions_by_filter(ThriftHiveMetastore.java:3212)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions_by_filter(ThriftHiveMetastore.java:3196)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(HiveMetaStoreClient.java:1470)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(HiveMetaStoreClient.java:1464)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
at com.sun.proxy.$Proxy49.listPartitionsByFilter(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2799)
at com.sun.proxy.$Proxy49.listPartitionsByFilter(Unknown Source)
at 
org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Hive.java:3143)
... 98 more

when i do not use this patch, it runs successfully.

> Pushdown Date data type to metastore via direct sql / JDO
> -
>
> Key: HIVE-26778
> URL: https://issues.apache.org/jira/browse/HIVE-26778
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> The original feature to push down date data type while doing partition 
> pruning via direct sql/JDO was added as part of the jira : 
> https://issues.apache.org/jira/browse/HIVE-5679
> Since the behavior of Hive has changed with CBO, Now when CBO is turned on, 
> The date data types are not pushed down to metastore due to CBO adding extra 
> keyword 'DATE' with the original filter since the filter parser is not 
> handled to parse this extra keyword it fails and hence the date data type is 
> not pushed down to the metastore.
> {code:java}
> select * from test_table where date_col = '2022-01-01';
> {code}
> When CBO is turned on, The filter predicate generated is 
> date_col=DATE'2022-01-01' which the filter parser fails to recognize.
>  
> *Steps to reproduce*
> The following query will generate "{color:#6a8759}Error parsing partition 
> filter; lexer error" 
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartFilterExprUtil.java#L128]{color}
>  
> {color:#172b4d}When CBO is turned off (set hive.cbo.enable=false) we don't 
> see this error message.{color}
> {code:java}
> create table part_time(a int) partitioned by(date_c date);
> insert into part_time partition(date_c='2000-01-01') values (1);
> insert into part_time

[jira] [Updated] (HIVE-28203) Flaky qtest mv_iceberg_orc5.q

2024-04-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28203:
--
Labels: pull-request-available  (was: )

> Flaky qtest mv_iceberg_orc5.q
> -
>
> Key: HIVE-28203
> URL: https://issues.apache.org/jira/browse/HIVE-28203
> Project: Hive
>  Issue Type: Task
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
>
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5190/3/tests]
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5195/4/tests]
>  
> Flaky CI report:
> [http://ci.hive.apache.org/job/hive-flaky-check/837/testReport/] 
>  
> {code:java}
> Execution succeeded but contained differences (error code = 1) after 
> executing mv_iceberg_orc5.q 
> 101c101
> < HiveJoin(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT 
> FROM($1, $6))], joinType=[right], algorithm=[BucketJoin], cost=[not 
> available])
> ---
> > HiveJoin(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT 
> > FROM($1, $6))], joinType=[right], algorithm=[none], cost=[not available])
> 106c106
> <   HiveJoin(condition=[=($0, $3)], joinType=[inner], 
> algorithm=[CommonJoin], cost=[not available])
> ---
> >   HiveJoin(condition=[=($0, $3)], joinType=[inner], 
> > algorithm=[none], cost=[not available]) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-28204) Remove the HMS 1.x init script

2024-04-17 Thread Zhihua Deng (Jira)

Zhihua Deng created HIVE-28204:
--

 Summary: Remove the HMS 1.x init script
 Key: HIVE-28204
 URL: https://issues.apache.org/jira/browse/HIVE-28204
 Project: Hive
  Issue Type: Task
Reporter: Zhihua Deng
Assignee: Zhihua Deng


As the Hive 1.x has reached end of life, the scripts for HMS metadata need to 
be removed from the repository and the packaged tarball, however it's better to 
keep the script for the Hive to upgrade from 1.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-28203) Flaky qtest mv_iceberg_orc5.q

2024-04-17 Thread Butao Zhang (Jira)

Butao Zhang created HIVE-28203:
--

 Summary: Flaky qtest mv_iceberg_orc5.q
 Key: HIVE-28203
 URL: https://issues.apache.org/jira/browse/HIVE-28203
 Project: Hive
  Issue Type: Task
  Components: Iceberg integration
Reporter: Butao Zhang


[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5190/3/tests]

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5195/4/tests]

 

Flaky CI report:

[http://ci.hive.apache.org/job/hive-flaky-check/837/testReport/] 

 
{code:java}
Execution succeeded but contained differences (error code = 1) after executing 
mv_iceberg_orc5.q 
101c101
< HiveJoin(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT 
FROM($1, $6))], joinType=[right], algorithm=[BucketJoin], cost=[not available])
---
> HiveJoin(condition=[AND(IS NOT DISTINCT FROM($0, $5), IS NOT DISTINCT 
> FROM($1, $6))], joinType=[right], algorithm=[none], cost=[not available])
106c106
<   HiveJoin(condition=[=($0, $3)], joinType=[inner], 
algorithm=[CommonJoin], cost=[not available])
---
>   HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], 
> cost=[not available]) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28202) Incorrect projected column size after ORC upgrade to v1.6.7

2024-04-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28202:
--
Labels: hive-4.0.1-must performance pull-request-available  (was: 
hive-4.0.1-must performance)

> Incorrect projected column size after ORC upgrade to v1.6.7 
> 
>
> Key: HIVE-28202
> URL: https://issues.apache.org/jira/browse/HIVE-28202
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0, 4.0.0-beta-1
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Critical
>  Labels: hive-4.0.1-must, performance, pull-request-available
> Fix For: 4.1.0
>
>
> `ReaderImpl.getRawDataSizeFromColIndices` changed behavior for handling 
> struct type and now includes their subtypes. That caused an issue in Hive as 
> the root struct index is always "included", causing size estimation for the 
> complete schema, not just selected columns leading to incorrect split 
> estimations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-28082) HiveAggregateReduceFunctionsRule could generate an inconsistent result

2024-04-17 Thread Shohei Okumiya (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-28082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838155#comment-17838155
 ] 

Shohei Okumiya commented on HIVE-28082:
---

It seems to be correct, and those behaviors look intentional as we explicitly 
handle the exceptions.
 * 
[https://github.com/apache/hive/blob/rel/release-4.0.0/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java#L447-L454]
 * 
[https://github.com/apache/hive/blob/rel/release-4.0.0/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java#L551-L556]

> HiveAggregateReduceFunctionsRule could generate an inconsistent result
> --
>
> Key: HIVE-28082
> URL: https://issues.apache.org/jira/browse/HIVE-28082
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>
> HiveAggregateReduceFunctionsRule translates AVG, STDDEV_POP, STDDEV_SAMP, 
> VAR_POP, and VAR_SAMP. Those UDFs accept string types and try to decode them 
> as floating point values. It is possible that undecodable values exist.
> We found that it could cause inconsistent behaviors with or without CBO.
> {code:java}
> 0: jdbc:hive2://hive-hiveserver2:1/defaul> SELECT AVG('text');
> ...
> +--+
> | _c0  |
> +--+
> | 0.0  |
> +--+
> 1 row selected (18.229 seconds)
> 0: jdbc:hive2://hive-hiveserver2:1/defaul> set hive.cbo.enable=false;
> No rows affected (0.013 seconds)
> 0: jdbc:hive2://hive-hiveserver2:1/defaul> SELECT AVG('text');
> ...
> +---+
> |  _c0  |
> +---+
> | NULL  |
> +---+ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28202) Incorrect projected column size after ORC upgrade to v1.6.7

2024-04-17 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28202:
--
Affects Version/s: 4.0.0-beta-1
   4.0.0

> Incorrect projected column size after ORC upgrade to v1.6.7 
> 
>
> Key: HIVE-28202
> URL: https://issues.apache.org/jira/browse/HIVE-28202
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0, 4.0.0-beta-1
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> `ReaderImpl.getRawDataSizeFromColIndices` changed behavior for handling 
> struct type and now includes their subtypes. That caused an issue in Hive as 
> the root struct index is always "included", causing size estimation for the 
> complete schema, not just selected columns leading to incorrect split 
> estimations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28202) Incorrect projected column size after ORC upgrade to v1.6.7

2024-04-17 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28202:
--
Fix Version/s: 4.1.0

> Incorrect projected column size after ORC upgrade to v1.6.7 
> 
>
> Key: HIVE-28202
> URL: https://issues.apache.org/jira/browse/HIVE-28202
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0, 4.0.0-beta-1
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Critical
>  Labels: hive-4.0.1-must, performance
> Fix For: 4.1.0
>
>
> `ReaderImpl.getRawDataSizeFromColIndices` changed behavior for handling 
> struct type and now includes their subtypes. That caused an issue in Hive as 
> the root struct index is always "included", causing size estimation for the 
> complete schema, not just selected columns leading to incorrect split 
> estimations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28202) Incorrect projected column size after ORC upgrade to v1.6.7

2024-04-17 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28202:
--
Priority: Critical  (was: Major)

> Incorrect projected column size after ORC upgrade to v1.6.7 
> 
>
> Key: HIVE-28202
> URL: https://issues.apache.org/jira/browse/HIVE-28202
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0, 4.0.0-beta-1
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Critical
>  Labels: hive-4.0.1-must
>
> `ReaderImpl.getRawDataSizeFromColIndices` changed behavior for handling 
> struct type and now includes their subtypes. That caused an issue in Hive as 
> the root struct index is always "included", causing size estimation for the 
> complete schema, not just selected columns leading to incorrect split 
> estimations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28202) Incorrect projected column size after ORC upgrade to v1.6.7

2024-04-17 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28202:
--
Status: Patch Available  (was: Open)

> Incorrect projected column size after ORC upgrade to v1.6.7 
> 
>
> Key: HIVE-28202
> URL: https://issues.apache.org/jira/browse/HIVE-28202
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-beta-1, 4.0.0
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Critical
>  Labels: hive-4.0.1-must, performance
> Fix For: 4.1.0
>
>
> `ReaderImpl.getRawDataSizeFromColIndices` changed behavior for handling 
> struct type and now includes their subtypes. That caused an issue in Hive as 
> the root struct index is always "included", causing size estimation for the 
> complete schema, not just selected columns leading to incorrect split 
> estimations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28202) Incorrect projected column size after ORC upgrade to v1.6.7

2024-04-17 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28202:
--
Labels: hive-4.0.1-must performance  (was: hive-4.0.1-must)

> Incorrect projected column size after ORC upgrade to v1.6.7 
> 
>
> Key: HIVE-28202
> URL: https://issues.apache.org/jira/browse/HIVE-28202
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0, 4.0.0-beta-1
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Critical
>  Labels: hive-4.0.1-must, performance
>
> `ReaderImpl.getRawDataSizeFromColIndices` changed behavior for handling 
> struct type and now includes their subtypes. That caused an issue in Hive as 
> the root struct index is always "included", causing size estimation for the 
> complete schema, not just selected columns leading to incorrect split 
> estimations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28202) Incorrect projected column size after ORC upgrade to v1.6.7

2024-04-17 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28202:
--
Labels: hive-4.0.1-must  (was: )

> Incorrect projected column size after ORC upgrade to v1.6.7 
> 
>
> Key: HIVE-28202
> URL: https://issues.apache.org/jira/browse/HIVE-28202
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0, 4.0.0-beta-1
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: hive-4.0.1-must
>
> `ReaderImpl.getRawDataSizeFromColIndices` changed behavior for handling 
> struct type and now includes their subtypes. That caused an issue in Hive as 
> the root struct index is always "included", causing size estimation for the 
> complete schema, not just selected columns leading to incorrect split 
> estimations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28202) Incorrect projected column size after ORC upgrade to v1.6.7

2024-04-17 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28202:
--
Component/s: Hive

> Incorrect projected column size after ORC upgrade to v1.6.7 
> 
>
> Key: HIVE-28202
> URL: https://issues.apache.org/jira/browse/HIVE-28202
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> `ReaderImpl.getRawDataSizeFromColIndices` changed behavior for handling 
> struct type and now includes their subtypes. That caused an issue in Hive as 
> the root struct index is always "included", causing size estimation for the 
> complete schema, not just selected columns leading to incorrect split 
> estimations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28202) Incorrect projected column size after ORC upgrade to v1.6.7

2024-04-17 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28202:
--
Issue Type: Bug  (was: Task)

> Incorrect projected column size after ORC upgrade to v1.6.7 
> 
>
> Key: HIVE-28202
> URL: https://issues.apache.org/jira/browse/HIVE-28202
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
>
> `ReaderImpl.getRawDataSizeFromColIndices` changed behavior for handling 
> struct type and now includes their subtypes. That caused an issue in Hive as 
> the root struct index is always "included", causing size estimation for the 
> complete schema, not just selected columns leading to incorrect split 
> estimations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-28202) Incorrect projected column size after ORC upgrade to v1.6.7

2024-04-17 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko reassigned HIVE-28202:
-

Assignee: Denys Kuzmenko

> Incorrect projected column size after ORC upgrade to v1.6.7 
> 
>
> Key: HIVE-28202
> URL: https://issues.apache.org/jira/browse/HIVE-28202
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> `ReaderImpl.getRawDataSizeFromColIndices` changed behavior for handling 
> struct type and now includes their subtypes. That caused an issue in Hive as 
> the root struct index is always "included", causing size estimation for the 
> complete schema, not just selected columns leading to incorrect split 
> estimations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28202) Incorrect projected column size after ORC upgrade to v1.6.7

2024-04-17 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28202:
--
Description: `ReaderImpl.getRawDataSizeFromColIndices` changed behavior for 
handling struct type and now includes their subtypes. That caused an issue in 
Hive as the root struct index is always "included", causing size estimation for 
the complete schema, not just selected columns leading to incorrect split 
estimations.  (was: `ReaderImpl.getRawDataSizeFromColIndices` changed behavior 
for handling structs and now includes their subtypes. That caused an issue in 
Hive as the root struct index is always provided in the "include" list causing 
size estimation for the complete schema, not just selected columns)

> Incorrect projected column size after ORC upgrade to v1.6.7 
> 
>
> Key: HIVE-28202
> URL: https://issues.apache.org/jira/browse/HIVE-28202
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>
> `ReaderImpl.getRawDataSizeFromColIndices` changed behavior for handling 
> struct type and now includes their subtypes. That caused an issue in Hive as 
> the root struct index is always "included", causing size estimation for the 
> complete schema, not just selected columns leading to incorrect split 
> estimations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28202) Incorrect ORC projected column size

2024-04-17 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28202:
--
Summary: Incorrect ORC projected column size  (was: Invalid ORC projected 
column size)

> Incorrect ORC projected column size
> ---
>
> Key: HIVE-28202
> URL: https://issues.apache.org/jira/browse/HIVE-28202
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28202) Incorrect projected column size after ORC upgrade to v1.6.7

2024-04-17 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28202:
--
Summary: Incorrect projected column size after ORC upgrade to v1.6.7   
(was: Incorrect ORC projected column size after HIVE-23553)

> Incorrect projected column size after ORC upgrade to v1.6.7 
> 
>
> Key: HIVE-28202
> URL: https://issues.apache.org/jira/browse/HIVE-28202
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28202) Incorrect ORC projected column size after HIVE-23553

2024-04-17 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28202:
--
Summary: Incorrect ORC projected column size after HIVE-23553  (was: 
Incorrect ORC projected column size)

> Incorrect ORC projected column size after HIVE-23553
> 
>
> Key: HIVE-28202
> URL: https://issues.apache.org/jira/browse/HIVE-28202
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-28202) Invalid ORC projected column size

2024-04-17 Thread Denys Kuzmenko (Jira)

Denys Kuzmenko created HIVE-28202:
-

 Summary: Invalid ORC projected column size
 Key: HIVE-28202
 URL: https://issues.apache.org/jira/browse/HIVE-28202
 Project: Hive
  Issue Type: Task
Reporter: Denys Kuzmenko






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28202) Incorrect projected column size after ORC upgrade to v1.6.7

2024-04-17 Thread Denys Kuzmenko (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28202:
--
Description: `ReaderImpl.getRawDataSizeFromColIndices` changed behavior for 
handling structs and now includes their subtypes. That caused an issue in Hive 
as the root struct index is always provided in the "include" list causing size 
estimation for the complete schema, not just selected columns

> Incorrect projected column size after ORC upgrade to v1.6.7 
> 
>
> Key: HIVE-28202
> URL: https://issues.apache.org/jira/browse/HIVE-28202
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>
> `ReaderImpl.getRawDataSizeFromColIndices` changed behavior for handling 
> structs and now includes their subtypes. That caused an issue in Hive as the 
> root struct index is always provided in the "include" list causing size 
> estimation for the complete schema, not just selected columns



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28201) LLAP perf counter: all deamons, used daemons

2024-04-17 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-28201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-28201:

Description: 
The proposal is to introduce counters as below:
LLAP_NODES_COUNT_ALL: number of LLAP nodes visible for the task communicator
LLAP_NODES_COUNT_TASK_ASSIGNED: number of LLAP nodes where a task attempt was 
assigned to

I'm assuming here that 1 node == 1 daemon, so the counter can be referred to as 
number of daemons

> LLAP perf counter: all deamons, used daemons
> 
>
> Key: HIVE-28201
> URL: https://issues.apache.org/jira/browse/HIVE-28201
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Priority: Major
>
> The proposal is to introduce counters as below:
> LLAP_NODES_COUNT_ALL: number of LLAP nodes visible for the task communicator
> LLAP_NODES_COUNT_TASK_ASSIGNED: number of LLAP nodes where a task attempt was 
> assigned to
> I'm assuming here that 1 node == 1 daemon, so the counter can be referred to 
> as number of daemons



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28201) LLAP perf counter: all nodes, used nodes

2024-04-17 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-28201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-28201:

Summary: LLAP perf counter: all nodes, used nodes  (was: LLAP perf counter: 
all deamons, used daemons)

> LLAP perf counter: all nodes, used nodes
> 
>
> Key: HIVE-28201
> URL: https://issues.apache.org/jira/browse/HIVE-28201
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Priority: Major
>
> The proposal is to introduce counters as below:
> LLAP_NODES_COUNT_ALL: number of LLAP nodes visible for the task communicator
> LLAP_NODES_COUNT_TASK_ASSIGNED: number of LLAP nodes where a task attempt was 
> assigned to
> I'm assuming here that 1 node == 1 daemon, so the counter can be referred to 
> as number of daemons



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-28177) Announce Hive 1.x EOL and remove from downloads space

2024-04-17 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-28177.

Fix Version/s: Not Applicable
   Resolution: Fixed

> Announce Hive 1.x EOL and remove from downloads space
> -
>
> Key: HIVE-28177
> URL: https://issues.apache.org/jira/browse/HIVE-28177
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: Not Applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Hive 1.x release line is officially unsupported. The respective 
> discussion and vote can be found below:
>  * https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s
>  * [https://lists.apache.org/thread/cyfg2ftrsh9bn0wgycm7ltqsx9yb6fts]
> The following tasks are pending:
>  * Update the Hive website to reflect that Hive 1.x is EOL
>  * Send an official announcement email to the following lists: user@hive, 
> dev@hive, announce@apache
>  * Remove hive-1.2.2 from [https://downloads.apache.org/hive/] (it will be 
> automatically archived)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-28201) LLAP perf counter: all deamons, used daemons

2024-04-17 Thread Jira

László Bodor created HIVE-28201:
---

 Summary: LLAP perf counter: all deamons, used daemons
 Key: HIVE-28201
 URL: https://issues.apache.org/jira/browse/HIVE-28201
 Project: Hive
  Issue Type: Sub-task
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-28177) Announce Hive 1.x EOL and remove from downloads space

2024-04-17 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-28177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838069#comment-17838069
 ] 

Stamatis Zampetakis commented on HIVE-28177:


I posted the announcement to the site with 
https://github.com/apache/hive-site/commit/8b1a898c4ded604bdc06897429c59a3f69f03a6d
 and send an email to announce@apache

Thanks for the initiative and help [~ayushsaxena]!

> Announce Hive 1.x EOL and remove from downloads space
> -
>
> Key: HIVE-28177
> URL: https://issues.apache.org/jira/browse/HIVE-28177
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Hive 1.x release line is officially unsupported. The respective 
> discussion and vote can be found below:
>  * https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s
>  * [https://lists.apache.org/thread/cyfg2ftrsh9bn0wgycm7ltqsx9yb6fts]
> The following tasks are pending:
>  * Update the Hive website to reflect that Hive 1.x is EOL
>  * Send an official announcement email to the following lists: user@hive, 
> dev@hive, announce@apache
>  * Remove hive-1.2.2 from [https://downloads.apache.org/hive/] (it will be 
> automatically archived)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-28177) Announce Hive 1.x EOL and remove from downloads space

2024-04-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28177?focusedWorklogId=915076=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-915076
 ]

ASF GitHub Bot logged work on HIVE-28177:
-

Author: ASF GitHub Bot
Created on: 17/Apr/24 09:29
Start Date: 17/Apr/24 09:29
Worklog Time Spent: 10m 
  Work Description: zabetak closed pull request #14: HIVE-28177: Announce 
Hive 1.x EOL
URL: https://github.com/apache/hive-site/pull/14




Issue Time Tracking
---

Worklog Id: (was: 915076)
Time Spent: 20m  (was: 10m)

> Announce Hive 1.x EOL and remove from downloads space
> -
>
> Key: HIVE-28177
> URL: https://issues.apache.org/jira/browse/HIVE-28177
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Hive 1.x release line is officially unsupported. The respective 
> discussion and vote can be found below:
>  * https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s
>  * [https://lists.apache.org/thread/cyfg2ftrsh9bn0wgycm7ltqsx9yb6fts]
> The following tasks are pending:
>  * Update the Hive website to reflect that Hive 1.x is EOL
>  * Send an official announcement email to the following lists: user@hive, 
> dev@hive, announce@apache
>  * Remove hive-1.2.2 from [https://downloads.apache.org/hive/] (it will be 
> automatically archived)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-25765) skip.header.line.count property skips rows of each block in FetchOperator when file size is larger

2024-04-17 Thread Miklos Szurap (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838024#comment-17838024
 ] 

Miklos Szurap commented on HIVE-25765:
--

We've also faced this recently, and it's even more apparent when using S3 as a 
storage, since the [block size in 
S3|https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/performance.html]
 is 32 MB:
{code}
fs.s3a.block.size=32M
{code}
Can somebody reopen the pull request and help with the commit?

> skip.header.line.count property skips rows of each block in FetchOperator 
> when file size is larger
> --
>
> Key: HIVE-25765
> URL: https://issues.apache.org/jira/browse/HIVE-25765
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Ganesha Shreedhara
>Assignee: Ganesha Shreedhara
>Priority: Major
>  Labels: pull-request-available
> Attachments: data.txt.gz
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When _skip.header.line.count_ property is set in table properties, simple 
> select queries that gets converted into FetchTask skip rows of each block 
> instead of skipping header lines of each file. This happens when the file 
> size is larger and file is read in blocks. This issue doesn't exist when 
> select query is converted into map only job by setting 
> _hive.fetch.task.conversion_ to _none_ because the header lines are skipped 
> only for the first block because of [this 
> check|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java#L330]
>  We should have similar check in FetchOperator to avoid this issue. 
>  
> *Steps to reproduce:* 
> {code:java}
> -- Create table on top of the data file (uncompressed size: ~239M) attached 
> in this ticket
> CREATE EXTERNAL TABLE test_table(
>   col1 string,
>   col2 string,
>   col3 string,
>   col4 string,
>   col5 string,
>   col6 string,
>   col7 string,
>   col8 string,
>   col9 string,
>   col10 string,
>   col11 string,
>   col12 string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'location_of_data_file'
> TBLPROPERTIES ('skip.header.line.count'='1');
> -- Counting number of rows gives correct result with only one header line 
> skipped
> select count(*) from test_table;
> 3145727
> -- Select query skips more rows and the result depends upon the number of 
> blocks configured in underlying filesystem. 3 rows are skipped when the file 
> is read in 3 blocks. 
> select * from test_table;
> .
> .
> Fetched 3145724 rows
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28177) Announce Hive 1.x EOL and remove from downloads space

2024-04-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28177:
--
Labels: pull-request-available  (was: )

> Announce Hive 1.x EOL and remove from downloads space
> -
>
> Key: HIVE-28177
> URL: https://issues.apache.org/jira/browse/HIVE-28177
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Hive 1.x release line is officially unsupported. The respective 
> discussion and vote can be found below:
>  * https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s
>  * [https://lists.apache.org/thread/cyfg2ftrsh9bn0wgycm7ltqsx9yb6fts]
> The following tasks are pending:
>  * Update the Hive website to reflect that Hive 1.x is EOL
>  * Send an official announcement email to the following lists: user@hive, 
> dev@hive, announce@apache
>  * Remove hive-1.2.2 from [https://downloads.apache.org/hive/] (it will be 
> automatically archived)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-14867) "serialization.last.column.takes.rest" does not work for MultiDelimitSerDe

2024-04-17 Thread Liu Weizheng (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-14867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Weizheng updated HIVE-14867:

Description: 
Create table with MultiDelimitSerde:
{code:java}
CREATE TABLE foo (a string, b string) ROW FORMAT SERDE 
'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES 
("field.delim"="|@|","collection.delim"=":","mapkey.delim"="@") stored as 
textfile;
{code}
load data into table:
{code:java}
1|@|Lily|@|HW|@|abc
2|@|Lucy|@|LX|@|123
3|@|Lilei|@|XX|@|3434
{code}
select data from this table:
{code:java}
select * from foo;
+-++--+
| foo.a  | foo.b |
+-++--+
| 1   | Lily^AHW^Aabc|
| 2   | Lucy^ALX^A123|
| 3   | Lilei^AXX^A3434  |
+-++--+
3 rows selected (0.905 seconds)
{code}
You can see the last column takes all the data, and replace the delimiter to 
default ^A.

lastColumnTakesRestString should be false by default:
{code:java}
String lastColumnTakesRestString = tbl
.getProperty(serdeConstants.SERIALIZATION_LAST_COLUMN_TAKES_REST);
lastColumnTakesRest = (lastColumnTakesRestString != null && 
lastColumnTakesRestString
.equalsIgnoreCase("true"));
{code}
 

  was:
Create table with MultiDelimitSerde:
{code}
CREATE TABLE foo (a string, b string) ROW FORMAT SERDE 
'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES 
("field.delim"="|@|","collection.delim"=":","mapkey.delim"="@") stored as 
textfile;
{code}

load data into table:
{code}
1|@|Lily|@|HW|@|abc
2|@|Lucy|@|LX|@|123
3|@|Lilei|@|XX|@|3434
{code}

select data from this table:
{code}
select * from foo;
+-++--+
| foo.a  | foo.b |
+-++--+
| 1   | Lily^AHW^Aabc|
| 2   | Lucy^ALX^A123|
| 3   | Lilei^AXX^A3434  |
+-++--+
3 rows selected (0.905 seconds)
{code}

You can see the last column takes all the data, and replace the delimiter to 
default ^A.

lastColumnTakesRestString should be false by default: 
{code}
String lastColumnTakesRestString = tbl
.getProperty(serdeConstants.SERIALIZATION_LAST_COLUMN_TAKES_REST);
lastColumnTakesRest = (lastColumnTakesRestString != null && 
lastColumnTakesRestString
.equalsIgnoreCase("true"));
{code}



> "serialization.last.column.takes.rest" does not work for MultiDelimitSerDe
> --
>
> Key: HIVE-14867
> URL: https://issues.apache.org/jira/browse/HIVE-14867
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.3.0
>Reporter: Niklaus Xiao
>Assignee: Niklaus Xiao
>Priority: Major
>
> Create table with MultiDelimitSerde:
> {code:java}
> CREATE TABLE foo (a string, b string) ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH 
> SERDEPROPERTIES 
> ("field.delim"="|@|","collection.delim"=":","mapkey.delim"="@") stored as 
> textfile;
> {code}
> load data into table:
> {code:java}
> 1|@|Lily|@|HW|@|abc
> 2|@|Lucy|@|LX|@|123
> 3|@|Lilei|@|XX|@|3434
> {code}
> select data from this table:
> {code:java}
> select * from foo;
> +-++--+
> | foo.a  | foo.b |
> +-++--+
> | 1   | Lily^AHW^Aabc|
> | 2   | Lucy^ALX^A123|
> | 3   | Lilei^AXX^A3434  |
> +-++--+
> 3 rows selected (0.905 seconds)
> {code}
> You can see the last column takes all the data, and replace the delimiter to 
> default ^A.
> lastColumnTakesRestString should be false by default:
> {code:java}
> String lastColumnTakesRestString = tbl
> .getProperty(serdeConstants.SERIALIZATION_LAST_COLUMN_TAKES_REST);
> lastColumnTakesRest = (lastColumnTakesRestString != null && 
> lastColumnTakesRestString
> .equalsIgnoreCase("true"));
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-28177) Announce Hive 1.x EOL and remove from downloads space

2024-04-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28177?focusedWorklogId=915066=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-915066
 ]

ASF GitHub Bot logged work on HIVE-28177:
-

Author: ASF GitHub Bot
Created on: 17/Apr/24 08:00
Start Date: 17/Apr/24 08:00
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request, #14:
URL: https://github.com/apache/hive-site/pull/14

   (no comment)




Issue Time Tracking
---

Worklog Id: (was: 915066)
Remaining Estimate: 0h
Time Spent: 10m

> Announce Hive 1.x EOL and remove from downloads space
> -
>
> Key: HIVE-28177
> URL: https://issues.apache.org/jira/browse/HIVE-28177
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Hive 1.x release line is officially unsupported. The respective 
> discussion and vote can be found below:
>  * https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s
>  * [https://lists.apache.org/thread/cyfg2ftrsh9bn0wgycm7ltqsx9yb6fts]
> The following tasks are pending:
>  * Update the Hive website to reflect that Hive 1.x is EOL
>  * Send an official announcement email to the following lists: user@hive, 
> dev@hive, announce@apache
>  * Remove hive-1.2.2 from [https://downloads.apache.org/hive/] (it will be 
> automatically archived)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-28177) Announce Hive 1.x EOL and remove from downloads space

2024-04-17 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-28177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838016#comment-17838016
 ] 

Stamatis Zampetakis commented on HIVE-28177:


I just removed hive-1.2.2 from https://downloads.apache.org/hive/ along with a 
few other obsolete entries.

> Announce Hive 1.x EOL and remove from downloads space
> -
>
> Key: HIVE-28177
> URL: https://issues.apache.org/jira/browse/HIVE-28177
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> The Hive 1.x release line is officially unsupported. The respective 
> discussion and vote can be found below:
>  * https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s
>  * [https://lists.apache.org/thread/cyfg2ftrsh9bn0wgycm7ltqsx9yb6fts]
> The following tasks are pending:
>  * Update the Hive website to reflect that Hive 1.x is EOL
>  * Send an official announcement email to the following lists: user@hive, 
> dev@hive, announce@apache
>  * Remove hive-1.2.2 from [https://downloads.apache.org/hive/] (it will be 
> automatically archived)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HIVE-28177) Announce Hive 1.x EOL and remove from downloads space

2024-04-17 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-28177:
--

Assignee: Stamatis Zampetakis

> Announce Hive 1.x EOL and remove from downloads space
> -
>
> Key: HIVE-28177
> URL: https://issues.apache.org/jira/browse/HIVE-28177
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> The Hive 1.x release line is officially unsupported. The respective 
> discussion and vote can be found below:
>  * https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s
>  * [https://lists.apache.org/thread/cyfg2ftrsh9bn0wgycm7ltqsx9yb6fts]
> The following tasks are pending:
>  * Update the Hive website to reflect that Hive 1.x is EOL
>  * Send an official announcement email to the following lists: user@hive, 
> dev@hive, announce@apache
>  * Remove hive-1.2.2 from [https://downloads.apache.org/hive/] (it will be 
> automatically archived)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

38 matches

Mail list logo