[jira] [Updated] (HIVE-27695) Intermittent OOM when running TestMiniTezCliDriver

2023-09-15 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27695:
---
Attachment: leak_suspect_1.png

> Intermittent OOM when running TestMiniTezCliDriver
> --
>
> Key: HIVE-27695
> URL: https://issues.apache.org/jira/browse/HIVE-27695
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Attachments: am_heap_dumps.tar.xz, leak_suspect_1.png
>
>
> Running all the tests under TestMiniTezCliDriver very frequently (but still 
> intermittently) leads to OutOfMemory errors.
> {noformat}
> cd itests/qtest && mvn test -Dtest=TestMiniTezCliDriver
> {noformat}
> I set {{-XX:+HeapDumpOnOutOfMemoryError}} and the respective heapdumps are 
> attached to this ticket.
> The OOM is thrown from the application master and a quick inspection of the 
> dumps shows that it comes mainly from the accumulation of Configuration 
> objects (~1MB each) by various classes.
> The max heap size for application master is pretty low (~100MB) so it is 
> quite easy to reach. The heap size is explicitly very low for testing 
> purposes but maybe we should re-evaluate the current configurations for the 
> tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27695) Intermittent OOM when running TestMiniTezCliDriver

2023-09-15 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27695:
---
Attachment: am_heap_dumps.tar.xz

> Intermittent OOM when running TestMiniTezCliDriver
> --
>
> Key: HIVE-27695
> URL: https://issues.apache.org/jira/browse/HIVE-27695
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Attachments: am_heap_dumps.tar.xz
>
>
> Running all the tests under TestMiniTezCliDriver very frequently (but still 
> intermittently) leads to OutOfMemory errors.
> {noformat}
> cd itests/qtest && mvn test -Dtest=TestMiniTezCliDriver
> {noformat}
> I set {{-XX:+HeapDumpOnOutOfMemoryError}} and the respective heapdumps are 
> attached to this ticket.
> The OOM is thrown from the application master and a quick inspection of the 
> dumps shows that it comes mainly from the accumulation of Configuration 
> objects (~1MB each) by various classes.
> The max heap size for application master is pretty low (~100MB) so it is 
> quite easy to reach. The heap size is explicitly very low for testing 
> purposes but maybe we should re-evaluate the current configurations for the 
> tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27695) Intermittent OOM when running TestMiniTezCliDriver

2023-09-15 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27695:
--

 Summary: Intermittent OOM when running TestMiniTezCliDriver
 Key: HIVE-27695
 URL: https://issues.apache.org/jira/browse/HIVE-27695
 Project: Hive
  Issue Type: Bug
  Components: Test
Affects Versions: 4.0.0-beta-1
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Running all the tests under TestMiniTezCliDriver very frequently (but still 
intermittently) leads to OutOfMemory errors.
{noformat}
cd itests/qtest && mvn test -Dtest=TestMiniTezCliDriver
{noformat}

I set {{-XX:+HeapDumpOnOutOfMemoryError}} and the respective heapdumps are 
attached to this ticket.

The OOM is thrown from the application master and a quick inspection of the 
dumps shows that it comes mainly from the accumulation of Configuration objects 
(~1MB each) by various classes.

The max heap size for application master is pretty low (~100MB) so it is quite 
easy to reach. The heap size is explicitly very low for testing purposes but 
maybe we should re-evaluate the current configurations for the tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27694) Include HiveIcebergSerDe in default list of serdes using HMS

2023-09-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27694:
--
Labels: pull-request-available  (was: )

> Include HiveIcebergSerDe in default list of serdes using HMS
> 
>
> Key: HIVE-27694
> URL: https://issues.apache.org/jira/browse/HIVE-27694
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
>  Labels: pull-request-available
>
> MetastoreConf has this default list of SerDes that use HMS to persist the 
> metadata. Iceberg tables also use have their metadata in HMS and hence can be 
> fetched from metastore. This SerDe needs to be added to this list as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27649) Subqueries with a set operator do not support order by clauses

2023-09-15 Thread Nicolas Richard (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Richard updated HIVE-27649:
---
Status: Patch Available  (was: In Progress)

> Subqueries with a set operator do not support order by clauses
> --
>
> Key: HIVE-27649
> URL: https://issues.apache.org/jira/browse/HIVE-27649
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Nicolas Richard
>Assignee: Nicolas Richard
>Priority: Major
>  Labels: pull-request-available
>
> Consider the following query:
> {code:java}
> select key from ((select key from src order by key) union (select key from 
> src))subq {code}
> Up until 3.1.2, Hive would parse this query without any problems. However, if 
> you try it on the latest versions, you'll get the following exception:
> {code:java}
> org.apache.hadoop.hive.ql.parse.ParseException: line 1:60 cannot recognize 
> input near 'union' '(' 'select' in subquery source
>         at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125)
>         at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:97) {code}
> With the inner exception stack trace being:
> {code:java}
> NoViableAltException(367@[])
>     at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:14006)
>     at 
> org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:45086)
>     at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.subQuerySource(HiveParser_FromClauseParser.java:5411)
>     at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.atomjoinSource(HiveParser_FromClauseParser.java:1921)
>     at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.joinSource(HiveParser_FromClauseParser.java:2175)
>     at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.atomjoinSource(HiveParser_FromClauseParser.java:2110)
>     at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.joinSource(HiveParser_FromClauseParser.java:2175)
>     at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromSource(HiveParser_FromClauseParser.java:1750)
>     at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1593)
>     at 
> org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:45094)
>     at 
> org.apache.hadoop.hive.ql.parse.HiveParser.atomSelectStatement(HiveParser.java:38538)
>     at 
> org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:38831)
>     at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:38424)
>     at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:37686)
>     at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:37574)
>     at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2757)
>     at 
> org.apache.hadoop.hive.ql.parse.HiveParser.explainStatement(HiveParser.java:1751)
>     at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1614)
>     at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:123)
>     at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:97) 
> {code}
> Note that this behavior also happens if the subquery contains a SORT BY, 
> CLUSTER BY, DISTRIBUTE BY or LIMIT clause.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27694) Include HiveIcebergSerDe in default list of serdes using HMS

2023-09-15 Thread Naveen Gangam (Jira)
Naveen Gangam created HIVE-27694:


 Summary: Include HiveIcebergSerDe in default list of serdes using 
HMS
 Key: HIVE-27694
 URL: https://issues.apache.org/jira/browse/HIVE-27694
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Reporter: Naveen Gangam
Assignee: Naveen Gangam


MetastoreConf has this default list of SerDes that use HMS to persist the 
metadata. Iceberg tables also use have their metadata in HMS and hence can be 
fetched from metastore. This SerDe needs to be added to this list as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27675) Support keystore/truststore types for hive to zookeeper integration points

2023-09-15 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-27675.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Fix has been merged to master. Thank you for the review

> Support keystore/truststore types for hive to zookeeper integration points
> --
>
> Key: HIVE-27675
> URL: https://issues.apache.org/jira/browse/HIVE-27675
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC, Standalone Metastore
>Affects Versions: 3.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> In HIVE-24253, we added support for HS2/HMS/JDBC DRiver to support other 
> store types like BCFKS (other than JKS). This allows JDBC Clients to connect 
> to HS2 directly. However, with service discovery enabled, the clients have to 
> connect zookeeper to determine HS2 endpoints. This connectivity currently 
> does not support other store types. Similarly, HS2/HMS services also do not 
> provide ability to use different store types for the zk registration process.
> {noformat}
> $ beeline 
> Connecting to 
> jdbc:hive2://:2181/default;httpPath=cliservice;principal=hive/_HOST@;retries=5;serviceDiscoveryMode=zooKeeper;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks;transportMode=http;trustStorePassword=RoeCFK11Pq54;trustStoreType=bcfks;zooKeeperNamespace=hiveserver2
> Error: org.apache.hive.jdbc.ZooKeeperHiveClientException: Unable to read 
> HiveServer2 configs from ZooKeeper (state=,code=0) 
> {noformat}
> {noformat}
> Opening socket connection to server :2182. Will attempt to 
> SASL-authenticate using Login Context section 'HiveZooKeeperClient'
> 2023-08-09 13:28:07,591 WARN  io.netty.channel.ChannelInitializer: 
> [nioEventLoopGroup-3-1]: Failed to initialize a channel. Closing: [id: 
> 0x0937583f]
> org.apache.zookeeper.common.X509Exception$SSLContextException: Failed to 
> create KeyManager
> at 
> org.apache.zookeeper.common.X509Util.createSSLContextAndOptions(X509Util.java:346)
>  ~[zookeeper-3.5.5.7.2.16.300-7.jar:3.5.5.7.2.16.300-7]
> at 
> org.apache.zookeeper.common.X509Util.createSSLContext(X509Util.java:278) 
> ~[zookeeper-3.5.5.7.2.16.300-7.jar:3.5.5.7.2.16.300-7]
> at 
> org.apache.zookeeper.ClientCnxnSocketNetty$ZKClientPipelineFactory.initSSL(ClientCnxnSocketNetty.java:454)
>  ~[zookeeper-3.5.5.7.2.16.300-7.jar:3.5.5.7.2.16.300-7]
> at 
> org.apache.zookeeper.ClientCnxnSocketNetty$ZKClientPipelineFactory.initChannel(ClientCnxnSocketNetty.java:444)
>  ~[zookeeper-3.5.5.7.2.16.300-7.jar:3.5.5.7.2.16.300-7]
> at 
> org.apache.zookeeper.ClientCnxnSocketNetty$ZKClientPipelineFactory.initChannel(ClientCnxnSocketNetty.java:429)
>  ~[zookeeper-3.5.5.7.2.16.300-7.jar:3.5.5.7.2.16.300-7]
> at 
> io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:129) 
> [netty-transport-4.1.86.Final.jar:4.1.86.Final]
> at 
> io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:112) 
> [netty-transport-4.1.86.Final.jar:4.1.86.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.callHandlerAdded(AbstractChannelHandlerContext.java:1114)
>  [netty-transport-4.1.86.Final.jar:4.1.86.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:609)
>  [netty-transport-4.1.86.Final.jar:4.1.86.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.access$100(DefaultChannelPipeline.java:46)
>  [netty-transport-4.1.86.Final.jar:4.1.86.Final]
> at 
> io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1463)
>  [netty-transport-4.1.86.Final.jar:4.1.86.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1115)
>  [netty-transport-4.1.86.Final.jar:4.1.86.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:650)
>  [netty-transport-4.1.86.Final.jar:4.1.86.Final]
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:514)
>  [netty-transport-4.1.86.Final.jar:4.1.86.Final]
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
>  [netty-transport-4.1.86.Final.jar:4.1.86.Final]
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)
>  [netty-transport-4.1.86.Final.jar:4.1.86.Final]
> at 
> io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
>  

[jira] [Updated] (HIVE-27138) MapJoinOperator throws NPE when computing OuterJoin with filter expressions on small table

2023-09-15 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27138:
---
Fix Version/s: 4.0.0

> MapJoinOperator throws NPE when computing OuterJoin with filter expressions 
> on small table
> --
>
> Key: HIVE-27138
> URL: https://issues.apache.org/jira/browse/HIVE-27138
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Blocker
>  Labels: hive-4.0.0-must, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hive throws NPE when running mapjoin_filter_on_outerjoin.q using Tez engine. 
> (I used TestMiniLlapCliDriver.)
> The NPE is thrown by CommonJoinOperator.getFilterTag(), which just retreives 
> the last object from the given list.
> To the best of my knowledge, if Hive selects MapJoin to perform Join 
> operation, filterTag should be computed and appended to a row before the row 
> is passed to MapJoinOperator.
> In the case of MapReduce engine, this is done by HashTableSinkOperator.
> However, I cannot find any logic pareparing filterTag for small tables when 
> Hive uses Tez engine.
> I think there are 2 available options:
> 1. Don't use MapJoinOperator if a small table has filter expression.
> 2. Add a new logic that computes and passes filterTag to MapJoinOperator.
> I am working on the second option and ready to discuss about it.
> It would be grateful if you could give any opinion about this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27558) HBase table query does not push BETWEEN predicate to storage layer

2023-09-15 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27558.
---
Resolution: Fixed

Merged to master. Thanks [~Dayakar] for the patch.

> HBase table query does not push BETWEEN predicate to storage layer
> --
>
> Key: HIVE-27558
> URL: https://issues.apache.org/jira/browse/HIVE-27558
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> INSERT INTO TABLE target_tbl
> SELECT
>   ...
> FROM
>   (
> SELECT
>   ...
> FROM
>   hbase_tbl
> WHERE
>   CDS_PK >= '2-00OZG-0'
>   and CDS_PK <= '2-00OZG-g'
>   ) CDS_VIEW;
> {code}
> The statement predicate is not pushed to the storage lager causing the job to 
> execute longer than it needs to.
> Possible solutions:
> 1. Support pushing down BETWEEN clause to HBaseStorageHandler
> 2. Don't convert specific filters on key column incase of HBaseStorageHandler 
> or don't apply this optimization for HBaseStorageHandler tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27558) HBase table query does not push BETWEEN predicate to storage layer

2023-09-15 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27558:
--
Fix Version/s: 4.0.0

> HBase table query does not push BETWEEN predicate to storage layer
> --
>
> Key: HIVE-27558
> URL: https://issues.apache.org/jira/browse/HIVE-27558
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> INSERT INTO TABLE target_tbl
> SELECT
>   ...
> FROM
>   (
> SELECT
>   ...
> FROM
>   hbase_tbl
> WHERE
>   CDS_PK >= '2-00OZG-0'
>   and CDS_PK <= '2-00OZG-g'
>   ) CDS_VIEW;
> {code}
> The statement predicate is not pushed to the storage lager causing the job to 
> execute longer than it needs to.
> Possible solutions:
> 1. Support pushing down BETWEEN clause to HBaseStorageHandler
> 2. Don't convert specific filters on key column incase of HBaseStorageHandler 
> or don't apply this optimization for HBaseStorageHandler tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27693) FileSystem counters summary to include durations

2023-09-15 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27693:

Description: 
given this:
{code}
INFO  : FileSystem Counters Summary
INFO  : 
INFO  : Scheme: HDFS
INFO  : 
--
INFO  :   VERTICES  BYTES_READ  READ_OPS LARGE_READ_OPS  
BYTES_WRITTEN WRITE_OPS
INFO  : 
--
INFO  :  Map 1350.69MB   622  0 
0B 0
INFO  : Map 10  0B 0  0 
0B 0
INFO  : Map 11  0B 1  0 
0B 0
INFO  : Map 12  0B 1  0 
0B 0
INFO  : Map 13  0B 1  0 
0B 0
INFO  : Map 14206B 1  0 
0B 0
INFO  : Map 15206B 1  0 
0B 0
INFO  : Map 16  0B 1  0 
0B 0
INFO  : Map 17  0B 1  0 
0B 0
{code}

I want to be able to see how much time has actually been spent on filesystem 
operation on a specific scheme, as b looking at that, I could easily tell if 
file reads were slow

  was:
given this:
{code}
INFO  : FileSystem Counters Summary
INFO  : 
INFO  : Scheme: HDFS
INFO  : 
--
INFO  :   VERTICES  BYTES_READ  READ_OPS LARGE_READ_OPS  
BYTES_WRITTEN WRITE_OPS
INFO  : 
--
INFO  :  Map 1350.69MB   622  0 
0B 0
INFO  : Map 10  0B 0  0 
0B 0
INFO  : Map 11  0B 1  0 
0B 0
INFO  : Map 12  0B 1  0 
0B 0
INFO  : Map 13  0B 1  0 
0B 0
INFO  : Map 14206B 1  0 
0B 0
INFO  : Map 15206B 1  0 
0B 0
INFO  : Map 16  0B 1  0 
0B 0
INFO  : Map 17  0B 1  0 
0B 0
{code}

I want to be able to see how much time has actually been spent on filesystem 
operation on a specific scheme, as by that, I could be easily tell


> FileSystem counters summary to include durations 
> -
>
> Key: HIVE-27693
> URL: https://issues.apache.org/jira/browse/HIVE-27693
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Priority: Major
>
> given this:
> {code}
> INFO  : FileSystem Counters Summary
> INFO  : 
> INFO  : Scheme: HDFS
> INFO  : 
> --
> INFO  :   VERTICES  BYTES_READ  READ_OPS LARGE_READ_OPS  
> BYTES_WRITTEN WRITE_OPS
> INFO  : 
> --
> INFO  :  Map 1350.69MB   622  0   
>   0B 0
> INFO  : Map 10  0B 0  0   
>   0B 0
> INFO  : Map 11  0B 1  0   
>   0B 0
> INFO  : Map 12  0B 1  0   
>   0B 0
> INFO  : Map 13  0B 1  0   
>   0B 0
> INFO  : Map 14206B 1  0   
>   0B 0
> INFO  : Map 15206B 1  0   
>   0B 0
> INFO  : Map 16  0B 1  0   
>   0B 0
> INFO  : Map 17  0B 1  0   
>   0B 0
> {code}
> I want to be able to see how much time has actually been spent on filesystem 
> operation 

[jira] [Updated] (HIVE-27693) FileSystem counters summary to include durations

2023-09-15 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27693:

Description: 
given this:
{code}
INFO  : FileSystem Counters Summary
INFO  : 
INFO  : Scheme: HDFS
INFO  : 
--
INFO  :   VERTICES  BYTES_READ  READ_OPS LARGE_READ_OPS  
BYTES_WRITTEN WRITE_OPS
INFO  : 
--
INFO  :  Map 1350.69MB   622  0 
0B 0
INFO  : Map 10  0B 0  0 
0B 0
INFO  : Map 11  0B 1  0 
0B 0
INFO  : Map 12  0B 1  0 
0B 0
INFO  : Map 13  0B 1  0 
0B 0
INFO  : Map 14206B 1  0 
0B 0
INFO  : Map 15206B 1  0 
0B 0
INFO  : Map 16  0B 1  0 
0B 0
INFO  : Map 17  0B 1  0 
0B 0
{code}

I want to be able to see how much time has actually been spent on filesystem 
operation on a specific scheme, as by that, I could be easily tell

> FileSystem counters summary to include durations 
> -
>
> Key: HIVE-27693
> URL: https://issues.apache.org/jira/browse/HIVE-27693
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Priority: Major
>
> given this:
> {code}
> INFO  : FileSystem Counters Summary
> INFO  : 
> INFO  : Scheme: HDFS
> INFO  : 
> --
> INFO  :   VERTICES  BYTES_READ  READ_OPS LARGE_READ_OPS  
> BYTES_WRITTEN WRITE_OPS
> INFO  : 
> --
> INFO  :  Map 1350.69MB   622  0   
>   0B 0
> INFO  : Map 10  0B 0  0   
>   0B 0
> INFO  : Map 11  0B 1  0   
>   0B 0
> INFO  : Map 12  0B 1  0   
>   0B 0
> INFO  : Map 13  0B 1  0   
>   0B 0
> INFO  : Map 14206B 1  0   
>   0B 0
> INFO  : Map 15206B 1  0   
>   0B 0
> INFO  : Map 16  0B 1  0   
>   0B 0
> INFO  : Map 17  0B 1  0   
>   0B 0
> {code}
> I want to be able to see how much time has actually been spent on filesystem 
> operation on a specific scheme, as by that, I could be easily tell



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27693) FileSystem counters summary to include durations

2023-09-15 Thread Jira
László Bodor created HIVE-27693:
---

 Summary: FileSystem counters summary to include durations 
 Key: HIVE-27693
 URL: https://issues.apache.org/jira/browse/HIVE-27693
 Project: Hive
  Issue Type: Sub-task
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27673) Configurable datetime formatter for date_format

2023-09-15 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765527#comment-17765527
 ] 

Stamatis Zampetakis commented on HIVE-27673:


I updated the wiki to reflect the changes in date_format and 
hive.datetime.formatter property:
* https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
* 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Datetime

> Configurable datetime formatter for date_format
> ---
>
> Key: HIVE-27673
> URL: https://issues.apache.org/jira/browse/HIVE-27673
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> HIVE-25268 switched the internal implementation of date_format from 
> java.text.SimpleDateFormat to java.time.format.DateTimeFormatter in order to 
> avoid some inconsistencies (arguably wrong results) for dates prior to 1900.
> However, the API of the underlying formatter is exposed to the user since 
> they need to pass patterns that are valid for the respective formatter.
> Changing the formatter implementation resolves the bugs in HIVE-25268 but 
> also leads to backward incompatible behavior.
> Consider for example the following query where the letter 'u' is used to 
> format the date:
> {code:sql}
> select date_format('2023-09-08','u');
> {code}
> The query above will return different result depending on the formatter that 
> is used underneath.
> In 
> [SimpleDateFormat|https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html],
>  the letter 'u' means day of the week so the query returns 5.
> In 
> [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html],
>  the letter 'u' means year so the query returns 2023.
> The goal of this ticket is to make the underlying formatter of date_format 
> function configurable by the end-user via property, similarly to what was 
> done in HIVE-25576. For this purpose we could reuse the same property: 
> hive.datetime.formatter



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27138) MapJoinOperator throws NPE when computing OuterJoin with filter expressions on small table

2023-09-15 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27138.
---
Resolution: Fixed

Merged to master. Thanks [~seonggon] for the patch.

> MapJoinOperator throws NPE when computing OuterJoin with filter expressions 
> on small table
> --
>
> Key: HIVE-27138
> URL: https://issues.apache.org/jira/browse/HIVE-27138
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Blocker
>  Labels: hive-4.0.0-must, pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hive throws NPE when running mapjoin_filter_on_outerjoin.q using Tez engine. 
> (I used TestMiniLlapCliDriver.)
> The NPE is thrown by CommonJoinOperator.getFilterTag(), which just retreives 
> the last object from the given list.
> To the best of my knowledge, if Hive selects MapJoin to perform Join 
> operation, filterTag should be computed and appended to a row before the row 
> is passed to MapJoinOperator.
> In the case of MapReduce engine, this is done by HashTableSinkOperator.
> However, I cannot find any logic pareparing filterTag for small tables when 
> Hive uses Tez engine.
> I think there are 2 available options:
> 1. Don't use MapJoinOperator if a small table has filter expression.
> 2. Add a new logic that computes and passes filterTag to MapJoinOperator.
> I am working on the second option and ready to discuss about it.
> It would be grateful if you could give any opinion about this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27138) MapJoinOperator throws NPE when computing OuterJoin with filter expressions on small table

2023-09-15 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-27138:
-

Assignee: Seonggon Namgung  (was: Sönke Liebau)

> MapJoinOperator throws NPE when computing OuterJoin with filter expressions 
> on small table
> --
>
> Key: HIVE-27138
> URL: https://issues.apache.org/jira/browse/HIVE-27138
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Blocker
>  Labels: hive-4.0.0-must, pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hive throws NPE when running mapjoin_filter_on_outerjoin.q using Tez engine. 
> (I used TestMiniLlapCliDriver.)
> The NPE is thrown by CommonJoinOperator.getFilterTag(), which just retreives 
> the last object from the given list.
> To the best of my knowledge, if Hive selects MapJoin to perform Join 
> operation, filterTag should be computed and appended to a row before the row 
> is passed to MapJoinOperator.
> In the case of MapReduce engine, this is done by HashTableSinkOperator.
> However, I cannot find any logic pareparing filterTag for small tables when 
> Hive uses Tez engine.
> I think there are 2 available options:
> 1. Don't use MapJoinOperator if a small table has filter expression.
> 2. Add a new logic that computes and passes filterTag to MapJoinOperator.
> I am working on the second option and ready to discuss about it.
> It would be grateful if you could give any opinion about this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-24606) Multi-stage materialized CTEs can lose intermediate data

2023-09-15 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-24606:
--
Fix Version/s: 4.0.0

> Multi-stage materialized CTEs can lose intermediate data
> 
>
> Key: HIVE-24606
> URL: https://issues.apache.org/jira/browse/HIVE-24606
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> With complex multi-stage CTEs, Hive can start a latter stage before its 
> previous stage finishes.
>  That's because `SemanticAnalyzer#toRealRootTasks` can fail to resolve 
> dependency between multistage materialized CTEs when a non-materialized CTE 
> cuts in.
>  
> [https://github.com/apache/hive/blob/425e1ff7c054f87c4db87e77d004282d529599ae/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L1414]
>  
> For example, when submitting this query,
> {code:sql}
> SET hive.optimize.cte.materialize.threshold=2;
> SET hive.optimize.cte.materialize.full.aggregate.only=false;
> WITH x AS ( SELECT 'x' AS id ), -- not materialized
> a1 AS ( SELECT 'a1' AS id ), -- materialized by a2 and the root
> a2 AS ( SELECT 'a2 <- ' || id AS id FROM a1) -- materialized by the root
> SELECT * FROM a1
> UNION ALL
> SELECT * FROM x
> UNION ALL
> SELECT * FROM a2
> UNION ALL
> SELECT * FROM a2;
> {code}
> `toRealRootTask` will traverse the CTEs in order of `a1`, `x`, and `a2`. It 
> means the dependency between `a1` and `a2` will be ignored and `a2` can start 
> without waiting for `a1`. As a result, the above query returns the following 
> result.
> {code:java}
> +-+
> | id  |
> +-+
> | a1  |
> | x   |
> +-+
> {code}
> For your information, I ran this test with revision = 
> 425e1ff7c054f87c4db87e77d004282d529599ae.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-24606) Multi-stage materialized CTEs can lose intermediate data

2023-09-15 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-24606.
---
Resolution: Fixed

Merged [4363|https://github.com/apache/hive/pull/4363] to master. Thanks 
[~okumin] for the patch.

> Multi-stage materialized CTEs can lose intermediate data
> 
>
> Key: HIVE-24606
> URL: https://issues.apache.org/jira/browse/HIVE-24606
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> With complex multi-stage CTEs, Hive can start a latter stage before its 
> previous stage finishes.
>  That's because `SemanticAnalyzer#toRealRootTasks` can fail to resolve 
> dependency between multistage materialized CTEs when a non-materialized CTE 
> cuts in.
>  
> [https://github.com/apache/hive/blob/425e1ff7c054f87c4db87e77d004282d529599ae/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L1414]
>  
> For example, when submitting this query,
> {code:sql}
> SET hive.optimize.cte.materialize.threshold=2;
> SET hive.optimize.cte.materialize.full.aggregate.only=false;
> WITH x AS ( SELECT 'x' AS id ), -- not materialized
> a1 AS ( SELECT 'a1' AS id ), -- materialized by a2 and the root
> a2 AS ( SELECT 'a2 <- ' || id AS id FROM a1) -- materialized by the root
> SELECT * FROM a1
> UNION ALL
> SELECT * FROM x
> UNION ALL
> SELECT * FROM a2
> UNION ALL
> SELECT * FROM a2;
> {code}
> `toRealRootTask` will traverse the CTEs in order of `a1`, `x`, and `a2`. It 
> means the dependency between `a1` and `a2` will be ignored and `a2` can start 
> without waiting for `a1`. As a result, the above query returns the following 
> result.
> {code:java}
> +-+
> | id  |
> +-+
> | a1  |
> | x   |
> +-+
> {code}
> For your information, I ran this test with revision = 
> 425e1ff7c054f87c4db87e77d004282d529599ae.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27673) Configurable datetime formatter for date_format

2023-09-15 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-27673.

Fix Version/s: 4.0.0
   Resolution: Fixed

Fixed in 
https://github.com/apache/hive/commit/16a39b4fe77be3f204ddcffac607385f683ed129. 
Thanks for the reviews [~jfs], [~amansinha100], [~ayushsaxena]!

> Configurable datetime formatter for date_format
> ---
>
> Key: HIVE-27673
> URL: https://issues.apache.org/jira/browse/HIVE-27673
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> HIVE-25268 switched the internal implementation of date_format from 
> java.text.SimpleDateFormat to java.time.format.DateTimeFormatter in order to 
> avoid some inconsistencies (arguably wrong results) for dates prior to 1900.
> However, the API of the underlying formatter is exposed to the user since 
> they need to pass patterns that are valid for the respective formatter.
> Changing the formatter implementation resolves the bugs in HIVE-25268 but 
> also leads to backward incompatible behavior.
> Consider for example the following query where the letter 'u' is used to 
> format the date:
> {code:sql}
> select date_format('2023-09-08','u');
> {code}
> The query above will return different result depending on the formatter that 
> is used underneath.
> In 
> [SimpleDateFormat|https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html],
>  the letter 'u' means day of the week so the query returns 5.
> In 
> [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html],
>  the letter 'u' means year so the query returns 2023.
> The goal of this ticket is to make the underlying formatter of date_format 
> function configurable by the end-user via property, similarly to what was 
> done in HIVE-25576. For this purpose we could reuse the same property: 
> hive.datetime.formatter



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27081) Revert HIVE-26717 and HIVE-26718

2023-09-15 Thread Jacques (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765510#comment-17765510
 ] 

Jacques commented on HIVE-27081:


Is this off the table indefinitely? In our cluster we have a large amount of 
insert-only acid tables (lecagy Impala compatibility) and rebalance compaction 
would be extremely useful in these cases as well

> Revert HIVE-26717 and HIVE-26718
> 
>
> Key: HIVE-27081
> URL: https://issues.apache.org/jira/browse/HIVE-27081
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-beta-1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Due to some unexpected challenges, the scope for rebalance compaction is 
> reduced. Only manual rebalance on full-acid tables are need to be supported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-22215) Compaction of sorted table

2023-09-15 Thread Jacques (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765506#comment-17765506
 ] 

Jacques commented on HIVE-22215:


Any information / view on when compaction of SORTED tables will be supported? 

> Compaction of sorted table
> --
>
> Key: HIVE-22215
> URL: https://issues.apache.org/jira/browse/HIVE-22215
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: Pawel Jurkiewicz
>Priority: Major
>
> I recently came across an issue regarding compacting tables with sorting.
> I am creating and populating with test data two tables: both ACID but only 
> one is sorted
> {code:sql}
> USE priv;
> DROP TABLE IF EXISTS test_data;
> DROP TABLE IF EXISTS test_compact_insert_with_sorting;
> DROP TABLE IF EXISTS test_compact_insert_without_sorting;
> CREATE TABLE test_data AS SELECT 'foobar' col;
> CREATE TABLE test_compact_insert_with_sorting (col string) 
> CLUSTERED BY (col) SORTED BY (col) INTO 1 BUCKETS
> TBLPROPERTIES ('transactional' = 'true', 
> 'transactional_properties'='insert_only');
> CREATE TABLE test_compact_insert_without_sorting (col string) 
> CLUSTERED BY (col) INTO 1 BUCKETS
> TBLPROPERTIES ('transactional' = 'true', 
> 'transactional_properties'='insert_only');
> INSERT OVERWRITE TABLE test_compact_insert_with_sorting SELECT col FROM 
> test_data;
> INSERT OVERWRITE TABLE test_compact_insert_without_sorting SELECT col FROM 
> test_data;  INSERT OVERWRITE TABLE test_compact_insert_with_sorting SELECT 
> col FROM test_data;
> INSERT OVERWRITE TABLE test_compact_insert_without_sorting SELECT col FROM 
> test_data; 
> {code}
> As expected, after these operations two base files were created for each 
> table:
> {code:bash}
> $ hdfs dfs -ls /warehouse/tablespace/managed/hive/priv.db/test_compact_insert*
> Found 2 items
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_001
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_002
> Found 2 items
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_without_sorting/base_001
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_without_sorting/base_002
> {code}
> But after running manual compaction on those tables:
> {code:sql}
> USE priv;
> ALTER TABLE test_compact_insert_with_sorting COMPACT 'MAJOR';
> ALTER TABLE test_compact_insert_without_sorting COMPACT 'MAJOR';
> {code}
> Tuns out only the one without sorting got compacted:
> {code:bash}
> hdfs dfs -ls /warehouse/tablespace/managed/hive/priv.db/test_compact*
> Found 2 items
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_001
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_002
> Found 1 items
> drwxrwx---+  - hive hadoop  0 2019-09-18 15:08 
> /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_without_sorting/base_002
> {code}
> Compactions inspection returns:
> {code:bash}
> $ beeline -e 'show compactions' | grep priv | grep test_compact
> | 7598474   | priv  | test_compact_insert_with_sorting   |  ---   
> | MAJOR  | succeeded  | 
> master-01.pd.my-domain.com.pl-51  | 1568812155386  | 11 | None
> |
> | 7598475   | priv  | test_compact_insert_without_sorting|  ---   
> | MAJOR  | succeeded  |  ---  
>| 1568812155403  | 298| None
> {code}
> Is this by design? Both compactions states are 'succeeded' but only the one 
> that resulted in reducing number of base files took some time. Another 
> remarkable behavior is compaction of the table with sorting has worker 
> assigned meaning it is still in progress?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)