date:20210608

[jira] [Resolved] (HIVE-25150) Tab characters are not removed before decimal conversion similar to space character which is fixed as part of HIVE-24378

2021-06-08 Thread Taraka Rama Rao Lethavadla (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Taraka Rama Rao Lethavadla resolved HIVE-25150.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

> Tab characters are not removed before decimal conversion similar to space 
> character which is fixed as part of HIVE-24378
> 
>
> Key: HIVE-25150
> URL: https://issues.apache.org/jira/browse/HIVE-25150
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Taraka Rama Rao Lethavadla
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Test case: 
>  column values with space and tab character 
> {noformat}
> bash-4.2$ cat data/files/test_dec_space.csv
> 1,0
> 2, 1
> 3,2{noformat}
> {noformat}
> create external table test_dec_space (id int, value decimal) ROW FORMAT 
> DELIMITED
>  FIELDS TERMINATED BY ',' location '/tmp/test_dec_space';
> {noformat}
> output of select * from test_dec_space would be
> {noformat}
> 1 0
> 2 1
> 3 NULL{noformat}
> The behaviour in MySQL when there is tab & space characters in decimal values
> {noformat}
> bash-4.2$ cat /tmp/insert.csv 
> "1","aa",11.88
> "2","bb", 99.88
> "4","dd", 209.88{noformat}
>  
> {noformat}
> MariaDB [test]> load data local infile '/tmp/insert.csv' into table t2 fields 
> terminated by ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
>  Query OK, 3 rows affected, 3 warnings (0.00 sec) 
>  Records: 3 Deleted: 0 Skipped: 0 Warnings: 3
> MariaDB [test]> select * from t2;
> +--+--+---+
> | id   | name | score |
> +--+--+---+
> | 1| aa   |12 |
> | 2| bb   |   100 |
> | 4| dd   |   210 |
> +--+--+---+
>  3 rows in set (0.00 sec)
> {noformat}
> So in hive also we can make it work by skipping tab character



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25150) Tab characters are not removed before decimal conversion similar to space character which is fixed as part of HIVE-24378

2021-06-08 Thread Taraka Rama Rao Lethavadla (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359767#comment-17359767
 ] 

Taraka Rama Rao Lethavadla commented on HIVE-25150:
---

Added support to skip below characters as part of this request


HORIZONTAL_TABULATION ('\u0009')
VERTICAL_TABULATION ('\u000B')
FORM_FEED ('\u000C')

> Tab characters are not removed before decimal conversion similar to space 
> character which is fixed as part of HIVE-24378
> 
>
> Key: HIVE-25150
> URL: https://issues.apache.org/jira/browse/HIVE-25150
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Taraka Rama Rao Lethavadla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Test case: 
>  column values with space and tab character 
> {noformat}
> bash-4.2$ cat data/files/test_dec_space.csv
> 1,0
> 2, 1
> 3,2{noformat}
> {noformat}
> create external table test_dec_space (id int, value decimal) ROW FORMAT 
> DELIMITED
>  FIELDS TERMINATED BY ',' location '/tmp/test_dec_space';
> {noformat}
> output of select * from test_dec_space would be
> {noformat}
> 1 0
> 2 1
> 3 NULL{noformat}
> The behaviour in MySQL when there is tab & space characters in decimal values
> {noformat}
> bash-4.2$ cat /tmp/insert.csv 
> "1","aa",11.88
> "2","bb", 99.88
> "4","dd", 209.88{noformat}
>  
> {noformat}
> MariaDB [test]> load data local infile '/tmp/insert.csv' into table t2 fields 
> terminated by ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
>  Query OK, 3 rows affected, 3 warnings (0.00 sec) 
>  Records: 3 Deleted: 0 Skipped: 0 Warnings: 3
> MariaDB [test]> select * from t2;
> +--+--+---+
> | id   | name | score |
> +--+--+---+
> | 1| aa   |12 |
> | 2| bb   |   100 |
> | 4| dd   |   210 |
> +--+--+---+
>  3 rows in set (0.00 sec)
> {noformat}
> So in hive also we can make it work by skipping tab character



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25150) Tab characters are not removed before decimal conversion similar to space character which is fixed as part of HIVE-24378

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25150?focusedWorklogId=608924=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608924
 ]

ASF GitHub Bot logged work on HIVE-25150:
-

Author: ASF GitHub Bot
Created on: 09/Jun/21 05:19
Start Date: 09/Jun/21 05:19
Worklog Time Spent: 10m 
  Work Description: maheshk114 merged pull request #2308:
URL: https://github.com/apache/hive/pull/2308


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608924)
Time Spent: 1h 10m  (was: 1h)

> Tab characters are not removed before decimal conversion similar to space 
> character which is fixed as part of HIVE-24378
> 
>
> Key: HIVE-25150
> URL: https://issues.apache.org/jira/browse/HIVE-25150
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Taraka Rama Rao Lethavadla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Test case: 
>  column values with space and tab character 
> {noformat}
> bash-4.2$ cat data/files/test_dec_space.csv
> 1,0
> 2, 1
> 3,2{noformat}
> {noformat}
> create external table test_dec_space (id int, value decimal) ROW FORMAT 
> DELIMITED
>  FIELDS TERMINATED BY ',' location '/tmp/test_dec_space';
> {noformat}
> output of select * from test_dec_space would be
> {noformat}
> 1 0
> 2 1
> 3 NULL{noformat}
> The behaviour in MySQL when there is tab & space characters in decimal values
> {noformat}
> bash-4.2$ cat /tmp/insert.csv 
> "1","aa",11.88
> "2","bb", 99.88
> "4","dd", 209.88{noformat}
>  
> {noformat}
> MariaDB [test]> load data local infile '/tmp/insert.csv' into table t2 fields 
> terminated by ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
>  Query OK, 3 rows affected, 3 warnings (0.00 sec) 
>  Records: 3 Deleted: 0 Skipped: 0 Warnings: 3
> MariaDB [test]> select * from t2;
> +--+--+---+
> | id   | name | score |
> +--+--+---+
> | 1| aa   |12 |
> | 2| bb   |   100 |
> | 4| dd   |   210 |
> +--+--+---+
>  3 rows in set (0.00 sec)
> {noformat}
> So in hive also we can make it work by skipping tab character



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25101) Remove HBase libraries from Hive distribution

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25101?focusedWorklogId=608921=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608921
 ]

ASF GitHub Bot logged work on HIVE-25101:
-

Author: ASF GitHub Bot
Created on: 09/Jun/21 05:06
Start Date: 09/Jun/21 05:06
Worklog Time Spent: 10m 
  Work Description: stoty commented on pull request #2259:
URL: https://github.com/apache/hive/pull/2259#issuecomment-857376643


   I have noticed one more thing while testing this change.
   The hive script changes will always overwrite the hbase.aux.jar.path 
configuration parameter.
   
   Now a lot of other settings, like having and auxjars directory, or setting 
the HIVE_AUX_JARS_PATH will do the same, 
   but this change will overwrite the hbase.aux.jar.path set in hbase-site.xml 
pretty much every single time.
   
   I'm not sure how much of a problem this is, but I wanted to give a heads-up.
   
   I could explore reverting to using the HADOOP_CLASSPATH instead, though I 
have doubts if that actually works for the distributed operations.
   
   @kgyrtkirk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608921)
Time Spent: 1h  (was: 50m)

> Remove HBase libraries from Hive distribution
> -
>
> Key: HIVE-25101
> URL: https://issues.apache.org/jira/browse/HIVE-25101
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler, Hive
>Affects Versions: 4.0.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hive currently packages HBase libraries into its lib directory.
> It also adds the HBase libraries separately to its classpath in the hive 
> startup script.
> Having both mechanisms is redundant, and it also causes errors, as the 
> standard HBase libraries packaged into Hive are unshaded, while the libraries 
> added by _hbase mapredcp_
> are shaded, and the two are NOT compatible when custom coprocessors are used, 
> and in some cases the classpaths during local execution and for MR/TEZ jobs 
> are mutually incompatible.
> I propose removing all HBase libraries from the distribution, and pulling 
> them via the hbase mapredcp mechanism.
> This also solves the old problem of including ancient HBase alpha versions 
> Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-21489) EXPLAIN command throws ClassCastException in Hive

2021-06-08 Thread Ramesh Kumar Thangarajan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramesh Kumar Thangarajan reassigned HIVE-21489:
---

Assignee: Ramesh Kumar Thangarajan  (was: Daniel Dai)

> EXPLAIN command throws ClassCastException in Hive
> -
>
> Key: HIVE-21489
> URL: https://issues.apache.org/jira/browse/HIVE-21489
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.4
>Reporter: Ping Lu
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
> Attachments: HIVE-21489.1.patch, HIVE-21489.2.patch
>
>
> I'm trying to run commands like explain select * from src in hive-2.3.4,but 
> it falls with the ClassCastException: 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer cannot be cast to 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer
> Steps to reproduce:
> 1）hive.execution.engine is the default value mr
> 2）hive.security.authorization.enabled is set to true, and 
> hive.security.authorization.manager is set to 
> org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider
> 3）start hivecli to run command：explain select * from src
> I debug the code and find the issue HIVE-18778 causing the above 
> ClassCastException.If I set hive.in.test to true，the explain command can be 
> successfully executed。
> Now,I have one question,due to hive.in.test cann't be modified at runtime.how 
> to run explain command with using default authorization in hive-2.3.4,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25220) Query with union fails CBO with OOM

2021-06-08 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-25220:
--
Description: 
{code}
2021-06-08T08:15:14,450 ERROR [6241f234-77e0-4e63-9873-6eb9d655421c 
HiveServer2-Handler-Pool: Thread-79] parse.CalcitePlanner: CBO failed, skipping 
CBO. 
java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.rethrowCalciteException(CalcitePlanner.java:1728)
 ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1564)
 ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:538)
 ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12680)
 ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:428)
 ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
 ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:170)
 ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
 ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:221) 
~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:188) 
~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:600) 
~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:546) 
~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:540) 
~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
 ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
 ~[hive-service-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260)
 ~[hive-service-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:274) 
~[hive-service-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:565)
 ~[hive-service-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:551)
 ~[hive-service-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_262]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_262]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_262]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_262]
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
 ~[hive-service-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
 ~[hive-service-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
 ~[hive-service-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_262]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_262]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
 ~[hadoop-common-3.1.1.jar:?]
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
 ~[hive-service-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1]
at com.sun.proxy.$Proxy39.executeStatementAsync(Unknown Source) ~[?:?]
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)

[jira] [Work logged] (HIVE-25220) Query with union fails CBO with OOM

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25220?focusedWorklogId=608919=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608919
 ]

ASF GitHub Bot logged work on HIVE-25220:
-

Author: ASF GitHub Bot
Created on: 09/Jun/21 04:26
Start Date: 09/Jun/21 04:26
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #2372:
URL: https://github.com/apache/hive/pull/2372


   ### What changes were proposed in this pull request?
   1. Create and setup `HiveDefaultRelMetadataProvider` before the first call 
of `HiveRelFieldTrimmer`.
   2. Invalidate the metadata query on the current `RelOptCluster` instance to 
trigger the newly set MetadataQuery instantiation.
   
   ### Why are the changes needed?
   `HiveRelFieldTrimmer` uses `RelMetadataProvider` to get expression lineage. 
If the query contains several union operators determining expression lineage 
can result into exponential number of expressions due to UNIONs which can lead 
to OOM.
   We already have a fix for this issue but prior this patch the fix was not 
used because it is part of `HiveDefaultRelMetadataProvider` which is not used 
when `HiveRelFieldTrimmer` was called the first time.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   It is not straight forward to repro this issue from q tests since the 
RelMetadataProvider is stored in a ThreadLocal instance and the ddl statements 
prior the failing query initializes MD provider with the Hive version. 
   To avoid this I setup a small cluster with Hive, Hadoop and Tez.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608919)
Remaining Estimate: 0h
Time Spent: 10m

> Query with union fails CBO with OOM
> ---
>
> Key: HIVE-25220
> URL: https://issues.apache.org/jira/browse/HIVE-25220
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> explain cbo
> with meod AS (select max(data_as_of_date) data_as_of_date from 
> governed.cc_forecast_pnl),
> daod as (select min(f.cob_date) data_as_of_date from governed.cc_forecast_pnl 
> f, meod where f.data_as_of_date = meod.data_as_of_date),
> me_rates as (
> SELECT
> refRateFX.to_currency_code,
> refRateFX.from_currency_code,
> cast(refRateFX.exchange_rate as decimal(38,18)) exchange_rate,
> cast('GC2' AS string) currency_label
> FROM
> (SELECT MAX(fx.data_as_of_date) data_as_of_date
> FROM governed.standard_fx_rates fx, daod
> WHERE fx.data_as_of_date LIKE '%_MCR_MTD'
> and fx.data_as_of_date <= concat(daod.data_as_of_date, '_MCR_MTD')) fx, -- 
> get most recent rates not later than the delivery period
> governed.standard_fx_rates refRateFX
> WHERE refRateFX.data_as_of_date = fx.data_as_of_date
> AND refRateFX.to_currency_code = 'USD'
> UNION ALL
> SELECT
> refRateFX2.from_currency_code to_currency_code,
> refRateFX1.from_currency_code,
> cast(cast(refRateFX1.exchange_rate as double)/cast(refRateFX2.exchange_rate 
> as double) as decimal(38,18)) exchange_rate,
> CAST('GC1' AS string) currency_label
> FROM
> (SELECT MAX(fx.data_as_of_date) data_as_of_date
> FROM governed.standard_fx_rates fx, daod
> WHERE fx.data_as_of_date LIKE '%_MCR_MTD'
> and fx.data_as_of_date <= concat(daod.data_as_of_date, '_MCR_MTD')) fx, -- 
> get most recent rates not later than the delivery period
> governed.standard_fx_rates refRateFX1,
> governed.standard_fx_rates refRateFX2
> WHERE refRateFX1.data_as_of_date = fx.data_as_of_date
> AND refRateFX2.data_as_of_date = fx.data_as_of_date
> AND refRateFX1.to_currency_code = 'USD'
> AND refRateFX2.from_currency_code = 'CHF'
> AND refRateFX2.to_currency_code = 'USD'
> ),
> cc_func_hier_filter as(
> SELECT DISTINCT LEVEL10 FUNCTION_CD
> FROM GOVERNED.CC_CYBOS_HIER_FUNCTION
> WHERE DATA_AS_OF_DATE in
> (SELECT MAX(DATA_AS_OF_DATE) FROM GOVERNED.CC_CYBOS_HIER_FUNCTION)
> AND  LEVEL2='N14954'
> ),
> cc_unified_acc_hier_filter as(
> SELECT DISTINCT LEVEL14 GROUP_ACCOUNT_CD
> FROM governed.cc_cybos_hier_acct
> WHERE DATA_AS_OF_DATE in (SELECT MAX(DATA_AS_OF_DATE) FROM 
> governed.cc_cybos_hier_acct)
> AND LEVEL1='U0' AND LEVEL6 = 'U52000'
> ),
> cc_sign_reversal as(
> SELECT DISTINCT LEVEL14 GROUP_ACCOUNT_CD, CAST(-1 AS DECIMAL(38,18)) 
> reverse_sign
> FROM governed.cc_cybos_hier_acct
> WHERE DATA_AS_OF_DATE in (SELECT MAX(DATA_AS_OF_DATE) FROM 
>

[jira] [Updated] (HIVE-25220) Query with union fails CBO with OOM

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25220:
--
Labels: pull-request-available  (was: )

> Query with union fails CBO with OOM
> ---
>
> Key: HIVE-25220
> URL: https://issues.apache.org/jira/browse/HIVE-25220
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> explain cbo
> with meod AS (select max(data_as_of_date) data_as_of_date from 
> governed.cc_forecast_pnl),
> daod as (select min(f.cob_date) data_as_of_date from governed.cc_forecast_pnl 
> f, meod where f.data_as_of_date = meod.data_as_of_date),
> me_rates as (
> SELECT
> refRateFX.to_currency_code,
> refRateFX.from_currency_code,
> cast(refRateFX.exchange_rate as decimal(38,18)) exchange_rate,
> cast('GC2' AS string) currency_label
> FROM
> (SELECT MAX(fx.data_as_of_date) data_as_of_date
> FROM governed.standard_fx_rates fx, daod
> WHERE fx.data_as_of_date LIKE '%_MCR_MTD'
> and fx.data_as_of_date <= concat(daod.data_as_of_date, '_MCR_MTD')) fx, -- 
> get most recent rates not later than the delivery period
> governed.standard_fx_rates refRateFX
> WHERE refRateFX.data_as_of_date = fx.data_as_of_date
> AND refRateFX.to_currency_code = 'USD'
> UNION ALL
> SELECT
> refRateFX2.from_currency_code to_currency_code,
> refRateFX1.from_currency_code,
> cast(cast(refRateFX1.exchange_rate as double)/cast(refRateFX2.exchange_rate 
> as double) as decimal(38,18)) exchange_rate,
> CAST('GC1' AS string) currency_label
> FROM
> (SELECT MAX(fx.data_as_of_date) data_as_of_date
> FROM governed.standard_fx_rates fx, daod
> WHERE fx.data_as_of_date LIKE '%_MCR_MTD'
> and fx.data_as_of_date <= concat(daod.data_as_of_date, '_MCR_MTD')) fx, -- 
> get most recent rates not later than the delivery period
> governed.standard_fx_rates refRateFX1,
> governed.standard_fx_rates refRateFX2
> WHERE refRateFX1.data_as_of_date = fx.data_as_of_date
> AND refRateFX2.data_as_of_date = fx.data_as_of_date
> AND refRateFX1.to_currency_code = 'USD'
> AND refRateFX2.from_currency_code = 'CHF'
> AND refRateFX2.to_currency_code = 'USD'
> ),
> cc_func_hier_filter as(
> SELECT DISTINCT LEVEL10 FUNCTION_CD
> FROM GOVERNED.CC_CYBOS_HIER_FUNCTION
> WHERE DATA_AS_OF_DATE in
> (SELECT MAX(DATA_AS_OF_DATE) FROM GOVERNED.CC_CYBOS_HIER_FUNCTION)
> AND  LEVEL2='N14954'
> ),
> cc_unified_acc_hier_filter as(
> SELECT DISTINCT LEVEL14 GROUP_ACCOUNT_CD
> FROM governed.cc_cybos_hier_acct
> WHERE DATA_AS_OF_DATE in (SELECT MAX(DATA_AS_OF_DATE) FROM 
> governed.cc_cybos_hier_acct)
> AND LEVEL1='U0' AND LEVEL6 = 'U52000'
> ),
> cc_sign_reversal as(
> SELECT DISTINCT LEVEL14 GROUP_ACCOUNT_CD, CAST(-1 AS DECIMAL(38,18)) 
> reverse_sign
> FROM governed.cc_cybos_hier_acct
> WHERE DATA_AS_OF_DATE in (SELECT MAX(DATA_AS_OF_DATE) FROM 
> governed.cc_cybos_hier_acct)
> AND ((LEVEL1='U0' AND LEVEL5 = 'U30175') OR (LEVEL2 = 'EAR90006'))
> ),
> cc_unified_acc_hier as(
> SELECT DISTINCT TRIM(level14) level14
> FROM provision.cc_hier_unified_acct_vw
> WHERE level5_desc = 'Total operating expense'
> AND TRIM(level14) NOT IN
> (SELECT group_account_cd FROM governed.cc_temp_reg_exclude_rules
> WHERE data_as_of_date IN (SELECT MAX(data_as_of_date) from 
> governed.cc_temp_reg_exclude_rules))
> ),
> tempreg as(
> SELECT function_cd, tt_cd
> FROM governed.cc_temp_reg_rules
> WHERE data_as_of_date IN (SELECT MAX(data_as_of_date) FROM 
> governed.cc_temp_reg_rules)
> ),
> gov as(
> select cob_date, count(*) as gov_count, sum(case when measure_amt <> 0 then 1 
> else 0 end) gov_non_zero_count, sum(MEASURE_AMT) as gov_amt
> from (
> select pnl.cob_date,
> CASE WHEN tr.function_cd IS NOT NULL AND h.level14 IS NOT NULL THEN tr.TT_CD 
> ELSE NULL END AS PERFORMANCE_VIEW_TYPE,
> pnl.company_code,
> pnl.function_code,
> pnl.group_account_code,
> pnl.gaap_code,
> 'Actual Rate' AS CURRENCY_TYPE,
> me.to_currency_code AS CURRENCY_CODE,
> pnl.group_account_code MEASURE_ID,
> sum(CAST(cast((cast(pnl.posting_lc_amt as double) * cast(NVL(sr.reverse_sign, 
> 1) as double)) as double) * cast(me.exchange_rate as double) as 
> decimal(38,18))) as MEASURE_AMT,
> 'FORECAST' AS PROJECTION_TYPE,
> CASE WHEN GROUP_ACCOUNT_CODE LIKE 'EAR%' THEN 'RETAINED_EARNINGS' ELSE 'PNL' 
> END AS MACRO_MEASURE,
> me.currency_label AS MACRO_MEASURE_SUB_TYPE,
> pnl.cob_date AS partition_date_key
> from governed.cc_forecast_pnl pnl,
> me_rates me
> left outer join cc_func_hier_filter fHier
> on pnl.function_code = fHier.FUNCTION_CD
> left outer join cc_unified_acc_hier_filter aHier
> on pnl.group_account_code = aHier.group_account_cd
> left outer join cc_sign_reversal sr
>

[jira] [Updated] (HIVE-25220) Query with union fails CBO with OOM

2021-06-08 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-25220:
--
Description: 
{code}
explain cbo
with meod AS (select max(data_as_of_date) data_as_of_date from 
governed.cc_forecast_pnl),
daod as (select min(f.cob_date) data_as_of_date from governed.cc_forecast_pnl 
f, meod where f.data_as_of_date = meod.data_as_of_date),
me_rates as (
SELECT
refRateFX.to_currency_code,
refRateFX.from_currency_code,
cast(refRateFX.exchange_rate as decimal(38,18)) exchange_rate,
cast('GC2' AS string) currency_label
FROM
(SELECT MAX(fx.data_as_of_date) data_as_of_date
FROM governed.standard_fx_rates fx, daod
WHERE fx.data_as_of_date LIKE '%_MCR_MTD'
and fx.data_as_of_date <= concat(daod.data_as_of_date, '_MCR_MTD')) fx, -- get 
most recent rates not later than the delivery period
governed.standard_fx_rates refRateFX
WHERE refRateFX.data_as_of_date = fx.data_as_of_date
AND refRateFX.to_currency_code = 'USD'
UNION ALL
SELECT
refRateFX2.from_currency_code to_currency_code,
refRateFX1.from_currency_code,
cast(cast(refRateFX1.exchange_rate as double)/cast(refRateFX2.exchange_rate as 
double) as decimal(38,18)) exchange_rate,
CAST('GC1' AS string) currency_label
FROM
(SELECT MAX(fx.data_as_of_date) data_as_of_date
FROM governed.standard_fx_rates fx, daod
WHERE fx.data_as_of_date LIKE '%_MCR_MTD'
and fx.data_as_of_date <= concat(daod.data_as_of_date, '_MCR_MTD')) fx, -- get 
most recent rates not later than the delivery period
governed.standard_fx_rates refRateFX1,
governed.standard_fx_rates refRateFX2
WHERE refRateFX1.data_as_of_date = fx.data_as_of_date
AND refRateFX2.data_as_of_date = fx.data_as_of_date
AND refRateFX1.to_currency_code = 'USD'
AND refRateFX2.from_currency_code = 'CHF'
AND refRateFX2.to_currency_code = 'USD'
),
cc_func_hier_filter as(
SELECT DISTINCT LEVEL10 FUNCTION_CD
FROM GOVERNED.CC_CYBOS_HIER_FUNCTION
WHERE DATA_AS_OF_DATE in
(SELECT MAX(DATA_AS_OF_DATE) FROM GOVERNED.CC_CYBOS_HIER_FUNCTION)
AND  LEVEL2='N14954'
),
cc_unified_acc_hier_filter as(
SELECT DISTINCT LEVEL14 GROUP_ACCOUNT_CD
FROM governed.cc_cybos_hier_acct
WHERE DATA_AS_OF_DATE in (SELECT MAX(DATA_AS_OF_DATE) FROM 
governed.cc_cybos_hier_acct)
AND LEVEL1='U0' AND LEVEL6 = 'U52000'
),
cc_sign_reversal as(
SELECT DISTINCT LEVEL14 GROUP_ACCOUNT_CD, CAST(-1 AS DECIMAL(38,18)) 
reverse_sign
FROM governed.cc_cybos_hier_acct
WHERE DATA_AS_OF_DATE in (SELECT MAX(DATA_AS_OF_DATE) FROM 
governed.cc_cybos_hier_acct)
AND ((LEVEL1='U0' AND LEVEL5 = 'U30175') OR (LEVEL2 = 'EAR90006'))
),
cc_unified_acc_hier as(
SELECT DISTINCT TRIM(level14) level14
FROM provision.cc_hier_unified_acct_vw
WHERE level5_desc = 'Total operating expense'
AND TRIM(level14) NOT IN
(SELECT group_account_cd FROM governed.cc_temp_reg_exclude_rules
WHERE data_as_of_date IN (SELECT MAX(data_as_of_date) from 
governed.cc_temp_reg_exclude_rules))
),
tempreg as(
SELECT function_cd, tt_cd
FROM governed.cc_temp_reg_rules
WHERE data_as_of_date IN (SELECT MAX(data_as_of_date) FROM 
governed.cc_temp_reg_rules)
),
gov as(
select cob_date, count(*) as gov_count, sum(case when measure_amt <> 0 then 1 
else 0 end) gov_non_zero_count, sum(MEASURE_AMT) as gov_amt
from (
select pnl.cob_date,
CASE WHEN tr.function_cd IS NOT NULL AND h.level14 IS NOT NULL THEN tr.TT_CD 
ELSE NULL END AS PERFORMANCE_VIEW_TYPE,
pnl.company_code,
pnl.function_code,
pnl.group_account_code,
pnl.gaap_code,
'Actual Rate' AS CURRENCY_TYPE,
me.to_currency_code AS CURRENCY_CODE,
pnl.group_account_code MEASURE_ID,
sum(CAST(cast((cast(pnl.posting_lc_amt as double) * cast(NVL(sr.reverse_sign, 
1) as double)) as double) * cast(me.exchange_rate as double) as 
decimal(38,18))) as MEASURE_AMT,
'FORECAST' AS PROJECTION_TYPE,
CASE WHEN GROUP_ACCOUNT_CODE LIKE 'EAR%' THEN 'RETAINED_EARNINGS' ELSE 'PNL' 
END AS MACRO_MEASURE,
me.currency_label AS MACRO_MEASURE_SUB_TYPE,
pnl.cob_date AS partition_date_key
from governed.cc_forecast_pnl pnl,
me_rates me
left outer join cc_func_hier_filter fHier
on pnl.function_code = fHier.FUNCTION_CD
left outer join cc_unified_acc_hier_filter aHier
on pnl.group_account_code = aHier.group_account_cd
left outer join cc_sign_reversal sr
on pnl.group_account_code = sr.group_account_cd
left outer join tempreg tr
on pnl.function_code = tr.function_cd
left outer join cc_unified_acc_hier h
on pnl.group_account_code = h.level14
WHERE me.from_currency_code = (CASE WHEN pnl.local_currency_code LIKE 'AR' 
THEN SUBSTR(pnl.local_currency_code, 1, 3) ELSE pnl.local_currency_code END) 
and data_as_of_date in
(select max(data_as_of_date) from governed.cc_forecast_pnl)
AND (fHier.FUNCTION_CD IS NOT NULL OR aHier.group_account_cd IS NOT NULL)
group by pnl.cob_date,CASE WHEN tr.function_cd IS NOT NULL AND h.level14 IS NOT 
NULL THEN tr.TT_CD ELSE NULL END,
pnl.company_code,pnl.function_code,pnl.group_account_code,pnl.gaap_code,me.to_currency_code,me.currency_label)a
group by cob_date

[jira] [Assigned] (HIVE-25220) Query with union fails CBO with OOM

2021-06-08 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-25220:
-


> Query with union fails CBO with OOM
> ---
>
> Key: HIVE-25220
> URL: https://issues.apache.org/jira/browse/HIVE-25220
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25213) Implement List getTables() for existing connectors.

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25213?focusedWorklogId=608885=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608885
 ]

ASF GitHub Bot logged work on HIVE-25213:
-

Author: ASF GitHub Bot
Created on: 09/Jun/21 01:55
Start Date: 09/Jun/21 01:55
Worklog Time Spent: 10m 
  Work Description: dantongdong opened a new pull request #2371:
URL: https://github.com/apache/hive/pull/2371


   [HIVE-25213](https://issues.apache.org/jira/browse/HIVE-25213): Implement 
List getTables() for existing connectors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608885)
Remaining Estimate: 0h
Time Spent: 10m

> Implement List getTables() for existing connectors.
> --
>
> Key: HIVE-25213
> URL: https://issues.apache.org/jira/browse/HIVE-25213
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Assignee: Dantong Dong
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the initial implementation, connector providers do not implement the 
> getTables(string pattern) spi. We had deferred it for later. Only 
> getTableNames() and getTable() were implemented. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25213) Implement List getTables() for existing connectors.

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25213:
--
Labels: pull-request-available  (was: )

> Implement List getTables() for existing connectors.
> --
>
> Key: HIVE-25213
> URL: https://issues.apache.org/jira/browse/HIVE-25213
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Assignee: Dantong Dong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the initial implementation, connector providers do not implement the 
> getTables(string pattern) spi. We had deferred it for later. Only 
> getTableNames() and getTable() were implemented. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24994) get_aggr_stats_for call fail with "Tried to send an out-of-range integer"

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24994?focusedWorklogId=608839=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608839
 ]

ASF GitHub Bot logged work on HIVE-24994:
-

Author: ASF GitHub Bot
Created on: 09/Jun/21 00:09
Start Date: 09/Jun/21 00:09
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #2162:
URL: https://github.com/apache/hive/pull/2162#issuecomment-857274702


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608839)
Time Spent: 0.5h  (was: 20m)

> get_aggr_stats_for call fail with "Tried to send an out-of-range integer"
> -
>
> Key: HIVE-24994
> URL: https://issues.apache.org/jira/browse/HIVE-24994
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> aggrColStatsForPartitions call fail with the Postgres LIMIT if the no of 
> partitions passed in the direct sql goes beyond the 32767
> {code:java}
> postgresql.util.PSQLException: An I/O error occurred while sending to the 
> backend.
>  at 
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:337) 
> ~[postgresql-42.2.8.jar:42.2.8]
>  at 
> org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:446) 
> ~[postgresql-42.2.8.jar:42.2.8]
>  at 
> org.postgresql.jdbc.PgStatement.execute(PgStatement.java:370) 
> ~[postgresql-42.2.8.jar:42.2.8]
>  at 
> org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:149)
>  ~[postgresql-42.2.8.jar:42.2.8]
>  at 
> org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:108)
>  ~[postgresql-42.2.8.jar:42.2.8]
>  at 
> com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
>  ~[HikariCP-2.6.1.jar:?]
>  at 
> com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
>  [HikariCP-2.6.1.jar:?]
>  at 
> org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeQuery(ParamLoggingPreparedStatement.java:375)
>  [datanucleus-rdbms-4.1.19.jar:?]
>  at 
> org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:552)
>  [datanucleus-rdbms-4.1.19.jar:?]
>  at 
> org.datanucleus.store.rdbms.query.SQLQuery.performExecute(SQLQuery.java:645) 
> [datanucleus-rdbms-4.1.19.jar:?]
>  at 
> org.datanucleus.store.query.Query.executeQuery(Query.java:1855) 
> [datanucleus-core-4.1.17.jar:?]
>  at 
> org.datanucleus.store.rdbms.query.SQLQuery.executeWithArray(SQLQuery.java:807)
>  [datanucleus-rdbms-4.1.19.jar:?]
>  at 
> org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:368) 
> [datanucleus-api-jdo-4.2.4.jar:?]
>  at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:267) 
> [datanucleus-api-jdo-4.2.4.jar:?]
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:2058)
>  [hive-exec-3.1.0.3.1.5.6019-4.jar:3.1.0.3.1.5.6019-4]
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:2050)
>  [hive-exec-3.1.0.3.1.5.6019-4.jar:3.1.0.3.1.5.6019-4]
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.access$1500(MetaStoreDirectSql.java:110)
>  [hive-exec-3.1.0.3.1.5.6019-4.jar:3.1.0.3.1.5.6019-4]
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql$15$1.run(MetaStoreDirectSql.java:1530)
>  [hive-exec-3.1.0.3.1.5.6019-4.jar:3.1.0.3.1.5.6019-4]
>  at 
> org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:73) 
> [hive-exec-3.1.0.3.1.5.6019-4.jar:3.1.0.3.1.5.6019-4]
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql$15.run(MetaStoreDirectSql.java:1521)
>  [hive-exec-3.1.0.3.1.5.6019-4.jar:3.1.0.3.1.5.6019-4]
>  at

[jira] [Updated] (HIVE-25219) Backward incompatible timestamp serialization in Avro for certain timezones

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25219:
--
Labels: pull-request-available  (was: )

> Backward incompatible timestamp serialization in Avro for certain timezones
> ---
>
> Key: HIVE-25219
> URL: https://issues.apache.org/jira/browse/HIVE-25219
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
> performed and to some extend how timestamps are serialized and deserialized 
> in files (Parquet, Avro).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in Avro 
> files is not backwards compatible. In other words writing timestamps with a 
> version of Hive that includes HIVE-12192/HIVE-20007 and reading them with 
> another (not including the previous issues) may lead to different results 
> depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to 
> US/Pacific.
> At apache/master commit eedcd82bc2d61861a27205f925ba0ffab9b6bca8
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO
>  LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> |1|1880-01-01 00:00:00|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO
>  LOCATION '/tmp/hiveexttbl/employee';
> SELECT * FROM employee;
> {code}
> |1|1879-12-31 23:52:58|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25219) Backward incompatible timestamp serialization in Avro for certain timezones

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25219?focusedWorklogId=608808=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608808
 ]

ASF GitHub Bot logged work on HIVE-25219:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 22:35
Start Date: 08/Jun/21 22:35
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request #2370:
URL: https://github.com/apache/hive/pull/2370


   ### What changes were proposed in this pull request?
   1. Add new read/write config properties to control legacy zone conversions 
in Avro.
   2. Exploit file metadata and property to choose between new/old conversion 
rules.
   
   ### Why are the changes needed?
   Provide the end-users the possibility to write backward compatible 
timestamps in Parquet files so that files can be read correctly by older 
versions.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   1. New qtests for writting Avro timestamps (`avro_write_legacy_timestamp.q`, 
`avro_write_new_timestamp.q`)
   2. Manual tests
   * Export Avro table with current Hive version setting 
`hive.avro.timestamp.write.legacy.conversion.enabled=true`
   * Read from external Parquet table with Hive 2 (commit 324f9fa)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608808)
Remaining Estimate: 0h
Time Spent: 10m

> Backward incompatible timestamp serialization in Avro for certain timezones
> ---
>
> Key: HIVE-25219
> URL: https://issues.apache.org/jira/browse/HIVE-25219
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
> performed and to some extend how timestamps are serialized and deserialized 
> in files (Parquet, Avro).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in Avro 
> files is not backwards compatible. In other words writing timestamps with a 
> version of Hive that includes HIVE-12192/HIVE-20007 and reading them with 
> another (not including the previous issues) may lead to different results 
> depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to 
> US/Pacific.
> At apache/master commit eedcd82bc2d61861a27205f925ba0ffab9b6bca8
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO
>  LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> |1|1880-01-01 00:00:00|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO
>  LOCATION '/tmp/hiveexttbl/employee';
> SELECT * FROM employee;
> {code}
> |1|1879-12-31 23:52:58|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25219) Backward incompatible timestamp serialization in Avro for certain timezones

2021-06-08 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359588#comment-17359588
 ] 

Stamatis Zampetakis commented on HIVE-25219:


The issue describes the same problem with HIVE-25104 but for Avro instead of 
Parquet.

> Backward incompatible timestamp serialization in Avro for certain timezones
> ---
>
> Key: HIVE-25219
> URL: https://issues.apache.org/jira/browse/HIVE-25219
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Fix For: 4.0.0
>
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
> performed and to some extend how timestamps are serialized and deserialized 
> in files (Parquet, Avro).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in Avro 
> files is not backwards compatible. In other words writing timestamps with a 
> version of Hive that includes HIVE-12192/HIVE-20007 and reading them with 
> another (not including the previous issues) may lead to different results 
> depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to 
> US/Pacific.
> At apache/master commit eedcd82bc2d61861a27205f925ba0ffab9b6bca8
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO
>  LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> |1|1880-01-01 00:00:00|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO
>  LOCATION '/tmp/hiveexttbl/employee';
> SELECT * FROM employee;
> {code}
> |1|1879-12-31 23:52:58|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25219) Backward incompatible timestamp serialization in Avro for certain timezones

2021-06-08 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-25219:
--


> Backward incompatible timestamp serialization in Avro for certain timezones
> ---
>
> Key: HIVE-25219
> URL: https://issues.apache.org/jira/browse/HIVE-25219
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Fix For: 4.0.0
>
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
> performed and to some extend how timestamps are serialized and deserialized 
> in files (Parquet, Avro).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in Avro 
> files is not backwards compatible. In other words writing timestamps with a 
> version of Hive that includes HIVE-12192/HIVE-20007 and reading them with 
> another (not including the previous issues) may lead to different results 
> depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to 
> US/Pacific.
> At apache/master commit eedcd82bc2d61861a27205f925ba0ffab9b6bca8
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO
>  LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> |1|1880-01-01 00:00:00|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO
>  LOCATION '/tmp/hiveexttbl/employee';
> SELECT * FROM employee;
> {code}
> |1|1879-12-31 23:52:58|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-18284?focusedWorklogId=608671=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608671
 ]

ASF GitHub Bot logged work on HIVE-18284:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 18:45
Start Date: 08/Jun/21 18:45
Worklog Time Spent: 10m 
  Work Description: Vinodh-thimmisetty commented on pull request #1400:
URL: https://github.com/apache/hive/pull/1400#issuecomment-857008049


   Hi @kgyrtkirk, Does it have any impact If we include LIMIT after Distribute 
by clause ?
   We had the same issue, but luckily the table size was small. So, by 
including LIMIT **, we are able to insert overwrite with distribute by key.
   **Note:** I have ran with both mr and tez executions engine types 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608671)
Time Spent: 3h 50m  (was: 3h 40m)

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2, 3.0.0, 3.1.1, 3.1.2, 4.0.0
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while

[jira] [Resolved] (HIVE-23987) Upgrade arrow version to 0.11.0

2021-06-08 Thread Jesus Camacho Rodriguez (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-23987.

Fix Version/s: 4.0.0
 Assignee: Jesus Camacho Rodriguez  (was: Barnabas Maidics)
   Resolution: Fixed

> Upgrade arrow version to 0.11.0
> ---
>
> Key: HIVE-23987
> URL: https://issues.apache.org/jira/browse/HIVE-23987
> Project: Hive
>  Issue Type: Improvement
>Reporter: Barnabas Maidics
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As part of [HIVE-23890|https://issues.apache.org/jira/browse/HIVE-23890], 
> we're introducing flatbuffers as a dependency. 
> Arrow 0.10.0 has an unofficial flatbuffer dependency, which is incompatible 
> with the official ones: https://issues.apache.org/jira/browse/ARROW-3175
> It was fixed in 0.11.0. We should upgrade to that version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23987) Upgrade arrow version to 0.11.0

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23987?focusedWorklogId=608656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608656
 ]

ASF GitHub Bot logged work on HIVE-23987:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 17:57
Start Date: 08/Jun/21 17:57
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #2366:
URL: https://github.com/apache/hive/pull/2366


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608656)
Time Spent: 20m  (was: 10m)

> Upgrade arrow version to 0.11.0
> ---
>
> Key: HIVE-23987
> URL: https://issues.apache.org/jira/browse/HIVE-23987
> Project: Hive
>  Issue Type: Improvement
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As part of [HIVE-23890|https://issues.apache.org/jira/browse/HIVE-23890], 
> we're introducing flatbuffers as a dependency. 
> Arrow 0.10.0 has an unofficial flatbuffer dependency, which is incompatible 
> with the official ones: https://issues.apache.org/jira/browse/ARROW-3175
> It was fixed in 0.11.0. We should upgrade to that version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25218) Add a replication migration tool for external tables

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25218:
--
Labels: pull-request-available  (was: )

> Add a replication migration tool for external tables
> 
>
> Key: HIVE-25218
> URL: https://issues.apache.org/jira/browse/HIVE-25218
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add a tool which can confirm migration of external tables post replication 
> from one cluster to another.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25218) Add a replication migration tool for external tables

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25218?focusedWorklogId=608567=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608567
 ]

ASF GitHub Bot logged work on HIVE-25218:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 16:17
Start Date: 08/Jun/21 16:17
Worklog Time Spent: 10m 
  Work Description: ayushtkn opened a new pull request #2369:
URL: https://github.com/apache/hive/pull/2369


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608567)
Remaining Estimate: 0h
Time Spent: 10m

> Add a replication migration tool for external tables
> 
>
> Key: HIVE-25218
> URL: https://issues.apache.org/jira/browse/HIVE-25218
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add a tool which can confirm migration of external tables post replication 
> from one cluster to another.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25218) Add a replication migration tool for external tables

2021-06-08 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HIVE-25218:
---


> Add a replication migration tool for external tables
> 
>
> Key: HIVE-25218
> URL: https://issues.apache.org/jira/browse/HIVE-25218
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> Add a tool which can confirm migration of external tables post replication 
> from one cluster to another.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608551
 ]

ASF GitHub Bot logged work on HIVE-24991:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 15:59
Start Date: 08/Jun/21 15:59
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2264:
URL: https://github.com/apache/hive/pull/2264#discussion_r647582327



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -281,16 +285,23 @@ private VectorizedOrcAcidRowBatchReader(JobConf conf, 
OrcSplit orcSplit, Reporte
 deleteEventReaderOptions.range(0, Long.MAX_VALUE);
 deleteEventReaderOptions.searchArgument(null, null);
 keyInterval = findMinMaxKeys(orcSplit, conf, deleteEventReaderOptions);
+fetchDeletedRows = conf.getBoolean(Constants.ACID_FETCH_DELETED_ROWS, 
false);
 DeleteEventRegistry der;
 try {
   // See if we can load all the relevant delete events from all the
   // delete deltas in memory...
+  ColumnizedDeleteEventRegistry.OriginalWriteIdLoader writeIdLoader;
+  if (fetchDeletedRows) {
+writeIdLoader = new ColumnizedDeleteEventRegistry.BothWriteIdLoader();

Review comment:
   done

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -303,6 +314,12 @@ private VectorizedOrcAcidRowBatchReader(JobConf conf, 
OrcSplit orcSplit, Reporte
   VectorizedRowBatch.DEFAULT_SIZE, null, null, null);
 }
 rowIdProjected = areRowIdsProjected(rbCtx);
+rowIsDeletedProjected = isVirtualColumnProjected(rbCtx, 
VirtualColumn.ROWISDELETED);

Review comment:
   done

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -303,6 +314,12 @@ private VectorizedOrcAcidRowBatchReader(JobConf conf, 
OrcSplit orcSplit, Reporte
   VectorizedRowBatch.DEFAULT_SIZE, null, null, null);
 }
 rowIdProjected = areRowIdsProjected(rbCtx);
+rowIsDeletedProjected = isVirtualColumnProjected(rbCtx, 
VirtualColumn.ROWISDELETED);
+if (rowIsDeletedProjected) {
+  rowIsDeletedVector = new RowIsDeletedColumnVector();

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608551)
Time Spent: 4h 20m  (was: 4h 10m)

> Enable fetching deleted rows in vectorized mode
> ---
>
> Key: HIVE-24991
> URL: https://issues.apache.org/jira/browse/HIVE-24991
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> HIVE-24855 enables loading deleted rows from ORC tables when table property 
> *acid.fetch.deleted.rows* is true.
> The goal of this jira is to enable this feature in vectorized orc batch 
> reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608553
 ]

ASF GitHub Bot logged work on HIVE-24991:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 15:59
Start Date: 08/Jun/21 15:59
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2264:
URL: https://github.com/apache/hive/pull/2264#discussion_r647583124



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -1748,7 +1946,7 @@ public int compareTo(CompressedOwid other) {
   assert shouldReadDeleteDeltasWithLlap(conf, true);
 }
 deleteReaderValue = new DeleteReaderValue(readerData.reader, 
deleteDeltaFile, readerOptions, bucket,
-validWriteIdList, isBucketedTable, conf, keyInterval, 
orcSplit, numRows, cacheTag, fileId);
+validWriteIdList, isBucketedTable, conf, keyInterval, 
orcSplit, numRows, cacheTag, fileId);

Review comment:
   reverted




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608553)
Time Spent: 4h 40m  (was: 4.5h)

> Enable fetching deleted rows in vectorized mode
> ---
>
> Key: HIVE-24991
> URL: https://issues.apache.org/jira/browse/HIVE-24991
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> HIVE-24855 enables loading deleted rows from ORC tables when table property 
> *acid.fetch.deleted.rows* is true.
> The goal of this jira is to enable this feature in vectorized orc batch 
> reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608552=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608552
 ]

ASF GitHub Bot logged work on HIVE-24991:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 15:59
Start Date: 08/Jun/21 15:59
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2264:
URL: https://github.com/apache/hive/pull/2264#discussion_r647582736



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -948,7 +978,7 @@ public boolean next(NullWritable key, VectorizedRowBatch 
value) throws IOExcepti
   // This loop fills up the selected[] vector with all the index positions 
that are selected.
   for (int setBitIndex = selectedBitSet.nextSetBit(0), selectedItr = 0;
setBitIndex >= 0;
-   setBitIndex = selectedBitSet.nextSetBit(setBitIndex+1), 
++selectedItr) {
+   setBitIndex = selectedBitSet.nextSetBit(setBitIndex + 1), 
++selectedItr) {

Review comment:
   reverted




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608552)
Time Spent: 4.5h  (was: 4h 20m)

> Enable fetching deleted rows in vectorized mode
> ---
>
> Key: HIVE-24991
> URL: https://issues.apache.org/jira/browse/HIVE-24991
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> HIVE-24855 enables loading deleted rows from ORC tables when table property 
> *acid.fetch.deleted.rows* is true.
> The goal of this jira is to enable this feature in vectorized orc batch 
> reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25081) Put metrics collection behind a feature flag

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25081?focusedWorklogId=608549=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608549
 ]

ASF GitHub Bot logged work on HIVE-25081:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 15:58
Start Date: 08/Jun/21 15:58
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2332:
URL: https://github.com/apache/hive/pull/2332#discussion_r647567066



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
##
@@ -120,7 +120,7 @@ public void run() {
 // don't doom the entire thread.
 try {
   handle = 
txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Initiator.name());
-  if (metricsEnabled) {
+  if (metricsEnabled && MetastoreConf.getBoolVar(conf, 
MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) {

Review comment:
   Same as cleaner

##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##
@@ -454,6 +454,8 @@ public static ConfVars getMetaConf(String name) {
 "hive.metastore.acidmetrics.check.interval", 300,
 TimeUnit.SECONDS,
 "Time in seconds between acid related metric collection runs."),
+METASTORE_ACIDMETRICS_EXT_ON("metastore.acidmetrics.ext.on", 
"hive.metastore.acidmetrics.ext.on", true,
+"Whether to collect additional acid related metrics outside of the 
acid metrics service."),

Review comment:
   I think these are only enabled if `MetastoreConf.getBoolVar(conf, 
MetastoreConf.ConfVars.METRICS_ENABLED)==true` , so it would be good to mention 
that in the description

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##
@@ -115,41 +117,45 @@ public static DeltaFilesMetricReporter getInstance() {
 return InstanceHolder.instance;
   }
 
-  public static synchronized void init(HiveConf conf){
+  public static synchronized void init(HiveConf conf) {
 getInstance().configure(conf);
   }
 
   public void submit(TezCounters counters) {
-updateMetrics(NUM_OBSOLETE_DELTAS,
-obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, 
counters);
-updateMetrics(NUM_DELTAS,
-deltaCache, deltaTopN, deltasThreshold, counters);
-updateMetrics(NUM_SMALL_DELTAS,
-smallDeltaCache, smallDeltaTopN, deltasThreshold, counters);
+if(acidMetricsExtEnabled) {
+  updateMetrics(NUM_OBSOLETE_DELTAS,
+  obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, 
counters);
+  updateMetrics(NUM_DELTAS,
+  deltaCache, deltaTopN, deltasThreshold, counters);
+  updateMetrics(NUM_SMALL_DELTAS,
+  smallDeltaCache, smallDeltaTopN, deltasThreshold, counters);
+}
   }
 
-  public static void mergeDeltaFilesStats(AcidDirectory dir, long 
checkThresholdInSec,
-float deltaPctThreshold, EnumMap> deltaFilesStats) throws IOException {
-long baseSize = getBaseSize(dir);
-int numObsoleteDeltas = getNumObsoleteDeltas(dir, checkThresholdInSec);
+  public static void mergeDeltaFilesStats(AcidDirectory dir, long 
checkThresholdInSec, float deltaPctThreshold,
+  EnumMap> deltaFilesStats, 
Configuration conf) throws IOException {
+if (MetastoreConf.getBoolVar(conf, 
MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) {

Review comment:
   Instead of adding the check here, it makes a bit more sense to add it to 
these checks in 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat#generateSplitsInfo:
   ```
   if (metricsEnabled && directory instanceof AcidDirectory) {
 DeltaFilesMetricReporter.mergeDeltaFilesStats((AcidDirectory) 
directory, checkThresholdInSec,
 deltaPctThreshold, deltaFilesStats);
   }
   ```
   

##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -111,7 +111,7 @@ public void run() {
 // so wrap it in a big catch Throwable statement.
 try {
   handle = 
txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Cleaner.name());
-  if (metricsEnabled) {
+  if (metricsEnabled && MetastoreConf.getBoolVar(conf, 
MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) {

Review comment:
   I think this is the same logic as `metricsEnabled  = 
   MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METRICS_ENABLED) && 
MetastoreConf.getBoolVar(conf, 
MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)`
   right?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##
@@ -115,41 +117,45 @@ public static DeltaFilesMetricReporter getInstance() {
 return InstanceHolder.instance;
   }
 
-  public static synchronized void init(HiveConf conf){
+

[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608548=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608548
 ]

ASF GitHub Bot logged work on HIVE-24991:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 15:58
Start Date: 08/Jun/21 15:58
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2264:
URL: https://github.com/apache/hive/pull/2264#discussion_r647582032



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -1940,39 +2091,38 @@ public boolean isEmpty() {
 }
 @Override
 public void findDeletedRecords(ColumnVector[] cols, int size, BitSet 
selectedBitSet) {
-  if (rowIds == null || compressedOwids == null) {
+  if (rowIds == null || writeIds == null || writeIds.isEmpty()) {
 return;
   }
   // Iterate through the batch and for each (owid, rowid) in the batch
   // check if it is deleted or not.
 
   long[] originalWriteIdVector =
-  cols[OrcRecordUpdater.ORIGINAL_WRITEID].isRepeating ? null
-  : ((LongColumnVector) 
cols[OrcRecordUpdater.ORIGINAL_WRITEID]).vector;
+  cols[OrcRecordUpdater.ORIGINAL_WRITEID].isRepeating ? null

Review comment:
   reverted

##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedOrcAcidRowBatchReader.java
##
@@ -961,26 +966,41 @@ private void testDeleteEventOriginalFiltering2() throws 
Exception {
 
   @Test
   public void testVectorizedOrcAcidRowBatchReader() throws Exception {
+setupTestData();
+
+
testVectorizedOrcAcidRowBatchReader(ColumnizedDeleteEventRegistry.class.getName());
+
+// To test the SortMergedDeleteEventRegistry, we need to explicitly set the
+// HIVE_TRANSACTIONAL_NUM_EVENTS_IN_MEMORY constant to a smaller value.
+int oldValue = 
conf.getInt(HiveConf.ConfVars.HIVE_TRANSACTIONAL_NUM_EVENTS_IN_MEMORY.varname, 
100);
+
conf.setInt(HiveConf.ConfVars.HIVE_TRANSACTIONAL_NUM_EVENTS_IN_MEMORY.varname, 
1000);
+
testVectorizedOrcAcidRowBatchReader(SortMergedDeleteEventRegistry.class.getName());
+
+// Restore the old value.
+
conf.setInt(HiveConf.ConfVars.HIVE_TRANSACTIONAL_NUM_EVENTS_IN_MEMORY.varname, 
oldValue);
+  }
+
+  private void setupTestData() throws IOException {
 conf.set("bucket_count", "1");
-  conf.set(ValidTxnList.VALID_TXNS_KEY,
-  new ValidReadTxnList(new long[0], new BitSet(), 1000, 
Long.MAX_VALUE).writeToString());
+conf.set(ValidTxnList.VALID_TXNS_KEY,
+new ValidReadTxnList(new long[0], new BitSet(), 1000, 
Long.MAX_VALUE).writeToString());
 
 int bucket = 0;
 AcidOutputFormat.Options options = new AcidOutputFormat.Options(conf)
-.filesystem(fs)
-.bucket(bucket)
-.writingBase(false)
-.minimumWriteId(1)
-.maximumWriteId(NUM_OWID)
-.inspector(inspector)
-.reporter(Reporter.NULL)
-.recordIdColumn(1)
-.finalDestination(root);
+.filesystem(fs)

Review comment:
   reverted




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608548)
Time Spent: 4h 10m  (was: 4h)

> Enable fetching deleted rows in vectorized mode
> ---
>
> Key: HIVE-24991
> URL: https://issues.apache.org/jira/browse/HIVE-24991
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> HIVE-24855 enables loading deleted rows from ORC tables when table property 
> *acid.fetch.deleted.rows* is true.
> The goal of this jira is to enable this feature in vectorized orc batch 
> reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24135) Drop database doesn't delete directory in managed location

2021-06-08 Thread Naveen Gangam (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-24135.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Fix from PR 2454 has been reviewed and merged. Thanks @ychen for the review.

> Drop database doesn't delete directory in managed location
> --
>
> Key: HIVE-24135
> URL: https://issues.apache.org/jira/browse/HIVE-24135
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Repro:
>  say the default managed location is managed/hive and the default external 
> location is external/hive.
> {code:java}
> create database db1; -- creates: external/hive/db1.db
> create table db1.table1 (i int); -- creates: managed/hive/db1.db and  
> managed/hive/db1.db/table1
> drop database db1 cascade; -- removes : external/hive/db1.db and 
> managed/hive/db1.db/table1
> {code}
> Problem: Directory managed/hive/db1.db remains.
> Since HIVE-22995, dbs have a managed (managedLocationUri) and an external 
> location (locationUri). I think the issue is that 
> HiveMetaStore.HMSHandler#drop_database_core deletes only the db directory in 
> the external location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24135) Drop database doesn't delete directory in managed location

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24135?focusedWorklogId=608538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608538
 ]

ASF GitHub Bot logged work on HIVE-24135:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 15:41
Start Date: 08/Jun/21 15:41
Worklog Time Spent: 10m 
  Work Description: nrg4878 closed pull request #2354:
URL: https://github.com/apache/hive/pull/2354


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608538)
Time Spent: 1.5h  (was: 1h 20m)

> Drop database doesn't delete directory in managed location
> --
>
> Key: HIVE-24135
> URL: https://issues.apache.org/jira/browse/HIVE-24135
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Repro:
>  say the default managed location is managed/hive and the default external 
> location is external/hive.
> {code:java}
> create database db1; -- creates: external/hive/db1.db
> create table db1.table1 (i int); -- creates: managed/hive/db1.db and  
> managed/hive/db1.db/table1
> drop database db1 cascade; -- removes : external/hive/db1.db and 
> managed/hive/db1.db/table1
> {code}
> Problem: Directory managed/hive/db1.db remains.
> Since HIVE-22995, dbs have a managed (managedLocationUri) and an external 
> location (locationUri). I think the issue is that 
> HiveMetaStore.HMSHandler#drop_database_core deletes only the db directory in 
> the external location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24135) Drop database doesn't delete directory in managed location

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24135?focusedWorklogId=608537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608537
 ]

ASF GitHub Bot logged work on HIVE-24135:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 15:41
Start Date: 08/Jun/21 15:41
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #2354:
URL: https://github.com/apache/hive/pull/2354#issuecomment-856881562


   Thanks for the review. Fix has been committed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608537)
Time Spent: 1h 20m  (was: 1h 10m)

> Drop database doesn't delete directory in managed location
> --
>
> Key: HIVE-24135
> URL: https://issues.apache.org/jira/browse/HIVE-24135
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Repro:
>  say the default managed location is managed/hive and the default external 
> location is external/hive.
> {code:java}
> create database db1; -- creates: external/hive/db1.db
> create table db1.table1 (i int); -- creates: managed/hive/db1.db and  
> managed/hive/db1.db/table1
> drop database db1 cascade; -- removes : external/hive/db1.db and 
> managed/hive/db1.db/table1
> {code}
> Problem: Directory managed/hive/db1.db remains.
> Since HIVE-22995, dbs have a managed (managedLocationUri) and an external 
> location (locationUri). I think the issue is that 
> HiveMetaStore.HMSHandler#drop_database_core deletes only the db directory in 
> the external location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608534
 ]

ASF GitHub Bot logged work on HIVE-24991:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 15:37
Start Date: 08/Jun/21 15:37
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2264:
URL: https://github.com/apache/hive/pull/2264#discussion_r647563794



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -2039,4 +2189,29 @@ private static IntegerColumnStatistics 
deserializeIntColumnStatistics(List Enable fetching deleted rows in vectorized mode
> ---
>
> Key: HIVE-24991
> URL: https://issues.apache.org/jira/browse/HIVE-24991
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> HIVE-24855 enables loading deleted rows from ORC tables when table property 
> *acid.fetch.deleted.rows* is true.
> The goal of this jira is to enable this feature in vectorized orc batch 
> reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608531
 ]

ASF GitHub Bot logged work on HIVE-24991:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 15:36
Start Date: 08/Jun/21 15:36
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2264:
URL: https://github.com/apache/hive/pull/2264#discussion_r647562562



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -2039,4 +2189,29 @@ private static IntegerColumnStatistics 
deserializeIntColumnStatistics(List Enable fetching deleted rows in vectorized mode
> ---
>
> Key: HIVE-24991
> URL: https://issues.apache.org/jira/browse/HIVE-24991
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> HIVE-24855 enables loading deleted rows from ORC tables when table property 
> *acid.fetch.deleted.rows* is true.
> The goal of this jira is to enable this feature in vectorized orc batch 
> reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608529
 ]

ASF GitHub Bot logged work on HIVE-24991:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 15:35
Start Date: 08/Jun/21 15:35
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2264:
URL: https://github.com/apache/hive/pull/2264#discussion_r647561929



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -959,6 +989,20 @@ public boolean next(NullWritable key, VectorizedRowBatch 
value) throws IOExcepti
   int ix = rbCtx.findVirtualColumnNum(VirtualColumn.ROWID);
   value.cols[ix] = recordIdColumnVector;
 }
+if (rowIsDeletedProjected) {
+  if (fetchDeletedRows) {

Review comment:
   I prefer your first suggestion because the second one requires passing 
`vectorizedRowBatchBase.size()` to the `set` method which I would like to avoid.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608529)
Time Spent: 3h 40m  (was: 3.5h)

> Enable fetching deleted rows in vectorized mode
> ---
>
> Key: HIVE-24991
> URL: https://issues.apache.org/jira/browse/HIVE-24991
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> HIVE-24855 enables loading deleted rows from ORC tables when table property 
> *acid.fetch.deleted.rows* is true.
> The goal of this jira is to enable this feature in vectorized orc batch 
> reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608526=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608526
 ]

ASF GitHub Bot logged work on HIVE-24991:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 15:32
Start Date: 08/Jun/21 15:32
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2264:
URL: https://github.com/apache/hive/pull/2264#discussion_r647559407



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -959,6 +989,20 @@ public boolean next(NullWritable key, VectorizedRowBatch 
value) throws IOExcepti
   int ix = rbCtx.findVirtualColumnNum(VirtualColumn.ROWID);

Review comment:
   see my previous comment for `VirtualColumn.ROWISDELETED`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608526)
Time Spent: 3h 20m  (was: 3h 10m)

> Enable fetching deleted rows in vectorized mode
> ---
>
> Key: HIVE-24991
> URL: https://issues.apache.org/jira/browse/HIVE-24991
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> HIVE-24855 enables loading deleted rows from ORC tables when table property 
> *acid.fetch.deleted.rows* is true.
> The goal of this jira is to enable this feature in vectorized orc batch 
> reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608527
 ]

ASF GitHub Bot logged work on HIVE-24991:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 15:32
Start Date: 08/Jun/21 15:32
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2264:
URL: https://github.com/apache/hive/pull/2264#discussion_r647559557



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -983,7 +1027,7 @@ private void copyFromBase(VectorizedRowBatch value) {
   System.arraycopy(payloadStruct.fields, 0, value.cols, 0, 
value.getDataColumnCount());
 }
 if (rowIdProjected) {
-  recordIdColumnVector.fields[0] = 
vectorizedRowBatchBase.cols[OrcRecordUpdater.ORIGINAL_WRITEID];
+  recordIdColumnVector.fields[0] = 
vectorizedRowBatchBase.cols[fetchDeletedRows ? OrcRecordUpdater.CURRENT_WRITEID 
: OrcRecordUpdater.ORIGINAL_WRITEID];

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608527)
Time Spent: 3.5h  (was: 3h 20m)

> Enable fetching deleted rows in vectorized mode
> ---
>
> Key: HIVE-24991
> URL: https://issues.apache.org/jira/browse/HIVE-24991
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> HIVE-24855 enables loading deleted rows from ORC tables when table property 
> *acid.fetch.deleted.rows* is true.
> The goal of this jira is to enable this feature in vectorized orc batch 
> reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608522
 ]

ASF GitHub Bot logged work on HIVE-24991:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 15:31
Start Date: 08/Jun/21 15:31
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2264:
URL: https://github.com/apache/hive/pull/2264#discussion_r647558152



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -892,13 +913,20 @@ public boolean next(NullWritable key, VectorizedRowBatch 
value) throws IOExcepti
 } catch (Exception e) {
   throw new IOException("error iterating", e);
 }
-if(!includeAcidColumns) {
+if (!includeAcidColumns) {
   //if here, we don't need to filter anything wrt acid metadata columns
   //in fact, they are not even read from file/llap
   value.size = vectorizedRowBatchBase.size;
   value.selected = vectorizedRowBatchBase.selected;
   value.selectedInUse = vectorizedRowBatchBase.selectedInUse;
   copyFromBase(value);
+
+  if (rowIsDeletedProjected) {
+rowIsDeletedVector.clear();
+int ix = rbCtx.findVirtualColumnNum(VirtualColumn.ROWISDELETED);

Review comment:
   I started to work on a solution to manage Virtual Column related 
information but it lead to a much bigger change. 
`VectorizedOrcAcidRowBatchReader` can behave several ways and each of those 
behavior worth a separate class after extracting common parts.
   So I decided to followed existing logic implemented for RowId.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608522)
Time Spent: 3h  (was: 2h 50m)

> Enable fetching deleted rows in vectorized mode
> ---
>
> Key: HIVE-24991
> URL: https://issues.apache.org/jira/browse/HIVE-24991
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> HIVE-24855 enables loading deleted rows from ORC tables when table property 
> *acid.fetch.deleted.rows* is true.
> The goal of this jira is to enable this feature in vectorized orc batch 
> reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608524
 ]

ASF GitHub Bot logged work on HIVE-24991:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 15:31
Start Date: 08/Jun/21 15:31
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2264:
URL: https://github.com/apache/hive/pull/2264#discussion_r647558355



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -932,8 +960,10 @@ public boolean next(NullWritable key, VectorizedRowBatch 
value) throws IOExcepti
 }
 
 // Case 2- find rows which have been deleted.
+BitSet notDeletedBitSet = fetchDeletedRows ? (BitSet) 
selectedBitSet.clone() : selectedBitSet;

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608524)
Time Spent: 3h 10m  (was: 3h)

> Enable fetching deleted rows in vectorized mode
> ---
>
> Key: HIVE-24991
> URL: https://issues.apache.org/jira/browse/HIVE-24991
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> HIVE-24855 enables loading deleted rows from ORC tables when table property 
> *acid.fetch.deleted.rows* is true.
> The goal of this jira is to enable this feature in vectorized orc batch 
> reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25185) Improve Logging On Polling Tez Session from Pool

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25185?focusedWorklogId=608500=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608500
 ]

ASF GitHub Bot logged work on HIVE-25185:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 14:49
Start Date: 08/Jun/21 14:49
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2339:
URL: https://github.com/apache/hive/pull/2339#discussion_r647518347



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java
##
@@ -131,13 +131,14 @@ SessionType getSession() throws Exception {
   poolLock.lock();
   try {
 while ((result = pool.poll()) == null) {
-  notEmpty.await(100, TimeUnit.MILLISECONDS);
+  LOG.info("Awaiting Tez session to become available in session pool");
+  notEmpty.await(10, TimeUnit.SECONDS);

Review comment:
   Right, this is done as part of putSessionBack() method -- so I dont see 
an issue here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608500)
Time Spent: 1h 20m  (was: 1h 10m)

> Improve Logging On Polling Tez Session from Pool
> 
>
> Key: HIVE-25185
> URL: https://issues.apache.org/jira/browse/HIVE-25185
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25185) Improve Logging On Polling Tez Session from Pool

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25185?focusedWorklogId=608499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608499
 ]

ASF GitHub Bot logged work on HIVE-25185:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 14:45
Start Date: 08/Jun/21 14:45
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2339:
URL: https://github.com/apache/hive/pull/2339#discussion_r647514577



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java
##
@@ -131,13 +131,14 @@ SessionType getSession() throws Exception {
   poolLock.lock();
   try {
 while ((result = pool.poll()) == null) {
-  notEmpty.await(100, TimeUnit.MILLISECONDS);
+  LOG.info("Awaiting Tez session to become available in session pool");
+  notEmpty.await(10, TimeUnit.SECONDS);

Review comment:
   @pgaref This is not just a loop-and-sleep.  The `notEmpty` condition 
will be alerted when a session becomes available and this thread will run again 
to pickup the session.  I don't really understand why there is a timeout 
currently implemented.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608499)
Time Spent: 1h 10m  (was: 1h)

> Improve Logging On Polling Tez Session from Pool
> 
>
> Key: HIVE-25185
> URL: https://issues.apache.org/jira/browse/HIVE-25185
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25093) date_format() UDF is returning values in UTC time zone only

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25093?focusedWorklogId=608496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608496
 ]

ASF GitHub Bot logged work on HIVE-25093:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 14:36
Start Date: 08/Jun/21 14:36
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #2252:
URL: https://github.com/apache/hive/pull/2252#discussion_r647505427



##
File path: ql/src/java/org/apache/hadoop/hive/ql/util/DateTimeMath.java
##
@@ -613,4 +615,18 @@ public static Calendar getProlepticGregorianCalendarUTC() {
 calendar.setGregorianChange(new java.util.Date(Long.MIN_VALUE));
 return calendar;
   }
+
+  /**
+   * TODO - this is a temporary fix for handling Julian calendar dates.
+   * Returns a Gregorian calendar that can be used from year 0+ instead of 
default 1582.10.15.
+   * This is desirable for some UDFs that work on dates which normally would 
use Julian calendar.
+   * @return the calendar
+   */
+  public static Calendar getTimeZonedProlepticGregorianCalendar() {
+GregorianCalendar calendar = new GregorianCalendar(TimeZone.getTimeZone(
+SessionState.get() == null ? new HiveConf().getLocalTimeZone() : 
SessionState.get().getConf()

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608496)
Time Spent: 0.5h  (was: 20m)

> date_format() UDF is returning values in UTC time zone only 
> 
>
> Key: HIVE-25093
> URL: https://issues.apache.org/jira/browse/HIVE-25093
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.2
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> *HIVE - 1.2*
> sshuser@hn0-dateti:~$ *timedatectl*
>   Local time: Thu 2021-05-06 11:56:08 IST
>   Universal time: Thu 2021-05-06 06:26:08 UTC
> RTC time: Thu 2021-05-06 06:26:08
>Time zone: Asia/Kolkata (IST, +0530)
>  Network time on: yes
> NTP synchronized: yes
>  RTC in local TZ: no
> sshuser@hn0-dateti:~$ beeline
> 0: jdbc:hive2://localhost:10001/default> *select 
> date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
> +--+--+
> | _c0  |
> +--+--+
> | 2021-05-06 11:58:53.760 IST  |
> +--+--+
> 1 row selected (1.271 seconds)
> *HIVE - 3.1.0*
> sshuser@hn0-testja:~$ *timedatectl*
>   Local time: Thu 2021-05-06 12:03:32 IST
>   Universal time: Thu 2021-05-06 06:33:32 UTC
> RTC time: Thu 2021-05-06 06:33:32
>Time zone: Asia/Kolkata (IST, +0530)
>  Network time on: yes
> NTP synchronized: yes
>  RTC in local TZ: no
> sshuser@hn0-testja:~$ beeline
> 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select 
> date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
> +--+
> | _c0  |
> +--+
> | *2021-05-06 06:33:59.078 UTC*  |
> +--+
> 1 row selected (13.396 seconds)
> 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *set 
> hive.local.time.zone=Asia/Kolkata;*
> No rows affected (0.025 seconds)
> 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select 
> date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");*
> +--+
> | _c0  |
> +--+
> | *{color:red}2021-05-06 12:08:15.118 UTC{color}*  | 
> +--+
> 1 row selected (1.074 seconds)
> expected result was *2021-05-06 12:08:15.118 IST*
> As part of HIVE-12192 it was decided to have a common time zone for all 
> computation i.e. "UTC". Due to which data_format() function was hard coded to 
> "UTC".
> But later in HIVE-21039 it was decided that user session time zone value 
> should be the default not UTC. 
> date_format() was not fixed as part of HIVE-21039.
> what should be the ideal time zone value of date_format().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25185) Improve Logging On Polling Tez Session from Pool

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25185?focusedWorklogId=608494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608494
 ]

ASF GitHub Bot logged work on HIVE-25185:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 14:35
Start Date: 08/Jun/21 14:35
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2339:
URL: https://github.com/apache/hive/pull/2339#discussion_r647504478



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java
##
@@ -131,13 +131,14 @@ SessionType getSession() throws Exception {
   poolLock.lock();
   try {
 while ((result = pool.poll()) == null) {
-  notEmpty.await(100, TimeUnit.MILLISECONDS);
+  LOG.info("Awaiting Tez session to become available in session pool");
+  notEmpty.await(10, TimeUnit.SECONDS);

Review comment:
   Any chance this change can increase the time we actually wait to get a 
session from the pool assuming there is none currently available?
   It looks to me that if the next session becomes available in the next 10ms 
with the new change we might wait 10s instead -- am I missing something ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608494)
Time Spent: 1h  (was: 50m)

> Improve Logging On Polling Tez Session from Pool
> 
>
> Key: HIVE-25185
> URL: https://issues.apache.org/jira/browse/HIVE-25185
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25194) Add support for STORED AS ORC/PARQUET/AVRO for Iceberg

2021-06-08 Thread Jira



[ 
https://issues.apache.org/jira/browse/HIVE-25194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359375#comment-17359375
 ] 

László Pintér commented on HIVE-25194:
--

Merged into master. Thanks, [~mbod] and [~pvary] for the review!

> Add support for STORED AS ORC/PARQUET/AVRO for Iceberg
> --
>
> Key: HIVE-25194
> URL: https://issues.apache.org/jira/browse/HIVE-25194
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently we have to specify the fileformat in TBLPROPERTIES during Iceberg 
> create table statements.
> The ideal syntax would be:
> CREATE TABLE tbl STORED BY ICEBERG STORED AS ORC ...
> One complication is that currently stored by and stored as are not permitted 
> within the same query, so that needs to be amended.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25194) Add support for STORED AS ORC/PARQUET/AVRO for Iceberg

2021-06-08 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér resolved HIVE-25194.
--
Resolution: Fixed

> Add support for STORED AS ORC/PARQUET/AVRO for Iceberg
> --
>
> Key: HIVE-25194
> URL: https://issues.apache.org/jira/browse/HIVE-25194
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently we have to specify the fileformat in TBLPROPERTIES during Iceberg 
> create table statements.
> The ideal syntax would be:
> CREATE TABLE tbl STORED BY ICEBERG STORED AS ORC ...
> One complication is that currently stored by and stored as are not permitted 
> within the same query, so that needs to be amended.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25194) Add support for STORED AS ORC/PARQUET/AVRO for Iceberg

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25194?focusedWorklogId=608474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608474
 ]

ASF GitHub Bot logged work on HIVE-25194:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 13:55
Start Date: 08/Jun/21 13:55
Worklog Time Spent: 10m 
  Work Description: lcspinter merged pull request #2348:
URL: https://github.com/apache/hive/pull/2348


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608474)
Time Spent: 3h 10m  (was: 3h)

> Add support for STORED AS ORC/PARQUET/AVRO for Iceberg
> --
>
> Key: HIVE-25194
> URL: https://issues.apache.org/jira/browse/HIVE-25194
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently we have to specify the fileformat in TBLPROPERTIES during Iceberg 
> create table statements.
> The ideal syntax would be:
> CREATE TABLE tbl STORED BY ICEBERG STORED AS ORC ...
> One complication is that currently stored by and stored as are not permitted 
> within the same query, so that needs to be amended.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24463) Add special case for Derby and MySQL in Get Next ID DbNotificationListener

2021-06-08 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-24463.
---
Resolution: Won't Fix

> Add special case for Derby and MySQL in Get Next ID DbNotificationListener
> --
>
> Key: HIVE-24463
> URL: https://issues.apache.org/jira/browse/HIVE-24463
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> * Derby does not support {{SELECT FOR UPDATE}} statements
>  * MySQL can be optimized to use {{LAST_INSERT_ID()}}
>  
> Debry tables are locked in other parts of the code already, but not in this 
> path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24463) Add special case for Derby and MySQL in Get Next ID DbNotificationListener

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24463?focusedWorklogId=608471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608471
 ]

ASF GitHub Bot logged work on HIVE-24463:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 13:54
Start Date: 08/Jun/21 13:54
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1727:
URL: https://github.com/apache/hive/pull/1727


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608471)
Time Spent: 1h  (was: 50m)

> Add special case for Derby and MySQL in Get Next ID DbNotificationListener
> --
>
> Key: HIVE-24463
> URL: https://issues.apache.org/jira/browse/HIVE-24463
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> * Derby does not support {{SELECT FOR UPDATE}} statements
>  * MySQL can be optimized to use {{LAST_INSERT_ID()}}
>  
> Debry tables are locked in other parts of the code already, but not in this 
> path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24875) Unify InetAddress.getLocalHost()

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24875?focusedWorklogId=608462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608462
 ]

ASF GitHub Bot logged work on HIVE-24875:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 13:46
Start Date: 08/Jun/21 13:46
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #2314:
URL: https://github.com/apache/hive/pull/2314#issuecomment-856782572


   @kgyrtkirk Gentle reminder that I'm looking for a follow-up on your initial 
review. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608462)
Time Spent: 1h  (was: 50m)

> Unify InetAddress.getLocalHost()
> 
>
> Key: HIVE-24875
> URL: https://issues.apache.org/jira/browse/HIVE-24875
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Minor
>  Labels: newbie, noob, pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Lots of calls in the Hive code to {{InetAddress.getLocalHost()}}.  This 
> should be standardized onto hive-common {{ServerUtils.hostname()}}, which 
> includes removing (deprecating) a similar method in {{HiveStringUtils}}.
> Open to anyone to improve.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25144) Add NoReconnect Annotation to CreateXXX Methods With AlreadyExistsException

2021-06-08 Thread David Mollitor (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359364#comment-17359364
 ] 

David Mollitor commented on HIVE-25144:
---

And here is the logging...

 
{code:none}
2021-06-04 12:01:25,927 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-9-thread-3]: 
ugi=kudu/host@DOMAIN ip=xx.xx.xx.xx  cmd=create_table: 
Table(tableName:test_table, dbName:test_database, owner:user, createTime:0, 
lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:... 
tableType:MANAGED_TABLE, temporary:false, ownerType:USER)
2021-06-04 12:01:26,001 INFO  org.apache.hadoop.hive.common.FileUtils: 
[pool-9-thread-3]: Creating directory if it doesn't exist: 
hdfs://ns1/user/hive/warehouse/test_database.db/test_table
2021-06-04 12:01:26,185 ERROR com.jolbox.bonecp.ConnectionHandle: 
[pool-9-thread-3]: Database access problem. Killing off this connection and all 
remaining connections in the connection pool. SQL State = 08S01
2021-06-04 12:01:26,294 INFO  org.apache.hadoop.fs.TrashPolicyDefault: 
[pool-9-thread-3]: Moved: 
'hdfs://ns1/user/hive/warehouse/test_database.db/test_table' to trash at: 
hdfs://ns1/user/.Trash/kudu/Current/user/hive/warehouse/test_database.db/test_table
2021-06-04 12:01:26,304 ERROR 
org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-9-thread-3]: 
Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: 
javax.jdo.JDODataStoreException: Communications link failure

The last packet successfully received from the server was 1,521,446 
milliseconds ago.  The last packet sent successfully to the server was 
1,521,447 milliseconds ago.
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
at 
org.datanucleus.api.jdo.JDOTransaction.commit(JDOTransaction.java:171)
at 
org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:727)
at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
at com.sun.proxy.$Proxy26.commitTransaction(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1582)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1615)
at sun.reflect.GeneratedMethodAccessor79.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
at com.sun.proxy.$Proxy28.create_table_with_environment_context(Unknown 
Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:10993)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:10977)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:594)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:589)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:589)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
NestedThrowablesStackTrace:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link 
failure

The last packet successfully received from the server was 1,521,446 
milliseconds ago.  The last packet sent successfully to the server was 
1,521,447 milliseconds ago.
at sun.reflect.GeneratedConstructorAccessor84.newInstance(Unknown 
Source)
at

[jira] [Work logged] (HIVE-25211) Create database throws NPE

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25211?focusedWorklogId=608446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608446
 ]

ASF GitHub Bot logged work on HIVE-25211:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 12:59
Start Date: 08/Jun/21 12:59
Worklog Time Spent: 10m 
  Work Description: yongzhi merged pull request #2362:
URL: https://github.com/apache/hive/pull/2362


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608446)
Time Spent: 20m  (was: 10m)

> Create database throws NPE
> --
>
> Key: HIVE-25211
> URL: https://issues.apache.org/jira/browse/HIVE-25211
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> <11>1 2021-06-06T17:32:48.964Z 
> metastore-0.metastore-service.warehouse-1622998329-9klr.svc.cluster.local 
> metastore 1 5ad83e8e-bf89-4ad3-b1fb-51c73c7133b7 [mdc@18060 
> class="metastore.RetryingHMSHandler" level="ERROR" thread="pool-9-thread-16"] 
> MetaException(message:java.lang.NullPointerException)
>   
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:8115)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:1629)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:160)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:121)
>   at com.sun.proxy.$Proxy31.create_database(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_database.getResult(ThriftHiveMetastore.java:16795)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_database.getResult(ThriftHiveMetastore.java:16779)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:643)
>   at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:638)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>   at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:638)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>   at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:120)
>   at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:128)
>   at 
> org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:491)
>   at 
> org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:480)
>   at 
> org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:476)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$9.run(HiveMetaStore.java:1556)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$9.run(HiveMetaStore.java:1554)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database_core(HiveMetaStore.java:1554)
>

[jira] [Updated] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones

2021-06-08 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-25104:
---
Description: 
HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
performed and to some extend how timestamps are serialized and deserialized in 
files (Parquet, Avro).

In versions that include HIVE-12192 or HIVE-20007 the serialization in Parquet 
files is not backwards compatible. In other words writing timestamps with a 
version of Hive that includes HIVE-12192/HIVE-20007 and reading them with 
another (not including the previous issues) may lead to different results 
depending on the default timezone of the system.

Consider the following scenario where the default system timezone is set to 
US/Pacific.

At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
{code:sql}
CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
 LOCATION '/tmp/hiveexttbl/employee';
INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
SELECT * FROM employee;
{code}
|1|1880-01-01 00:00:00|
|2|1884-01-01 00:00:00|
|3|1990-01-01 00:00:00|

At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
{code:sql}
CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
 LOCATION '/tmp/hiveexttbl/employee';
SELECT * FROM employee;
{code}
|1|1879-12-31 23:52:58|
|2|1884-01-01 00:00:00|
|3|1990-01-01 00:00:00|

The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.

  was:
HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
performed and to some extend how timestamps are serialized and deserialized in 
files (Parquet, Avro, Orc).

In versions that include HIVE-12192 or HIVE-20007 the serialization in Parquet 
files is not backwards compatible. In other words writing timestamps with a 
version of Hive that includes HIVE-12192/HIVE-20007 and reading them with 
another (not including the previous issues) may lead to different results 
depending on the default timezone of the system.

Consider the following scenario where the default system timezone is set to 
US/Pacific.

At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
{code:sql}
CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
 LOCATION '/tmp/hiveexttbl/employee';
INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
SELECT * FROM employee;
{code}
|1|1880-01-01 00:00:00|
|2|1884-01-01 00:00:00|
|3|1990-01-01 00:00:00|

At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
{code:sql}
CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
 LOCATION '/tmp/hiveexttbl/employee';
SELECT * FROM employee;
{code}
|1|1879-12-31 23:52:58|
|2|1884-01-01 00:00:00|
|3|1990-01-01 00:00:00|

The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.


> Backward incompatible timestamp serialization in Parquet for certain timezones
> --
>
> Key: HIVE-25104
> URL: https://issues.apache.org/jira/browse/HIVE-25104
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
> performed and to some extend how timestamps are serialized and deserialized 
> in files (Parquet, Avro).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in 
> Parquet files is not backwards compatible. In other words writing timestamps 
> with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them 
> with another (not including the previous issues) may lead to different 
> results depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to 
> US/Pacific.
> At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> |1|1880-01-01 00:00:00|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> At apache/branch-2.3 commit

[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608414
 ]

ASF GitHub Bot logged work on HIVE-25200:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 11:43
Start Date: 08/Jun/21 11:43
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2351:
URL: https://github.com/apache/hive/pull/2351#discussion_r647358883



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -310,6 +335,24 @@ public void 
rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTab
 }
   }
 
+  private void setupAlterOperationType(EnvironmentContext context) throws 
MetaException {
+if (context != null) {
+  Map contextProperties = context.getProperties();
+  if (contextProperties != null) {
+String stringOpType = 
contextProperties.get(ALTER_TABLE_OPERATION_TYPE);
+if (stringOpType != null) {
+  currentAlterTableOp = AlterTableType.valueOf(stringOpType);
+  if (SUPPORTED_ALTER_OPS.stream().noneMatch(op -> 
op.equals(currentAlterTableOp))) {
+throw new MetaException(
+"Unsupported ALTER TABLE operation type for Iceberg tables, 
must be: " + allowedAlterTypes.toString());
+  }
+}
+return;

Review comment:
   Maybe a short comment explaining what you just said would be useful for 
future maintainers




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608414)
Time Spent: 5h 20m  (was: 5h 10m)

> Alter table add columns support for Iceberg tables
> --
>
> Key: HIVE-25200
> URL: https://issues.apache.org/jira/browse/HIVE-25200
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Since Iceberg counts as being a non-native Hive table, addColumn operation 
> needs to be implemented by the help of Hive meta hooks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608413=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608413
 ]

ASF GitHub Bot logged work on HIVE-25200:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 11:42
Start Date: 08/Jun/21 11:42
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2351:
URL: https://github.com/apache/hive/pull/2351#discussion_r647358340



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -310,6 +337,24 @@ public void 
rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTab
 }
   }
 
+  private void setupAlterOperationType(EnvironmentContext context) throws 
MetaException {
+if (context != null) {
+  Map contextProperties = context.getProperties();
+  if (contextProperties != null) {
+String stringOpType = 
contextProperties.get(ALTER_TABLE_OPERATION_TYPE);
+if (stringOpType != null) {
+  currentAlterTableOp = AlterTableType.valueOf(stringOpType);
+  if (SUPPORTED_ALTER_OPS.stream().noneMatch(op -> 
op.equals(currentAlterTableOp))) {
+throw new MetaException(
+"Unsupported ALTER TABLE operation type for Iceberg tables, 
must be: " + allowedAlterTypes.toString());
+  }
+}
+return;
+  }
+}
+throw new MetaException("ALTER TABLE operation type could not be 
determined.");

Review comment:
   Can we maybe get rid of the return by putting this exception to the 
beginning of the method?
   e.g. 
   ```
   if (context == null || context.getProperties() == null) {
  throw new ...
   }
   ```
   The other thing I'm thinking of is that it'd be informative to include the 
hmsTable name in the exception message as well (for this and the above too).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608413)
Time Spent: 5h 10m  (was: 5h)

> Alter table add columns support for Iceberg tables
> --
>
> Key: HIVE-25200
> URL: https://issues.apache.org/jira/browse/HIVE-25200
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Since Iceberg counts as being a non-native Hive table, addColumn operation 
> needs to be implemented by the help of Hive meta hooks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25194) Add support for STORED AS ORC/PARQUET/AVRO for Iceberg

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25194?focusedWorklogId=608411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608411
 ]

ASF GitHub Bot logged work on HIVE-25194:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 11:39
Start Date: 08/Jun/21 11:39
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2348:
URL: https://github.com/apache/hive/pull/2348#discussion_r647356190



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -13477,15 +13477,28 @@ ASTNode analyzeCreateTable(
   }
 }
 
-if (partitionTransformSpecExists) {
-  try {
-HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(conf, 
storageFormat.getStorageHandler());
-if (!storageHandler.supportsPartitionTransform()) {
-  throw new SemanticException("Partition transform is not supported 
for " +
-  storageHandler.getClass().getName());
+HiveStorageHandler handler;
+try {
+  handler = HiveUtils.getStorageHandler(conf, 
storageFormat.getStorageHandler());

Review comment:
   Yes, the storage handler can be null in the case of native tables, but 
this is handled inside of `HiveUtils.getStorageHandler()` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608411)
Time Spent: 3h  (was: 2h 50m)

> Add support for STORED AS ORC/PARQUET/AVRO for Iceberg
> --
>
> Key: HIVE-25194
> URL: https://issues.apache.org/jira/browse/HIVE-25194
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently we have to specify the fileformat in TBLPROPERTIES during Iceberg 
> create table statements.
> The ideal syntax would be:
> CREATE TABLE tbl STORED BY ICEBERG STORED AS ORC ...
> One complication is that currently stored by and stored as are not permitted 
> within the same query, so that needs to be amended.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608410=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608410
 ]

ASF GitHub Bot logged work on HIVE-25200:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 11:39
Start Date: 08/Jun/21 11:39
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2351:
URL: https://github.com/apache/hive/pull/2351#discussion_r647356109



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -310,6 +335,24 @@ public void 
rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTab
 }
   }
 
+  private void setupAlterOperationType(EnvironmentContext context) throws 
MetaException {
+if (context != null) {
+  Map contextProperties = context.getProperties();
+  if (contextProperties != null) {
+String stringOpType = 
contextProperties.get(ALTER_TABLE_OPERATION_TYPE);
+if (stringOpType != null) {
+  currentAlterTableOp = AlterTableType.valueOf(stringOpType);
+  if (SUPPORTED_ALTER_OPS.stream().noneMatch(op -> 
op.equals(currentAlterTableOp))) {
+throw new MetaException(
+"Unsupported ALTER TABLE operation type for Iceberg tables, 
must be: " + allowedAlterTypes.toString());
+  }
+}
+return;

Review comment:
   I see. Maybe worth a comment then.
   Thanks for the explanation!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608410)
Time Spent: 5h  (was: 4h 50m)

> Alter table add columns support for Iceberg tables
> --
>
> Key: HIVE-25200
> URL: https://issues.apache.org/jira/browse/HIVE-25200
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Since Iceberg counts as being a non-native Hive table, addColumn operation 
> needs to be implemented by the help of Hive meta hooks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608404=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608404
 ]

ASF GitHub Bot logged work on HIVE-25200:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 11:28
Start Date: 08/Jun/21 11:28
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #2351:
URL: https://github.com/apache/hive/pull/2351#discussion_r647348841



##
File path: 
iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java
##
@@ -134,6 +137,23 @@ public static Type convert(TypeInfo typeInfo) {
 return HiveSchemaConverter.convert(typeInfo, false);
   }
 
+  /**
+   * Produces the difference of two FieldSchema lists by only taking into 
account the field name and type.
+   * @param subtrahendCollection List of fields to subtract from

Review comment:
   Woops, yeah..




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608404)
Time Spent: 4h 50m  (was: 4h 40m)

> Alter table add columns support for Iceberg tables
> --
>
> Key: HIVE-25200
> URL: https://issues.apache.org/jira/browse/HIVE-25200
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Since Iceberg counts as being a non-native Hive table, addColumn operation 
> needs to be implemented by the help of Hive meta hooks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608402=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608402
 ]

ASF GitHub Bot logged work on HIVE-25200:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 11:24
Start Date: 08/Jun/21 11:24
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2351:
URL: https://github.com/apache/hive/pull/2351#discussion_r647346665



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -84,6 +88,8 @@
   // Initially we'd like to cache the partition spec in HMS, but not push 
it down later to Iceberg during alter
   // table commands since by then the HMS info can be stale + Iceberg does 
not store its partition spec in the props
   InputFormatConfig.PARTITION_SPEC);
+  private static final Set> SUPPORTED_ALTER_OPS = ImmutableSet.of(

Review comment:
   Maybe we should use EnumSet here instead?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608402)
Time Spent: 4h 40m  (was: 4.5h)

> Alter table add columns support for Iceberg tables
> --
>
> Key: HIVE-25200
> URL: https://issues.apache.org/jira/browse/HIVE-25200
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Since Iceberg counts as being a non-native Hive table, addColumn operation 
> needs to be implemented by the help of Hive meta hooks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608401=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608401
 ]

ASF GitHub Bot logged work on HIVE-25200:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 11:22
Start Date: 08/Jun/21 11:22
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #2351:
URL: https://github.com/apache/hive/pull/2351#discussion_r647344754



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -310,6 +335,24 @@ public void 
rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTab
 }
   }
 
+  private void setupAlterOperationType(EnvironmentContext context) throws 
MetaException {
+if (context != null) {
+  Map contextProperties = context.getProperties();
+  if (contextProperties != null) {
+String stringOpType = 
contextProperties.get(ALTER_TABLE_OPERATION_TYPE);
+if (stringOpType != null) {
+  currentAlterTableOp = AlterTableType.valueOf(stringOpType);
+  if (SUPPORTED_ALTER_OPS.stream().noneMatch(op -> 
op.equals(currentAlterTableOp))) {
+throw new MetaException(
+"Unsupported ALTER TABLE operation type for Iceberg tables, 
must be: " + allowedAlterTypes.toString());
+  }
+}
+return;

Review comment:
   Yeah I found that it is valid as tests started to fail after the recent 
refactor :D E.g. for analyze+compute_stats query there's an alter table 
invocation, where there's no operation type among the context properties. Our 
hook should not fail for such cases, but rather act as no-op.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608401)
Time Spent: 4.5h  (was: 4h 20m)

> Alter table add columns support for Iceberg tables
> --
>
> Key: HIVE-25200
> URL: https://issues.apache.org/jira/browse/HIVE-25200
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Since Iceberg counts as being a non-native Hive table, addColumn operation 
> needs to be implemented by the help of Hive meta hooks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608400=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608400
 ]

ASF GitHub Bot logged work on HIVE-25200:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 11:21
Start Date: 08/Jun/21 11:21
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2351:
URL: https://github.com/apache/hive/pull/2351#discussion_r647344554



##
File path: 
iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java
##
@@ -134,6 +137,23 @@ public static Type convert(TypeInfo typeInfo) {
 return HiveSchemaConverter.convert(typeInfo, false);
   }
 
+  /**
+   * Produces the difference of two FieldSchema lists by only taking into 
account the field name and type.
+   * @param subtrahendCollection List of fields to subtract from

Review comment:
   I think minuend and subtrahend are the other way around in this case, no?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608400)
Time Spent: 4h 20m  (was: 4h 10m)

> Alter table add columns support for Iceberg tables
> --
>
> Key: HIVE-25200
> URL: https://issues.apache.org/jira/browse/HIVE-25200
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Since Iceberg counts as being a non-native Hive table, addColumn operation 
> needs to be implemented by the help of Hive meta hooks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608399=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608399
 ]

ASF GitHub Bot logged work on HIVE-25200:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 11:21
Start Date: 08/Jun/21 11:21
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2351:
URL: https://github.com/apache/hive/pull/2351#discussion_r647344554



##
File path: 
iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java
##
@@ -134,6 +137,23 @@ public static Type convert(TypeInfo typeInfo) {
 return HiveSchemaConverter.convert(typeInfo, false);
   }
 
+  /**
+   * Produces the difference of two FieldSchema lists by only taking into 
account the field name and type.
+   * @param subtrahendCollection List of fields to subtract from

Review comment:
   I think minuend and subtrahend are the other way around in this case




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608399)
Time Spent: 4h 10m  (was: 4h)

> Alter table add columns support for Iceberg tables
> --
>
> Key: HIVE-25200
> URL: https://issues.apache.org/jira/browse/HIVE-25200
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Since Iceberg counts as being a non-native Hive table, addColumn operation 
> needs to be implemented by the help of Hive meta hooks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608395
 ]

ASF GitHub Bot logged work on HIVE-25200:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 11:05
Start Date: 08/Jun/21 11:05
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2351:
URL: https://github.com/apache/hive/pull/2351#discussion_r647334605



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -310,6 +335,24 @@ public void 
rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTab
 }
   }
 
+  private void setupAlterOperationType(EnvironmentContext context) throws 
MetaException {
+if (context != null) {
+  Map contextProperties = context.getProperties();
+  if (contextProperties != null) {
+String stringOpType = 
contextProperties.get(ALTER_TABLE_OPERATION_TYPE);
+if (stringOpType != null) {
+  currentAlterTableOp = AlterTableType.valueOf(stringOpType);
+  if (SUPPORTED_ALTER_OPS.stream().noneMatch(op -> 
op.equals(currentAlterTableOp))) {
+throw new MetaException(
+"Unsupported ALTER TABLE operation type for Iceberg tables, 
must be: " + allowedAlterTypes.toString());
+  }
+}
+return;

Review comment:
   Why is this return here?
   Is it valid operation where `stringOpType` == null? What is the operation at 
that time?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608395)
Time Spent: 4h  (was: 3h 50m)

> Alter table add columns support for Iceberg tables
> --
>
> Key: HIVE-25200
> URL: https://issues.apache.org/jira/browse/HIVE-25200
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Since Iceberg counts as being a non-native Hive table, addColumn operation 
> needs to be implemented by the help of Hive meta hooks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608394=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608394
 ]

ASF GitHub Bot logged work on HIVE-25200:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 11:02
Start Date: 08/Jun/21 11:02
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2351:
URL: https://github.com/apache/hive/pull/2351#discussion_r647332812



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -248,6 +256,20 @@ public void 
preAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable, E
   Collections.emptyMap()));
   updateHmsTableProperties(hmsTable);
 }
+if (AlterTableType.ADDCOLS.equals(currentAlterTableOp)) {

Review comment:
   nit: newline after block close




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608394)
Time Spent: 3h 50m  (was: 3h 40m)

> Alter table add columns support for Iceberg tables
> --
>
> Key: HIVE-25200
> URL: https://issues.apache.org/jira/browse/HIVE-25200
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Since Iceberg counts as being a non-native Hive table, addColumn operation 
> needs to be implemented by the help of Hive meta hooks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25193) Vectorized Query Execution: ClassCastException when use nvl() function which default_value is decimal type

2021-06-08 Thread qiang.bi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qiang.bi updated HIVE-25193:

Description: 
Problem statement:
{code:java}
set hive.vectorized.execution.enabled = true;
select nvl(get_json_object(attr_json,'$.correctedPrice'),0.88) corrected_price 
from dw_mdm_sync_asset;
{code}
 The error log:
{code:java}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVectorCaused by: 
java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector at 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:504)
 at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorCoalesce.evaluate(VectorCoalesce.java:124)
 at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:271)
 at 
org.apache.hadoop.hive.ql.exec.vector.expressions.CastStringToDouble.evaluate(CastStringToDouble.java:83)
 at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
 ... 28 more{code}
 The problem HiveQL:
{code:java}
nvl(get_json_object(attr_json,'$.correctedPrice'),0.88) corrected_price
{code}
 The problem expression:
{code:java}
CastStringToDouble(col 39:string)(children: VectorCoalesce(columns [37, 
38])(children: VectorUDFAdaptor(get_json_object(_col14, '$.correctedPrice')) -> 
37:string, ConstantVectorExpression(val 0.88) -> 38:decimal(2,2)) -> 39:string) 
-> 40:double
{code}
 The problem code:
{code:java}
public class VectorCoalesce extends VectorExpression {  
  ...   
  @Override
  public void evaluate(VectorizedRowBatch batch) throws HiveException {if 
(childExpressions != null) {
  super.evaluateChildren(batch);
}int[] sel = batch.selected;
int n = batch.size;
ColumnVector outputColVector = batch.cols[outputColumnNum];
boolean[] outputIsNull = outputColVector.isNull;
if (n <= 0) {
  // Nothing to do
  return;
}if (unassignedBatchIndices == null || n > 
unassignedBatchIndices.length) {  // (Re)allocate larger to be a multiple 
of 1024 (DEFAULT_SIZE).
  final int roundUpSize =
  ((n + VectorizedRowBatch.DEFAULT_SIZE - 1) / 
VectorizedRowBatch.DEFAULT_SIZE)
  * VectorizedRowBatch.DEFAULT_SIZE;
  unassignedBatchIndices = new int[roundUpSize];
}// We do not need to do a column reset since we are carefully changing 
the output.
outputColVector.isRepeating = false;// CONSIDER: Should be do this for 
all vector expressions that can
//   work on BytesColumnVector output columns???
outputColVector.init();
final int columnCount = inputColumns.length;/*
 * Process the input columns to find a non-NULL value for each row.
 *
 * We track the unassigned batchIndex of the rows that have not received
 * a non-NULL value yet.  Similar to a selected array.
 */
boolean isAllUnassigned = true;
int unassignedColumnCount = 0;
for (int k = 0; k < inputColumns.length; k++) {
  ColumnVector cv = batch.cols[inputColumns[k]];
  if (cv.isRepeating) {if (cv.noNulls || !cv.isNull[0]) {
  /*
   * With a repeating value we can finish all remaining rows.
   */
  if (isAllUnassigned) {// No other columns provided 
non-NULL values.  We can return repeated output.
outputIsNull[0] = false;
outputColVector.setElement(0, 0, cv);
outputColVector.isRepeating = true;
return;
  } else {// Some rows have already been assigned values. 
Assign the remaining.
// We cannot use copySelected method here.
for (int i = 0; i < unassignedColumnCount; i++) {
  final int batchIndex = unassignedBatchIndices[i];
  outputIsNull[batchIndex] = false;  // Our input is 
repeating (i.e. inputColNumber = 0).
  outputColVector.setElement(batchIndex, 0, cv);
}
return;
  }
} else {  // Repeated NULLs -- skip this input column.
}
  } else {/*
 * Non-repeating input column. Use any non-NULL values for unassigned 
rows.
 */
if (isAllUnassigned) {  /*
   * No other columns provided non-NULL values.  We *may* be able to 
finish all rows
   * with this input column...
   */
  if (cv.noNulls){// Since no NULLs, we can provide values 
for all rows.
if (batch.selectedInUse) {
  for (int i = 0; i < n; i++) {
final int batchIndex = sel[i];
outputIsNull[batchIndex] = false;

[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608365=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608365
 ]

ASF GitHub Bot logged work on HIVE-25200:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 09:46
Start Date: 08/Jun/21 09:46
Worklog Time Spent: 10m 
  Work Description: szlta commented on pull request #2351:
URL: https://github.com/apache/hive/pull/2351#issuecomment-856626991


   > @szlta: quick question: Would it be possible to create a test where we 
concurrently try to modify the schema through Hive and change the schema 
through the Iceberg Java API?
   
   yep, added


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608365)
Time Spent: 3h 40m  (was: 3.5h)

> Alter table add columns support for Iceberg tables
> --
>
> Key: HIVE-25200
> URL: https://issues.apache.org/jira/browse/HIVE-25200
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Since Iceberg counts as being a non-native Hive table, addColumn operation 
> needs to be implemented by the help of Hive meta hooks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24458) Allow access to SArgs without converting to disjunctive normal form

2021-06-08 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-24458:
--
Fix Version/s: storage-2.7.3

> Allow access to SArgs without converting to disjunctive normal form
> ---
>
> Key: HIVE-24458
> URL: https://issues.apache.org/jira/browse/HIVE-24458
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, storage-2.7.3
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> For some use cases, it is useful to have access to the SArg expression in a 
> non-normalized form. Currently, the SArg only provides the fully normalized 
> expression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25217) Move isEligibleForCompaction evaluation under the Initiator thread pool

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25217:
--
Labels: pull-request-available  (was: )

> Move isEligibleForCompaction evaluation under the Initiator thread pool
> ---
>
> Key: HIVE-25217
> URL: https://issues.apache.org/jira/browse/HIVE-25217
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Checking for eligibility >1 mil of distinct table / partition combinations 
> can take a while by the Initiator since all steps are performed in the main 
> thread. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25217) Move isEligibleForCompaction evaluation under the Initiator thread pool

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25217?focusedWorklogId=608353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608353
 ]

ASF GitHub Bot logged work on HIVE-25217:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 08:49
Start Date: 08/Jun/21 08:49
Worklog Time Spent: 10m 
  Work Description: deniskuzZ opened a new pull request #2367:
URL: https://github.com/apache/hive/pull/2367


   …or thread pool
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608353)
Remaining Estimate: 0h
Time Spent: 10m

> Move isEligibleForCompaction evaluation under the Initiator thread pool
> ---
>
> Key: HIVE-25217
> URL: https://issues.apache.org/jira/browse/HIVE-25217
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Checking for eligibility >1 mil of distinct table / partition combinations 
> can take a while by the Initiator since all steps are performed in the main 
> thread. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-16220) Memory leak when creating a table using location and NameNode in HA

2021-06-08 Thread Ivan Podhornyi (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-16220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359157#comment-17359157
 ] 

Ivan Podhornyi commented on HIVE-16220:
---

[~james601232]

got the same issue, and after week of research found few solution:

Remove .cache() DataFrame from your code, because it will create a SessionState 
which is full copy of Session.

If not caching DataFrame is show stopper for you - [here is a Scala 
method|https://github.com/apache/spark/blob/1d550c4e90275ab418b9161925049239227f3dc9/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L189],
 where I guess possible to create a clone of SparkSession for each batch 
processing and then close it. Need just to check overhead.

 

> Memory leak when creating a table using location and NameNode in HA
> ---
>
> Key: HIVE-16220
> URL: https://issues.apache.org/jira/browse/HIVE-16220
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.1, 2.3.4, 3.0.0
> Environment: HDP-2.4.0.0
> HDP-3.1.0.0
>Reporter: Angel Alvarez Pascua
>Priority: Major
>
> The following simple DDL
> CREATE TABLE `test`(`field` varchar(1)) LOCATION 
> 'hdfs://benderHA/apps/hive/warehouse/test'
> ends up generating a huge memory leak in the HiveServer2 service.
> After two weeks without a restart, the service stops suddenly because of 
> OutOfMemory errors.
> This only happens when we're in an environment in which the NameNode is in 
> HA,  otherwise, nothing (so weird) happens. If the location clause is not 
> present, everything is also fine.
> It seems, multiples instances of Hadoop configuration are created when we're 
> in an HA environment:
> 
> 2.618 instances of "org.apache.hadoop.conf.Configuration", loaded by 
> "sun.misc.Launcher$AppClassLoader @ 0x4d260de88" 
> occupy 350.263.816 (81,66%) bytes. These instances are referenced from one 
> instance of "java.util.HashMap$Node[]", 
> loaded by ""
> 
> 5.216 instances of "org.apache.hadoop.conf.Configuration", loaded by 
> "sun.misc.Launcher$AppClassLoader @ 0x4d260de88" 
> occupy 699.901.416 (87,32%) bytes. These instances are referenced from one 
> instance of "java.util.HashMap$Node[]", 
> loaded by ""



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25201) Remove Caffein shading from Iceberg

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25201?focusedWorklogId=608338=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608338
 ]

ASF GitHub Bot logged work on HIVE-25201:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 08:10
Start Date: 08/Jun/21 08:10
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #2352:
URL: https://github.com/apache/hive/pull/2352


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608338)
Time Spent: 40m  (was: 0.5h)

> Remove Caffein shading from Iceberg
> ---
>
> Key: HIVE-25201
> URL: https://issues.apache.org/jira/browse/HIVE-25201
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Since Iceberg moved to the same version as we use (Upgrade Caffeine version 
> [#2671|https://github.com/apache/iceberg/pull/2671]), we can get rid of the 
> Caffein shading.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25201) Remove Caffein shading from Iceberg

2021-06-08 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-25201.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.

Thanks for the review [~mbod] and [~lpinter]

> Remove Caffein shading from Iceberg
> ---
>
> Key: HIVE-25201
> URL: https://issues.apache.org/jira/browse/HIVE-25201
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Since Iceberg moved to the same version as we use (Upgrade Caffeine version 
> [#2671|https://github.com/apache/iceberg/pull/2671]), we can get rid of the 
> Caffein shading.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25194) Add support for STORED AS ORC/PARQUET/AVRO for Iceberg

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25194?focusedWorklogId=608336=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608336
 ]

ASF GitHub Bot logged work on HIVE-25194:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 07:56
Start Date: 08/Jun/21 07:56
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2348:
URL: https://github.com/apache/hive/pull/2348#discussion_r647203906



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -13477,15 +13477,28 @@ ASTNode analyzeCreateTable(
   }
 }
 
-if (partitionTransformSpecExists) {
-  try {
-HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(conf, 
storageFormat.getStorageHandler());
-if (!storageHandler.supportsPartitionTransform()) {
-  throw new SemanticException("Partition transform is not supported 
for " +
-  storageHandler.getClass().getName());
+HiveStorageHandler handler;
+try {
+  handler = HiveUtils.getStorageHandler(conf, 
storageFormat.getStorageHandler());

Review comment:
   Do we have an exception for native tables? In my experience sometimes 
the StorageHandler is `null`, but there might be some other issues here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608336)
Time Spent: 2h 50m  (was: 2h 40m)

> Add support for STORED AS ORC/PARQUET/AVRO for Iceberg
> --
>
> Key: HIVE-25194
> URL: https://issues.apache.org/jira/browse/HIVE-25194
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently we have to specify the fileformat in TBLPROPERTIES during Iceberg 
> create table statements.
> The ideal syntax would be:
> CREATE TABLE tbl STORED BY ICEBERG STORED AS ORC ...
> One complication is that currently stored by and stored as are not permitted 
> within the same query, so that needs to be amended.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25194) Add support for STORED AS ORC/PARQUET/AVRO for Iceberg

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25194?focusedWorklogId=608335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608335
 ]

ASF GitHub Bot logged work on HIVE-25194:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 07:55
Start Date: 08/Jun/21 07:55
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2348:
URL: https://github.com/apache/hive/pull/2348#discussion_r647203906



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -13477,15 +13477,28 @@ ASTNode analyzeCreateTable(
   }
 }
 
-if (partitionTransformSpecExists) {
-  try {
-HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(conf, 
storageFormat.getStorageHandler());
-if (!storageHandler.supportsPartitionTransform()) {
-  throw new SemanticException("Partition transform is not supported 
for " +
-  storageHandler.getClass().getName());
+HiveStorageHandler handler;
+try {
+  handler = HiveUtils.getStorageHandler(conf, 
storageFormat.getStorageHandler());

Review comment:
   Could it be that the `storageFormat.getStorageHandler()` is null? Like 
for native tables?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608335)
Time Spent: 2h 40m  (was: 2.5h)

> Add support for STORED AS ORC/PARQUET/AVRO for Iceberg
> --
>
> Key: HIVE-25194
> URL: https://issues.apache.org/jira/browse/HIVE-25194
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently we have to specify the fileformat in TBLPROPERTIES during Iceberg 
> create table statements.
> The ideal syntax would be:
> CREATE TABLE tbl STORED BY ICEBERG STORED AS ORC ...
> One complication is that currently stored by and stored as are not permitted 
> within the same query, so that needs to be amended.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (HIVE-25216) Vectorized reading of ORC tables via Iceberg

2021-06-08 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25216 started by Ádám Szita.
-
> Vectorized reading of ORC tables via Iceberg
> 
>
> Key: HIVE-25216
> URL: https://issues.apache.org/jira/browse/HIVE-25216
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>
> As [https://github.com/apache/iceberg/pull/2613] is resolved, we should port 
> it to Hive codebase, to enable vectorized ORC reads on Iceberg-backed tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25216) Vectorized reading of ORC tables via Iceberg

2021-06-08 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita reassigned HIVE-25216:
-


> Vectorized reading of ORC tables via Iceberg
> 
>
> Key: HIVE-25216
> URL: https://issues.apache.org/jira/browse/HIVE-25216
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>
> As [https://github.com/apache/iceberg/pull/2613] is resolved, we should port 
> it to Hive codebase, to enable vectorized ORC reads on Iceberg-backed tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-08 Thread Pravin Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha resolved HIVE-25154.
-
Resolution: Fixed

Committed to master.

Thanks for the patch, [~haymant] !!!

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25154.patch
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=608310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608310
 ]

ASF GitHub Bot logged work on HIVE-25154:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 06:55
Start Date: 08/Jun/21 06:55
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on pull request #2311:
URL: https://github.com/apache/hive/pull/2311#issuecomment-856505462


   Committed to master.
   
   Thanks for the patch, @hmangla98 !!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608310)
Time Spent: 5h  (was: 4h 50m)

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25154.patch
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=608311=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608311
 ]

ASF GitHub Bot logged work on HIVE-25154:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 06:55
Start Date: 08/Jun/21 06:55
Worklog Time Spent: 10m 
  Work Description: pkumarsinha removed a comment on pull request #2311:
URL: https://github.com/apache/hive/pull/2311#issuecomment-856505462


   Committed to master.
   
   Thanks for the patch, @hmangla98 !!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608311)
Time Spent: 5h 10m  (was: 5h)

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25154.patch
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=608309=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608309
 ]

ASF GitHub Bot logged work on HIVE-25154:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 06:53
Start Date: 08/Jun/21 06:53
Worklog Time Spent: 10m 
  Work Description: pkumarsinha merged pull request #2311:
URL: https://github.com/apache/hive/pull/2311


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608309)
Time Spent: 4h 50m  (was: 4h 40m)

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25154.patch
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-08 Thread Pravin Sinha (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359077#comment-17359077
 ] 

Pravin Sinha commented on HIVE-25154:
-

+1

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-25154.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25189) Cache the validWriteIdList in query cache before fetching tables from HMS

2021-06-08 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-25189.
---
Resolution: Fixed

Pushed to master. Thanks [~scarlin].

> Cache the validWriteIdList in query cache before fetching tables from HMS
> -
>
> Key: HIVE-25189
> URL: https://issues.apache.org/jira/browse/HIVE-25189
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> For a small performance boost at compile time, we should fetch the 
> validWriteIdList before fetching the tables.  HMS allows these to be batched 
> together in one call.  This will avoid the getTable API from being called 
> twice, because the first time we call it, we pass in a null for 
> validWriteIdList.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25189) Cache the validWriteIdList in query cache before fetching tables from HMS

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25189?focusedWorklogId=608304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608304
 ]

ASF GitHub Bot logged work on HIVE-25189:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 06:43
Start Date: 08/Jun/21 06:43
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged pull request #2342:
URL: https://github.com/apache/hive/pull/2342


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608304)
Time Spent: 1h 10m  (was: 1h)

> Cache the validWriteIdList in query cache before fetching tables from HMS
> -
>
> Key: HIVE-25189
> URL: https://issues.apache.org/jira/browse/HIVE-25189
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> For a small performance boost at compile time, we should fetch the 
> validWriteIdList before fetching the tables.  HMS allows these to be batched 
> together in one call.  This will avoid the getTable API from being called 
> twice, because the first time we call it, we pass in a null for 
> validWriteIdList.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23987) Upgrade arrow version to 0.11.0

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23987:
--
Labels: pull-request-available  (was: )

> Upgrade arrow version to 0.11.0
> ---
>
> Key: HIVE-23987
> URL: https://issues.apache.org/jira/browse/HIVE-23987
> Project: Hive
>  Issue Type: Improvement
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As part of [HIVE-23890|https://issues.apache.org/jira/browse/HIVE-23890], 
> we're introducing flatbuffers as a dependency. 
> Arrow 0.10.0 has an unofficial flatbuffer dependency, which is incompatible 
> with the official ones: https://issues.apache.org/jira/browse/ARROW-3175
> It was fixed in 0.11.0. We should upgrade to that version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23987) Upgrade arrow version to 0.11.0

2021-06-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23987?focusedWorklogId=608297=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608297
 ]

ASF GitHub Bot logged work on HIVE-23987:
-

Author: ASF GitHub Bot
Created on: 08/Jun/21 06:23
Start Date: 08/Jun/21 06:23
Worklog Time Spent: 10m 
  Work Description: jcamachor opened a new pull request #2366:
URL: https://github.com/apache/hive/pull/2366


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 608297)
Remaining Estimate: 0h
Time Spent: 10m

> Upgrade arrow version to 0.11.0
> ---
>
> Key: HIVE-23987
> URL: https://issues.apache.org/jira/browse/HIVE-23987
> Project: Hive
>  Issue Type: Improvement
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As part of [HIVE-23890|https://issues.apache.org/jira/browse/HIVE-23890], 
> we're introducing flatbuffers as a dependency. 
> Arrow 0.10.0 has an unofficial flatbuffer dependency, which is incompatible 
> with the official ones: https://issues.apache.org/jira/browse/ARROW-3175
> It was fixed in 0.11.0. We should upgrade to that version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-20091) Tez: Add security credentials for FileSinkOperator output

2021-06-08 Thread Xi Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Chen updated HIVE-20091:
---
Target Version/s: 3.2.0, 4.0.0  (was: 3.1.0, 4.0.0)

> Tez: Add security credentials for FileSinkOperator output
> -
>
> Key: HIVE-20091
> URL: https://issues.apache.org/jira/browse/HIVE-20091
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.2.0, 4.0.0
>
> Attachments: HIVE-20091.01.patch, HIVE-20091.02.patch, 
> HIVE-20091.03.patch, HIVE-20091.04.patch, HIVE-20091.05.patch, 
> HIVE-20091.06.patch, HIVE-20091.07.patch, HIVE-20091.08.patch
>
>
> DagUtils needs to add security credentials for the output for the 
> FileSinkOperator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

84 matches

Mail list logo