[jira] [Resolved] (HIVE-25150) Tab characters are not removed before decimal conversion similar to space character which is fixed as part of HIVE-24378
[ https://issues.apache.org/jira/browse/HIVE-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Taraka Rama Rao Lethavadla resolved HIVE-25150. --- Fix Version/s: 4.0.0 Resolution: Fixed > Tab characters are not removed before decimal conversion similar to space > character which is fixed as part of HIVE-24378 > > > Key: HIVE-25150 > URL: https://issues.apache.org/jira/browse/HIVE-25150 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: Taraka Rama Rao Lethavadla >Assignee: Taraka Rama Rao Lethavadla >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Test case: > column values with space and tab character > {noformat} > bash-4.2$ cat data/files/test_dec_space.csv > 1,0 > 2, 1 > 3,2{noformat} > {noformat} > create external table test_dec_space (id int, value decimal) ROW FORMAT > DELIMITED > FIELDS TERMINATED BY ',' location '/tmp/test_dec_space'; > {noformat} > output of select * from test_dec_space would be > {noformat} > 1 0 > 2 1 > 3 NULL{noformat} > The behaviour in MySQL when there is tab & space characters in decimal values > {noformat} > bash-4.2$ cat /tmp/insert.csv > "1","aa",11.88 > "2","bb", 99.88 > "4","dd", 209.88{noformat} > > {noformat} > MariaDB [test]> load data local infile '/tmp/insert.csv' into table t2 fields > terminated by ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; > Query OK, 3 rows affected, 3 warnings (0.00 sec) > Records: 3 Deleted: 0 Skipped: 0 Warnings: 3 > MariaDB [test]> select * from t2; > +--+--+---+ > | id | name | score | > +--+--+---+ > | 1| aa |12 | > | 2| bb | 100 | > | 4| dd | 210 | > +--+--+---+ > 3 rows in set (0.00 sec) > {noformat} > So in hive also we can make it work by skipping tab character -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25150) Tab characters are not removed before decimal conversion similar to space character which is fixed as part of HIVE-24378
[ https://issues.apache.org/jira/browse/HIVE-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359767#comment-17359767 ] Taraka Rama Rao Lethavadla commented on HIVE-25150: --- Added support to skip below characters as part of this request HORIZONTAL_TABULATION ('\u0009') VERTICAL_TABULATION ('\u000B') FORM_FEED ('\u000C') > Tab characters are not removed before decimal conversion similar to space > character which is fixed as part of HIVE-24378 > > > Key: HIVE-25150 > URL: https://issues.apache.org/jira/browse/HIVE-25150 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: Taraka Rama Rao Lethavadla >Assignee: Taraka Rama Rao Lethavadla >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Test case: > column values with space and tab character > {noformat} > bash-4.2$ cat data/files/test_dec_space.csv > 1,0 > 2, 1 > 3,2{noformat} > {noformat} > create external table test_dec_space (id int, value decimal) ROW FORMAT > DELIMITED > FIELDS TERMINATED BY ',' location '/tmp/test_dec_space'; > {noformat} > output of select * from test_dec_space would be > {noformat} > 1 0 > 2 1 > 3 NULL{noformat} > The behaviour in MySQL when there is tab & space characters in decimal values > {noformat} > bash-4.2$ cat /tmp/insert.csv > "1","aa",11.88 > "2","bb", 99.88 > "4","dd", 209.88{noformat} > > {noformat} > MariaDB [test]> load data local infile '/tmp/insert.csv' into table t2 fields > terminated by ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; > Query OK, 3 rows affected, 3 warnings (0.00 sec) > Records: 3 Deleted: 0 Skipped: 0 Warnings: 3 > MariaDB [test]> select * from t2; > +--+--+---+ > | id | name | score | > +--+--+---+ > | 1| aa |12 | > | 2| bb | 100 | > | 4| dd | 210 | > +--+--+---+ > 3 rows in set (0.00 sec) > {noformat} > So in hive also we can make it work by skipping tab character -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25150) Tab characters are not removed before decimal conversion similar to space character which is fixed as part of HIVE-24378
[ https://issues.apache.org/jira/browse/HIVE-25150?focusedWorklogId=608924=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608924 ] ASF GitHub Bot logged work on HIVE-25150: - Author: ASF GitHub Bot Created on: 09/Jun/21 05:19 Start Date: 09/Jun/21 05:19 Worklog Time Spent: 10m Work Description: maheshk114 merged pull request #2308: URL: https://github.com/apache/hive/pull/2308 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608924) Time Spent: 1h 10m (was: 1h) > Tab characters are not removed before decimal conversion similar to space > character which is fixed as part of HIVE-24378 > > > Key: HIVE-25150 > URL: https://issues.apache.org/jira/browse/HIVE-25150 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: Taraka Rama Rao Lethavadla >Assignee: Taraka Rama Rao Lethavadla >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Test case: > column values with space and tab character > {noformat} > bash-4.2$ cat data/files/test_dec_space.csv > 1,0 > 2, 1 > 3,2{noformat} > {noformat} > create external table test_dec_space (id int, value decimal) ROW FORMAT > DELIMITED > FIELDS TERMINATED BY ',' location '/tmp/test_dec_space'; > {noformat} > output of select * from test_dec_space would be > {noformat} > 1 0 > 2 1 > 3 NULL{noformat} > The behaviour in MySQL when there is tab & space characters in decimal values > {noformat} > bash-4.2$ cat /tmp/insert.csv > "1","aa",11.88 > "2","bb", 99.88 > "4","dd", 209.88{noformat} > > {noformat} > MariaDB [test]> load data local infile '/tmp/insert.csv' into table t2 fields > terminated by ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; > Query OK, 3 rows affected, 3 warnings (0.00 sec) > Records: 3 Deleted: 0 Skipped: 0 Warnings: 3 > MariaDB [test]> select * from t2; > +--+--+---+ > | id | name | score | > +--+--+---+ > | 1| aa |12 | > | 2| bb | 100 | > | 4| dd | 210 | > +--+--+---+ > 3 rows in set (0.00 sec) > {noformat} > So in hive also we can make it work by skipping tab character -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25101) Remove HBase libraries from Hive distribution
[ https://issues.apache.org/jira/browse/HIVE-25101?focusedWorklogId=608921=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608921 ] ASF GitHub Bot logged work on HIVE-25101: - Author: ASF GitHub Bot Created on: 09/Jun/21 05:06 Start Date: 09/Jun/21 05:06 Worklog Time Spent: 10m Work Description: stoty commented on pull request #2259: URL: https://github.com/apache/hive/pull/2259#issuecomment-857376643 I have noticed one more thing while testing this change. The hive script changes will always overwrite the hbase.aux.jar.path configuration parameter. Now a lot of other settings, like having and auxjars directory, or setting the HIVE_AUX_JARS_PATH will do the same, but this change will overwrite the hbase.aux.jar.path set in hbase-site.xml pretty much every single time. I'm not sure how much of a problem this is, but I wanted to give a heads-up. I could explore reverting to using the HADOOP_CLASSPATH instead, though I have doubts if that actually works for the distributed operations. @kgyrtkirk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608921) Time Spent: 1h (was: 50m) > Remove HBase libraries from Hive distribution > - > > Key: HIVE-25101 > URL: https://issues.apache.org/jira/browse/HIVE-25101 > Project: Hive > Issue Type: Improvement > Components: HBase Handler, Hive >Affects Versions: 4.0.0 >Reporter: Istvan Toth >Assignee: Istvan Toth >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Hive currently packages HBase libraries into its lib directory. > It also adds the HBase libraries separately to its classpath in the hive > startup script. > Having both mechanisms is redundant, and it also causes errors, as the > standard HBase libraries packaged into Hive are unshaded, while the libraries > added by _hbase mapredcp_ > are shaded, and the two are NOT compatible when custom coprocessors are used, > and in some cases the classpaths during local execution and for MR/TEZ jobs > are mutually incompatible. > I propose removing all HBase libraries from the distribution, and pulling > them via the hbase mapredcp mechanism. > This also solves the old problem of including ancient HBase alpha versions > Hive. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-21489) EXPLAIN command throws ClassCastException in Hive
[ https://issues.apache.org/jira/browse/HIVE-21489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramesh Kumar Thangarajan reassigned HIVE-21489: --- Assignee: Ramesh Kumar Thangarajan (was: Daniel Dai) > EXPLAIN command throws ClassCastException in Hive > - > > Key: HIVE-21489 > URL: https://issues.apache.org/jira/browse/HIVE-21489 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.4 >Reporter: Ping Lu >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Attachments: HIVE-21489.1.patch, HIVE-21489.2.patch > > > I'm trying to run commands like explain select * from src in hive-2.3.4,but > it falls with the ClassCastException: > org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer cannot be cast to > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer > Steps to reproduce: > 1)hive.execution.engine is the default value mr > 2)hive.security.authorization.enabled is set to true, and > hive.security.authorization.manager is set to > org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider > 3)start hivecli to run command:explain select * from src > I debug the code and find the issue HIVE-18778 causing the above > ClassCastException.If I set hive.in.test to true,the explain command can be > successfully executed。 > Now,I have one question,due to hive.in.test cann't be modified at runtime.how > to run explain command with using default authorization in hive-2.3.4, -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25220) Query with union fails CBO with OOM
[ https://issues.apache.org/jira/browse/HIVE-25220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-25220: -- Description: {code} 2021-06-08T08:15:14,450 ERROR [6241f234-77e0-4e63-9873-6eb9d655421c HiveServer2-Handler-Pool: Thread-79] parse.CalcitePlanner: CBO failed, skipping CBO. java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hive.ql.parse.CalcitePlanner.rethrowCalciteException(CalcitePlanner.java:1728) ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1564) ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:538) ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12680) ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:428) ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288) ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:170) ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288) ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:221) ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:188) ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:600) ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:546) ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:540) ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127) ~[hive-exec-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199) ~[hive-service-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260) ~[hive-service-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hive.service.cli.operation.Operation.run(Operation.java:274) ~[hive-service-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:565) ~[hive-service-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:551) ~[hive-service-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_262] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_262] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_262] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_262] at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) ~[hive-service-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) ~[hive-service-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) ~[hive-service-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_262] at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_262] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) ~[hadoop-common-3.1.1.jar:?] at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) ~[hive-service-3.1.3000.7.2.6.3-1.jar:3.1.3000.7.2.6.3-1] at com.sun.proxy.$Proxy39.executeStatementAsync(Unknown Source) ~[?:?] at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
[jira] [Work logged] (HIVE-25220) Query with union fails CBO with OOM
[ https://issues.apache.org/jira/browse/HIVE-25220?focusedWorklogId=608919=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608919 ] ASF GitHub Bot logged work on HIVE-25220: - Author: ASF GitHub Bot Created on: 09/Jun/21 04:26 Start Date: 09/Jun/21 04:26 Worklog Time Spent: 10m Work Description: kasakrisz opened a new pull request #2372: URL: https://github.com/apache/hive/pull/2372 ### What changes were proposed in this pull request? 1. Create and setup `HiveDefaultRelMetadataProvider` before the first call of `HiveRelFieldTrimmer`. 2. Invalidate the metadata query on the current `RelOptCluster` instance to trigger the newly set MetadataQuery instantiation. ### Why are the changes needed? `HiveRelFieldTrimmer` uses `RelMetadataProvider` to get expression lineage. If the query contains several union operators determining expression lineage can result into exponential number of expressions due to UNIONs which can lead to OOM. We already have a fix for this issue but prior this patch the fix was not used because it is part of `HiveDefaultRelMetadataProvider` which is not used when `HiveRelFieldTrimmer` was called the first time. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? It is not straight forward to repro this issue from q tests since the RelMetadataProvider is stored in a ThreadLocal instance and the ddl statements prior the failing query initializes MD provider with the Hive version. To avoid this I setup a small cluster with Hive, Hadoop and Tez. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608919) Remaining Estimate: 0h Time Spent: 10m > Query with union fails CBO with OOM > --- > > Key: HIVE-25220 > URL: https://issues.apache.org/jira/browse/HIVE-25220 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {code} > explain cbo > with meod AS (select max(data_as_of_date) data_as_of_date from > governed.cc_forecast_pnl), > daod as (select min(f.cob_date) data_as_of_date from governed.cc_forecast_pnl > f, meod where f.data_as_of_date = meod.data_as_of_date), > me_rates as ( > SELECT > refRateFX.to_currency_code, > refRateFX.from_currency_code, > cast(refRateFX.exchange_rate as decimal(38,18)) exchange_rate, > cast('GC2' AS string) currency_label > FROM > (SELECT MAX(fx.data_as_of_date) data_as_of_date > FROM governed.standard_fx_rates fx, daod > WHERE fx.data_as_of_date LIKE '%_MCR_MTD' > and fx.data_as_of_date <= concat(daod.data_as_of_date, '_MCR_MTD')) fx, -- > get most recent rates not later than the delivery period > governed.standard_fx_rates refRateFX > WHERE refRateFX.data_as_of_date = fx.data_as_of_date > AND refRateFX.to_currency_code = 'USD' > UNION ALL > SELECT > refRateFX2.from_currency_code to_currency_code, > refRateFX1.from_currency_code, > cast(cast(refRateFX1.exchange_rate as double)/cast(refRateFX2.exchange_rate > as double) as decimal(38,18)) exchange_rate, > CAST('GC1' AS string) currency_label > FROM > (SELECT MAX(fx.data_as_of_date) data_as_of_date > FROM governed.standard_fx_rates fx, daod > WHERE fx.data_as_of_date LIKE '%_MCR_MTD' > and fx.data_as_of_date <= concat(daod.data_as_of_date, '_MCR_MTD')) fx, -- > get most recent rates not later than the delivery period > governed.standard_fx_rates refRateFX1, > governed.standard_fx_rates refRateFX2 > WHERE refRateFX1.data_as_of_date = fx.data_as_of_date > AND refRateFX2.data_as_of_date = fx.data_as_of_date > AND refRateFX1.to_currency_code = 'USD' > AND refRateFX2.from_currency_code = 'CHF' > AND refRateFX2.to_currency_code = 'USD' > ), > cc_func_hier_filter as( > SELECT DISTINCT LEVEL10 FUNCTION_CD > FROM GOVERNED.CC_CYBOS_HIER_FUNCTION > WHERE DATA_AS_OF_DATE in > (SELECT MAX(DATA_AS_OF_DATE) FROM GOVERNED.CC_CYBOS_HIER_FUNCTION) > AND LEVEL2='N14954' > ), > cc_unified_acc_hier_filter as( > SELECT DISTINCT LEVEL14 GROUP_ACCOUNT_CD > FROM governed.cc_cybos_hier_acct > WHERE DATA_AS_OF_DATE in (SELECT MAX(DATA_AS_OF_DATE) FROM > governed.cc_cybos_hier_acct) > AND LEVEL1='U0' AND LEVEL6 = 'U52000' > ), > cc_sign_reversal as( > SELECT DISTINCT LEVEL14 GROUP_ACCOUNT_CD, CAST(-1 AS DECIMAL(38,18)) > reverse_sign > FROM governed.cc_cybos_hier_acct > WHERE DATA_AS_OF_DATE in (SELECT MAX(DATA_AS_OF_DATE) FROM >
[jira] [Updated] (HIVE-25220) Query with union fails CBO with OOM
[ https://issues.apache.org/jira/browse/HIVE-25220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25220: -- Labels: pull-request-available (was: ) > Query with union fails CBO with OOM > --- > > Key: HIVE-25220 > URL: https://issues.apache.org/jira/browse/HIVE-25220 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {code} > explain cbo > with meod AS (select max(data_as_of_date) data_as_of_date from > governed.cc_forecast_pnl), > daod as (select min(f.cob_date) data_as_of_date from governed.cc_forecast_pnl > f, meod where f.data_as_of_date = meod.data_as_of_date), > me_rates as ( > SELECT > refRateFX.to_currency_code, > refRateFX.from_currency_code, > cast(refRateFX.exchange_rate as decimal(38,18)) exchange_rate, > cast('GC2' AS string) currency_label > FROM > (SELECT MAX(fx.data_as_of_date) data_as_of_date > FROM governed.standard_fx_rates fx, daod > WHERE fx.data_as_of_date LIKE '%_MCR_MTD' > and fx.data_as_of_date <= concat(daod.data_as_of_date, '_MCR_MTD')) fx, -- > get most recent rates not later than the delivery period > governed.standard_fx_rates refRateFX > WHERE refRateFX.data_as_of_date = fx.data_as_of_date > AND refRateFX.to_currency_code = 'USD' > UNION ALL > SELECT > refRateFX2.from_currency_code to_currency_code, > refRateFX1.from_currency_code, > cast(cast(refRateFX1.exchange_rate as double)/cast(refRateFX2.exchange_rate > as double) as decimal(38,18)) exchange_rate, > CAST('GC1' AS string) currency_label > FROM > (SELECT MAX(fx.data_as_of_date) data_as_of_date > FROM governed.standard_fx_rates fx, daod > WHERE fx.data_as_of_date LIKE '%_MCR_MTD' > and fx.data_as_of_date <= concat(daod.data_as_of_date, '_MCR_MTD')) fx, -- > get most recent rates not later than the delivery period > governed.standard_fx_rates refRateFX1, > governed.standard_fx_rates refRateFX2 > WHERE refRateFX1.data_as_of_date = fx.data_as_of_date > AND refRateFX2.data_as_of_date = fx.data_as_of_date > AND refRateFX1.to_currency_code = 'USD' > AND refRateFX2.from_currency_code = 'CHF' > AND refRateFX2.to_currency_code = 'USD' > ), > cc_func_hier_filter as( > SELECT DISTINCT LEVEL10 FUNCTION_CD > FROM GOVERNED.CC_CYBOS_HIER_FUNCTION > WHERE DATA_AS_OF_DATE in > (SELECT MAX(DATA_AS_OF_DATE) FROM GOVERNED.CC_CYBOS_HIER_FUNCTION) > AND LEVEL2='N14954' > ), > cc_unified_acc_hier_filter as( > SELECT DISTINCT LEVEL14 GROUP_ACCOUNT_CD > FROM governed.cc_cybos_hier_acct > WHERE DATA_AS_OF_DATE in (SELECT MAX(DATA_AS_OF_DATE) FROM > governed.cc_cybos_hier_acct) > AND LEVEL1='U0' AND LEVEL6 = 'U52000' > ), > cc_sign_reversal as( > SELECT DISTINCT LEVEL14 GROUP_ACCOUNT_CD, CAST(-1 AS DECIMAL(38,18)) > reverse_sign > FROM governed.cc_cybos_hier_acct > WHERE DATA_AS_OF_DATE in (SELECT MAX(DATA_AS_OF_DATE) FROM > governed.cc_cybos_hier_acct) > AND ((LEVEL1='U0' AND LEVEL5 = 'U30175') OR (LEVEL2 = 'EAR90006')) > ), > cc_unified_acc_hier as( > SELECT DISTINCT TRIM(level14) level14 > FROM provision.cc_hier_unified_acct_vw > WHERE level5_desc = 'Total operating expense' > AND TRIM(level14) NOT IN > (SELECT group_account_cd FROM governed.cc_temp_reg_exclude_rules > WHERE data_as_of_date IN (SELECT MAX(data_as_of_date) from > governed.cc_temp_reg_exclude_rules)) > ), > tempreg as( > SELECT function_cd, tt_cd > FROM governed.cc_temp_reg_rules > WHERE data_as_of_date IN (SELECT MAX(data_as_of_date) FROM > governed.cc_temp_reg_rules) > ), > gov as( > select cob_date, count(*) as gov_count, sum(case when measure_amt <> 0 then 1 > else 0 end) gov_non_zero_count, sum(MEASURE_AMT) as gov_amt > from ( > select pnl.cob_date, > CASE WHEN tr.function_cd IS NOT NULL AND h.level14 IS NOT NULL THEN tr.TT_CD > ELSE NULL END AS PERFORMANCE_VIEW_TYPE, > pnl.company_code, > pnl.function_code, > pnl.group_account_code, > pnl.gaap_code, > 'Actual Rate' AS CURRENCY_TYPE, > me.to_currency_code AS CURRENCY_CODE, > pnl.group_account_code MEASURE_ID, > sum(CAST(cast((cast(pnl.posting_lc_amt as double) * cast(NVL(sr.reverse_sign, > 1) as double)) as double) * cast(me.exchange_rate as double) as > decimal(38,18))) as MEASURE_AMT, > 'FORECAST' AS PROJECTION_TYPE, > CASE WHEN GROUP_ACCOUNT_CODE LIKE 'EAR%' THEN 'RETAINED_EARNINGS' ELSE 'PNL' > END AS MACRO_MEASURE, > me.currency_label AS MACRO_MEASURE_SUB_TYPE, > pnl.cob_date AS partition_date_key > from governed.cc_forecast_pnl pnl, > me_rates me > left outer join cc_func_hier_filter fHier > on pnl.function_code = fHier.FUNCTION_CD > left outer join cc_unified_acc_hier_filter aHier > on pnl.group_account_code = aHier.group_account_cd > left outer join cc_sign_reversal sr >
[jira] [Updated] (HIVE-25220) Query with union fails CBO with OOM
[ https://issues.apache.org/jira/browse/HIVE-25220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-25220: -- Description: {code} explain cbo with meod AS (select max(data_as_of_date) data_as_of_date from governed.cc_forecast_pnl), daod as (select min(f.cob_date) data_as_of_date from governed.cc_forecast_pnl f, meod where f.data_as_of_date = meod.data_as_of_date), me_rates as ( SELECT refRateFX.to_currency_code, refRateFX.from_currency_code, cast(refRateFX.exchange_rate as decimal(38,18)) exchange_rate, cast('GC2' AS string) currency_label FROM (SELECT MAX(fx.data_as_of_date) data_as_of_date FROM governed.standard_fx_rates fx, daod WHERE fx.data_as_of_date LIKE '%_MCR_MTD' and fx.data_as_of_date <= concat(daod.data_as_of_date, '_MCR_MTD')) fx, -- get most recent rates not later than the delivery period governed.standard_fx_rates refRateFX WHERE refRateFX.data_as_of_date = fx.data_as_of_date AND refRateFX.to_currency_code = 'USD' UNION ALL SELECT refRateFX2.from_currency_code to_currency_code, refRateFX1.from_currency_code, cast(cast(refRateFX1.exchange_rate as double)/cast(refRateFX2.exchange_rate as double) as decimal(38,18)) exchange_rate, CAST('GC1' AS string) currency_label FROM (SELECT MAX(fx.data_as_of_date) data_as_of_date FROM governed.standard_fx_rates fx, daod WHERE fx.data_as_of_date LIKE '%_MCR_MTD' and fx.data_as_of_date <= concat(daod.data_as_of_date, '_MCR_MTD')) fx, -- get most recent rates not later than the delivery period governed.standard_fx_rates refRateFX1, governed.standard_fx_rates refRateFX2 WHERE refRateFX1.data_as_of_date = fx.data_as_of_date AND refRateFX2.data_as_of_date = fx.data_as_of_date AND refRateFX1.to_currency_code = 'USD' AND refRateFX2.from_currency_code = 'CHF' AND refRateFX2.to_currency_code = 'USD' ), cc_func_hier_filter as( SELECT DISTINCT LEVEL10 FUNCTION_CD FROM GOVERNED.CC_CYBOS_HIER_FUNCTION WHERE DATA_AS_OF_DATE in (SELECT MAX(DATA_AS_OF_DATE) FROM GOVERNED.CC_CYBOS_HIER_FUNCTION) AND LEVEL2='N14954' ), cc_unified_acc_hier_filter as( SELECT DISTINCT LEVEL14 GROUP_ACCOUNT_CD FROM governed.cc_cybos_hier_acct WHERE DATA_AS_OF_DATE in (SELECT MAX(DATA_AS_OF_DATE) FROM governed.cc_cybos_hier_acct) AND LEVEL1='U0' AND LEVEL6 = 'U52000' ), cc_sign_reversal as( SELECT DISTINCT LEVEL14 GROUP_ACCOUNT_CD, CAST(-1 AS DECIMAL(38,18)) reverse_sign FROM governed.cc_cybos_hier_acct WHERE DATA_AS_OF_DATE in (SELECT MAX(DATA_AS_OF_DATE) FROM governed.cc_cybos_hier_acct) AND ((LEVEL1='U0' AND LEVEL5 = 'U30175') OR (LEVEL2 = 'EAR90006')) ), cc_unified_acc_hier as( SELECT DISTINCT TRIM(level14) level14 FROM provision.cc_hier_unified_acct_vw WHERE level5_desc = 'Total operating expense' AND TRIM(level14) NOT IN (SELECT group_account_cd FROM governed.cc_temp_reg_exclude_rules WHERE data_as_of_date IN (SELECT MAX(data_as_of_date) from governed.cc_temp_reg_exclude_rules)) ), tempreg as( SELECT function_cd, tt_cd FROM governed.cc_temp_reg_rules WHERE data_as_of_date IN (SELECT MAX(data_as_of_date) FROM governed.cc_temp_reg_rules) ), gov as( select cob_date, count(*) as gov_count, sum(case when measure_amt <> 0 then 1 else 0 end) gov_non_zero_count, sum(MEASURE_AMT) as gov_amt from ( select pnl.cob_date, CASE WHEN tr.function_cd IS NOT NULL AND h.level14 IS NOT NULL THEN tr.TT_CD ELSE NULL END AS PERFORMANCE_VIEW_TYPE, pnl.company_code, pnl.function_code, pnl.group_account_code, pnl.gaap_code, 'Actual Rate' AS CURRENCY_TYPE, me.to_currency_code AS CURRENCY_CODE, pnl.group_account_code MEASURE_ID, sum(CAST(cast((cast(pnl.posting_lc_amt as double) * cast(NVL(sr.reverse_sign, 1) as double)) as double) * cast(me.exchange_rate as double) as decimal(38,18))) as MEASURE_AMT, 'FORECAST' AS PROJECTION_TYPE, CASE WHEN GROUP_ACCOUNT_CODE LIKE 'EAR%' THEN 'RETAINED_EARNINGS' ELSE 'PNL' END AS MACRO_MEASURE, me.currency_label AS MACRO_MEASURE_SUB_TYPE, pnl.cob_date AS partition_date_key from governed.cc_forecast_pnl pnl, me_rates me left outer join cc_func_hier_filter fHier on pnl.function_code = fHier.FUNCTION_CD left outer join cc_unified_acc_hier_filter aHier on pnl.group_account_code = aHier.group_account_cd left outer join cc_sign_reversal sr on pnl.group_account_code = sr.group_account_cd left outer join tempreg tr on pnl.function_code = tr.function_cd left outer join cc_unified_acc_hier h on pnl.group_account_code = h.level14 WHERE me.from_currency_code = (CASE WHEN pnl.local_currency_code LIKE 'AR' THEN SUBSTR(pnl.local_currency_code, 1, 3) ELSE pnl.local_currency_code END) and data_as_of_date in (select max(data_as_of_date) from governed.cc_forecast_pnl) AND (fHier.FUNCTION_CD IS NOT NULL OR aHier.group_account_cd IS NOT NULL) group by pnl.cob_date,CASE WHEN tr.function_cd IS NOT NULL AND h.level14 IS NOT NULL THEN tr.TT_CD ELSE NULL END, pnl.company_code,pnl.function_code,pnl.group_account_code,pnl.gaap_code,me.to_currency_code,me.currency_label)a group by cob_date
[jira] [Assigned] (HIVE-25220) Query with union fails CBO with OOM
[ https://issues.apache.org/jira/browse/HIVE-25220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa reassigned HIVE-25220: - > Query with union fails CBO with OOM > --- > > Key: HIVE-25220 > URL: https://issues.apache.org/jira/browse/HIVE-25220 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25213) Implement List getTables() for existing connectors.
[ https://issues.apache.org/jira/browse/HIVE-25213?focusedWorklogId=608885=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608885 ] ASF GitHub Bot logged work on HIVE-25213: - Author: ASF GitHub Bot Created on: 09/Jun/21 01:55 Start Date: 09/Jun/21 01:55 Worklog Time Spent: 10m Work Description: dantongdong opened a new pull request #2371: URL: https://github.com/apache/hive/pull/2371 [HIVE-25213](https://issues.apache.org/jira/browse/HIVE-25213): Implement List getTables() for existing connectors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608885) Remaining Estimate: 0h Time Spent: 10m > Implement List getTables() for existing connectors. > -- > > Key: HIVE-25213 > URL: https://issues.apache.org/jira/browse/HIVE-25213 > Project: Hive > Issue Type: Sub-task >Reporter: Naveen Gangam >Assignee: Dantong Dong >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In the initial implementation, connector providers do not implement the > getTables(string pattern) spi. We had deferred it for later. Only > getTableNames() and getTable() were implemented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25213) Implement List getTables() for existing connectors.
[ https://issues.apache.org/jira/browse/HIVE-25213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25213: -- Labels: pull-request-available (was: ) > Implement List getTables() for existing connectors. > -- > > Key: HIVE-25213 > URL: https://issues.apache.org/jira/browse/HIVE-25213 > Project: Hive > Issue Type: Sub-task >Reporter: Naveen Gangam >Assignee: Dantong Dong >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In the initial implementation, connector providers do not implement the > getTables(string pattern) spi. We had deferred it for later. Only > getTableNames() and getTable() were implemented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24994) get_aggr_stats_for call fail with "Tried to send an out-of-range integer"
[ https://issues.apache.org/jira/browse/HIVE-24994?focusedWorklogId=608839=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608839 ] ASF GitHub Bot logged work on HIVE-24994: - Author: ASF GitHub Bot Created on: 09/Jun/21 00:09 Start Date: 09/Jun/21 00:09 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #2162: URL: https://github.com/apache/hive/pull/2162#issuecomment-857274702 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608839) Time Spent: 0.5h (was: 20m) > get_aggr_stats_for call fail with "Tried to send an out-of-range integer" > - > > Key: HIVE-24994 > URL: https://issues.apache.org/jira/browse/HIVE-24994 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Rajkumar Singh >Assignee: Rajkumar Singh >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > aggrColStatsForPartitions call fail with the Postgres LIMIT if the no of > partitions passed in the direct sql goes beyond the 32767 > {code:java} > postgresql.util.PSQLException: An I/O error occurred while sending to the > backend. > at > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:337) > ~[postgresql-42.2.8.jar:42.2.8] > at > org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:446) > ~[postgresql-42.2.8.jar:42.2.8] > at > org.postgresql.jdbc.PgStatement.execute(PgStatement.java:370) > ~[postgresql-42.2.8.jar:42.2.8] > at > org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:149) > ~[postgresql-42.2.8.jar:42.2.8] > at > org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:108) > ~[postgresql-42.2.8.jar:42.2.8] > at > com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52) > ~[HikariCP-2.6.1.jar:?] > at > com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java) > [HikariCP-2.6.1.jar:?] > at > org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeQuery(ParamLoggingPreparedStatement.java:375) > [datanucleus-rdbms-4.1.19.jar:?] > at > org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:552) > [datanucleus-rdbms-4.1.19.jar:?] > at > org.datanucleus.store.rdbms.query.SQLQuery.performExecute(SQLQuery.java:645) > [datanucleus-rdbms-4.1.19.jar:?] > at > org.datanucleus.store.query.Query.executeQuery(Query.java:1855) > [datanucleus-core-4.1.17.jar:?] > at > org.datanucleus.store.rdbms.query.SQLQuery.executeWithArray(SQLQuery.java:807) > [datanucleus-rdbms-4.1.19.jar:?] > at > org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:368) > [datanucleus-api-jdo-4.2.4.jar:?] > at > org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:267) > [datanucleus-api-jdo-4.2.4.jar:?] > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:2058) > [hive-exec-3.1.0.3.1.5.6019-4.jar:3.1.0.3.1.5.6019-4] > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:2050) > [hive-exec-3.1.0.3.1.5.6019-4.jar:3.1.0.3.1.5.6019-4] > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.access$1500(MetaStoreDirectSql.java:110) > [hive-exec-3.1.0.3.1.5.6019-4.jar:3.1.0.3.1.5.6019-4] > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql$15$1.run(MetaStoreDirectSql.java:1530) > [hive-exec-3.1.0.3.1.5.6019-4.jar:3.1.0.3.1.5.6019-4] > at > org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:73) > [hive-exec-3.1.0.3.1.5.6019-4.jar:3.1.0.3.1.5.6019-4] > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql$15.run(MetaStoreDirectSql.java:1521) > [hive-exec-3.1.0.3.1.5.6019-4.jar:3.1.0.3.1.5.6019-4] > at
[jira] [Updated] (HIVE-25219) Backward incompatible timestamp serialization in Avro for certain timezones
[ https://issues.apache.org/jira/browse/HIVE-25219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25219: -- Labels: pull-request-available (was: ) > Backward incompatible timestamp serialization in Avro for certain timezones > --- > > Key: HIVE-25219 > URL: https://issues.apache.org/jira/browse/HIVE-25219 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 3.1.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > HIVE-12192, HIVE-20007 changed the way that timestamp computations are > performed and to some extend how timestamps are serialized and deserialized > in files (Parquet, Avro). > In versions that include HIVE-12192 or HIVE-20007 the serialization in Avro > files is not backwards compatible. In other words writing timestamps with a > version of Hive that includes HIVE-12192/HIVE-20007 and reading them with > another (not including the previous issues) may lead to different results > depending on the default timezone of the system. > Consider the following scenario where the default system timezone is set to > US/Pacific. > At apache/master commit eedcd82bc2d61861a27205f925ba0ffab9b6bca8 > {code:sql} > CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO > LOCATION '/tmp/hiveexttbl/employee'; > INSERT INTO employee VALUES (1, '1880-01-01 00:00:00'); > INSERT INTO employee VALUES (2, '1884-01-01 00:00:00'); > INSERT INTO employee VALUES (3, '1990-01-01 00:00:00'); > SELECT * FROM employee; > {code} > |1|1880-01-01 00:00:00| > |2|1884-01-01 00:00:00| > |3|1990-01-01 00:00:00| > At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356 > {code:sql} > CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO > LOCATION '/tmp/hiveexttbl/employee'; > SELECT * FROM employee; > {code} > |1|1879-12-31 23:52:58| > |2|1884-01-01 00:00:00| > |3|1990-01-01 00:00:00| > The timestamp for {{eid=1}} in branch-2.3 is different from the one in master. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25219) Backward incompatible timestamp serialization in Avro for certain timezones
[ https://issues.apache.org/jira/browse/HIVE-25219?focusedWorklogId=608808=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608808 ] ASF GitHub Bot logged work on HIVE-25219: - Author: ASF GitHub Bot Created on: 08/Jun/21 22:35 Start Date: 08/Jun/21 22:35 Worklog Time Spent: 10m Work Description: zabetak opened a new pull request #2370: URL: https://github.com/apache/hive/pull/2370 ### What changes were proposed in this pull request? 1. Add new read/write config properties to control legacy zone conversions in Avro. 2. Exploit file metadata and property to choose between new/old conversion rules. ### Why are the changes needed? Provide the end-users the possibility to write backward compatible timestamps in Parquet files so that files can be read correctly by older versions. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? 1. New qtests for writting Avro timestamps (`avro_write_legacy_timestamp.q`, `avro_write_new_timestamp.q`) 2. Manual tests * Export Avro table with current Hive version setting `hive.avro.timestamp.write.legacy.conversion.enabled=true` * Read from external Parquet table with Hive 2 (commit 324f9fa) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608808) Remaining Estimate: 0h Time Spent: 10m > Backward incompatible timestamp serialization in Avro for certain timezones > --- > > Key: HIVE-25219 > URL: https://issues.apache.org/jira/browse/HIVE-25219 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 3.1.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > HIVE-12192, HIVE-20007 changed the way that timestamp computations are > performed and to some extend how timestamps are serialized and deserialized > in files (Parquet, Avro). > In versions that include HIVE-12192 or HIVE-20007 the serialization in Avro > files is not backwards compatible. In other words writing timestamps with a > version of Hive that includes HIVE-12192/HIVE-20007 and reading them with > another (not including the previous issues) may lead to different results > depending on the default timezone of the system. > Consider the following scenario where the default system timezone is set to > US/Pacific. > At apache/master commit eedcd82bc2d61861a27205f925ba0ffab9b6bca8 > {code:sql} > CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO > LOCATION '/tmp/hiveexttbl/employee'; > INSERT INTO employee VALUES (1, '1880-01-01 00:00:00'); > INSERT INTO employee VALUES (2, '1884-01-01 00:00:00'); > INSERT INTO employee VALUES (3, '1990-01-01 00:00:00'); > SELECT * FROM employee; > {code} > |1|1880-01-01 00:00:00| > |2|1884-01-01 00:00:00| > |3|1990-01-01 00:00:00| > At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356 > {code:sql} > CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO > LOCATION '/tmp/hiveexttbl/employee'; > SELECT * FROM employee; > {code} > |1|1879-12-31 23:52:58| > |2|1884-01-01 00:00:00| > |3|1990-01-01 00:00:00| > The timestamp for {{eid=1}} in branch-2.3 is different from the one in master. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25219) Backward incompatible timestamp serialization in Avro for certain timezones
[ https://issues.apache.org/jira/browse/HIVE-25219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359588#comment-17359588 ] Stamatis Zampetakis commented on HIVE-25219: The issue describes the same problem with HIVE-25104 but for Avro instead of Parquet. > Backward incompatible timestamp serialization in Avro for certain timezones > --- > > Key: HIVE-25219 > URL: https://issues.apache.org/jira/browse/HIVE-25219 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 3.1.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.0.0 > > > HIVE-12192, HIVE-20007 changed the way that timestamp computations are > performed and to some extend how timestamps are serialized and deserialized > in files (Parquet, Avro). > In versions that include HIVE-12192 or HIVE-20007 the serialization in Avro > files is not backwards compatible. In other words writing timestamps with a > version of Hive that includes HIVE-12192/HIVE-20007 and reading them with > another (not including the previous issues) may lead to different results > depending on the default timezone of the system. > Consider the following scenario where the default system timezone is set to > US/Pacific. > At apache/master commit eedcd82bc2d61861a27205f925ba0ffab9b6bca8 > {code:sql} > CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO > LOCATION '/tmp/hiveexttbl/employee'; > INSERT INTO employee VALUES (1, '1880-01-01 00:00:00'); > INSERT INTO employee VALUES (2, '1884-01-01 00:00:00'); > INSERT INTO employee VALUES (3, '1990-01-01 00:00:00'); > SELECT * FROM employee; > {code} > |1|1880-01-01 00:00:00| > |2|1884-01-01 00:00:00| > |3|1990-01-01 00:00:00| > At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356 > {code:sql} > CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO > LOCATION '/tmp/hiveexttbl/employee'; > SELECT * FROM employee; > {code} > |1|1879-12-31 23:52:58| > |2|1884-01-01 00:00:00| > |3|1990-01-01 00:00:00| > The timestamp for {{eid=1}} in branch-2.3 is different from the one in master. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25219) Backward incompatible timestamp serialization in Avro for certain timezones
[ https://issues.apache.org/jira/browse/HIVE-25219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis reassigned HIVE-25219: -- > Backward incompatible timestamp serialization in Avro for certain timezones > --- > > Key: HIVE-25219 > URL: https://issues.apache.org/jira/browse/HIVE-25219 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 3.1.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.0.0 > > > HIVE-12192, HIVE-20007 changed the way that timestamp computations are > performed and to some extend how timestamps are serialized and deserialized > in files (Parquet, Avro). > In versions that include HIVE-12192 or HIVE-20007 the serialization in Avro > files is not backwards compatible. In other words writing timestamps with a > version of Hive that includes HIVE-12192/HIVE-20007 and reading them with > another (not including the previous issues) may lead to different results > depending on the default timezone of the system. > Consider the following scenario where the default system timezone is set to > US/Pacific. > At apache/master commit eedcd82bc2d61861a27205f925ba0ffab9b6bca8 > {code:sql} > CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO > LOCATION '/tmp/hiveexttbl/employee'; > INSERT INTO employee VALUES (1, '1880-01-01 00:00:00'); > INSERT INTO employee VALUES (2, '1884-01-01 00:00:00'); > INSERT INTO employee VALUES (3, '1990-01-01 00:00:00'); > SELECT * FROM employee; > {code} > |1|1880-01-01 00:00:00| > |2|1884-01-01 00:00:00| > |3|1990-01-01 00:00:00| > At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356 > {code:sql} > CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO > LOCATION '/tmp/hiveexttbl/employee'; > SELECT * FROM employee; > {code} > |1|1879-12-31 23:52:58| > |2|1884-01-01 00:00:00| > |3|1990-01-01 00:00:00| > The timestamp for {{eid=1}} in branch-2.3 is different from the one in master. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization
[ https://issues.apache.org/jira/browse/HIVE-18284?focusedWorklogId=608671=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608671 ] ASF GitHub Bot logged work on HIVE-18284: - Author: ASF GitHub Bot Created on: 08/Jun/21 18:45 Start Date: 08/Jun/21 18:45 Worklog Time Spent: 10m Work Description: Vinodh-thimmisetty commented on pull request #1400: URL: https://github.com/apache/hive/pull/1400#issuecomment-857008049 Hi @kgyrtkirk, Does it have any impact If we include LIMIT after Distribute by clause ? We had the same issue, but luckily the table size was small. So, by including LIMIT **, we are able to insert overwrite with distribute by key. **Note:** I have ran with both mr and tez executions engine types -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608671) Time Spent: 3h 50m (was: 3h 40m) > NPE when inserting data with 'distribute by' clause with dynpart sort > optimization > -- > > Key: HIVE-18284 > URL: https://issues.apache.org/jira/browse/HIVE-18284 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.3.1, 2.3.2, 3.0.0, 3.1.1, 3.1.2, 4.0.0 >Reporter: Aki Tanaka >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > A Null Pointer Exception occurs when inserting data with 'distribute by' > clause. The following snippet query reproduces this issue: > *(non-vectorized , non-llap mode)* > {code:java} > create table table1 (col1 string, datekey int); > insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1); > create table table2 (col1 string) partitioned by (datekey int); > set hive.vectorized.execution.enabled=false; > set hive.optimize.sort.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=nonstrict; > insert into table table2 > PARTITION(datekey) > select col1, > datekey > from table1 > distribute by datekey ; > {code} > I could run the insert query without the error if I remove Distribute By or > use Cluster By clause. > It seems that the issue happens because Distribute By does not guarantee > clustering or sorting properties on the distributed keys. > FileSinkOperator removes the previous fsp. FileSinkOperator will remove the > previous fsp which might be re-used when we use Distribute By. > https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972 > The following stack trace is logged. > {code:java} > Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, > diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}} > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while
[jira] [Resolved] (HIVE-23987) Upgrade arrow version to 0.11.0
[ https://issues.apache.org/jira/browse/HIVE-23987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez resolved HIVE-23987. Fix Version/s: 4.0.0 Assignee: Jesus Camacho Rodriguez (was: Barnabas Maidics) Resolution: Fixed > Upgrade arrow version to 0.11.0 > --- > > Key: HIVE-23987 > URL: https://issues.apache.org/jira/browse/HIVE-23987 > Project: Hive > Issue Type: Improvement >Reporter: Barnabas Maidics >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > As part of [HIVE-23890|https://issues.apache.org/jira/browse/HIVE-23890], > we're introducing flatbuffers as a dependency. > Arrow 0.10.0 has an unofficial flatbuffer dependency, which is incompatible > with the official ones: https://issues.apache.org/jira/browse/ARROW-3175 > It was fixed in 0.11.0. We should upgrade to that version -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23987) Upgrade arrow version to 0.11.0
[ https://issues.apache.org/jira/browse/HIVE-23987?focusedWorklogId=608656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608656 ] ASF GitHub Bot logged work on HIVE-23987: - Author: ASF GitHub Bot Created on: 08/Jun/21 17:57 Start Date: 08/Jun/21 17:57 Worklog Time Spent: 10m Work Description: jcamachor merged pull request #2366: URL: https://github.com/apache/hive/pull/2366 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608656) Time Spent: 20m (was: 10m) > Upgrade arrow version to 0.11.0 > --- > > Key: HIVE-23987 > URL: https://issues.apache.org/jira/browse/HIVE-23987 > Project: Hive > Issue Type: Improvement >Reporter: Barnabas Maidics >Assignee: Barnabas Maidics >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > As part of [HIVE-23890|https://issues.apache.org/jira/browse/HIVE-23890], > we're introducing flatbuffers as a dependency. > Arrow 0.10.0 has an unofficial flatbuffer dependency, which is incompatible > with the official ones: https://issues.apache.org/jira/browse/ARROW-3175 > It was fixed in 0.11.0. We should upgrade to that version -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25218) Add a replication migration tool for external tables
[ https://issues.apache.org/jira/browse/HIVE-25218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25218: -- Labels: pull-request-available (was: ) > Add a replication migration tool for external tables > > > Key: HIVE-25218 > URL: https://issues.apache.org/jira/browse/HIVE-25218 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Add a tool which can confirm migration of external tables post replication > from one cluster to another. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25218) Add a replication migration tool for external tables
[ https://issues.apache.org/jira/browse/HIVE-25218?focusedWorklogId=608567=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608567 ] ASF GitHub Bot logged work on HIVE-25218: - Author: ASF GitHub Bot Created on: 08/Jun/21 16:17 Start Date: 08/Jun/21 16:17 Worklog Time Spent: 10m Work Description: ayushtkn opened a new pull request #2369: URL: https://github.com/apache/hive/pull/2369 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608567) Remaining Estimate: 0h Time Spent: 10m > Add a replication migration tool for external tables > > > Key: HIVE-25218 > URL: https://issues.apache.org/jira/browse/HIVE-25218 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Add a tool which can confirm migration of external tables post replication > from one cluster to another. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25218) Add a replication migration tool for external tables
[ https://issues.apache.org/jira/browse/HIVE-25218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena reassigned HIVE-25218: --- > Add a replication migration tool for external tables > > > Key: HIVE-25218 > URL: https://issues.apache.org/jira/browse/HIVE-25218 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > > Add a tool which can confirm migration of external tables post replication > from one cluster to another. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608551 ] ASF GitHub Bot logged work on HIVE-24991: - Author: ASF GitHub Bot Created on: 08/Jun/21 15:59 Start Date: 08/Jun/21 15:59 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2264: URL: https://github.com/apache/hive/pull/2264#discussion_r647582327 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java ## @@ -281,16 +285,23 @@ private VectorizedOrcAcidRowBatchReader(JobConf conf, OrcSplit orcSplit, Reporte deleteEventReaderOptions.range(0, Long.MAX_VALUE); deleteEventReaderOptions.searchArgument(null, null); keyInterval = findMinMaxKeys(orcSplit, conf, deleteEventReaderOptions); +fetchDeletedRows = conf.getBoolean(Constants.ACID_FETCH_DELETED_ROWS, false); DeleteEventRegistry der; try { // See if we can load all the relevant delete events from all the // delete deltas in memory... + ColumnizedDeleteEventRegistry.OriginalWriteIdLoader writeIdLoader; + if (fetchDeletedRows) { +writeIdLoader = new ColumnizedDeleteEventRegistry.BothWriteIdLoader(); Review comment: done ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java ## @@ -303,6 +314,12 @@ private VectorizedOrcAcidRowBatchReader(JobConf conf, OrcSplit orcSplit, Reporte VectorizedRowBatch.DEFAULT_SIZE, null, null, null); } rowIdProjected = areRowIdsProjected(rbCtx); +rowIsDeletedProjected = isVirtualColumnProjected(rbCtx, VirtualColumn.ROWISDELETED); Review comment: done ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java ## @@ -303,6 +314,12 @@ private VectorizedOrcAcidRowBatchReader(JobConf conf, OrcSplit orcSplit, Reporte VectorizedRowBatch.DEFAULT_SIZE, null, null, null); } rowIdProjected = areRowIdsProjected(rbCtx); +rowIsDeletedProjected = isVirtualColumnProjected(rbCtx, VirtualColumn.ROWISDELETED); +if (rowIsDeletedProjected) { + rowIsDeletedVector = new RowIsDeletedColumnVector(); Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608551) Time Spent: 4h 20m (was: 4h 10m) > Enable fetching deleted rows in vectorized mode > --- > > Key: HIVE-24991 > URL: https://issues.apache.org/jira/browse/HIVE-24991 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > HIVE-24855 enables loading deleted rows from ORC tables when table property > *acid.fetch.deleted.rows* is true. > The goal of this jira is to enable this feature in vectorized orc batch > reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608553 ] ASF GitHub Bot logged work on HIVE-24991: - Author: ASF GitHub Bot Created on: 08/Jun/21 15:59 Start Date: 08/Jun/21 15:59 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2264: URL: https://github.com/apache/hive/pull/2264#discussion_r647583124 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java ## @@ -1748,7 +1946,7 @@ public int compareTo(CompressedOwid other) { assert shouldReadDeleteDeltasWithLlap(conf, true); } deleteReaderValue = new DeleteReaderValue(readerData.reader, deleteDeltaFile, readerOptions, bucket, -validWriteIdList, isBucketedTable, conf, keyInterval, orcSplit, numRows, cacheTag, fileId); +validWriteIdList, isBucketedTable, conf, keyInterval, orcSplit, numRows, cacheTag, fileId); Review comment: reverted -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608553) Time Spent: 4h 40m (was: 4.5h) > Enable fetching deleted rows in vectorized mode > --- > > Key: HIVE-24991 > URL: https://issues.apache.org/jira/browse/HIVE-24991 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > HIVE-24855 enables loading deleted rows from ORC tables when table property > *acid.fetch.deleted.rows* is true. > The goal of this jira is to enable this feature in vectorized orc batch > reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608552=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608552 ] ASF GitHub Bot logged work on HIVE-24991: - Author: ASF GitHub Bot Created on: 08/Jun/21 15:59 Start Date: 08/Jun/21 15:59 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2264: URL: https://github.com/apache/hive/pull/2264#discussion_r647582736 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java ## @@ -948,7 +978,7 @@ public boolean next(NullWritable key, VectorizedRowBatch value) throws IOExcepti // This loop fills up the selected[] vector with all the index positions that are selected. for (int setBitIndex = selectedBitSet.nextSetBit(0), selectedItr = 0; setBitIndex >= 0; - setBitIndex = selectedBitSet.nextSetBit(setBitIndex+1), ++selectedItr) { + setBitIndex = selectedBitSet.nextSetBit(setBitIndex + 1), ++selectedItr) { Review comment: reverted -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608552) Time Spent: 4.5h (was: 4h 20m) > Enable fetching deleted rows in vectorized mode > --- > > Key: HIVE-24991 > URL: https://issues.apache.org/jira/browse/HIVE-24991 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > HIVE-24855 enables loading deleted rows from ORC tables when table property > *acid.fetch.deleted.rows* is true. > The goal of this jira is to enable this feature in vectorized orc batch > reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25081) Put metrics collection behind a feature flag
[ https://issues.apache.org/jira/browse/HIVE-25081?focusedWorklogId=608549=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608549 ] ASF GitHub Bot logged work on HIVE-25081: - Author: ASF GitHub Bot Created on: 08/Jun/21 15:58 Start Date: 08/Jun/21 15:58 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #2332: URL: https://github.com/apache/hive/pull/2332#discussion_r647567066 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java ## @@ -120,7 +120,7 @@ public void run() { // don't doom the entire thread. try { handle = txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Initiator.name()); - if (metricsEnabled) { + if (metricsEnabled && MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) { Review comment: Same as cleaner ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java ## @@ -454,6 +454,8 @@ public static ConfVars getMetaConf(String name) { "hive.metastore.acidmetrics.check.interval", 300, TimeUnit.SECONDS, "Time in seconds between acid related metric collection runs."), +METASTORE_ACIDMETRICS_EXT_ON("metastore.acidmetrics.ext.on", "hive.metastore.acidmetrics.ext.on", true, +"Whether to collect additional acid related metrics outside of the acid metrics service."), Review comment: I think these are only enabled if `MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METRICS_ENABLED)==true` , so it would be good to mention that in the description ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java ## @@ -115,41 +117,45 @@ public static DeltaFilesMetricReporter getInstance() { return InstanceHolder.instance; } - public static synchronized void init(HiveConf conf){ + public static synchronized void init(HiveConf conf) { getInstance().configure(conf); } public void submit(TezCounters counters) { -updateMetrics(NUM_OBSOLETE_DELTAS, -obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, counters); -updateMetrics(NUM_DELTAS, -deltaCache, deltaTopN, deltasThreshold, counters); -updateMetrics(NUM_SMALL_DELTAS, -smallDeltaCache, smallDeltaTopN, deltasThreshold, counters); +if(acidMetricsExtEnabled) { + updateMetrics(NUM_OBSOLETE_DELTAS, + obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, counters); + updateMetrics(NUM_DELTAS, + deltaCache, deltaTopN, deltasThreshold, counters); + updateMetrics(NUM_SMALL_DELTAS, + smallDeltaCache, smallDeltaTopN, deltasThreshold, counters); +} } - public static void mergeDeltaFilesStats(AcidDirectory dir, long checkThresholdInSec, -float deltaPctThreshold, EnumMap> deltaFilesStats) throws IOException { -long baseSize = getBaseSize(dir); -int numObsoleteDeltas = getNumObsoleteDeltas(dir, checkThresholdInSec); + public static void mergeDeltaFilesStats(AcidDirectory dir, long checkThresholdInSec, float deltaPctThreshold, + EnumMap> deltaFilesStats, Configuration conf) throws IOException { +if (MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) { Review comment: Instead of adding the check here, it makes a bit more sense to add it to these checks in org.apache.hadoop.hive.ql.io.orc.OrcInputFormat#generateSplitsInfo: ``` if (metricsEnabled && directory instanceof AcidDirectory) { DeltaFilesMetricReporter.mergeDeltaFilesStats((AcidDirectory) directory, checkThresholdInSec, deltaPctThreshold, deltaFilesStats); } ``` ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -111,7 +111,7 @@ public void run() { // so wrap it in a big catch Throwable statement. try { handle = txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Cleaner.name()); - if (metricsEnabled) { + if (metricsEnabled && MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) { Review comment: I think this is the same logic as `metricsEnabled = MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METRICS_ENABLED) && MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)` right? ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java ## @@ -115,41 +117,45 @@ public static DeltaFilesMetricReporter getInstance() { return InstanceHolder.instance; } - public static synchronized void init(HiveConf conf){ +
[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608548=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608548 ] ASF GitHub Bot logged work on HIVE-24991: - Author: ASF GitHub Bot Created on: 08/Jun/21 15:58 Start Date: 08/Jun/21 15:58 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2264: URL: https://github.com/apache/hive/pull/2264#discussion_r647582032 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java ## @@ -1940,39 +2091,38 @@ public boolean isEmpty() { } @Override public void findDeletedRecords(ColumnVector[] cols, int size, BitSet selectedBitSet) { - if (rowIds == null || compressedOwids == null) { + if (rowIds == null || writeIds == null || writeIds.isEmpty()) { return; } // Iterate through the batch and for each (owid, rowid) in the batch // check if it is deleted or not. long[] originalWriteIdVector = - cols[OrcRecordUpdater.ORIGINAL_WRITEID].isRepeating ? null - : ((LongColumnVector) cols[OrcRecordUpdater.ORIGINAL_WRITEID]).vector; + cols[OrcRecordUpdater.ORIGINAL_WRITEID].isRepeating ? null Review comment: reverted ## File path: ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedOrcAcidRowBatchReader.java ## @@ -961,26 +966,41 @@ private void testDeleteEventOriginalFiltering2() throws Exception { @Test public void testVectorizedOrcAcidRowBatchReader() throws Exception { +setupTestData(); + + testVectorizedOrcAcidRowBatchReader(ColumnizedDeleteEventRegistry.class.getName()); + +// To test the SortMergedDeleteEventRegistry, we need to explicitly set the +// HIVE_TRANSACTIONAL_NUM_EVENTS_IN_MEMORY constant to a smaller value. +int oldValue = conf.getInt(HiveConf.ConfVars.HIVE_TRANSACTIONAL_NUM_EVENTS_IN_MEMORY.varname, 100); + conf.setInt(HiveConf.ConfVars.HIVE_TRANSACTIONAL_NUM_EVENTS_IN_MEMORY.varname, 1000); + testVectorizedOrcAcidRowBatchReader(SortMergedDeleteEventRegistry.class.getName()); + +// Restore the old value. + conf.setInt(HiveConf.ConfVars.HIVE_TRANSACTIONAL_NUM_EVENTS_IN_MEMORY.varname, oldValue); + } + + private void setupTestData() throws IOException { conf.set("bucket_count", "1"); - conf.set(ValidTxnList.VALID_TXNS_KEY, - new ValidReadTxnList(new long[0], new BitSet(), 1000, Long.MAX_VALUE).writeToString()); +conf.set(ValidTxnList.VALID_TXNS_KEY, +new ValidReadTxnList(new long[0], new BitSet(), 1000, Long.MAX_VALUE).writeToString()); int bucket = 0; AcidOutputFormat.Options options = new AcidOutputFormat.Options(conf) -.filesystem(fs) -.bucket(bucket) -.writingBase(false) -.minimumWriteId(1) -.maximumWriteId(NUM_OWID) -.inspector(inspector) -.reporter(Reporter.NULL) -.recordIdColumn(1) -.finalDestination(root); +.filesystem(fs) Review comment: reverted -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608548) Time Spent: 4h 10m (was: 4h) > Enable fetching deleted rows in vectorized mode > --- > > Key: HIVE-24991 > URL: https://issues.apache.org/jira/browse/HIVE-24991 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > HIVE-24855 enables loading deleted rows from ORC tables when table property > *acid.fetch.deleted.rows* is true. > The goal of this jira is to enable this feature in vectorized orc batch > reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24135) Drop database doesn't delete directory in managed location
[ https://issues.apache.org/jira/browse/HIVE-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam resolved HIVE-24135. -- Fix Version/s: 4.0.0 Resolution: Fixed Fix from PR 2454 has been reviewed and merged. Thanks @ychen for the review. > Drop database doesn't delete directory in managed location > -- > > Key: HIVE-24135 > URL: https://issues.apache.org/jira/browse/HIVE-24135 > Project: Hive > Issue Type: Sub-task >Reporter: Karen Coppage >Assignee: Naveen Gangam >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Repro: > say the default managed location is managed/hive and the default external > location is external/hive. > {code:java} > create database db1; -- creates: external/hive/db1.db > create table db1.table1 (i int); -- creates: managed/hive/db1.db and > managed/hive/db1.db/table1 > drop database db1 cascade; -- removes : external/hive/db1.db and > managed/hive/db1.db/table1 > {code} > Problem: Directory managed/hive/db1.db remains. > Since HIVE-22995, dbs have a managed (managedLocationUri) and an external > location (locationUri). I think the issue is that > HiveMetaStore.HMSHandler#drop_database_core deletes only the db directory in > the external location. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24135) Drop database doesn't delete directory in managed location
[ https://issues.apache.org/jira/browse/HIVE-24135?focusedWorklogId=608538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608538 ] ASF GitHub Bot logged work on HIVE-24135: - Author: ASF GitHub Bot Created on: 08/Jun/21 15:41 Start Date: 08/Jun/21 15:41 Worklog Time Spent: 10m Work Description: nrg4878 closed pull request #2354: URL: https://github.com/apache/hive/pull/2354 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608538) Time Spent: 1.5h (was: 1h 20m) > Drop database doesn't delete directory in managed location > -- > > Key: HIVE-24135 > URL: https://issues.apache.org/jira/browse/HIVE-24135 > Project: Hive > Issue Type: Sub-task >Reporter: Karen Coppage >Assignee: Naveen Gangam >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Repro: > say the default managed location is managed/hive and the default external > location is external/hive. > {code:java} > create database db1; -- creates: external/hive/db1.db > create table db1.table1 (i int); -- creates: managed/hive/db1.db and > managed/hive/db1.db/table1 > drop database db1 cascade; -- removes : external/hive/db1.db and > managed/hive/db1.db/table1 > {code} > Problem: Directory managed/hive/db1.db remains. > Since HIVE-22995, dbs have a managed (managedLocationUri) and an external > location (locationUri). I think the issue is that > HiveMetaStore.HMSHandler#drop_database_core deletes only the db directory in > the external location. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24135) Drop database doesn't delete directory in managed location
[ https://issues.apache.org/jira/browse/HIVE-24135?focusedWorklogId=608537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608537 ] ASF GitHub Bot logged work on HIVE-24135: - Author: ASF GitHub Bot Created on: 08/Jun/21 15:41 Start Date: 08/Jun/21 15:41 Worklog Time Spent: 10m Work Description: nrg4878 commented on pull request #2354: URL: https://github.com/apache/hive/pull/2354#issuecomment-856881562 Thanks for the review. Fix has been committed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608537) Time Spent: 1h 20m (was: 1h 10m) > Drop database doesn't delete directory in managed location > -- > > Key: HIVE-24135 > URL: https://issues.apache.org/jira/browse/HIVE-24135 > Project: Hive > Issue Type: Sub-task >Reporter: Karen Coppage >Assignee: Naveen Gangam >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Repro: > say the default managed location is managed/hive and the default external > location is external/hive. > {code:java} > create database db1; -- creates: external/hive/db1.db > create table db1.table1 (i int); -- creates: managed/hive/db1.db and > managed/hive/db1.db/table1 > drop database db1 cascade; -- removes : external/hive/db1.db and > managed/hive/db1.db/table1 > {code} > Problem: Directory managed/hive/db1.db remains. > Since HIVE-22995, dbs have a managed (managedLocationUri) and an external > location (locationUri). I think the issue is that > HiveMetaStore.HMSHandler#drop_database_core deletes only the db directory in > the external location. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608534 ] ASF GitHub Bot logged work on HIVE-24991: - Author: ASF GitHub Bot Created on: 08/Jun/21 15:37 Start Date: 08/Jun/21 15:37 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2264: URL: https://github.com/apache/hive/pull/2264#discussion_r647563794 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java ## @@ -2039,4 +2189,29 @@ private static IntegerColumnStatistics deserializeIntColumnStatistics(List Enable fetching deleted rows in vectorized mode > --- > > Key: HIVE-24991 > URL: https://issues.apache.org/jira/browse/HIVE-24991 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 4h > Remaining Estimate: 0h > > HIVE-24855 enables loading deleted rows from ORC tables when table property > *acid.fetch.deleted.rows* is true. > The goal of this jira is to enable this feature in vectorized orc batch > reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608531 ] ASF GitHub Bot logged work on HIVE-24991: - Author: ASF GitHub Bot Created on: 08/Jun/21 15:36 Start Date: 08/Jun/21 15:36 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2264: URL: https://github.com/apache/hive/pull/2264#discussion_r647562562 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java ## @@ -2039,4 +2189,29 @@ private static IntegerColumnStatistics deserializeIntColumnStatistics(List Enable fetching deleted rows in vectorized mode > --- > > Key: HIVE-24991 > URL: https://issues.apache.org/jira/browse/HIVE-24991 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > HIVE-24855 enables loading deleted rows from ORC tables when table property > *acid.fetch.deleted.rows* is true. > The goal of this jira is to enable this feature in vectorized orc batch > reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608529 ] ASF GitHub Bot logged work on HIVE-24991: - Author: ASF GitHub Bot Created on: 08/Jun/21 15:35 Start Date: 08/Jun/21 15:35 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2264: URL: https://github.com/apache/hive/pull/2264#discussion_r647561929 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java ## @@ -959,6 +989,20 @@ public boolean next(NullWritable key, VectorizedRowBatch value) throws IOExcepti int ix = rbCtx.findVirtualColumnNum(VirtualColumn.ROWID); value.cols[ix] = recordIdColumnVector; } +if (rowIsDeletedProjected) { + if (fetchDeletedRows) { Review comment: I prefer your first suggestion because the second one requires passing `vectorizedRowBatchBase.size()` to the `set` method which I would like to avoid. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608529) Time Spent: 3h 40m (was: 3.5h) > Enable fetching deleted rows in vectorized mode > --- > > Key: HIVE-24991 > URL: https://issues.apache.org/jira/browse/HIVE-24991 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > HIVE-24855 enables loading deleted rows from ORC tables when table property > *acid.fetch.deleted.rows* is true. > The goal of this jira is to enable this feature in vectorized orc batch > reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608526=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608526 ] ASF GitHub Bot logged work on HIVE-24991: - Author: ASF GitHub Bot Created on: 08/Jun/21 15:32 Start Date: 08/Jun/21 15:32 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2264: URL: https://github.com/apache/hive/pull/2264#discussion_r647559407 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java ## @@ -959,6 +989,20 @@ public boolean next(NullWritable key, VectorizedRowBatch value) throws IOExcepti int ix = rbCtx.findVirtualColumnNum(VirtualColumn.ROWID); Review comment: see my previous comment for `VirtualColumn.ROWISDELETED` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608526) Time Spent: 3h 20m (was: 3h 10m) > Enable fetching deleted rows in vectorized mode > --- > > Key: HIVE-24991 > URL: https://issues.apache.org/jira/browse/HIVE-24991 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > HIVE-24855 enables loading deleted rows from ORC tables when table property > *acid.fetch.deleted.rows* is true. > The goal of this jira is to enable this feature in vectorized orc batch > reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608527 ] ASF GitHub Bot logged work on HIVE-24991: - Author: ASF GitHub Bot Created on: 08/Jun/21 15:32 Start Date: 08/Jun/21 15:32 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2264: URL: https://github.com/apache/hive/pull/2264#discussion_r647559557 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java ## @@ -983,7 +1027,7 @@ private void copyFromBase(VectorizedRowBatch value) { System.arraycopy(payloadStruct.fields, 0, value.cols, 0, value.getDataColumnCount()); } if (rowIdProjected) { - recordIdColumnVector.fields[0] = vectorizedRowBatchBase.cols[OrcRecordUpdater.ORIGINAL_WRITEID]; + recordIdColumnVector.fields[0] = vectorizedRowBatchBase.cols[fetchDeletedRows ? OrcRecordUpdater.CURRENT_WRITEID : OrcRecordUpdater.ORIGINAL_WRITEID]; Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608527) Time Spent: 3.5h (was: 3h 20m) > Enable fetching deleted rows in vectorized mode > --- > > Key: HIVE-24991 > URL: https://issues.apache.org/jira/browse/HIVE-24991 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > HIVE-24855 enables loading deleted rows from ORC tables when table property > *acid.fetch.deleted.rows* is true. > The goal of this jira is to enable this feature in vectorized orc batch > reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608522 ] ASF GitHub Bot logged work on HIVE-24991: - Author: ASF GitHub Bot Created on: 08/Jun/21 15:31 Start Date: 08/Jun/21 15:31 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2264: URL: https://github.com/apache/hive/pull/2264#discussion_r647558152 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java ## @@ -892,13 +913,20 @@ public boolean next(NullWritable key, VectorizedRowBatch value) throws IOExcepti } catch (Exception e) { throw new IOException("error iterating", e); } -if(!includeAcidColumns) { +if (!includeAcidColumns) { //if here, we don't need to filter anything wrt acid metadata columns //in fact, they are not even read from file/llap value.size = vectorizedRowBatchBase.size; value.selected = vectorizedRowBatchBase.selected; value.selectedInUse = vectorizedRowBatchBase.selectedInUse; copyFromBase(value); + + if (rowIsDeletedProjected) { +rowIsDeletedVector.clear(); +int ix = rbCtx.findVirtualColumnNum(VirtualColumn.ROWISDELETED); Review comment: I started to work on a solution to manage Virtual Column related information but it lead to a much bigger change. `VectorizedOrcAcidRowBatchReader` can behave several ways and each of those behavior worth a separate class after extracting common parts. So I decided to followed existing logic implemented for RowId. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608522) Time Spent: 3h (was: 2h 50m) > Enable fetching deleted rows in vectorized mode > --- > > Key: HIVE-24991 > URL: https://issues.apache.org/jira/browse/HIVE-24991 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h > Remaining Estimate: 0h > > HIVE-24855 enables loading deleted rows from ORC tables when table property > *acid.fetch.deleted.rows* is true. > The goal of this jira is to enable this feature in vectorized orc batch > reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24991) Enable fetching deleted rows in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-24991?focusedWorklogId=608524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608524 ] ASF GitHub Bot logged work on HIVE-24991: - Author: ASF GitHub Bot Created on: 08/Jun/21 15:31 Start Date: 08/Jun/21 15:31 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2264: URL: https://github.com/apache/hive/pull/2264#discussion_r647558355 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java ## @@ -932,8 +960,10 @@ public boolean next(NullWritable key, VectorizedRowBatch value) throws IOExcepti } // Case 2- find rows which have been deleted. +BitSet notDeletedBitSet = fetchDeletedRows ? (BitSet) selectedBitSet.clone() : selectedBitSet; Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608524) Time Spent: 3h 10m (was: 3h) > Enable fetching deleted rows in vectorized mode > --- > > Key: HIVE-24991 > URL: https://issues.apache.org/jira/browse/HIVE-24991 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > HIVE-24855 enables loading deleted rows from ORC tables when table property > *acid.fetch.deleted.rows* is true. > The goal of this jira is to enable this feature in vectorized orc batch > reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25185) Improve Logging On Polling Tez Session from Pool
[ https://issues.apache.org/jira/browse/HIVE-25185?focusedWorklogId=608500=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608500 ] ASF GitHub Bot logged work on HIVE-25185: - Author: ASF GitHub Bot Created on: 08/Jun/21 14:49 Start Date: 08/Jun/21 14:49 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #2339: URL: https://github.com/apache/hive/pull/2339#discussion_r647518347 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java ## @@ -131,13 +131,14 @@ SessionType getSession() throws Exception { poolLock.lock(); try { while ((result = pool.poll()) == null) { - notEmpty.await(100, TimeUnit.MILLISECONDS); + LOG.info("Awaiting Tez session to become available in session pool"); + notEmpty.await(10, TimeUnit.SECONDS); Review comment: Right, this is done as part of putSessionBack() method -- so I dont see an issue here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608500) Time Spent: 1h 20m (was: 1h 10m) > Improve Logging On Polling Tez Session from Pool > > > Key: HIVE-25185 > URL: https://issues.apache.org/jira/browse/HIVE-25185 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25185) Improve Logging On Polling Tez Session from Pool
[ https://issues.apache.org/jira/browse/HIVE-25185?focusedWorklogId=608499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608499 ] ASF GitHub Bot logged work on HIVE-25185: - Author: ASF GitHub Bot Created on: 08/Jun/21 14:45 Start Date: 08/Jun/21 14:45 Worklog Time Spent: 10m Work Description: belugabehr commented on a change in pull request #2339: URL: https://github.com/apache/hive/pull/2339#discussion_r647514577 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java ## @@ -131,13 +131,14 @@ SessionType getSession() throws Exception { poolLock.lock(); try { while ((result = pool.poll()) == null) { - notEmpty.await(100, TimeUnit.MILLISECONDS); + LOG.info("Awaiting Tez session to become available in session pool"); + notEmpty.await(10, TimeUnit.SECONDS); Review comment: @pgaref This is not just a loop-and-sleep. The `notEmpty` condition will be alerted when a session becomes available and this thread will run again to pickup the session. I don't really understand why there is a timeout currently implemented. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608499) Time Spent: 1h 10m (was: 1h) > Improve Logging On Polling Tez Session from Pool > > > Key: HIVE-25185 > URL: https://issues.apache.org/jira/browse/HIVE-25185 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25093) date_format() UDF is returning values in UTC time zone only
[ https://issues.apache.org/jira/browse/HIVE-25093?focusedWorklogId=608496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608496 ] ASF GitHub Bot logged work on HIVE-25093: - Author: ASF GitHub Bot Created on: 08/Jun/21 14:36 Start Date: 08/Jun/21 14:36 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #2252: URL: https://github.com/apache/hive/pull/2252#discussion_r647505427 ## File path: ql/src/java/org/apache/hadoop/hive/ql/util/DateTimeMath.java ## @@ -613,4 +615,18 @@ public static Calendar getProlepticGregorianCalendarUTC() { calendar.setGregorianChange(new java.util.Date(Long.MIN_VALUE)); return calendar; } + + /** + * TODO - this is a temporary fix for handling Julian calendar dates. + * Returns a Gregorian calendar that can be used from year 0+ instead of default 1582.10.15. + * This is desirable for some UDFs that work on dates which normally would use Julian calendar. + * @return the calendar + */ + public static Calendar getTimeZonedProlepticGregorianCalendar() { +GregorianCalendar calendar = new GregorianCalendar(TimeZone.getTimeZone( +SessionState.get() == null ? new HiveConf().getLocalTimeZone() : SessionState.get().getConf() Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608496) Time Spent: 0.5h (was: 20m) > date_format() UDF is returning values in UTC time zone only > > > Key: HIVE-25093 > URL: https://issues.apache.org/jira/browse/HIVE-25093 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 3.1.2 >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > *HIVE - 1.2* > sshuser@hn0-dateti:~$ *timedatectl* > Local time: Thu 2021-05-06 11:56:08 IST > Universal time: Thu 2021-05-06 06:26:08 UTC > RTC time: Thu 2021-05-06 06:26:08 >Time zone: Asia/Kolkata (IST, +0530) > Network time on: yes > NTP synchronized: yes > RTC in local TZ: no > sshuser@hn0-dateti:~$ beeline > 0: jdbc:hive2://localhost:10001/default> *select > date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");* > +--+--+ > | _c0 | > +--+--+ > | 2021-05-06 11:58:53.760 IST | > +--+--+ > 1 row selected (1.271 seconds) > *HIVE - 3.1.0* > sshuser@hn0-testja:~$ *timedatectl* > Local time: Thu 2021-05-06 12:03:32 IST > Universal time: Thu 2021-05-06 06:33:32 UTC > RTC time: Thu 2021-05-06 06:33:32 >Time zone: Asia/Kolkata (IST, +0530) > Network time on: yes > NTP synchronized: yes > RTC in local TZ: no > sshuser@hn0-testja:~$ beeline > 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select > date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");* > +--+ > | _c0 | > +--+ > | *2021-05-06 06:33:59.078 UTC* | > +--+ > 1 row selected (13.396 seconds) > 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *set > hive.local.time.zone=Asia/Kolkata;* > No rows affected (0.025 seconds) > 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select > date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");* > +--+ > | _c0 | > +--+ > | *{color:red}2021-05-06 12:08:15.118 UTC{color}* | > +--+ > 1 row selected (1.074 seconds) > expected result was *2021-05-06 12:08:15.118 IST* > As part of HIVE-12192 it was decided to have a common time zone for all > computation i.e. "UTC". Due to which data_format() function was hard coded to > "UTC". > But later in HIVE-21039 it was decided that user session time zone value > should be the default not UTC. > date_format() was not fixed as part of HIVE-21039. > what should be the ideal time zone value of date_format(). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25185) Improve Logging On Polling Tez Session from Pool
[ https://issues.apache.org/jira/browse/HIVE-25185?focusedWorklogId=608494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608494 ] ASF GitHub Bot logged work on HIVE-25185: - Author: ASF GitHub Bot Created on: 08/Jun/21 14:35 Start Date: 08/Jun/21 14:35 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #2339: URL: https://github.com/apache/hive/pull/2339#discussion_r647504478 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java ## @@ -131,13 +131,14 @@ SessionType getSession() throws Exception { poolLock.lock(); try { while ((result = pool.poll()) == null) { - notEmpty.await(100, TimeUnit.MILLISECONDS); + LOG.info("Awaiting Tez session to become available in session pool"); + notEmpty.await(10, TimeUnit.SECONDS); Review comment: Any chance this change can increase the time we actually wait to get a session from the pool assuming there is none currently available? It looks to me that if the next session becomes available in the next 10ms with the new change we might wait 10s instead -- am I missing something ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608494) Time Spent: 1h (was: 50m) > Improve Logging On Polling Tez Session from Pool > > > Key: HIVE-25185 > URL: https://issues.apache.org/jira/browse/HIVE-25185 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25194) Add support for STORED AS ORC/PARQUET/AVRO for Iceberg
[ https://issues.apache.org/jira/browse/HIVE-25194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359375#comment-17359375 ] László Pintér commented on HIVE-25194: -- Merged into master. Thanks, [~mbod] and [~pvary] for the review! > Add support for STORED AS ORC/PARQUET/AVRO for Iceberg > -- > > Key: HIVE-25194 > URL: https://issues.apache.org/jira/browse/HIVE-25194 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > > Currently we have to specify the fileformat in TBLPROPERTIES during Iceberg > create table statements. > The ideal syntax would be: > CREATE TABLE tbl STORED BY ICEBERG STORED AS ORC ... > One complication is that currently stored by and stored as are not permitted > within the same query, so that needs to be amended. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25194) Add support for STORED AS ORC/PARQUET/AVRO for Iceberg
[ https://issues.apache.org/jira/browse/HIVE-25194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér resolved HIVE-25194. -- Resolution: Fixed > Add support for STORED AS ORC/PARQUET/AVRO for Iceberg > -- > > Key: HIVE-25194 > URL: https://issues.apache.org/jira/browse/HIVE-25194 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > > Currently we have to specify the fileformat in TBLPROPERTIES during Iceberg > create table statements. > The ideal syntax would be: > CREATE TABLE tbl STORED BY ICEBERG STORED AS ORC ... > One complication is that currently stored by and stored as are not permitted > within the same query, so that needs to be amended. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25194) Add support for STORED AS ORC/PARQUET/AVRO for Iceberg
[ https://issues.apache.org/jira/browse/HIVE-25194?focusedWorklogId=608474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608474 ] ASF GitHub Bot logged work on HIVE-25194: - Author: ASF GitHub Bot Created on: 08/Jun/21 13:55 Start Date: 08/Jun/21 13:55 Worklog Time Spent: 10m Work Description: lcspinter merged pull request #2348: URL: https://github.com/apache/hive/pull/2348 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608474) Time Spent: 3h 10m (was: 3h) > Add support for STORED AS ORC/PARQUET/AVRO for Iceberg > -- > > Key: HIVE-25194 > URL: https://issues.apache.org/jira/browse/HIVE-25194 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > > Currently we have to specify the fileformat in TBLPROPERTIES during Iceberg > create table statements. > The ideal syntax would be: > CREATE TABLE tbl STORED BY ICEBERG STORED AS ORC ... > One complication is that currently stored by and stored as are not permitted > within the same query, so that needs to be amended. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24463) Add special case for Derby and MySQL in Get Next ID DbNotificationListener
[ https://issues.apache.org/jira/browse/HIVE-24463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor resolved HIVE-24463. --- Resolution: Won't Fix > Add special case for Derby and MySQL in Get Next ID DbNotificationListener > -- > > Key: HIVE-24463 > URL: https://issues.apache.org/jira/browse/HIVE-24463 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > * Derby does not support {{SELECT FOR UPDATE}} statements > * MySQL can be optimized to use {{LAST_INSERT_ID()}} > > Debry tables are locked in other parts of the code already, but not in this > path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24463) Add special case for Derby and MySQL in Get Next ID DbNotificationListener
[ https://issues.apache.org/jira/browse/HIVE-24463?focusedWorklogId=608471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608471 ] ASF GitHub Bot logged work on HIVE-24463: - Author: ASF GitHub Bot Created on: 08/Jun/21 13:54 Start Date: 08/Jun/21 13:54 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #1727: URL: https://github.com/apache/hive/pull/1727 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608471) Time Spent: 1h (was: 50m) > Add special case for Derby and MySQL in Get Next ID DbNotificationListener > -- > > Key: HIVE-24463 > URL: https://issues.apache.org/jira/browse/HIVE-24463 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > * Derby does not support {{SELECT FOR UPDATE}} statements > * MySQL can be optimized to use {{LAST_INSERT_ID()}} > > Debry tables are locked in other parts of the code already, but not in this > path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24875) Unify InetAddress.getLocalHost()
[ https://issues.apache.org/jira/browse/HIVE-24875?focusedWorklogId=608462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608462 ] ASF GitHub Bot logged work on HIVE-24875: - Author: ASF GitHub Bot Created on: 08/Jun/21 13:46 Start Date: 08/Jun/21 13:46 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #2314: URL: https://github.com/apache/hive/pull/2314#issuecomment-856782572 @kgyrtkirk Gentle reminder that I'm looking for a follow-up on your initial review. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608462) Time Spent: 1h (was: 50m) > Unify InetAddress.getLocalHost() > > > Key: HIVE-24875 > URL: https://issues.apache.org/jira/browse/HIVE-24875 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Priority: Minor > Labels: newbie, noob, pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Lots of calls in the Hive code to {{InetAddress.getLocalHost()}}. This > should be standardized onto hive-common {{ServerUtils.hostname()}}, which > includes removing (deprecating) a similar method in {{HiveStringUtils}}. > Open to anyone to improve. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25144) Add NoReconnect Annotation to CreateXXX Methods With AlreadyExistsException
[ https://issues.apache.org/jira/browse/HIVE-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359364#comment-17359364 ] David Mollitor commented on HIVE-25144: --- And here is the logging... {code:none} 2021-06-04 12:01:25,927 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-9-thread-3]: ugi=kudu/host@DOMAIN ip=xx.xx.xx.xx cmd=create_table: Table(tableName:test_table, dbName:test_database, owner:user, createTime:0, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:... tableType:MANAGED_TABLE, temporary:false, ownerType:USER) 2021-06-04 12:01:26,001 INFO org.apache.hadoop.hive.common.FileUtils: [pool-9-thread-3]: Creating directory if it doesn't exist: hdfs://ns1/user/hive/warehouse/test_database.db/test_table 2021-06-04 12:01:26,185 ERROR com.jolbox.bonecp.ConnectionHandle: [pool-9-thread-3]: Database access problem. Killing off this connection and all remaining connections in the connection pool. SQL State = 08S01 2021-06-04 12:01:26,294 INFO org.apache.hadoop.fs.TrashPolicyDefault: [pool-9-thread-3]: Moved: 'hdfs://ns1/user/hive/warehouse/test_database.db/test_table' to trash at: hdfs://ns1/user/.Trash/kudu/Current/user/hive/warehouse/test_database.db/test_table 2021-06-04 12:01:26,304 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-9-thread-3]: Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: javax.jdo.JDODataStoreException: Communications link failure The last packet successfully received from the server was 1,521,446 milliseconds ago. The last packet sent successfully to the server was 1,521,447 milliseconds ago. at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543) at org.datanucleus.api.jdo.JDOTransaction.commit(JDOTransaction.java:171) at org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:727) at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101) at com.sun.proxy.$Proxy26.commitTransaction(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1582) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1615) at sun.reflect.GeneratedMethodAccessor79.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) at com.sun.proxy.$Proxy28.create_table_with_environment_context(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:10993) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:10977) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:594) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:589) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:589) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) NestedThrowablesStackTrace: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet successfully received from the server was 1,521,446 milliseconds ago. The last packet sent successfully to the server was 1,521,447 milliseconds ago. at sun.reflect.GeneratedConstructorAccessor84.newInstance(Unknown Source) at
[jira] [Work logged] (HIVE-25211) Create database throws NPE
[ https://issues.apache.org/jira/browse/HIVE-25211?focusedWorklogId=608446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608446 ] ASF GitHub Bot logged work on HIVE-25211: - Author: ASF GitHub Bot Created on: 08/Jun/21 12:59 Start Date: 08/Jun/21 12:59 Worklog Time Spent: 10m Work Description: yongzhi merged pull request #2362: URL: https://github.com/apache/hive/pull/2362 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608446) Time Spent: 20m (was: 10m) > Create database throws NPE > -- > > Key: HIVE-25211 > URL: https://issues.apache.org/jira/browse/HIVE-25211 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Affects Versions: 4.0.0 >Reporter: Yongzhi Chen >Assignee: Yongzhi Chen >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > <11>1 2021-06-06T17:32:48.964Z > metastore-0.metastore-service.warehouse-1622998329-9klr.svc.cluster.local > metastore 1 5ad83e8e-bf89-4ad3-b1fb-51c73c7133b7 [mdc@18060 > class="metastore.RetryingHMSHandler" level="ERROR" thread="pool-9-thread-16"] > MetaException(message:java.lang.NullPointerException) > > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:8115) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:1629) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:160) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:121) > at com.sun.proxy.$Proxy31.create_database(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_database.getResult(ThriftHiveMetastore.java:16795) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_database.getResult(ThriftHiveMetastore.java:16779) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:643) > at > org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:638) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) > at > org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:638) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:120) > at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:128) > at > org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:491) > at > org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:480) > at > org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:476) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$9.run(HiveMetaStore.java:1556) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$9.run(HiveMetaStore.java:1554) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database_core(HiveMetaStore.java:1554) >
[jira] [Updated] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones
[ https://issues.apache.org/jira/browse/HIVE-25104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-25104: --- Description: HIVE-12192, HIVE-20007 changed the way that timestamp computations are performed and to some extend how timestamps are serialized and deserialized in files (Parquet, Avro). In versions that include HIVE-12192 or HIVE-20007 the serialization in Parquet files is not backwards compatible. In other words writing timestamps with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them with another (not including the previous issues) may lead to different results depending on the default timezone of the system. Consider the following scenario where the default system timezone is set to US/Pacific. At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3 {code:sql} CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET LOCATION '/tmp/hiveexttbl/employee'; INSERT INTO employee VALUES (1, '1880-01-01 00:00:00'); INSERT INTO employee VALUES (2, '1884-01-01 00:00:00'); INSERT INTO employee VALUES (3, '1990-01-01 00:00:00'); SELECT * FROM employee; {code} |1|1880-01-01 00:00:00| |2|1884-01-01 00:00:00| |3|1990-01-01 00:00:00| At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356 {code:sql} CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET LOCATION '/tmp/hiveexttbl/employee'; SELECT * FROM employee; {code} |1|1879-12-31 23:52:58| |2|1884-01-01 00:00:00| |3|1990-01-01 00:00:00| The timestamp for {{eid=1}} in branch-2.3 is different from the one in master. was: HIVE-12192, HIVE-20007 changed the way that timestamp computations are performed and to some extend how timestamps are serialized and deserialized in files (Parquet, Avro, Orc). In versions that include HIVE-12192 or HIVE-20007 the serialization in Parquet files is not backwards compatible. In other words writing timestamps with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them with another (not including the previous issues) may lead to different results depending on the default timezone of the system. Consider the following scenario where the default system timezone is set to US/Pacific. At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3 {code:sql} CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET LOCATION '/tmp/hiveexttbl/employee'; INSERT INTO employee VALUES (1, '1880-01-01 00:00:00'); INSERT INTO employee VALUES (2, '1884-01-01 00:00:00'); INSERT INTO employee VALUES (3, '1990-01-01 00:00:00'); SELECT * FROM employee; {code} |1|1880-01-01 00:00:00| |2|1884-01-01 00:00:00| |3|1990-01-01 00:00:00| At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356 {code:sql} CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET LOCATION '/tmp/hiveexttbl/employee'; SELECT * FROM employee; {code} |1|1879-12-31 23:52:58| |2|1884-01-01 00:00:00| |3|1990-01-01 00:00:00| The timestamp for {{eid=1}} in branch-2.3 is different from the one in master. > Backward incompatible timestamp serialization in Parquet for certain timezones > -- > > Key: HIVE-25104 > URL: https://issues.apache.org/jira/browse/HIVE-25104 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 3.1.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > HIVE-12192, HIVE-20007 changed the way that timestamp computations are > performed and to some extend how timestamps are serialized and deserialized > in files (Parquet, Avro). > In versions that include HIVE-12192 or HIVE-20007 the serialization in > Parquet files is not backwards compatible. In other words writing timestamps > with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them > with another (not including the previous issues) may lead to different > results depending on the default timezone of the system. > Consider the following scenario where the default system timezone is set to > US/Pacific. > At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3 > {code:sql} > CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET > LOCATION '/tmp/hiveexttbl/employee'; > INSERT INTO employee VALUES (1, '1880-01-01 00:00:00'); > INSERT INTO employee VALUES (2, '1884-01-01 00:00:00'); > INSERT INTO employee VALUES (3, '1990-01-01 00:00:00'); > SELECT * FROM employee; > {code} > |1|1880-01-01 00:00:00| > |2|1884-01-01 00:00:00| > |3|1990-01-01 00:00:00| > At apache/branch-2.3 commit
[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608414 ] ASF GitHub Bot logged work on HIVE-25200: - Author: ASF GitHub Bot Created on: 08/Jun/21 11:43 Start Date: 08/Jun/21 11:43 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2351: URL: https://github.com/apache/hive/pull/2351#discussion_r647358883 ## File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java ## @@ -310,6 +335,24 @@ public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTab } } + private void setupAlterOperationType(EnvironmentContext context) throws MetaException { +if (context != null) { + Map contextProperties = context.getProperties(); + if (contextProperties != null) { +String stringOpType = contextProperties.get(ALTER_TABLE_OPERATION_TYPE); +if (stringOpType != null) { + currentAlterTableOp = AlterTableType.valueOf(stringOpType); + if (SUPPORTED_ALTER_OPS.stream().noneMatch(op -> op.equals(currentAlterTableOp))) { +throw new MetaException( +"Unsupported ALTER TABLE operation type for Iceberg tables, must be: " + allowedAlterTypes.toString()); + } +} +return; Review comment: Maybe a short comment explaining what you just said would be useful for future maintainers -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608414) Time Spent: 5h 20m (was: 5h 10m) > Alter table add columns support for Iceberg tables > -- > > Key: HIVE-25200 > URL: https://issues.apache.org/jira/browse/HIVE-25200 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 5h 20m > Remaining Estimate: 0h > > Since Iceberg counts as being a non-native Hive table, addColumn operation > needs to be implemented by the help of Hive meta hooks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608413=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608413 ] ASF GitHub Bot logged work on HIVE-25200: - Author: ASF GitHub Bot Created on: 08/Jun/21 11:42 Start Date: 08/Jun/21 11:42 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2351: URL: https://github.com/apache/hive/pull/2351#discussion_r647358340 ## File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java ## @@ -310,6 +337,24 @@ public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTab } } + private void setupAlterOperationType(EnvironmentContext context) throws MetaException { +if (context != null) { + Map contextProperties = context.getProperties(); + if (contextProperties != null) { +String stringOpType = contextProperties.get(ALTER_TABLE_OPERATION_TYPE); +if (stringOpType != null) { + currentAlterTableOp = AlterTableType.valueOf(stringOpType); + if (SUPPORTED_ALTER_OPS.stream().noneMatch(op -> op.equals(currentAlterTableOp))) { +throw new MetaException( +"Unsupported ALTER TABLE operation type for Iceberg tables, must be: " + allowedAlterTypes.toString()); + } +} +return; + } +} +throw new MetaException("ALTER TABLE operation type could not be determined."); Review comment: Can we maybe get rid of the return by putting this exception to the beginning of the method? e.g. ``` if (context == null || context.getProperties() == null) { throw new ... } ``` The other thing I'm thinking of is that it'd be informative to include the hmsTable name in the exception message as well (for this and the above too). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608413) Time Spent: 5h 10m (was: 5h) > Alter table add columns support for Iceberg tables > -- > > Key: HIVE-25200 > URL: https://issues.apache.org/jira/browse/HIVE-25200 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 5h 10m > Remaining Estimate: 0h > > Since Iceberg counts as being a non-native Hive table, addColumn operation > needs to be implemented by the help of Hive meta hooks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25194) Add support for STORED AS ORC/PARQUET/AVRO for Iceberg
[ https://issues.apache.org/jira/browse/HIVE-25194?focusedWorklogId=608411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608411 ] ASF GitHub Bot logged work on HIVE-25194: - Author: ASF GitHub Bot Created on: 08/Jun/21 11:39 Start Date: 08/Jun/21 11:39 Worklog Time Spent: 10m Work Description: lcspinter commented on a change in pull request #2348: URL: https://github.com/apache/hive/pull/2348#discussion_r647356190 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -13477,15 +13477,28 @@ ASTNode analyzeCreateTable( } } -if (partitionTransformSpecExists) { - try { -HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(conf, storageFormat.getStorageHandler()); -if (!storageHandler.supportsPartitionTransform()) { - throw new SemanticException("Partition transform is not supported for " + - storageHandler.getClass().getName()); +HiveStorageHandler handler; +try { + handler = HiveUtils.getStorageHandler(conf, storageFormat.getStorageHandler()); Review comment: Yes, the storage handler can be null in the case of native tables, but this is handled inside of `HiveUtils.getStorageHandler()` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608411) Time Spent: 3h (was: 2h 50m) > Add support for STORED AS ORC/PARQUET/AVRO for Iceberg > -- > > Key: HIVE-25194 > URL: https://issues.apache.org/jira/browse/HIVE-25194 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > Currently we have to specify the fileformat in TBLPROPERTIES during Iceberg > create table statements. > The ideal syntax would be: > CREATE TABLE tbl STORED BY ICEBERG STORED AS ORC ... > One complication is that currently stored by and stored as are not permitted > within the same query, so that needs to be amended. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608410=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608410 ] ASF GitHub Bot logged work on HIVE-25200: - Author: ASF GitHub Bot Created on: 08/Jun/21 11:39 Start Date: 08/Jun/21 11:39 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2351: URL: https://github.com/apache/hive/pull/2351#discussion_r647356109 ## File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java ## @@ -310,6 +335,24 @@ public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTab } } + private void setupAlterOperationType(EnvironmentContext context) throws MetaException { +if (context != null) { + Map contextProperties = context.getProperties(); + if (contextProperties != null) { +String stringOpType = contextProperties.get(ALTER_TABLE_OPERATION_TYPE); +if (stringOpType != null) { + currentAlterTableOp = AlterTableType.valueOf(stringOpType); + if (SUPPORTED_ALTER_OPS.stream().noneMatch(op -> op.equals(currentAlterTableOp))) { +throw new MetaException( +"Unsupported ALTER TABLE operation type for Iceberg tables, must be: " + allowedAlterTypes.toString()); + } +} +return; Review comment: I see. Maybe worth a comment then. Thanks for the explanation! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608410) Time Spent: 5h (was: 4h 50m) > Alter table add columns support for Iceberg tables > -- > > Key: HIVE-25200 > URL: https://issues.apache.org/jira/browse/HIVE-25200 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > Since Iceberg counts as being a non-native Hive table, addColumn operation > needs to be implemented by the help of Hive meta hooks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608404=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608404 ] ASF GitHub Bot logged work on HIVE-25200: - Author: ASF GitHub Bot Created on: 08/Jun/21 11:28 Start Date: 08/Jun/21 11:28 Worklog Time Spent: 10m Work Description: szlta commented on a change in pull request #2351: URL: https://github.com/apache/hive/pull/2351#discussion_r647348841 ## File path: iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java ## @@ -134,6 +137,23 @@ public static Type convert(TypeInfo typeInfo) { return HiveSchemaConverter.convert(typeInfo, false); } + /** + * Produces the difference of two FieldSchema lists by only taking into account the field name and type. + * @param subtrahendCollection List of fields to subtract from Review comment: Woops, yeah.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608404) Time Spent: 4h 50m (was: 4h 40m) > Alter table add columns support for Iceberg tables > -- > > Key: HIVE-25200 > URL: https://issues.apache.org/jira/browse/HIVE-25200 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 4h 50m > Remaining Estimate: 0h > > Since Iceberg counts as being a non-native Hive table, addColumn operation > needs to be implemented by the help of Hive meta hooks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608402=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608402 ] ASF GitHub Bot logged work on HIVE-25200: - Author: ASF GitHub Bot Created on: 08/Jun/21 11:24 Start Date: 08/Jun/21 11:24 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2351: URL: https://github.com/apache/hive/pull/2351#discussion_r647346665 ## File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java ## @@ -84,6 +88,8 @@ // Initially we'd like to cache the partition spec in HMS, but not push it down later to Iceberg during alter // table commands since by then the HMS info can be stale + Iceberg does not store its partition spec in the props InputFormatConfig.PARTITION_SPEC); + private static final Set> SUPPORTED_ALTER_OPS = ImmutableSet.of( Review comment: Maybe we should use EnumSet here instead? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608402) Time Spent: 4h 40m (was: 4.5h) > Alter table add columns support for Iceberg tables > -- > > Key: HIVE-25200 > URL: https://issues.apache.org/jira/browse/HIVE-25200 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 4h 40m > Remaining Estimate: 0h > > Since Iceberg counts as being a non-native Hive table, addColumn operation > needs to be implemented by the help of Hive meta hooks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608401=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608401 ] ASF GitHub Bot logged work on HIVE-25200: - Author: ASF GitHub Bot Created on: 08/Jun/21 11:22 Start Date: 08/Jun/21 11:22 Worklog Time Spent: 10m Work Description: szlta commented on a change in pull request #2351: URL: https://github.com/apache/hive/pull/2351#discussion_r647344754 ## File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java ## @@ -310,6 +335,24 @@ public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTab } } + private void setupAlterOperationType(EnvironmentContext context) throws MetaException { +if (context != null) { + Map contextProperties = context.getProperties(); + if (contextProperties != null) { +String stringOpType = contextProperties.get(ALTER_TABLE_OPERATION_TYPE); +if (stringOpType != null) { + currentAlterTableOp = AlterTableType.valueOf(stringOpType); + if (SUPPORTED_ALTER_OPS.stream().noneMatch(op -> op.equals(currentAlterTableOp))) { +throw new MetaException( +"Unsupported ALTER TABLE operation type for Iceberg tables, must be: " + allowedAlterTypes.toString()); + } +} +return; Review comment: Yeah I found that it is valid as tests started to fail after the recent refactor :D E.g. for analyze+compute_stats query there's an alter table invocation, where there's no operation type among the context properties. Our hook should not fail for such cases, but rather act as no-op. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608401) Time Spent: 4.5h (was: 4h 20m) > Alter table add columns support for Iceberg tables > -- > > Key: HIVE-25200 > URL: https://issues.apache.org/jira/browse/HIVE-25200 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > Since Iceberg counts as being a non-native Hive table, addColumn operation > needs to be implemented by the help of Hive meta hooks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608400=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608400 ] ASF GitHub Bot logged work on HIVE-25200: - Author: ASF GitHub Bot Created on: 08/Jun/21 11:21 Start Date: 08/Jun/21 11:21 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2351: URL: https://github.com/apache/hive/pull/2351#discussion_r647344554 ## File path: iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java ## @@ -134,6 +137,23 @@ public static Type convert(TypeInfo typeInfo) { return HiveSchemaConverter.convert(typeInfo, false); } + /** + * Produces the difference of two FieldSchema lists by only taking into account the field name and type. + * @param subtrahendCollection List of fields to subtract from Review comment: I think minuend and subtrahend are the other way around in this case, no? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608400) Time Spent: 4h 20m (was: 4h 10m) > Alter table add columns support for Iceberg tables > -- > > Key: HIVE-25200 > URL: https://issues.apache.org/jira/browse/HIVE-25200 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > > Since Iceberg counts as being a non-native Hive table, addColumn operation > needs to be implemented by the help of Hive meta hooks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608399=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608399 ] ASF GitHub Bot logged work on HIVE-25200: - Author: ASF GitHub Bot Created on: 08/Jun/21 11:21 Start Date: 08/Jun/21 11:21 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2351: URL: https://github.com/apache/hive/pull/2351#discussion_r647344554 ## File path: iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java ## @@ -134,6 +137,23 @@ public static Type convert(TypeInfo typeInfo) { return HiveSchemaConverter.convert(typeInfo, false); } + /** + * Produces the difference of two FieldSchema lists by only taking into account the field name and type. + * @param subtrahendCollection List of fields to subtract from Review comment: I think minuend and subtrahend are the other way around in this case -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608399) Time Spent: 4h 10m (was: 4h) > Alter table add columns support for Iceberg tables > -- > > Key: HIVE-25200 > URL: https://issues.apache.org/jira/browse/HIVE-25200 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 4h 10m > Remaining Estimate: 0h > > Since Iceberg counts as being a non-native Hive table, addColumn operation > needs to be implemented by the help of Hive meta hooks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608395 ] ASF GitHub Bot logged work on HIVE-25200: - Author: ASF GitHub Bot Created on: 08/Jun/21 11:05 Start Date: 08/Jun/21 11:05 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2351: URL: https://github.com/apache/hive/pull/2351#discussion_r647334605 ## File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java ## @@ -310,6 +335,24 @@ public void rollbackAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTab } } + private void setupAlterOperationType(EnvironmentContext context) throws MetaException { +if (context != null) { + Map contextProperties = context.getProperties(); + if (contextProperties != null) { +String stringOpType = contextProperties.get(ALTER_TABLE_OPERATION_TYPE); +if (stringOpType != null) { + currentAlterTableOp = AlterTableType.valueOf(stringOpType); + if (SUPPORTED_ALTER_OPS.stream().noneMatch(op -> op.equals(currentAlterTableOp))) { +throw new MetaException( +"Unsupported ALTER TABLE operation type for Iceberg tables, must be: " + allowedAlterTypes.toString()); + } +} +return; Review comment: Why is this return here? Is it valid operation where `stringOpType` == null? What is the operation at that time? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608395) Time Spent: 4h (was: 3h 50m) > Alter table add columns support for Iceberg tables > -- > > Key: HIVE-25200 > URL: https://issues.apache.org/jira/browse/HIVE-25200 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > Since Iceberg counts as being a non-native Hive table, addColumn operation > needs to be implemented by the help of Hive meta hooks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608394=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608394 ] ASF GitHub Bot logged work on HIVE-25200: - Author: ASF GitHub Bot Created on: 08/Jun/21 11:02 Start Date: 08/Jun/21 11:02 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2351: URL: https://github.com/apache/hive/pull/2351#discussion_r647332812 ## File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java ## @@ -248,6 +256,20 @@ public void preAlterTable(org.apache.hadoop.hive.metastore.api.Table hmsTable, E Collections.emptyMap())); updateHmsTableProperties(hmsTable); } +if (AlterTableType.ADDCOLS.equals(currentAlterTableOp)) { Review comment: nit: newline after block close -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608394) Time Spent: 3h 50m (was: 3h 40m) > Alter table add columns support for Iceberg tables > -- > > Key: HIVE-25200 > URL: https://issues.apache.org/jira/browse/HIVE-25200 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > > Since Iceberg counts as being a non-native Hive table, addColumn operation > needs to be implemented by the help of Hive meta hooks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25193) Vectorized Query Execution: ClassCastException when use nvl() function which default_value is decimal type
[ https://issues.apache.org/jira/browse/HIVE-25193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qiang.bi updated HIVE-25193: Description: Problem statement: {code:java} set hive.vectorized.execution.enabled = true; select nvl(get_json_object(attr_json,'$.correctedPrice'),0.88) corrected_price from dw_mdm_sync_asset; {code} The error log: {code:java} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.BytesColumnVectorCaused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:504) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorCoalesce.evaluate(VectorCoalesce.java:124) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:271) at org.apache.hadoop.hive.ql.exec.vector.expressions.CastStringToDouble.evaluate(CastStringToDouble.java:83) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) ... 28 more{code} The problem HiveQL: {code:java} nvl(get_json_object(attr_json,'$.correctedPrice'),0.88) corrected_price {code} The problem expression: {code:java} CastStringToDouble(col 39:string)(children: VectorCoalesce(columns [37, 38])(children: VectorUDFAdaptor(get_json_object(_col14, '$.correctedPrice')) -> 37:string, ConstantVectorExpression(val 0.88) -> 38:decimal(2,2)) -> 39:string) -> 40:double {code} The problem code: {code:java} public class VectorCoalesce extends VectorExpression { ... @Override public void evaluate(VectorizedRowBatch batch) throws HiveException {if (childExpressions != null) { super.evaluateChildren(batch); }int[] sel = batch.selected; int n = batch.size; ColumnVector outputColVector = batch.cols[outputColumnNum]; boolean[] outputIsNull = outputColVector.isNull; if (n <= 0) { // Nothing to do return; }if (unassignedBatchIndices == null || n > unassignedBatchIndices.length) { // (Re)allocate larger to be a multiple of 1024 (DEFAULT_SIZE). final int roundUpSize = ((n + VectorizedRowBatch.DEFAULT_SIZE - 1) / VectorizedRowBatch.DEFAULT_SIZE) * VectorizedRowBatch.DEFAULT_SIZE; unassignedBatchIndices = new int[roundUpSize]; }// We do not need to do a column reset since we are carefully changing the output. outputColVector.isRepeating = false;// CONSIDER: Should be do this for all vector expressions that can // work on BytesColumnVector output columns??? outputColVector.init(); final int columnCount = inputColumns.length;/* * Process the input columns to find a non-NULL value for each row. * * We track the unassigned batchIndex of the rows that have not received * a non-NULL value yet. Similar to a selected array. */ boolean isAllUnassigned = true; int unassignedColumnCount = 0; for (int k = 0; k < inputColumns.length; k++) { ColumnVector cv = batch.cols[inputColumns[k]]; if (cv.isRepeating) {if (cv.noNulls || !cv.isNull[0]) { /* * With a repeating value we can finish all remaining rows. */ if (isAllUnassigned) {// No other columns provided non-NULL values. We can return repeated output. outputIsNull[0] = false; outputColVector.setElement(0, 0, cv); outputColVector.isRepeating = true; return; } else {// Some rows have already been assigned values. Assign the remaining. // We cannot use copySelected method here. for (int i = 0; i < unassignedColumnCount; i++) { final int batchIndex = unassignedBatchIndices[i]; outputIsNull[batchIndex] = false; // Our input is repeating (i.e. inputColNumber = 0). outputColVector.setElement(batchIndex, 0, cv); } return; } } else { // Repeated NULLs -- skip this input column. } } else {/* * Non-repeating input column. Use any non-NULL values for unassigned rows. */ if (isAllUnassigned) { /* * No other columns provided non-NULL values. We *may* be able to finish all rows * with this input column... */ if (cv.noNulls){// Since no NULLs, we can provide values for all rows. if (batch.selectedInUse) { for (int i = 0; i < n; i++) { final int batchIndex = sel[i]; outputIsNull[batchIndex] = false;
[jira] [Work logged] (HIVE-25200) Alter table add columns support for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25200?focusedWorklogId=608365=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608365 ] ASF GitHub Bot logged work on HIVE-25200: - Author: ASF GitHub Bot Created on: 08/Jun/21 09:46 Start Date: 08/Jun/21 09:46 Worklog Time Spent: 10m Work Description: szlta commented on pull request #2351: URL: https://github.com/apache/hive/pull/2351#issuecomment-856626991 > @szlta: quick question: Would it be possible to create a test where we concurrently try to modify the schema through Hive and change the schema through the Iceberg Java API? yep, added -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608365) Time Spent: 3h 40m (was: 3.5h) > Alter table add columns support for Iceberg tables > -- > > Key: HIVE-25200 > URL: https://issues.apache.org/jira/browse/HIVE-25200 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > Since Iceberg counts as being a non-native Hive table, addColumn operation > needs to be implemented by the help of Hive meta hooks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24458) Allow access to SArgs without converting to disjunctive normal form
[ https://issues.apache.org/jira/browse/HIVE-24458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated HIVE-24458: -- Fix Version/s: storage-2.7.3 > Allow access to SArgs without converting to disjunctive normal form > --- > > Key: HIVE-24458 > URL: https://issues.apache.org/jira/browse/HIVE-24458 > Project: Hive > Issue Type: Improvement >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, storage-2.7.3 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > For some use cases, it is useful to have access to the SArg expression in a > non-normalized form. Currently, the SArg only provides the fully normalized > expression. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25217) Move isEligibleForCompaction evaluation under the Initiator thread pool
[ https://issues.apache.org/jira/browse/HIVE-25217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25217: -- Labels: pull-request-available (was: ) > Move isEligibleForCompaction evaluation under the Initiator thread pool > --- > > Key: HIVE-25217 > URL: https://issues.apache.org/jira/browse/HIVE-25217 > Project: Hive > Issue Type: Bug >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Checking for eligibility >1 mil of distinct table / partition combinations > can take a while by the Initiator since all steps are performed in the main > thread. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25217) Move isEligibleForCompaction evaluation under the Initiator thread pool
[ https://issues.apache.org/jira/browse/HIVE-25217?focusedWorklogId=608353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608353 ] ASF GitHub Bot logged work on HIVE-25217: - Author: ASF GitHub Bot Created on: 08/Jun/21 08:49 Start Date: 08/Jun/21 08:49 Worklog Time Spent: 10m Work Description: deniskuzZ opened a new pull request #2367: URL: https://github.com/apache/hive/pull/2367 …or thread pool ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608353) Remaining Estimate: 0h Time Spent: 10m > Move isEligibleForCompaction evaluation under the Initiator thread pool > --- > > Key: HIVE-25217 > URL: https://issues.apache.org/jira/browse/HIVE-25217 > Project: Hive > Issue Type: Bug >Reporter: Denys Kuzmenko >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Checking for eligibility >1 mil of distinct table / partition combinations > can take a while by the Initiator since all steps are performed in the main > thread. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-16220) Memory leak when creating a table using location and NameNode in HA
[ https://issues.apache.org/jira/browse/HIVE-16220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359157#comment-17359157 ] Ivan Podhornyi commented on HIVE-16220: --- [~james601232] got the same issue, and after week of research found few solution: Remove .cache() DataFrame from your code, because it will create a SessionState which is full copy of Session. If not caching DataFrame is show stopper for you - [here is a Scala method|https://github.com/apache/spark/blob/1d550c4e90275ab418b9161925049239227f3dc9/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L189], where I guess possible to create a clone of SparkSession for each batch processing and then close it. Need just to check overhead. > Memory leak when creating a table using location and NameNode in HA > --- > > Key: HIVE-16220 > URL: https://issues.apache.org/jira/browse/HIVE-16220 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.2.1, 2.3.4, 3.0.0 > Environment: HDP-2.4.0.0 > HDP-3.1.0.0 >Reporter: Angel Alvarez Pascua >Priority: Major > > The following simple DDL > CREATE TABLE `test`(`field` varchar(1)) LOCATION > 'hdfs://benderHA/apps/hive/warehouse/test' > ends up generating a huge memory leak in the HiveServer2 service. > After two weeks without a restart, the service stops suddenly because of > OutOfMemory errors. > This only happens when we're in an environment in which the NameNode is in > HA, otherwise, nothing (so weird) happens. If the location clause is not > present, everything is also fine. > It seems, multiples instances of Hadoop configuration are created when we're > in an HA environment: > > 2.618 instances of "org.apache.hadoop.conf.Configuration", loaded by > "sun.misc.Launcher$AppClassLoader @ 0x4d260de88" > occupy 350.263.816 (81,66%) bytes. These instances are referenced from one > instance of "java.util.HashMap$Node[]", > loaded by "" > > 5.216 instances of "org.apache.hadoop.conf.Configuration", loaded by > "sun.misc.Launcher$AppClassLoader @ 0x4d260de88" > occupy 699.901.416 (87,32%) bytes. These instances are referenced from one > instance of "java.util.HashMap$Node[]", > loaded by "" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25201) Remove Caffein shading from Iceberg
[ https://issues.apache.org/jira/browse/HIVE-25201?focusedWorklogId=608338=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608338 ] ASF GitHub Bot logged work on HIVE-25201: - Author: ASF GitHub Bot Created on: 08/Jun/21 08:10 Start Date: 08/Jun/21 08:10 Worklog Time Spent: 10m Work Description: pvary merged pull request #2352: URL: https://github.com/apache/hive/pull/2352 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608338) Time Spent: 40m (was: 0.5h) > Remove Caffein shading from Iceberg > --- > > Key: HIVE-25201 > URL: https://issues.apache.org/jira/browse/HIVE-25201 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Since Iceberg moved to the same version as we use (Upgrade Caffeine version > [#2671|https://github.com/apache/iceberg/pull/2671]), we can get rid of the > Caffein shading. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25201) Remove Caffein shading from Iceberg
[ https://issues.apache.org/jira/browse/HIVE-25201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary resolved HIVE-25201. --- Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master. Thanks for the review [~mbod] and [~lpinter] > Remove Caffein shading from Iceberg > --- > > Key: HIVE-25201 > URL: https://issues.apache.org/jira/browse/HIVE-25201 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Since Iceberg moved to the same version as we use (Upgrade Caffeine version > [#2671|https://github.com/apache/iceberg/pull/2671]), we can get rid of the > Caffein shading. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25194) Add support for STORED AS ORC/PARQUET/AVRO for Iceberg
[ https://issues.apache.org/jira/browse/HIVE-25194?focusedWorklogId=608336=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608336 ] ASF GitHub Bot logged work on HIVE-25194: - Author: ASF GitHub Bot Created on: 08/Jun/21 07:56 Start Date: 08/Jun/21 07:56 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2348: URL: https://github.com/apache/hive/pull/2348#discussion_r647203906 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -13477,15 +13477,28 @@ ASTNode analyzeCreateTable( } } -if (partitionTransformSpecExists) { - try { -HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(conf, storageFormat.getStorageHandler()); -if (!storageHandler.supportsPartitionTransform()) { - throw new SemanticException("Partition transform is not supported for " + - storageHandler.getClass().getName()); +HiveStorageHandler handler; +try { + handler = HiveUtils.getStorageHandler(conf, storageFormat.getStorageHandler()); Review comment: Do we have an exception for native tables? In my experience sometimes the StorageHandler is `null`, but there might be some other issues here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608336) Time Spent: 2h 50m (was: 2h 40m) > Add support for STORED AS ORC/PARQUET/AVRO for Iceberg > -- > > Key: HIVE-25194 > URL: https://issues.apache.org/jira/browse/HIVE-25194 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently we have to specify the fileformat in TBLPROPERTIES during Iceberg > create table statements. > The ideal syntax would be: > CREATE TABLE tbl STORED BY ICEBERG STORED AS ORC ... > One complication is that currently stored by and stored as are not permitted > within the same query, so that needs to be amended. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25194) Add support for STORED AS ORC/PARQUET/AVRO for Iceberg
[ https://issues.apache.org/jira/browse/HIVE-25194?focusedWorklogId=608335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608335 ] ASF GitHub Bot logged work on HIVE-25194: - Author: ASF GitHub Bot Created on: 08/Jun/21 07:55 Start Date: 08/Jun/21 07:55 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2348: URL: https://github.com/apache/hive/pull/2348#discussion_r647203906 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -13477,15 +13477,28 @@ ASTNode analyzeCreateTable( } } -if (partitionTransformSpecExists) { - try { -HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(conf, storageFormat.getStorageHandler()); -if (!storageHandler.supportsPartitionTransform()) { - throw new SemanticException("Partition transform is not supported for " + - storageHandler.getClass().getName()); +HiveStorageHandler handler; +try { + handler = HiveUtils.getStorageHandler(conf, storageFormat.getStorageHandler()); Review comment: Could it be that the `storageFormat.getStorageHandler()` is null? Like for native tables? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608335) Time Spent: 2h 40m (was: 2.5h) > Add support for STORED AS ORC/PARQUET/AVRO for Iceberg > -- > > Key: HIVE-25194 > URL: https://issues.apache.org/jira/browse/HIVE-25194 > Project: Hive > Issue Type: New Feature >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently we have to specify the fileformat in TBLPROPERTIES during Iceberg > create table statements. > The ideal syntax would be: > CREATE TABLE tbl STORED BY ICEBERG STORED AS ORC ... > One complication is that currently stored by and stored as are not permitted > within the same query, so that needs to be amended. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-25216) Vectorized reading of ORC tables via Iceberg
[ https://issues.apache.org/jira/browse/HIVE-25216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25216 started by Ádám Szita. - > Vectorized reading of ORC tables via Iceberg > > > Key: HIVE-25216 > URL: https://issues.apache.org/jira/browse/HIVE-25216 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > > As [https://github.com/apache/iceberg/pull/2613] is resolved, we should port > it to Hive codebase, to enable vectorized ORC reads on Iceberg-backed tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25216) Vectorized reading of ORC tables via Iceberg
[ https://issues.apache.org/jira/browse/HIVE-25216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ádám Szita reassigned HIVE-25216: - > Vectorized reading of ORC tables via Iceberg > > > Key: HIVE-25216 > URL: https://issues.apache.org/jira/browse/HIVE-25216 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > > As [https://github.com/apache/iceberg/pull/2613] is resolved, we should port > it to Hive codebase, to enable vectorized ORC reads on Iceberg-backed tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.
[ https://issues.apache.org/jira/browse/HIVE-25154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha resolved HIVE-25154. - Resolution: Fixed Committed to master. Thanks for the patch, [~haymant] !!! > Disable StatsUpdaterThread and PartitionManagementTask for db that is being > failoved over. > -- > > Key: HIVE-25154 > URL: https://issues.apache.org/jira/browse/HIVE-25154 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25154.patch > > Time Spent: 5h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.
[ https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=608310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608310 ] ASF GitHub Bot logged work on HIVE-25154: - Author: ASF GitHub Bot Created on: 08/Jun/21 06:55 Start Date: 08/Jun/21 06:55 Worklog Time Spent: 10m Work Description: pkumarsinha commented on pull request #2311: URL: https://github.com/apache/hive/pull/2311#issuecomment-856505462 Committed to master. Thanks for the patch, @hmangla98 !! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608310) Time Spent: 5h (was: 4h 50m) > Disable StatsUpdaterThread and PartitionManagementTask for db that is being > failoved over. > -- > > Key: HIVE-25154 > URL: https://issues.apache.org/jira/browse/HIVE-25154 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25154.patch > > Time Spent: 5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.
[ https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=608311=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608311 ] ASF GitHub Bot logged work on HIVE-25154: - Author: ASF GitHub Bot Created on: 08/Jun/21 06:55 Start Date: 08/Jun/21 06:55 Worklog Time Spent: 10m Work Description: pkumarsinha removed a comment on pull request #2311: URL: https://github.com/apache/hive/pull/2311#issuecomment-856505462 Committed to master. Thanks for the patch, @hmangla98 !! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608311) Time Spent: 5h 10m (was: 5h) > Disable StatsUpdaterThread and PartitionManagementTask for db that is being > failoved over. > -- > > Key: HIVE-25154 > URL: https://issues.apache.org/jira/browse/HIVE-25154 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25154.patch > > Time Spent: 5h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.
[ https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=608309=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608309 ] ASF GitHub Bot logged work on HIVE-25154: - Author: ASF GitHub Bot Created on: 08/Jun/21 06:53 Start Date: 08/Jun/21 06:53 Worklog Time Spent: 10m Work Description: pkumarsinha merged pull request #2311: URL: https://github.com/apache/hive/pull/2311 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608309) Time Spent: 4h 50m (was: 4h 40m) > Disable StatsUpdaterThread and PartitionManagementTask for db that is being > failoved over. > -- > > Key: HIVE-25154 > URL: https://issues.apache.org/jira/browse/HIVE-25154 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25154.patch > > Time Spent: 4h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.
[ https://issues.apache.org/jira/browse/HIVE-25154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359077#comment-17359077 ] Pravin Sinha commented on HIVE-25154: - +1 > Disable StatsUpdaterThread and PartitionManagementTask for db that is being > failoved over. > -- > > Key: HIVE-25154 > URL: https://issues.apache.org/jira/browse/HIVE-25154 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25154.patch > > Time Spent: 4h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25189) Cache the validWriteIdList in query cache before fetching tables from HMS
[ https://issues.apache.org/jira/browse/HIVE-25189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa resolved HIVE-25189. --- Resolution: Fixed Pushed to master. Thanks [~scarlin]. > Cache the validWriteIdList in query cache before fetching tables from HMS > - > > Key: HIVE-25189 > URL: https://issues.apache.org/jira/browse/HIVE-25189 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Steve Carlin >Assignee: Steve Carlin >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > For a small performance boost at compile time, we should fetch the > validWriteIdList before fetching the tables. HMS allows these to be batched > together in one call. This will avoid the getTable API from being called > twice, because the first time we call it, we pass in a null for > validWriteIdList. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25189) Cache the validWriteIdList in query cache before fetching tables from HMS
[ https://issues.apache.org/jira/browse/HIVE-25189?focusedWorklogId=608304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608304 ] ASF GitHub Bot logged work on HIVE-25189: - Author: ASF GitHub Bot Created on: 08/Jun/21 06:43 Start Date: 08/Jun/21 06:43 Worklog Time Spent: 10m Work Description: kasakrisz merged pull request #2342: URL: https://github.com/apache/hive/pull/2342 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608304) Time Spent: 1h 10m (was: 1h) > Cache the validWriteIdList in query cache before fetching tables from HMS > - > > Key: HIVE-25189 > URL: https://issues.apache.org/jira/browse/HIVE-25189 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Steve Carlin >Assignee: Steve Carlin >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > For a small performance boost at compile time, we should fetch the > validWriteIdList before fetching the tables. HMS allows these to be batched > together in one call. This will avoid the getTable API from being called > twice, because the first time we call it, we pass in a null for > validWriteIdList. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23987) Upgrade arrow version to 0.11.0
[ https://issues.apache.org/jira/browse/HIVE-23987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23987: -- Labels: pull-request-available (was: ) > Upgrade arrow version to 0.11.0 > --- > > Key: HIVE-23987 > URL: https://issues.apache.org/jira/browse/HIVE-23987 > Project: Hive > Issue Type: Improvement >Reporter: Barnabas Maidics >Assignee: Barnabas Maidics >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As part of [HIVE-23890|https://issues.apache.org/jira/browse/HIVE-23890], > we're introducing flatbuffers as a dependency. > Arrow 0.10.0 has an unofficial flatbuffer dependency, which is incompatible > with the official ones: https://issues.apache.org/jira/browse/ARROW-3175 > It was fixed in 0.11.0. We should upgrade to that version -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23987) Upgrade arrow version to 0.11.0
[ https://issues.apache.org/jira/browse/HIVE-23987?focusedWorklogId=608297=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608297 ] ASF GitHub Bot logged work on HIVE-23987: - Author: ASF GitHub Bot Created on: 08/Jun/21 06:23 Start Date: 08/Jun/21 06:23 Worklog Time Spent: 10m Work Description: jcamachor opened a new pull request #2366: URL: https://github.com/apache/hive/pull/2366 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 608297) Remaining Estimate: 0h Time Spent: 10m > Upgrade arrow version to 0.11.0 > --- > > Key: HIVE-23987 > URL: https://issues.apache.org/jira/browse/HIVE-23987 > Project: Hive > Issue Type: Improvement >Reporter: Barnabas Maidics >Assignee: Barnabas Maidics >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > As part of [HIVE-23890|https://issues.apache.org/jira/browse/HIVE-23890], > we're introducing flatbuffers as a dependency. > Arrow 0.10.0 has an unofficial flatbuffer dependency, which is incompatible > with the official ones: https://issues.apache.org/jira/browse/ARROW-3175 > It was fixed in 0.11.0. We should upgrade to that version -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-20091) Tez: Add security credentials for FileSinkOperator output
[ https://issues.apache.org/jira/browse/HIVE-20091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xi Chen updated HIVE-20091: --- Target Version/s: 3.2.0, 4.0.0 (was: 3.1.0, 4.0.0) > Tez: Add security credentials for FileSinkOperator output > - > > Key: HIVE-20091 > URL: https://issues.apache.org/jira/browse/HIVE-20091 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 3.2.0, 4.0.0 > > Attachments: HIVE-20091.01.patch, HIVE-20091.02.patch, > HIVE-20091.03.patch, HIVE-20091.04.patch, HIVE-20091.05.patch, > HIVE-20091.06.patch, HIVE-20091.07.patch, HIVE-20091.08.patch > > > DagUtils needs to add security credentials for the output for the > FileSinkOperator. -- This message was sent by Atlassian Jira (v8.3.4#803005)