[jira] [Work started] (HIVE-26583) Ensure iceberg-catalog tests are executed in ptest

2022-10-03 Thread Zsolt Miskolczi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26583 started by Zsolt Miskolczi.
--
> Ensure iceberg-catalog tests are executed in ptest
> --
>
> Key: HIVE-26583
> URL: https://issues.apache.org/jira/browse/HIVE-26583
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Zsolt Miskolczi
>Priority: Major
>
> When running Iceberg tests locally I discovered that there's a failing case 
> in iceberg-catalog
> {{HiveCreateReplaceTableTest.testReplaceTableTxnTableNotExists}}, failing 
> with 
> {code}
> java.lang.AssertionError: Expected exception message (No such table: 
> hivedb.tbl) missing: Table does not exist: hivedb.tbl
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at 
> org.apache.iceberg.AssertHelpers.handleException(AssertHelpers.java:129)
>   at org.apache.iceberg.AssertHelpers.assertThrows(AssertHelpers.java:47)
>   at 
> org.apache.iceberg.hive.HiveCreateReplaceTableTest.testReplaceTableTxnTableNotExists(HiveCreateReplaceTableTest.java:168)
> {code}
> and it probably has been like that since one of the recent Iceberg dependency 
> upgrades. 
> We should fix this test so that it has the expected type of exception and 
> ensure that this module is verified as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26583) Ensure iceberg-catalog tests are executed in ptest

2022-10-03 Thread Zsolt Miskolczi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Miskolczi reassigned HIVE-26583:
--

Assignee: Zsolt Miskolczi  (was: Ádám Szita)

> Ensure iceberg-catalog tests are executed in ptest
> --
>
> Key: HIVE-26583
> URL: https://issues.apache.org/jira/browse/HIVE-26583
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Zsolt Miskolczi
>Priority: Major
>
> When running Iceberg tests locally I discovered that there's a failing case 
> in iceberg-catalog
> {{HiveCreateReplaceTableTest.testReplaceTableTxnTableNotExists}}, failing 
> with 
> {code}
> java.lang.AssertionError: Expected exception message (No such table: 
> hivedb.tbl) missing: Table does not exist: hivedb.tbl
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at 
> org.apache.iceberg.AssertHelpers.handleException(AssertHelpers.java:129)
>   at org.apache.iceberg.AssertHelpers.assertThrows(AssertHelpers.java:47)
>   at 
> org.apache.iceberg.hive.HiveCreateReplaceTableTest.testReplaceTableTxnTableNotExists(HiveCreateReplaceTableTest.java:168)
> {code}
> and it probably has been like that since one of the recent Iceberg dependency 
> upgrades. 
> We should fix this test so that it has the expected type of exception and 
> ensure that this module is verified as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26584) compressed_skip_header_footer_aggr.q is flaky

2022-10-03 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612481#comment-17612481
 ] 

Ayush Saxena commented on HIVE-26584:
-

Changing tables from External to Managed, seems to be changing a bit core of 
the test. We shouldn't change the functional bit of the test to fix the 
flakiness, though the table type might not look relevant in the current test 
case or may be yes.
 * We can have {{hive.external.table.purge.default}} set to true, if we want to 
get the data deleted for the external tables?
 * Second, In the tests, Can we not first delete the directory & then use it 
giving it scope, if there is some leftovers? This should be a one liner and 
should get us sorted? In couple of test there is something Like DROP TABLE IF 
EXIST before creates, may be same stuff for the filesystem directories, before 
using, clean it up? giving it scope if some failed test didn't do that or 
failed to do that
 * Is there a scope like post every .q test we do a recursive delete inside  
{{{}system:test.tmp.dir{}}}? That should also solve the problem for all the 
tests affected by this issue?

 

In general we should try to have minimal changes to the core of the existing 
test as far as possible to get rid of the flakiness.

> compressed_skip_header_footer_aggr.q is flaky
> -
>
> Key: HIVE-26584
> URL: https://issues.apache.org/jira/browse/HIVE-26584
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0-alpha-2
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> One of my PRs compressed_skip_header_footer_aggr.q  was failing with 
> unexpected diff. Such as:
> {code:java}
>  TestMiniLlapLocalCliDriver.testCliDriver:62 Client Execution succeeded but 
> contained differences (error code = 1) after executing 
> compressed_skip_header_footer_aggr.q
> 69,71c69,70
> < 1 2019-12-31
> < 2 2018-12-31
> < 3 2017-12-31
> ---
> > 2 2019-12-31
> > 3 2019-12-31
> 89d87
> < NULL  NULL
> 91c89
> < 2 2018-12-31
> ---
> > 2 2019-12-31
> 100c98
> < 1
> ---
> > 2
> 109c107
> < 1 2019-12-31
> ---
> > 2 2019-12-31
> 127,128c125,126
> < 1 2019-12-31
> < 3 2017-12-31
> ---
> > 2 2019-12-31
> > 3 2019-12-31
> 146a145
> > 2 2019-12-31
> 155c154
> < 1
> ---
> > 2 {code}
> Investigating it, it did not seem to fail when executed locally. Since I 
> suspected test interference I searched for the tablenames/directories used 
> and discovered empty_skip_header_footer_aggr.q which uses the same table 
> names AND external directories.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26533) Column data type is lost when an Avro table with a BYTE column is written through spark-sql

2022-10-03 Thread xsys (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xsys updated HIVE-26533:

Description: 
h3. Describe the bug

We are trying to store a table through the {{spark-sql}} interface with the 
{{Avro}} file format. The table's schema contains a column with the {{BYTE}} 
data type. Additionally, the column's name contains uppercase letters.

When we {{INSERT}} some valid values (e.g. {{{}-128{}}}), we see the below 
message:
{code:java}
WARN HiveExternalCatalog: The table schema given by Hive 
metastore(struct) is different from the schema when this table 
was created by Spark SQL(struct). We have to fall back to 
the table schema from Hive metastore which is not case preserving.{code}
 
Finally, when we perform a {{DESC}} on the table, we observe that the {{BYTE}} 
data type has been converted to {{{}int{}}}, and the case sensitivity of the 
column name has been lost (it is converted to lowercase).
h3. Step to reproduce

On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{spark-shell}} with the Avro 
package:
{code:java}
./bin/spark-sql --packages org.apache.spark:spark-avro_2.12:3.2.1{code}
 
Execute the following:
{code:java}
spark-sql> create table hive_tinyint_avro(c0 INT, C1 BYTE) ROW FORMAT SERDE 
"org.apache.hadoop.hive.serde2.avro.AvroSerDe" STORED AS INPUTFORMAT 
"org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat" OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat";
22/08/28 15:44:21 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
since hive.security.authorization.manager is set to instance of 
HiveAuthorizerFactory.
Time taken: 0.359 seconds
spark-sql> insert into hive_tinyint_avro select 0, cast(-128 as byte);
22/08/28 15:44:28 WARN HiveExternalCatalog: The table schema given by Hive 
metastore(struct) is different from the schema when this table 
was created by Spark SQL(struct). We have to fall back to 
the table schema from Hive metastore which is not case preserving.
22/08/28 15:44:29 WARN HiveExternalCatalog: The table schema given by Hive 
metastore(struct) is different from the schema when this table 
was created by Spark SQL(struct). We have to fall back to 
the table schema from Hive metastore which is not case preserving.
Time taken: 1.605 seconds
spark-sql> desc hive_tinyint_avro;
22/08/28 15:44:32 WARN HiveExternalCatalog: The table schema given by Hive 
metastore(struct) is different from the schema when this table 
was created by Spark SQL(struct). We have to fall back to 
the table schema from Hive metastore which is not case preserving.
22/08/28 15:44:32 WARN HiveExternalCatalog: The table schema given by Hive 
metastore(struct) is different from the schema when this table 
was created by Spark SQL(struct). We have to fall back to 
the table schema from Hive metastore which is not case preserving.
c0                      int
c1                      int // Data type and case-sensitivity lost
Time taken: 0.068 seconds, Fetched 2 row(s){code}
h3. Expected behavior

We expect the case sensitivity and data type to be preserved. We tried other 
formats like Parquet & ORC and the outcome is consistent with this expectation.

Here are the logs from our attempt at doing the same with Parquet:
{noformat}
spark-sql> create table hive_tinyint_parquet(c0 INT, C1 BYTE) stored as PARQUET;
Time taken: 0.134 seconds
spark-sql> insert into hive_tinyint_parquet select 0, cast(-128 as byte);
Time taken: 0.995 seconds
spark-sql> desc hive_tinyint_parquet;
c0                      int
C1                      tinyint  // Data type and case-sensitivity preserved
Time taken: 0.092 seconds, Fetched 2 row(s){noformat}
h3. Root Cause

[TypeInfoToSchema|https://github.com/apache/hive/blob/8190d2be7b7165effa62bd21b7d60ef81fb0e4af/serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java#L41]'s
 
[createAvroPrimitive|https://github.com/apache/hive/blob/rel/release-3.1.2/serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java#L124-L132]
 is where Hive's BYTE, SHORT & INT are all converted into Avro's INT:
{code:java}
      case BYTE:
        schema = Schema.create(Schema.Type.INT);
        break;
      case SHORT:
        schema = Schema.create(Schema.Type.INT);
        break;
      case INT:
        schema = Schema.create(Schema.Type.INT);
        break;
{code}
 
Once converted into Avro schema, we lose track of the actual Hive schema 
specified by the user. Therefore, once TINYINT/BYTE is converted into INT, the 
former is lost in the AvroSerde instance.
 

  was:
h3. Describe the bug

We are trying to store a table through the {{spark-sql}} interface with the 
{{Avro}} file format. The table's schema contains a column with the {{BYTE}} 
data type. Additionally, the column's name contains uppercase letters.

When we {{INSERT}} some valid values (e.g. {{{}-128{}}}), we see the below 
message:
{code:java}
WARN HiveExternalCatalog: The tab

[jira] [Assigned] (HIVE-24313) Optimise stats collection for file sizes on cloud storage

2022-10-03 Thread Dmitriy Fingerman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Fingerman reassigned HIVE-24313:


Assignee: Dmitriy Fingerman

> Optimise stats collection for file sizes on cloud storage
> -
>
> Key: HIVE-24313
> URL: https://issues.apache.org/jira/browse/HIVE-24313
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When stats information is not present (e.g external table), RelOptHiveTable 
> computes basic stats at runtime.
> Following is the codepath.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java#L598]
> {code:java}
> Statistics stats = StatsUtils.collectStatistics(hiveConf, partitionList,
> hiveTblMetadata, hiveNonPartitionCols, 
> nonPartColNamesThatRqrStats, colStatsCached,
> nonPartColNamesThatRqrStats, true);
>  {code}
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L322]
> {code:java}
> for (Partition p : partList.getNotDeniedPartns()) {
> BasicStats basicStats = 
> basicStatsFactory.build(Partish.buildFor(table, p));
> partStats.add(basicStats);
>   }
>  {code}
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStats.java#L205]
>  
> {code:java}
> try {
> ds = getFileSizeForPath(path);
>   } catch (IOException e) {
> ds = 0L;
>   }
>  {code}
>  
> For a table & query with large number of partitions, this takes long time to 
> compute statistics and increases compilation time.  It would be good to fix 
> it with "ForkJoinPool" ( 
> partList.getNotDeniedPartns().parallelStream().forEach((p) )
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26584) compressed_skip_header_footer_aggr.q is flaky

2022-10-03 Thread John Sherman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612422#comment-17612422
 ] 

John Sherman commented on HIVE-26584:
-

After reading the tests in deeper detail, I've modified the posted patch and 
modified both compressed_skip_header_footer_aggr.q and 
empty_skip_header_footer_aggr.q to:
1) Changed all the EXTERNAL TABLEs to normal managed tables
2) Added DROPs to the test for the created tables (so the underlying data gets 
removed normally)
3) Removed the dfs commands that created directories and copied the test data
4) Added LOAD DATA commands to populate the tables with the test data
5) I also gave the tables unique names between the tests (I find it easier to 
have unique names for debugging).

I find this approach less error prone and less confusing (LOAD DATA is more 
idiomatic). I saw nothing inherent to the test cases that required external 
tables.

> compressed_skip_header_footer_aggr.q is flaky
> -
>
> Key: HIVE-26584
> URL: https://issues.apache.org/jira/browse/HIVE-26584
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0-alpha-2
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> One of my PRs compressed_skip_header_footer_aggr.q  was failing with 
> unexpected diff. Such as:
> {code:java}
>  TestMiniLlapLocalCliDriver.testCliDriver:62 Client Execution succeeded but 
> contained differences (error code = 1) after executing 
> compressed_skip_header_footer_aggr.q
> 69,71c69,70
> < 1 2019-12-31
> < 2 2018-12-31
> < 3 2017-12-31
> ---
> > 2 2019-12-31
> > 3 2019-12-31
> 89d87
> < NULL  NULL
> 91c89
> < 2 2018-12-31
> ---
> > 2 2019-12-31
> 100c98
> < 1
> ---
> > 2
> 109c107
> < 1 2019-12-31
> ---
> > 2 2019-12-31
> 127,128c125,126
> < 1 2019-12-31
> < 3 2017-12-31
> ---
> > 2 2019-12-31
> > 3 2019-12-31
> 146a145
> > 2 2019-12-31
> 155c154
> < 1
> ---
> > 2 {code}
> Investigating it, it did not seem to fail when executed locally. Since I 
> suspected test interference I searched for the tablenames/directories used 
> and discovered empty_skip_header_footer_aggr.q which uses the same table 
> names AND external directories.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26502) Improve LDAP auth to support include generic user filters

2022-10-03 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-26502.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Fix has been merged to master. Thank you for the review [~dengzh]

> Improve LDAP auth to support include generic user filters
> -
>
> Key: HIVE-26502
> URL: https://issues.apache.org/jira/browse/HIVE-26502
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-alpha-1
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
> Fix For: 4.0.0
>
>
> Currently, Hive's ldap userfiltering is based on configuring a set of 
> patterns in which wild cards are replaced by usernames and searched for. 
> While this model supports advanced filtering options where a corporate ldap 
> can have users in different orgs and trees, it does not quite support generic 
> ldap searches like this.
> (&(uid={0})(objectClass=person))
> To be able to support this without making changes to the semantics of 
> existing configuration params, and to be backward compatible, we can enhance 
> the existing custom query functionality to support this.
> For with a configuration like this, we should be able to perform a search for 
> user who uid matches the username being authenticated.
> {noformat}
>   
> hive.server2.authentication.ldap.baseDN
> dc=apache,dc=org
>   
>   
> hive.server2.authentication.ldap.customLDAPQuery
> (&(uid={0})(objectClass=person))
>   
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26592) TestMiniLlapLocalCliDriver test runs throw NoSuchMethodError since log4j 2.18 upgrade

2022-10-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hankó Gergely resolved HIVE-26592.
--
  Assignee: Hankó Gergely
Resolution: Information Provided

> TestMiniLlapLocalCliDriver test runs throw NoSuchMethodError since log4j 2.18 
> upgrade
> -
>
> Key: HIVE-26592
> URL: https://issues.apache.org/jira/browse/HIVE-26592
> Project: Hive
>  Issue Type: Bug
>Reporter: Hankó Gergely
>Assignee: Hankó Gergely
>Priority: Major
>
> The issue exists since 
> [https://github.com/apache/hive/commit/c9e7f5dd6191636232921279acc1a5dd5a6fcaff]
> {code:java}
> [INFO] Running org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.03 
> s <<< FAILURE! - in org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
> [ERROR] org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver  Time elapsed: 
> 8.028 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.apache.logging.log4j.util.StackLocatorUtil.getCallerClassLoader(I)Ljava/lang/ClassLoader;
>         at org.apache.log4j.Logger.getLogger(Logger.java:35)
>         at 
> org.apache.hadoop.hive.ql.udf.esri.ST_GeometryRelational.(ST_GeometryRelational.java:36)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at 
> org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:83)
>         at 
> org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDF(Registry.java:193)
>         at 
> org.apache.hadoop.hive.ql.exec.Registry.registerFunction(Registry.java:128)
>         at 
> org.apache.hadoop.hive.ql.exec.Registry.registerFunction(Registry.java:115)
>         at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.(FunctionRegistry.java:689)
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:345)
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:325)
>         at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:551)
>         at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:443)
>         at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:430)
>         at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:386)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.createHiveDB(BaseSemanticAnalyzer.java:291)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.(BaseSemanticAnalyzer.java:269)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.(SemanticAnalyzer.java:477)
>         at org.apache.hadoop.hive.ql.QTestUtil.postInit(QTestUtil.java:565)
>         at 
> org.apache.hadoop.hive.cli.control.CliAdapter$1$1.evaluate(CliAdapter.java:88)
>         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>         at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>         at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)[INFO]
>  
> [INFO] Results:
> [INFO] 
> [ERROR] Errors: 
> [ERROR]   
> TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
>  » NoSuchMethod
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-26592) TestMiniLlapLocalCliDriver test runs throw NoSuchMethodError since log4j 2.18 upgrade

2022-10-03 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-26592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612413#comment-17612413
 ] 

Hankó Gergely edited comment on HIVE-26592 at 10/3/22 6:59 PM:
---

My problem was the following:
 *  I always skip building :hive-it-druid because it causes problems and there 
was an old version of it in my maven repo
 * :hive-it-druid is used by :hive-it-qfile
 * hive-it-druid jar contains StackLocatorUtil from log4j-api and the old 
version contained an old version of it

So rebuilding the project including :hive-it-druid solves the problem but I'm 
not sure if compiling log4j-api into hive-it-druid jar is a good idea though.


was (Author: ghanko):
My problem was the following: * I always skip building :hive-it-druid because 
it causes problems and there was an old version of it in my maven repo
 * :hive-it-druid is used by :hive-it-qfile
 * hive-it-druid jar contains StackLocatorUtil from log4j-api and the old 
version contained an old version of it

So rebuilding the project including :hive-it-druid solves the problem but I'm 
not sure if compiling log4j-api into hive-it-druid jar is a good idea though.

> TestMiniLlapLocalCliDriver test runs throw NoSuchMethodError since log4j 2.18 
> upgrade
> -
>
> Key: HIVE-26592
> URL: https://issues.apache.org/jira/browse/HIVE-26592
> Project: Hive
>  Issue Type: Bug
>Reporter: Hankó Gergely
>Assignee: Hankó Gergely
>Priority: Major
>
> The issue exists since 
> [https://github.com/apache/hive/commit/c9e7f5dd6191636232921279acc1a5dd5a6fcaff]
> {code:java}
> [INFO] Running org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.03 
> s <<< FAILURE! - in org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
> [ERROR] org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver  Time elapsed: 
> 8.028 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.apache.logging.log4j.util.StackLocatorUtil.getCallerClassLoader(I)Ljava/lang/ClassLoader;
>         at org.apache.log4j.Logger.getLogger(Logger.java:35)
>         at 
> org.apache.hadoop.hive.ql.udf.esri.ST_GeometryRelational.(ST_GeometryRelational.java:36)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at 
> org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:83)
>         at 
> org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDF(Registry.java:193)
>         at 
> org.apache.hadoop.hive.ql.exec.Registry.registerFunction(Registry.java:128)
>         at 
> org.apache.hadoop.hive.ql.exec.Registry.registerFunction(Registry.java:115)
>         at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.(FunctionRegistry.java:689)
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:345)
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:325)
>         at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:551)
>         at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:443)
>         at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:430)
>         at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:386)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.createHiveDB(BaseSemanticAnalyzer.java:291)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.(BaseSemanticAnalyzer.java:269)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.(SemanticAnalyzer.java:477)
>         at org.apache.hadoop.hive.ql.QTestUtil.postInit(QTestUtil.java:565)
>         at 
> org.apache.hadoop.hive.cli.control.CliAdapter$1$1.evaluate(CliAdapter.java:88)
>         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>         at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>         at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java

[jira] [Commented] (HIVE-26592) TestMiniLlapLocalCliDriver test runs throw NoSuchMethodError since log4j 2.18 upgrade

2022-10-03 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-26592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612413#comment-17612413
 ] 

Hankó Gergely commented on HIVE-26592:
--

My problem was the following: * I always skip building :hive-it-druid because 
it causes problems and there was an old version of it in my maven repo
 * :hive-it-druid is used by :hive-it-qfile
 * hive-it-druid jar contains StackLocatorUtil from log4j-api and the old 
version contained an old version of it

So rebuilding the project including :hive-it-druid solves the problem but I'm 
not sure if compiling log4j-api into hive-it-druid jar is a good idea though.

> TestMiniLlapLocalCliDriver test runs throw NoSuchMethodError since log4j 2.18 
> upgrade
> -
>
> Key: HIVE-26592
> URL: https://issues.apache.org/jira/browse/HIVE-26592
> Project: Hive
>  Issue Type: Bug
>Reporter: Hankó Gergely
>Priority: Major
>
> The issue exists since 
> [https://github.com/apache/hive/commit/c9e7f5dd6191636232921279acc1a5dd5a6fcaff]
> {code:java}
> [INFO] Running org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.03 
> s <<< FAILURE! - in org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
> [ERROR] org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver  Time elapsed: 
> 8.028 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.apache.logging.log4j.util.StackLocatorUtil.getCallerClassLoader(I)Ljava/lang/ClassLoader;
>         at org.apache.log4j.Logger.getLogger(Logger.java:35)
>         at 
> org.apache.hadoop.hive.ql.udf.esri.ST_GeometryRelational.(ST_GeometryRelational.java:36)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at 
> org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:83)
>         at 
> org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDF(Registry.java:193)
>         at 
> org.apache.hadoop.hive.ql.exec.Registry.registerFunction(Registry.java:128)
>         at 
> org.apache.hadoop.hive.ql.exec.Registry.registerFunction(Registry.java:115)
>         at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.(FunctionRegistry.java:689)
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:345)
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:325)
>         at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:551)
>         at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:443)
>         at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:430)
>         at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:386)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.createHiveDB(BaseSemanticAnalyzer.java:291)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.(BaseSemanticAnalyzer.java:269)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.(SemanticAnalyzer.java:477)
>         at org.apache.hadoop.hive.ql.QTestUtil.postInit(QTestUtil.java:565)
>         at 
> org.apache.hadoop.hive.cli.control.CliAdapter$1$1.evaluate(CliAdapter.java:88)
>         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>         at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>         at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)[INFO]
>  
> [INFO] Results:
> [INFO] 
> [ERROR] Errors: 
> [ERROR]   
> TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
>  » NoSuchMethod
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-26584) compressed_skip_header_footer_aggr.q is flaky

2022-10-03 Thread John Sherman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612407#comment-17612407
 ] 

John Sherman edited comment on HIVE-26584 at 10/3/22 6:19 PM:
--

After digging in deeper - You are correct, it is not a concurrent issue. It 
just happened to be the easiest way to repro and I mistakenly thought it was 
the root of the issue (before we had the containerized ptest framework, test 
conflicts were somewhat common iirc).

Here is what is what I think is happening:
1. During PR testing TestMiniLlapLocalCliDriver tests get split into 32 
different splits
[https://github.com/apache/hive/blob/master/itests/bin/generate-cli-splits.sh]
[https://github.com/apache/hive/blob/4170e566143e6daa291654e97116199aa738377c/itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestMiniLlapLocalCliDriver.java#L39]
(It codegens 32 new TestMiniLlapLocalCliDriver objects each with split0 - 
split32 in the package name)

2. Test assignment for each split is handled via runtime introspection of the 
class name:
[https://github.com/apache/hive/blob/4170e566143e6daa291654e97116199aa738377c/itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestMiniLlapLocalCliDriver.java#L43]
[https://github.com/apache/hive/blob/4170e566143e6daa291654e97116199aa738377c/itests/util/src/main/java/org/apache/hadoop/hive/cli/control/SplitSupport.java#L46]

in my PRs case:
empty_skip_header_footer_aggr.q gets assigned to split-7:
{code:java}

{code}
compressed_skip_header_footer_aggr.q gets assigned to split-4:
{code:java}

{code}
3. All test splits are split across 20 executors (not sure where this lives, 
maybe Jenkins scripts)
split-7 and split-4 get assigned to the same "execution split" of 14
{code:java}
split-14/itests/qtest/target/surefire-reports/TEST-org.apache.hadoop.hive.cli.split7.TestMiniLlapLocalCliDriver.xml
144:  

split-14/itests/qtest/target/surefire-reports/TEST-org.apache.hadoop.hive.cli.split4.TestMiniLlapLocalCliDriver.xml
165:  
{code}
4. empty_skip_header_footer_aggr gets executed before 
compressed_skip_header_footer_aggr (this can be seen above in that 144 is 
before 165 in the test xml)

5. Both empty_skip_header_footer_aggr and compressed_skip_header_footer_aggr 
create external tables with the data copied to the same location(s). 
For example these locations get used in both tests:
${system:test.tmp.dir}/testcase1
${system:test.tmp.dir}/testcase2
since each test invocation ends up using the same path and the tmp directory is 
not cleaned between tests this is where the conflict occurs.

6. empty_skip_header_footer_aggr includes rmr commands to cleanup the testcase1 
and testcase2 directories.
[https://github.com/apache/hive/blob/4170e566143e6daa291654e97116199aa738377c/ql/src/test/queries/clientpositive/empty_skip_header_footer_aggr.q#L6]

compressed_skip_header does not:
[https://github.com/apache/hive/blob/4170e566143e6daa291654e97116199aa738377c/ql/src/test/queries/clientpositive/compressed_skip_header_footer_aggr.q#L1]

This also like explains why it is not reproducible via:
{code:java}
mvn test -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=compressed_skip_header_footer_aggr.q,empty_skip_header_footer_aggr.q
{code}
I think the order of the tests when executed this way is always 
compressed_skip_header_footer_aggr.q and then empty_skip_header_footer_aggr.q

My fix ends up working because I give a unique location for each tests test 
external data files.

I'll likely modify empty_skip_header_footer_aggr.q to remove the rmr's (because 
the only thing it really does is to hide this problem) and give all the 
files/directories unique names. We could like add a "unique external directory" 
variable that is generated per testcase and cleaned up after each one (or some 
other solution) but I think that is out of the scope of this ticket.


was (Author: jfs):
After digging in deeper - You are correct, it is not a concurrent issue. It 
just happened to be the easiest way to repro and I mistakenly thought it was 
the root of the issue (before we had the containerized ptest framework, test 
conflicts were somewhat common iirc).

Here is what is what I think is happening:
1. During PR testing TestMiniLlapLocalCliDriver tests get split into 32 
different splits
[https://github.com/apache/hive/blob/master/itests/bin/generate-cli-splits.sh]
[https://github.com/apache/hive/blob/4170e566143e6daa291654e97116199aa738377c/itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestMiniLlapLocalCliDriver.java#L39]
(It codegens 32 new TestMiniLlapLocalCliDriver objects each with split0 - 
split32 in the package name)

2. Test assignment for each split is handled via runtime introspection of the 
class name:
[https://github.com/apache/hive/blob/4170e566143e6daa291654e97116199aa738377c/itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestMiniLlapLocalCliDriver.java#L43]
[https://github.com/apache/

[jira] [Commented] (HIVE-26584) compressed_skip_header_footer_aggr.q is flaky

2022-10-03 Thread John Sherman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612407#comment-17612407
 ] 

John Sherman commented on HIVE-26584:
-

After digging in deeper - You are correct, it is not a concurrent issue. It 
just happened to be the easiest way to repro and I mistakenly thought it was 
the root of the issue (before we had the containerized ptest framework, test 
conflicts were somewhat common iirc).

Here is what is what I think is happening:
1. During PR testing TestMiniLlapLocalCliDriver tests get split into 32 
different splits
[https://github.com/apache/hive/blob/master/itests/bin/generate-cli-splits.sh]
[https://github.com/apache/hive/blob/4170e566143e6daa291654e97116199aa738377c/itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestMiniLlapLocalCliDriver.java#L39]
(It codegens 32 new TestMiniLlapLocalCliDriver objects each with split0 - 
split32 in the package name)

2. Test assignment for each split is handled via runtime introspection of the 
class name:
[https://github.com/apache/hive/blob/4170e566143e6daa291654e97116199aa738377c/itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestMiniLlapLocalCliDriver.java#L43]
[https://github.com/apache/hive/blob/4170e566143e6daa291654e97116199aa738377c/itests/util/src/main/java/org/apache/hadoop/hive/cli/control/SplitSupport.java#L46]

in my PRs case:
empty_skip_header_footer_aggr.q gets assigned to split-7:
{code:java}

{code}
compressed_skip_header_footer_aggr.q gets assigned to split-4:
{code:java}

{code}
3. All test splits are split across 20 executors (not sure where this lives, 
maybe Jenkins scripts)
split-7 and split-4 get assigned to the same "execution split" of 14
{code:java}
split-14/itests/qtest/target/surefire-reports/TEST-org.apache.hadoop.hive.cli.split7.TestMiniLlapLocalCliDriver.xml
144:  

split-14/itests/qtest/target/surefire-reports/TEST-org.apache.hadoop.hive.cli.split4.TestMiniLlapLocalCliDriver.xml
165:  
{code}
4. empty_skip_header_footer_aggr gets executed before 
compressed_skip_header_footer_aggr (this can be seen above in that 144 is 
before 165 in the test xml)

5. Both empty_skip_header_footer_aggr and compressed_skip_header_footer_aggr 
create external tables with the data copied to the same location(s). 
For example these locations get used in both tests:
${system:test.tmp.dir}/testcase1
${system:test.tmp.dir}/testcase2
since each test invocation ends up using the same path and the tmp directory is 
not cleaned between tests this is where the conflict occurs.

6. empty_skip_header_footer_aggr includes rmr commands to cleanup the testcase1 
and testcase2 directories.
[https://github.com/apache/hive/blob/4170e566143e6daa291654e97116199aa738377c/ql/src/test/queries/clientpositive/empty_skip_header_footer_aggr.q#L6]

compressed_skip_header does not:
[https://github.com/apache/hive/blob/4170e566143e6daa291654e97116199aa738377c/ql/src/test/queries/clientpositive/compressed_skip_header_footer_aggr.q#L1]

This also like explains why it is not reproducible via:
{code:java}
mvn test -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=compressed_skip_header_footer_aggr.q,empty_skip_header_footer_aggr.q
{code}
I think the order of the tests when executed this way is always 
compressed_skip_header_footer_aggr.q and then empty_skip_header_footer_aggr.q

My fix ends up working because I give a unique location for each tests test 
external data files.

I'll likely modify empty_skip_header_footer_aggr.q to remove the rmr's (because 
the only thing the do is to hide this problem) and give all the 
files/directories unique names. We could like add a "unique external directory" 
variable that is generated per testcase and cleaned up after each one (or some 
other solution) but I think that is out of the scope of this ticket.

> compressed_skip_header_footer_aggr.q is flaky
> -
>
> Key: HIVE-26584
> URL: https://issues.apache.org/jira/browse/HIVE-26584
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0-alpha-2
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> One of my PRs compressed_skip_header_footer_aggr.q  was failing with 
> unexpected diff. Such as:
> {code:java}
>  TestMiniLlapLocalCliDriver.testCliDriver:62 Client Execution succeeded but 
> contained differences (error code = 1) after executing 
> compressed_skip_header_footer_aggr.q
> 69,71c69,70
> < 1 2019-12-31
> < 2 2018-12-31
> < 3 2017-12-31
> ---
> > 2 2019-12-31
> > 3 2019-12-31
> 89d87
> < NULL  NULL
> 91c89
> < 2 2018-12-31
> ---
> > 2 2019-12-31
> 100c98
> < 1
> ---
> > 2
> 109c107
> < 1 2019-12-31
> ---
> > 2 2019-12-31
> 127,128c125,126
> < 1 2019-12-31
> < 3 2017-

[jira] [Updated] (HIVE-26592) TestMiniLlapLocalCliDriver test runs throw NoSuchMethodError since log4j 2.18 upgrade

2022-10-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hankó Gergely updated HIVE-26592:
-
Description: 
The issue exists since 
[https://github.com/apache/hive/commit/c9e7f5dd6191636232921279acc1a5dd5a6fcaff]
{code:java}
[INFO] Running org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.03 s 
<<< FAILURE! - in org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
[ERROR] org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver  Time elapsed: 
8.028 s  <<< ERROR!
java.lang.NoSuchMethodError: 
org.apache.logging.log4j.util.StackLocatorUtil.getCallerClassLoader(I)Ljava/lang/ClassLoader;
        at org.apache.log4j.Logger.getLogger(Logger.java:35)
        at 
org.apache.hadoop.hive.ql.udf.esri.ST_GeometryRelational.(ST_GeometryRelational.java:36)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:83)
        at 
org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDF(Registry.java:193)
        at 
org.apache.hadoop.hive.ql.exec.Registry.registerFunction(Registry.java:128)
        at 
org.apache.hadoop.hive.ql.exec.Registry.registerFunction(Registry.java:115)
        at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.(FunctionRegistry.java:689)
        at 
org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:345)
        at 
org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:325)
        at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:551)
        at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:443)
        at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:430)
        at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:386)
        at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.createHiveDB(BaseSemanticAnalyzer.java:291)
        at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.(BaseSemanticAnalyzer.java:269)
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.(SemanticAnalyzer.java:477)
        at org.apache.hadoop.hive.ql.QTestUtil.postInit(QTestUtil.java:565)
        at 
org.apache.hadoop.hive.cli.control.CliAdapter$1$1.evaluate(CliAdapter.java:88)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
        at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
        at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
        at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
        at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Errors: 
[ERROR]   
TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 » NoSuchMethod
 {code}

  was:
The issue exists since since 
[https://github.com/apache/hive/commit/c9e7f5dd6191636232921279acc1a5dd5a6fcaff]
{code:java}
[INFO] Running org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.03 s 
<<< FAILURE! - in org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
[ERROR] org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver  Time elapsed: 
8.028 s  <<< ERROR!
java.lang.NoSuchMethodError: 
org.apache.logging.log4j.util.StackLocatorUtil.getCallerClassLoader(I)Ljava/lang/ClassLoader;
        at org.apache.log4j.Logger.getLogger(Logger.java:35)
        at 
org.apache.hadoop.hive.ql.udf.esri.ST_GeometryRelational.(ST_GeometryRelational.java:36)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hive.common.util.ReflectionUtil.newIns

[jira] [Updated] (HIVE-26591) libthrift 0.14.0 onwards doesn't works with Hive (All versions)

2022-10-03 Thread Pratik Malani (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratik Malani updated HIVE-26591:
-
Description: 
libthrift:0.13.0 is affected with 
[CVE-2020-13949|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13949]

Currently I am using Spark 3.3.0 and Hive 2.3.9 with Hadoop 3.3.4.

When we do an upgrade to use libthrift:0.14.0 and above jar, below exception is 
thrown while starting the Spark Thriftserver.
{noformat}
org.apache.hive.service.ServiceException: Failed to Start HiveServer2
        at 
org.apache.hive.service.CompositeService.start(CompositeService.java:79)
        at 
org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoSuchMethodError: 
org.apache.thrift.server.TThreadPoolServer$Args.requestTimeout(I)Lorg/apache/thrift/server/TThreadPoolServer$Args;
        at 
org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.initializeServer(ThriftBinaryCLIService.java:101)
        at 
org.apache.hive.service.cli.thrift.ThriftCLIService.start(ThriftCLIService.java:176)
        at 
org.apache.hive.service.CompositeService.start(CompositeService.java:69)
        at 
org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{noformat}
After detailed investigation, found out that since 0.14.0 version, in class 
org.apache.thrift.server.TThreadPoolServer, requestTimeout property has been 
removed.

 

 

[https://jar-download.com/artifacts/org.apache.thrift/libthrift/0.13.0/source-code/org/apache/thrift/server/TThreadPoolServer.java]

Below code snippet is from libthrift:0.13.0, but the below mentioned attributes 
have been removed from libthrift:0.14.0 onwards.

!image-2022-10-03-19-55-16-030.png|width=428,height=88!

 

 

Even in latest hive release (3.1.3), it is still referencing to the 
requestTimeout attribute has been removed.

[https://jar-download.com/artifacts/org.apache.hive/hive-service/3.1.3/source-code/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java]

!image-2022-10-03-19-51-20-052.png|width=769,height=98!

Can we have any

[jira] [Updated] (HIVE-26591) libthrift 0.14.0 onwards doesn't works with Hive (All versions)

2022-10-03 Thread Pratik Malani (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratik Malani updated HIVE-26591:
-
Description: 
libthrift:0.13.0 is affected with 
[CVE-2020-13949|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13949]

Currently I am using Spark 3.3.0 and Hive 2.3.9 with Hadoop 3.3.4.

When we do an upgrade to use libthrift:0.14.0 and above jar, below exception is 
thrown while starting the Spark Thriftserver.
{noformat}
org.apache.hive.service.ServiceException: Failed to Start HiveServer2
        at 
org.apache.hive.service.CompositeService.start(CompositeService.java:79)
        at 
org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoSuchMethodError: 
org.apache.thrift.server.TThreadPoolServer$Args.requestTimeout(I)Lorg/apache/thrift/server/TThreadPoolServer$Args;
        at 
org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.initializeServer(ThriftBinaryCLIService.java:101)
        at 
org.apache.hive.service.cli.thrift.ThriftCLIService.start(ThriftCLIService.java:176)
        at 
org.apache.hive.service.CompositeService.start(CompositeService.java:69)
        at 
org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{noformat}
After detailed investigation, found out that since 0.14.0 version, in class 
org.apache.thrift.server.TThreadPoolServer, requestTimeout property has been 
removed.

 

[https://jar-download.com/artifacts/org.apache.thrift/libthrift/0.13.0/source-code/org/apache/thrift/server/TThreadPoolServer.java]

Below code snippet is from libthrift:0.13.0, but the below mentioned attributes 
have been removed from libthrift:0.14.0 onwards.

!image-2022-10-03-19-55-16-030.png|width=428,height=88!

 

Even in latest hive release (3.1.3), it is still referencing to the 
requestTimeout attribute has been removed.

[https://jar-download.com/artifacts/org.apache.hive/hive-service/3.1.3/source-code/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java]

!image-2022-10-03-19-51-20-052.png|width=769,height=98!

Can we have any alter

[jira] [Updated] (HIVE-26591) libthrift 0.14.0 onwards doesn't works with Hive (All versions)

2022-10-03 Thread Pratik Malani (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratik Malani updated HIVE-26591:
-
Attachment: image-2022-10-03-19-55-16-030.png

> libthrift 0.14.0 onwards doesn't works with Hive (All versions)
> ---
>
> Key: HIVE-26591
> URL: https://issues.apache.org/jira/browse/HIVE-26591
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Affects Versions: 1.2.2, 2.3.7, 2.3.9
>Reporter: Pratik Malani
>Assignee: Navis Ryu
>Priority: Critical
> Fix For: 3.1.3, 4.0.0
>
> Attachments: image-2022-10-03-19-51-20-052.png, 
> image-2022-10-03-19-55-16-030.png
>
>
> libthrift:0.13.0 is affected with 
> [CVE-2020-13949|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13949]
> Currently I am using Spark 3.3.0 and Hive 2.3.9 with Hadoop 3.3.4.
> When we do an upgrade to use libthrift:0.14.0 and above jar, below exception 
> is thrown while starting the Spark Thriftserver.
> {noformat}
> org.apache.hive.service.ServiceException: Failed to Start HiveServer2
>         at 
> org.apache.hive.service.CompositeService.start(CompositeService.java:79)
>         at 
> org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.thrift.server.TThreadPoolServer$Args.requestTimeout(I)Lorg/apache/thrift/server/TThreadPoolServer$Args;
>         at 
> org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.initializeServer(ThriftBinaryCLIService.java:101)
>         at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.start(ThriftCLIService.java:176)
>         at 
> org.apache.hive.service.CompositeService.start(CompositeService.java:69)
>         at 
> org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {noformat}
> After detailed investigation, foun

[jira] [Assigned] (HIVE-26591) libthrift 0.14.0 onwards doesn't works with Hive (All versions)

2022-10-03 Thread Pratik Malani (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratik Malani reassigned HIVE-26591:


Assignee: Navis Ryu

> libthrift 0.14.0 onwards doesn't works with Hive (All versions)
> ---
>
> Key: HIVE-26591
> URL: https://issues.apache.org/jira/browse/HIVE-26591
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Affects Versions: 1.2.2, 2.3.7, 2.3.9
>Reporter: Pratik Malani
>Assignee: Navis Ryu
>Priority: Critical
> Fix For: 3.1.3, 4.0.0
>
> Attachments: image-2022-10-03-19-51-20-052.png
>
>
> libthrift:0.13.0 is affected with 
> [CVE-2020-13949|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13949]
> Currently I am using Spark 3.3.0 and Hive 2.3.9 with Hadoop 3.3.4.
> When we do an upgrade to use libthrift:0.14.0 and above jar, below exception 
> is thrown while starting the Spark Thriftserver.
> {noformat}
> org.apache.hive.service.ServiceException: Failed to Start HiveServer2
>         at 
> org.apache.hive.service.CompositeService.start(CompositeService.java:79)
>         at 
> org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.thrift.server.TThreadPoolServer$Args.requestTimeout(I)Lorg/apache/thrift/server/TThreadPoolServer$Args;
>         at 
> org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.initializeServer(ThriftBinaryCLIService.java:101)
>         at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.start(ThriftCLIService.java:176)
>         at 
> org.apache.hive.service.CompositeService.start(CompositeService.java:69)
>         at 
> org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {noformat}
> After detailed investigation, found out that since 0.14.0 version, in class 
> org.apache.t

[jira] [Updated] (HIVE-26591) libthrift 0.14.0 onwards doesn't works with Hive (All versions)

2022-10-03 Thread Pratik Malani (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratik Malani updated HIVE-26591:
-
Attachment: image-2022-10-03-19-51-20-052.png

> libthrift 0.14.0 onwards doesn't works with Hive (All versions)
> ---
>
> Key: HIVE-26591
> URL: https://issues.apache.org/jira/browse/HIVE-26591
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Affects Versions: 1.2.2, 2.3.7, 2.3.9
>Reporter: Pratik Malani
>Priority: Critical
> Fix For: 3.1.3, 4.0.0
>
> Attachments: image-2022-10-03-19-51-20-052.png
>
>
> libthrift:0.13.0 is affected with 
> [CVE-2020-13949|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13949]
> Currently I am using Spark 3.3.0 and Hive 2.3.9 with Hadoop 3.3.4.
> When we do an upgrade to use libthrift:0.14.0 and above jar, below exception 
> is thrown while starting the Spark Thriftserver.
> {noformat}
> org.apache.hive.service.ServiceException: Failed to Start HiveServer2
>         at 
> org.apache.hive.service.CompositeService.start(CompositeService.java:79)
>         at 
> org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.thrift.server.TThreadPoolServer$Args.requestTimeout(I)Lorg/apache/thrift/server/TThreadPoolServer$Args;
>         at 
> org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.initializeServer(ThriftBinaryCLIService.java:101)
>         at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.start(ThriftCLIService.java:176)
>         at 
> org.apache.hive.service.CompositeService.start(CompositeService.java:69)
>         at 
> org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {noformat}
> After detailed investigation, found out that since 0.14.0 version, in class 
> org.apache.thrift.server.T

[jira] [Updated] (HIVE-26591) libthrift 0.14.0 onwards doesn't works with Hive (All versions)

2022-10-03 Thread Pratik Malani (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratik Malani updated HIVE-26591:
-
Description: 
libthrift:0.13.0 is affected with 
[CVE-2020-13949|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13949]

Currently I am using Spark 3.3.0 and Hive 2.3.9 with Hadoop 3.3.4.

When we do an upgrade to use libthrift:0.14.0 and above jar, below exception is 
thrown while starting the Spark Thriftserver.
{noformat}
org.apache.hive.service.ServiceException: Failed to Start HiveServer2
        at 
org.apache.hive.service.CompositeService.start(CompositeService.java:79)
        at 
org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoSuchMethodError: 
org.apache.thrift.server.TThreadPoolServer$Args.requestTimeout(I)Lorg/apache/thrift/server/TThreadPoolServer$Args;
        at 
org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.initializeServer(ThriftBinaryCLIService.java:101)
        at 
org.apache.hive.service.cli.thrift.ThriftCLIService.start(ThriftCLIService.java:176)
        at 
org.apache.hive.service.CompositeService.start(CompositeService.java:69)
        at 
org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104)
        at 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{noformat}
After detailed investigation, found out that since 0.14.0 version, in class 
org.apache.thrift.server.TThreadPoolServer, requestTimeout property has been 
removed.

 

Even in latest hive release (3.1.3), it is still referencing to the 
requestTimeout attribute has been removed.

[https://jar-download.com/artifacts/org.apache.hive/hive-service/3.1.3/source-code/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java]

!image-2022-10-03-19-51-20-052.png|width=769,height=98!

Can we have any alternative approach or any fix version for the above mentioned 
issue

  was:
libthrift:0.13.0 is affected with 
[CVE-2020-13949|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13949]

Currently I am using Spark 3.3.0 and Hive 2.3.9 with Hadoop 3.3.4.

When we do an upgrade to use libthrift:0.14.0 and above jar, below 

[jira] [Updated] (HIVE-26591) libthrift 0.14.0 onwards doesn't works with Hive (All versions)

2022-10-03 Thread Pratik Malani (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratik Malani updated HIVE-26591:
-
Fix Version/s: 4.0.0
   3.1.3

> libthrift 0.14.0 onwards doesn't works with Hive (All versions)
> ---
>
> Key: HIVE-26591
> URL: https://issues.apache.org/jira/browse/HIVE-26591
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Affects Versions: 1.2.2, 2.3.7, 2.3.9
>Reporter: Pratik Malani
>Priority: Critical
> Fix For: 3.1.3, 4.0.0
>
>
> libthrift:0.13.0 is affected with 
> [CVE-2020-13949|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13949]
> Currently I am using Spark 3.3.0 and Hive 2.3.9 with Hadoop 3.3.4.
> When we do an upgrade to use libthrift:0.14.0 and above jar, below exception 
> is thrown while starting the Spark Thriftserver.
> {noformat}
> org.apache.hive.service.ServiceException: Failed to Start HiveServer2
>         at 
> org.apache.hive.service.CompositeService.start(CompositeService.java:79)
>         at 
> org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.thrift.server.TThreadPoolServer$Args.requestTimeout(I)Lorg/apache/thrift/server/TThreadPoolServer$Args;
>         at 
> org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.initializeServer(ThriftBinaryCLIService.java:101)
>         at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.start(ThriftCLIService.java:176)
>         at 
> org.apache.hive.service.CompositeService.start(CompositeService.java:69)
>         at 
> org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.start(HiveThriftServer2.scala:154)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.startWithContext(HiveThriftServer2.scala:64)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:104)
>         at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {noformat}
> After detailed investigation, found out that since 0.14.0 version, in class 
> org.apache.thrift.server.TThreadPoolServer, requestTimeout property has been 
> remov

[jira] [Commented] (HIVE-26582) Cartesian join fails if the query has an empty table when cartesian product edge is used

2022-10-03 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612309#comment-17612309
 ] 

Krisztian Kasa commented on HIVE-26582:
---

[~zabetak] 
I don't think HIVE-26524 helps here because we don't know that one of the 
tables is empty at compile time.

Please see CBO plan of the query in the description
{code}
HiveProject(a=[$0])
  HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not 
available])
HiveProject(a=[$0])
  HiveUnion(all=[true])
HiveProject(a=[$0])
  HiveTableScan(table=[[default, tmp1]], table:alias=[tmp1])
HiveProject(a=[$0])
  HiveTableScan(table=[[default, tmp2]], table:alias=[tmp2])
HiveProject(a1=[CAST(3):INTEGER])
  HiveFilter(condition=[=($0, 3)])
HiveTableScan(table=[[default, c]], table:alias=[c])
{code}

> Cartesian join fails if the query has an empty table when cartesian product 
> edge is used
> 
>
> Key: HIVE-26582
> URL: https://issues.apache.org/jira/browse/HIVE-26582
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Reporter: Sourabh Badhya
>Priority: Major
>
> The following example fails when "hive.tez.cartesian-product.enabled" is true 
> - 
> Test command - 
> {code:java}
> mvn test -Dtest=TestMiniLlapCliDriver -Dqfile=file.q 
> -Dtest.output.overwrite=true {code}
> Query - file.q
> {code:java}
> set hive.tez.cartesian-product.enabled=true;
> create table c (a1 int) stored as orc;
> create table tmp1 (a int) stored as orc;
> create table tmp2 (a int) stored as orc;
> insert into table c values (3);
> insert into table tmp1 values (3);
> with
> first as (
> select a1 from c where a1 = 3
> ),
> second as (
> select a from tmp1
> union all
> select a from tmp2
> )
> select a from second cross join first; {code}
> The following stack trace is seen - 
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Number of items is 0. Should 
> be positive
>         at 
> org.apache.tez.common.Preconditions.checkArgument(Preconditions.java:38)
>         at org.apache.tez.runtime.library.utils.Grouper.init(Grouper.java:41)
>         at 
> org.apache.tez.runtime.library.cartesianproduct.FairCartesianProductEdgeManager.initialize(FairCartesianProductEdgeManager.java:66)
>         at 
> org.apache.tez.runtime.library.cartesianproduct.CartesianProductEdgeManager.initialize(CartesianProductEdgeManager.java:51)
>         at org.apache.tez.dag.app.dag.impl.Edge.initialize(Edge.java:213)
>         ... 22 more{code}
> The following error is seen because one of the tables (tmp2 in this case) has 
> 0 rows in it. 
> The query works fine when the config hive.tez.cartesian-product.enabled is 
> set to false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26552) PartitionConditionRemover doesn't remove constant filter with structs inside

2022-10-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hankó Gergely reassigned HIVE-26552:


Assignee: Hankó Gergely

> PartitionConditionRemover doesn't remove constant filter with structs inside
> 
>
> Key: HIVE-26552
> URL: https://issues.apache.org/jira/browse/HIVE-26552
> Project: Hive
>  Issue Type: Improvement
>Reporter: Hankó Gergely
>Assignee: Hankó Gergely
>Priority: Major
>
> Repro:
> {code:java}
> set hive.fetch.task.conversion=none;
> create table test (a string) partitioned by (y string, m string);
> insert into test values ('aa', 2022, 9);
> explain vectorization expression select * from test where 
> (y=year(date_sub('2022-09-11',4)) and m=month(date_sub('2022-09-11',4))) or 
> (y=year(date_sub('2022-09-11',10)) and m=month(date_sub('2022-09-11',10)) ); 
> {code}
> Actual:
> {code:java}
> (...)
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: SelectColumnIsTrue(col 5:boolean)(children: 
> VectorUDFAdaptor((const struct(2022.0D,9.0D)) IN (const struct(2022.0D,9.0D), 
> const struct(2022.0D,9.0D))) -> 5:boolean)
>   predicate: (const struct(2022.0D,9.0D)) IN (const struct(2022.0D,9.0D), 
> const struct(2022.0D,9.0D)) (type: boolean)
>   Statistics: Num rows: 1 Data size: 454 Basic stats: COMPLETE Column stats: 
> COMPLETE 
> (...){code}
> Expected:
> The filter operator should be optimized out similarly as it is removed in the 
> following query:
> {code:java}
> explain vectorization expression select * from test where 
> (y=year(date_sub('2022-09-11',4))) or (y=year(date_sub('2022-09-11',10))); 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-26552) PartitionConditionRemover doesn't remove constant filter with structs inside

2022-10-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26552 started by Hankó Gergely.

> PartitionConditionRemover doesn't remove constant filter with structs inside
> 
>
> Key: HIVE-26552
> URL: https://issues.apache.org/jira/browse/HIVE-26552
> Project: Hive
>  Issue Type: Improvement
>Reporter: Hankó Gergely
>Assignee: Hankó Gergely
>Priority: Major
>
> Repro:
> {code:java}
> set hive.fetch.task.conversion=none;
> create table test (a string) partitioned by (y string, m string);
> insert into test values ('aa', 2022, 9);
> explain vectorization expression select * from test where 
> (y=year(date_sub('2022-09-11',4)) and m=month(date_sub('2022-09-11',4))) or 
> (y=year(date_sub('2022-09-11',10)) and m=month(date_sub('2022-09-11',10)) ); 
> {code}
> Actual:
> {code:java}
> (...)
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: SelectColumnIsTrue(col 5:boolean)(children: 
> VectorUDFAdaptor((const struct(2022.0D,9.0D)) IN (const struct(2022.0D,9.0D), 
> const struct(2022.0D,9.0D))) -> 5:boolean)
>   predicate: (const struct(2022.0D,9.0D)) IN (const struct(2022.0D,9.0D), 
> const struct(2022.0D,9.0D)) (type: boolean)
>   Statistics: Num rows: 1 Data size: 454 Basic stats: COMPLETE Column stats: 
> COMPLETE 
> (...){code}
> Expected:
> The filter operator should be optimized out similarly as it is removed in the 
> following query:
> {code:java}
> explain vectorization expression select * from test where 
> (y=year(date_sub('2022-09-11',4))) or (y=year(date_sub('2022-09-11',10))); 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26524) Use Calcite to remove sections of a query plan known never produces rows

2022-10-03 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612291#comment-17612291
 ] 

Krisztian Kasa commented on HIVE-26524:
---

The [PR #3588|https://github.com/apache/hive/pull/3588] changes some existing q 
test:
* antijoin.q - The predicate {{b.value > 'val_1'}} was changed to {{b.value is 
null}} in query
{code}
explain select a.key from t1_n55 a left join t2_n33 b on a.key = b.key where 
b.key is null and b.value > 'val_1';
{code}
because the goal is testing anti join however the original predicate was 
evaluated to always false: when {{b.key is null}} is true then {{a.key = 
b.key}} is false and the values coming from the right are nulls so {{b.value > 
'val_1'}} can not be true.
* some auto_join tests - joins which has always false conditions like 
{code}
ON src1.key = src2.key AND src1.key < 10 AND src2.key > 10
{code}
are removed due to the optimization but the goal of the tests are testing auto 
join conversion so the condition was changed to filter out the majority of the 
rows but not all of them.


> Use Calcite to remove sections of a query plan known never produces rows
> 
>
> Key: HIVE-26524
> URL: https://issues.apache.org/jira/browse/HIVE-26524
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Calcite has a set of rules to remove sections of a query plan known never 
> produces any rows. In some cases the whole plan can be removed. Such plans 
> are represented with a single {{Values}} operators with no tuples. ex.:
> {code:java}
> select y + 1 from (select a1 y, b1 z from t1 where b1 > 10) q WHERE 1=0
> {code}
> {code:java}
> HiveValues(tuples=[[]])
> {code}
> Other cases when plan has outer join or set operators some branches can be 
> replaced with empty values moving forward in some cases the join/set operator 
> can be removed
> {code:java}
> select a2, b2 from t2 where 1=0
> union
> select a1, b1 from t1
> {code}
> {code:java}
> HiveAggregate(group=[{0, 1}])
>   HiveTableScan(table=[[default, t1]], table:alias=[t1])
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26581) Test failing on aarch64

2022-10-03 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612256#comment-17612256
 ] 

Stamatis Zampetakis commented on HIVE-26581:


We are not really testing with aarch64 so there may be many broken tests. If 
someone though wants to invest in fixing the failures it would be much 
appreciated.

> Test failing on aarch64
> ---
>
> Key: HIVE-26581
> URL: https://issues.apache.org/jira/browse/HIVE-26581
> Project: Hive
>  Issue Type: Bug
>Reporter: odidev
>Priority: Major
>
> Hi Team, 
> I tried to build and test the Apache hive repository on an aarch64 machine 
> but when I run *mvn clean install* it is giving me the following error:
> {code:java}
> [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.265 
> s <<< FAILURE! - in 
> org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskCommunicator
> [ERROR] 
> org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskCommunicator.testFinishableStateUpdateFailure
>   Time elapsed: 2.206 s  <<< ERROR!
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SignableVertexSpec$Builder.setUser(LlapDaemonProtocolProtos.java:5513)
> at 
> org.apache.hadoop.hive.llap.tez.Converters.constructSignableVertexSpec(Converters.java:135)
> at 
> org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator.constructSubmitWorkRequest(LlapTaskCommunicator.java:912)
> at 
> org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator.registerRunningTaskAttempt(LlapTaskCommunicator.java:512)
> at 
> org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskCommunicator$LlapTaskCommunicatorWrapperForTest.registerRunningTaskAttemptWithSourceVertex(TestLlapTaskCommunicator.java:335)
> at 
> org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskCommunicator.testFinishableStateUpdateFailure(TestLlapTaskCommunicator.java:141)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:750)
> [INFO]
> [INFO] Results:
> [INFO]
> [ERROR] Errors:
> [ERROR]   TestLlapTaskCommunicator.testFinishableStateUpdateFailure:141 ? 
> NullPointer
> [INFO]
> [ERROR] Tests run: 53, Failures: 0, Errors: 1, Skipped: 2
> {code}
> When I tried to run *mvn clean install –DskipTests* the installation was 
> successful but for testing when I ran *mvn test* it is giving me the 
> above-mentioned error. The error is the same on the amd64 platform also. 
> Can anyone suggest any pointers on the above error?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26582) Cartesian join fails if the query has an empty table when cartesian product edge is used

2022-10-03 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612255#comment-17612255
 ] 

Stamatis Zampetakis commented on HIVE-26582:


Possibly HIVE-26524 may be able to help with this problem.

[~kkasa] Can we cut branches from cartesian products using HIVE-26524? If not 
then maybe we can log a ticket and follow-up later on.

> Cartesian join fails if the query has an empty table when cartesian product 
> edge is used
> 
>
> Key: HIVE-26582
> URL: https://issues.apache.org/jira/browse/HIVE-26582
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Reporter: Sourabh Badhya
>Priority: Major
>
> The following example fails when "hive.tez.cartesian-product.enabled" is true 
> - 
> Test command - 
> {code:java}
> mvn test -Dtest=TestMiniLlapCliDriver -Dqfile=file.q 
> -Dtest.output.overwrite=true {code}
> Query - file.q
> {code:java}
> set hive.tez.cartesian-product.enabled=true;
> create table c (a1 int) stored as orc;
> create table tmp1 (a int) stored as orc;
> create table tmp2 (a int) stored as orc;
> insert into table c values (3);
> insert into table tmp1 values (3);
> with
> first as (
> select a1 from c where a1 = 3
> ),
> second as (
> select a from tmp1
> union all
> select a from tmp2
> )
> select a from second cross join first; {code}
> The following stack trace is seen - 
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Number of items is 0. Should 
> be positive
>         at 
> org.apache.tez.common.Preconditions.checkArgument(Preconditions.java:38)
>         at org.apache.tez.runtime.library.utils.Grouper.init(Grouper.java:41)
>         at 
> org.apache.tez.runtime.library.cartesianproduct.FairCartesianProductEdgeManager.initialize(FairCartesianProductEdgeManager.java:66)
>         at 
> org.apache.tez.runtime.library.cartesianproduct.CartesianProductEdgeManager.initialize(CartesianProductEdgeManager.java:51)
>         at org.apache.tez.dag.app.dag.impl.Edge.initialize(Edge.java:213)
>         ... 22 more{code}
> The following error is seen because one of the tables (tmp2 in this case) has 
> 0 rows in it. 
> The query works fine when the config hive.tez.cartesian-product.enabled is 
> set to false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26584) compressed_skip_header_footer_aggr.q is flaky

2022-10-03 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612252#comment-17612252
 ] 

Stamatis Zampetakis commented on HIVE-26584:


Thanks for tracking this down [~jfs]. Can you elaborate a bit more on what do 
you mean concurrently? 

I don't think concurrent test execution is supported so I am trying to 
understand why we bump into this problem. In Hive CI, the tests are somewhat 
running concurrently but this is done in different containers so normally there 
shouldn't be any interference. 

> compressed_skip_header_footer_aggr.q is flaky
> -
>
> Key: HIVE-26584
> URL: https://issues.apache.org/jira/browse/HIVE-26584
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0-alpha-2
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> One of my PRs compressed_skip_header_footer_aggr.q  was failing with 
> unexpected diff. Such as:
> {code:java}
>  TestMiniLlapLocalCliDriver.testCliDriver:62 Client Execution succeeded but 
> contained differences (error code = 1) after executing 
> compressed_skip_header_footer_aggr.q
> 69,71c69,70
> < 1 2019-12-31
> < 2 2018-12-31
> < 3 2017-12-31
> ---
> > 2 2019-12-31
> > 3 2019-12-31
> 89d87
> < NULL  NULL
> 91c89
> < 2 2018-12-31
> ---
> > 2 2019-12-31
> 100c98
> < 1
> ---
> > 2
> 109c107
> < 1 2019-12-31
> ---
> > 2 2019-12-31
> 127,128c125,126
> < 1 2019-12-31
> < 3 2017-12-31
> ---
> > 2 2019-12-31
> > 3 2019-12-31
> 146a145
> > 2 2019-12-31
> 155c154
> < 1
> ---
> > 2 {code}
> Investigating it, it did not seem to fail when executed locally. Since I 
> suspected test interference I searched for the tablenames/directories used 
> and discovered empty_skip_header_footer_aggr.q which uses the same table 
> names AND external directories.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26320) Incorrect results for IN UDF on Parquet column of CHAR/VARCHAR type

2022-10-03 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-26320.

Fix Version/s: 4.0.0-alpha-2
   Resolution: Fixed

Fixed in 
https://github.com/apache/hive/commit/70562437d369c2f4ab3e879bae519f81d386da3b. 
Thanks for the PR [~jfs] and [~amansinha] [~kkasa] [~asolimando] [~mdayakar] 
for the reviews! Also many thanks [~chiran54321] for reporting the issue and 
helping out with the initial investigation.

> Incorrect results for IN UDF on Parquet column of CHAR/VARCHAR type
> ---
>
> Key: HIVE-26320
> URL: https://issues.apache.org/jira/browse/HIVE-26320
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Planning
>Affects Versions: 4.0.0-alpha-1
>Reporter: Chiran Ravani
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Query involving case statement with two or more conditions leads to incorrect 
> result for tables with parquet format, The problem is not observed with ORC 
> or TextFile.
> *Steps to reproduce*:
> {code:java}
> create external table case_test_parquet(kob varchar(2),enhanced_type_code 
> int) stored as parquet;
> insert into case_test_parquet values('BB',18),('BC',18),('AB',18);
> select case when (
>(kob='BB' and enhanced_type_code='18')
>or (kob='BC' and enhanced_type_code='18')
>  )
> then 1
> else 0
> end as logic_check
> from case_test_parquet;
> {code}
> Result:
> {code}
> 0
> 0
> 0
> {code}
> Expected result:
> {code}
> 1
> 1
> 0
> {code}
> The problem does not appear when setting hive.optimize.point.lookup=false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26320) Incorrect results for IN UDF on Parquet column of CHAR/VARCHAR type

2022-10-03 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-26320:
---
Summary: Incorrect results for IN UDF on Parquet column of CHAR/VARCHAR 
type  (was: Incorrect case evaluation for Parquet based table)

> Incorrect results for IN UDF on Parquet column of CHAR/VARCHAR type
> ---
>
> Key: HIVE-26320
> URL: https://issues.apache.org/jira/browse/HIVE-26320
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Planning
>Affects Versions: 4.0.0-alpha-1
>Reporter: Chiran Ravani
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Query involving case statement with two or more conditions leads to incorrect 
> result for tables with parquet format, The problem is not observed with ORC 
> or TextFile.
> *Steps to reproduce*:
> {code:java}
> create external table case_test_parquet(kob varchar(2),enhanced_type_code 
> int) stored as parquet;
> insert into case_test_parquet values('BB',18),('BC',18),('AB',18);
> select case when (
>(kob='BB' and enhanced_type_code='18')
>or (kob='BC' and enhanced_type_code='18')
>  )
> then 1
> else 0
> end as logic_check
> from case_test_parquet;
> {code}
> Result:
> {code}
> 0
> 0
> 0
> {code}
> Expected result:
> {code}
> 1
> 1
> 0
> {code}
> The problem does not appear when setting hive.optimize.point.lookup=false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26579) Prepare for Hadoop and Zookeeper switching to Reload4j

2022-10-03 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-26579:

Summary: Prepare for Hadoop and Zookeeper switching to Reload4j  (was: 
Prepare for Hadoop switching to Reload4j)

> Prepare for Hadoop and Zookeeper switching to Reload4j
> --
>
> Key: HIVE-26579
> URL: https://issues.apache.org/jira/browse/HIVE-26579
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Hadoop moved from Log4j1 to Reload4j (HADOOP-18088). The goal of this task is 
> to prepare Hive for that change:
>  * Hive build fails with current {{useStrictFiltering=true}} setting in some 
> assemblies, due to excluded dependency (log4j) not really being present.
>  * Exclude {{ch.qos.reload4j:\*}} in addition to current {{log4j:\*}} to 
> avoid polluting the assemblies and shaded jars.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)