[Impala-ASF-CR] IMPALA-11845: Fix incorrect check of struct STAR path in resolvePathWithMasking

2023-01-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19429 )

Change subject: IMPALA-11845: Fix incorrect check of struct STAR path in 
resolvePathWithMasking
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12209/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8f1e78e325baafbe23101909d47e82bf140a2d77
Gerrit-Change-Number: 19429
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 20 Jan 2023 02:53:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11626: Handle COMMIT COMPACTION EVENT from HMS

2023-01-19 Thread Sai Hemanth Gantasala (Code Review)
Sai Hemanth Gantasala has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19155 )

Change subject: IMPALA-11626: Handle COMMIT_COMPACTION_EVENT from HMS
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19155/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19155/6//COMMIT_MSG@7
PS6, Line 7: Handle
> Nit: Handle
Ack



--
To view, visit http://gerrit.cloudera.org:8080/19155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I464faedb4e3bbcd417bab2e3cb0d57e339d42605
Gerrit-Change-Number: 19155
Gerrit-PatchSet: 7
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Fri, 20 Jan 2023 02:43:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11626: Handle COMMIT COMPACTION EVENT from HMS

2023-01-19 Thread Sai Hemanth Gantasala (Code Review)
Hello Quanlong Huang, Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19155

to look at the new patch set (#7).

Change subject: IMPALA-11626: Handle COMMIT_COMPACTION_EVENT from HMS
..

IMPALA-11626: Handle COMMIT_COMPACTION_EVENT from HMS

Since HIVE-24329 HMS emits an event when a compaction is committed,
but Impala ignores it. Handling it would allow automatic refreshing
of file metadata after commit compactions.

Change-Id: I464faedb4e3bbcd417bab2e3cb0d57e339d42605
---
M fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
4 files changed, 107 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/19155/7
--
To view, visit http://gerrit.cloudera.org:8080/19155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I464faedb4e3bbcd417bab2e3cb0d57e339d42605
Gerrit-Change-Number: 19155
Gerrit-PatchSet: 7
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 


[Impala-ASF-CR] IMPALA-11845: Fix incorrect check of struct STAR path in resolvePathWithMasking

2023-01-19 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19429 )

Change subject: IMPALA-11845: Fix incorrect check of struct STAR path in 
resolvePathWithMasking
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19429/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19429/3//COMMIT_MSG@40
PS3, Line 40: tri
> Nit: tries
Done


http://gerrit.cloudera.org:8080/#/c/19429/3//COMMIT_MSG@50
PS3, Line 50: rooted tuple d
> Could you explain why the first condition is true for tables/views? Is it b
Yeah, needs the second point to better explain this. When it's a table/view 
STAR expansion, destType() of the STAR path is the type of the rooted tuple 
descriptor:
https://github.com/apache/impala/blob/9baf790606073d88c3a2fd431110812140df0cb7/fe/src/main/java/org/apache/impala/analysis/Path.java#L354

The type of a tuple descriptor is always a StructType:
https://github.com/apache/impala/blob/9baf790606073d88c3a2fd431110812140df0cb7/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java#L88



--
To view, visit http://gerrit.cloudera.org:8080/19429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8f1e78e325baafbe23101909d47e82bf140a2d77
Gerrit-Change-Number: 19429
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 20 Jan 2023 02:32:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11845: Fix incorrect check of struct STAR path in resolvePathWithMasking

2023-01-19 Thread Quanlong Huang (Code Review)
Hello Fang-Yu Rao, Daniel Becker, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19429

to look at the new patch set (#4).

Change subject: IMPALA-11845: Fix incorrect check of struct STAR path in 
resolvePathWithMasking
..

IMPALA-11845: Fix incorrect check of struct STAR path in resolvePathWithMasking

resolvePathWithMasking() is a wrapper on resolvePath() to further
resolve nested columns inside the table masking view. When it was
added, complex types in the select list hadn't been supported yet. So
the table masking view can't expose complex type columns directly in the
select list. Any paths in nested types will be further resolved inside
the table masking view in resolvePathWithMasking().

Take the following query as an example:
  select id, nested_struct.* from complextypestbl;
If Ranger column-masking/row-filter policies applied on the table, the
query is rewritten as
  select id, nested_struct.* from (
select mask(id) from complextypestbl
where row-filtering-condition
  ) t;
Table masking view "t" can't expose the nested column "nested_struct".
So we further resolve "nested_struct" inside the inlineView to use the
masked table "complextypestbl". The underlying TableRef is expected to
be a BaseTableRef.

Paths that don't reference nested columns should be resolved and
returned directly (just like the original resolvePath() does). E.g.
  select v.* from masked_view v
is rewritten to
  select v.* from (
select mask(c1), mask(c2), ..., mask(cn)
from masked_view
where row-filtering-condition
  ) v;

The STAR path "v.*" should be resolved directly. However, it's treated as a
nested column unexpectedly. The code then tries to resolve it inside the
table "masked_view" and found "masked_view" is not a table so throws the
IllegalStateException.

These are the current conditions for identifying nested STAR paths:
 - The destType is STRUCT
 - And the resolved path is rooted at a valid tuple descriptor

They don't really recognize the nested struct columns because STAR paths
on table/view also match these conditions. When the STAR paths is an
expansion on a catalog table/view, the rooted tuple descriptor is
exactly the output tuple of the table/view. The destType is the type of
the tuple descriptor which is always a StructType.

Note that STAR paths on other nested types, i.e. array/map, are invalid.
So the first condition matches for all valid cases. The second condition
also matches all valid cases since both the table/view or struct STAR
expansion have the path rooted at a valid tuple descriptor.

This patch fixes the check for nested struct STAR path by checking
the matched types instead. Note that if "v.*" is a table/view expansion,
the matched type list is empty. If "v.*" is a struct column expansion,
the matched type list contains the STRUCT column type.

Tests:
 - Add missing coverage on STAR paths (v.*) on masked views.

Change-Id: I8f1e78e325baafbe23101909d47e82bf140a2d77
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test
4 files changed, 67 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/19429/4
--
To view, visit http://gerrit.cloudera.org:8080/19429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8f1e78e325baafbe23101909d47e82bf140a2d77
Gerrit-Change-Number: 19429
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs tables

2023-01-19 Thread Andrew Sherman (Code Review)
Andrew Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19397 )

Change subject: IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external 
Hdfs tables
..


Patch Set 6:

(3 comments)

This seems a very useful change. I have a few general comments/questions.

http://gerrit.cloudera.org:8080/#/c/19397/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19397/6//COMMIT_MSG@7
PS6, Line 7: IMPALA-11013 (part 1): Support 'MIGRATE TABLE' for external Hdfs 
tables
I know what you mean by an "Hdfs table" but maybe "Hive table" is clearer. Our 
new table will still be in hdfs, it will just have a different table format.


http://gerrit.cloudera.org:8080/#/c/19397/6//COMMIT_MSG@14
PS6, Line 14: tables.
Is it it true to say "the data files themselves are not changed during this 
migration". If so it would be nice to state this explicitly.


http://gerrit.cloudera.org:8080/#/c/19397/6//COMMIT_MSG@35
PS6, Line 35:  - Child query 4: Drop the temporary Hdfs table.
> What happens if there is an error at any step? It would be nice if we could
I read that in Spark "When you migrate a Hive table to Iceberg, a backup of the 
table, named backup, is created." That could be a nice feature 
which might be easy to implement. Of course this could be deferred to future 
work.



--
To view, visit http://gerrit.cloudera.org:8080/19397
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91e6a9cfe099c263f17b5506d6db459b79ad31a5
Gerrit-Change-Number: 19397
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 20 Jan 2023 02:26:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11850 Adds HTTP tracing headers when using the hs2-http protocol.

2023-01-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19428 )

Change subject: IMPALA-11850 Adds HTTP tracing headers when using the hs2-http 
protocol.
..


Patch Set 5:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/12208/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/19428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7857eb5ec03eba32e06ec8d4133480f2e958ad2f
Gerrit-Change-Number: 19428
Gerrit-PatchSet: 5
Gerrit-Owner: Jason Fehr 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 20 Jan 2023 00:24:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11850 Adds HTTP tracing headers when using the hs2-http protocol.

2023-01-19 Thread Jason Fehr (Code Review)
Jason Fehr has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/19428 )

Change subject: IMPALA-11850 Adds HTTP tracing headers when using the hs2-http 
protocol.
..

IMPALA-11850 Adds HTTP tracing headers when using the hs2-http protocol.

When using the hs2 protocol with the http transport, include several
tracing http headers by default.  These headers are:

  * X-Request-Id-- client defined string that identifies the
   http request, this string is meaningful only
   to the client
  * X-Impala-Session-Id -- session id generated by the Impala backend,
   will be omitted on http calls that occur
   before this id has been generated
  * X-Impala-Query-Id   -- query id generated by the Impala backend,
   will be omitted on http calls that occur
   before this id has been generated

The Impala shell includes these flags by default.  Command line
arguments have been added to remove these headers.

The Impala backend logs out these headers if they are on the http
request.

Testing:
  - manual testing (verified using debugging proxy and impala logs)
  - new python test

Change-Id: I7857eb5ec03eba32e06ec8d4133480f2e958ad2f
---
M be/src/transport/THttpServer.cpp
M be/src/transport/THttpServer.h
M shell/ImpalaHttpClient.py
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/option_parser.py
M tests/common/test_dimensions.py
M tests/shell/test_shell_commandline.py
9 files changed, 265 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/19428/5
--
To view, visit http://gerrit.cloudera.org:8080/19428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7857eb5ec03eba32e06ec8d4133480f2e958ad2f
Gerrit-Change-Number: 19428
Gerrit-PatchSet: 5
Gerrit-Owner: Jason Fehr 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-11745: Add Hive's ESRI geospatial functions as builtins

2023-01-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19425 )

Change subject: IMPALA-11745: Add Hive's ESRI geospatial functions as builtins
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12207/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225
Gerrit-Change-Number: 19425
Gerrit-PatchSet: 13
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 19 Jan 2023 21:23:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11745: Add Hive's ESRI geospatial functions as builtins

2023-01-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19425 )

Change subject: IMPALA-11745: Add Hive's ESRI geospatial functions as builtins
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12206/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225
Gerrit-Change-Number: 19425
Gerrit-PatchSet: 12
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 19 Jan 2023 21:20:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11745: Add Hive's ESRI geospatial functions as builtins

2023-01-19 Thread Peter Rozsa (Code Review)
Peter Rozsa has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/19425 )

Change subject: IMPALA-11745: Add Hive's ESRI geospatial functions as builtins
..

IMPALA-11745: Add Hive's ESRI geospatial functions as builtins

This change adds geospatial functions from Hive's ESRI library
as builtin UDFs. Plain Hive UDFs are imported without changes,
but the generic and varargs functions are handled differently;
generic functions are added with all of the combinations of
their parameters (cartesian product of the parameters), and
varargs functions are unfolded as an nth parameter simple
function. The varargs function wrappers are generated at build
time and they can be configured in
gen_geospatial_udf_wrappers.py. These additional steps are
required because of the limitations in Impala's UDF Executor
which could be further improved; in this case, the additional
wrapping/mapping steps could be removed.

Changes regarding function handling/creating are sourced from
https://gerrit.cloudera.org/c/19177

A new backend flag was added to turn this feature on/off
as "geospatial_library". The default value is "NONE" which
means no geospatial function gets registered
as builtin, "HIVE_ESRI" value enables this implementation.

Known limitations:
 - ST_MultiLineString, ST_MultiPolygon only works
   with the WKT overload
 - ST_Polygon supports a maximum of 6 pairs of coordinates
 - ST_MultiPoint, ST_LineStrin supports a maximum of 7
   pairs of coordinates
 - ST_ConvexHull, ST_Union supports a maximum of 6 geoms

These limits can be increased in gen_geospatial_udf_wrappers.py
and HiveEsriGeospatialBuiltins.java

Tests:
 - test_geospatial_udfs.py added based on
   https://github.com/Esri/spatial-framework-for-hadoop

Co-Authored-by: Csaba Ringhofer 

Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225
---
M be/src/common/global-flags.cc
M be/src/exprs/hive-udf-call.cc
M be/src/util/backend-gflag-util.cc
M bin/start-impala-cluster.py
M common/function-registry/CMakeLists.txt
A common/function-registry/gen_geospatial_udf_wrappers.py
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
A fe/src/main/java/org/apache/impala/catalog/HiveEsriGeospatialBuiltins.java
M fe/src/main/java/org/apache/impala/catalog/ScalarFunction.java
A 
fe/src/main/java/org/apache/impala/hive/executor/BinaryToBinaryHiveLegacyFunctionExtractor.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunction.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunctionFactory.java
M 
fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunctionFactoryImpl.java
A 
fe/src/main/java/org/apache/impala/hive/executor/HiveLegacyFunctionExtractor.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveLegacyJavaFunction.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaDoubleWritable.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M 
fe/src/test/java/org/apache/impala/hive/executor/TestHiveJavaFunctionFactory.java
M java/CMakeLists.txt
M java/shaded-deps/hive-exec/pom.xml
M testdata/datasets/README
A testdata/workloads/functional-query/queries/QueryTest/udf-esri-geospatial.test
A tests/custom_cluster/test_geospatial_udfs.py
28 files changed, 3,527 insertions(+), 139 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/19425/13
--
To view, visit http://gerrit.cloudera.org:8080/19425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225
Gerrit-Change-Number: 19425
Gerrit-PatchSet: 13
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-11745: Add Hive's ESRI geospatial functions as builtins

2023-01-19 Thread Peter Rozsa (Code Review)
Peter Rozsa has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/19425 )

Change subject: IMPALA-11745: Add Hive's ESRI geospatial functions as builtins
..

IMPALA-11745: Add Hive's ESRI geospatial functions as builtins

This change adds geospatial functions from Hive's ESRI library
as builtin UDFs. Plain Hive UDFs are imported without changes,
but the generic and varargs functions are handled differently;
generic functions are added with all of the combinations of
their parameters (cartesian product of the parameters), and
varargs functions are unfolded as an nth parameter simple
function. The varargs function wrappers are generated at build
time and they can be configured in
gen_geospatial_udf_wrappers.py. These additional steps are
required because of the limitations in Impala's UDF Executor
which could be further improved; in this case, the additional
wrapping/mapping steps could be removed.

Changes regarding function handling/creating are sourced from
https://gerrit.cloudera.org/c/19177

A new backend flag was added to turn this feature on/off
as "geospatial_library". The default value is "NONE" which
means no geospatial function gets registered
as builtin, "HIVE_ESRI" value enables this implementation.

Known limitations:
 - ST_MultiLineString, ST_MultiPolygon only works
   with the WKT overload
 - ST_Polygon supports a maximum of 6 pairs of coordinates
 - ST_MultiPoint, ST_LineStrin supports a maximum of 7
   pairs of coordinates
 - ST_ConvexHull, ST_Union supports a maximum of 6 geoms

These limits can be increased in gen_geospatial_udf_wrappers.py
and HiveEsriGeospatialBuiltins.java

Tests:
 - test_geospatial_udfs.py added based on
   https://github.com/Esri/spatial-framework-for-hadoop

Co-Authored-by: Csaba Ringhofer 

Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225
---
M be/src/common/global-flags.cc
M be/src/exprs/hive-udf-call.cc
M be/src/util/backend-gflag-util.cc
M bin/start-impala-cluster.py
M common/function-registry/CMakeLists.txt
A common/function-registry/gen_geospatial_udf_wrappers.py
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
A fe/src/main/java/org/apache/impala/catalog/HiveEsriGeospatialBuiltins.java
M fe/src/main/java/org/apache/impala/catalog/ScalarFunction.java
A 
fe/src/main/java/org/apache/impala/hive/executor/BinaryToBinaryHiveLegacyFunctionExtractor.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunction.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunctionFactory.java
M 
fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunctionFactoryImpl.java
A 
fe/src/main/java/org/apache/impala/hive/executor/HiveLegacyFunctionExtractor.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveLegacyJavaFunction.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaDoubleWritable.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M 
fe/src/test/java/org/apache/impala/hive/executor/TestHiveJavaFunctionFactory.java
M java/CMakeLists.txt
M java/shaded-deps/hive-exec/pom.xml
M testdata/datasets/README
A testdata/workloads/functional-query/queries/QueryTest/udf-esri-geospatial.test
A tests/custom_cluster/test_geospatial_udfs.py
28 files changed, 3,527 insertions(+), 139 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/19425/12
--
To view, visit http://gerrit.cloudera.org:8080/19425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225
Gerrit-Change-Number: 19425
Gerrit-PatchSet: 12
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-11745: Add Hive's ESRI geospatial functions as builtins

2023-01-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19425 )

Change subject: IMPALA-11745: Add Hive's ESRI geospatial functions as builtins
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19425/12/tests/custom_cluster/test_geospatial_udfs.py
File tests/custom_cluster/test_geospatial_udfs.py:

http://gerrit.cloudera.org:8080/#/c/19425/12/tests/custom_cluster/test_geospatial_udfs.py@26
PS12, Line 26: class TestGeospatialUdfs(CustomClusterTestSuite):
flake8: E302 expected 2 blank lines, found 1



--
To view, visit http://gerrit.cloudera.org:8080/19425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225
Gerrit-Change-Number: 19425
Gerrit-PatchSet: 12
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 19 Jan 2023 21:01:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11850 Adds HTTP tracing headers when using the hs2-http protocol.

2023-01-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19428 )

Change subject: IMPALA-11850 Adds HTTP tracing headers when using the hs2-http 
protocol.
..


Patch Set 4:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/12205/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/19428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7857eb5ec03eba32e06ec8d4133480f2e958ad2f
Gerrit-Change-Number: 19428
Gerrit-PatchSet: 4
Gerrit-Owner: Jason Fehr 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 19 Jan 2023 20:42:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11850 Adds HTTP tracing headers when using the hs2-http protocol.

2023-01-19 Thread Jason Fehr (Code Review)
Jason Fehr has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/19428 )

Change subject: IMPALA-11850 Adds HTTP tracing headers when using the hs2-http 
protocol.
..

IMPALA-11850 Adds HTTP tracing headers when using the hs2-http protocol.

When using the hs2 protocol with the http transport, include several
tracing http headers by default.  These headers are:

  * X-Request-Id-- client defined string that identifies the
   http request, this string is meaningful only
   to the client
  * X-Impala-Session-Id -- session id generated by the Impala backend,
   will be omitted on http calls that occur
   before this id has been generated
  * X-Impala-Query-Id   -- query id generated by the Impala backend,
   will be omitted on http calls that occur
   before this id has been generated

The Impala shell includes these flags by default.  Command line
arguments have been added to remove these headers.

The Impala backend logs out these headers if they are on the http
request.

Testing:
  - manual testing (verified using debugging proxy and impala logs)
  - new python test

Change-Id: I7857eb5ec03eba32e06ec8d4133480f2e958ad2f
---
M be/src/transport/THttpServer.cpp
M be/src/transport/THttpServer.h
M shell/ImpalaHttpClient.py
M shell/impala_client.py
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
M shell/option_parser.py
M tests/common/test_dimensions.py
M tests/shell/test_shell_commandline.py
9 files changed, 265 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/19428/4
--
To view, visit http://gerrit.cloudera.org:8080/19428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7857eb5ec03eba32e06ec8d4133480f2e958ad2f
Gerrit-Change-Number: 19428
Gerrit-PatchSet: 4
Gerrit-Owner: Jason Fehr 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-11807: Rewrite iceberg metadata if not on hdfs

2023-01-19 Thread Noemi Pap-Takacs (Code Review)
Noemi Pap-Takacs has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19432 )

Change subject: IMPALA-11807: Rewrite iceberg metadata if not on hdfs
..


Patch Set 1: Code-Review+1

Thanks for the fix!


--
To view, visit http://gerrit.cloudera.org:8080/19432
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic04c5abdd42cb0c1cf5abd310b06c39cf8cd64ba
Gerrit-Change-Number: 19432
Gerrit-PatchSet: 1
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Comment-Date: Thu, 19 Jan 2023 17:25:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11807: Rewrite iceberg metadata if not on hdfs

2023-01-19 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19432 )

Change subject: IMPALA-11807: Rewrite iceberg metadata if not on hdfs
..


Patch Set 1: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19432/1/testdata/bin/load-test-warehouse-snapshot.sh
File testdata/bin/load-test-warehouse-snapshot.sh:

http://gerrit.cloudera.org:8080/#/c/19432/1/testdata/bin/load-test-warehouse-snapshot.sh@120
PS1, Line 120:   ${IMPALA_HOME}/testdata/bin/rewrite-iceberg-metadata.py 
"${WAREHOUSE_LOCATION_PREFIX}" \
So this removes the authority portion so we just have a path, even if 
WAREHOUSE_LOCATION_PREFIX is empty? That lets it work in S3. Makes sense to me.


http://gerrit.cloudera.org:8080/#/c/19432/1/testdata/bin/load-test-warehouse-snapshot.sh@121
PS1, Line 121:   $(find 
${SNAPSHOT_STAGING_DIR}${TEST_WAREHOUSE_DIR}/iceberg_test -name "metadata")
For future updates, I think it would be safe to search all files instead of 
just in iceberg_test.



--
To view, visit http://gerrit.cloudera.org:8080/19432
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic04c5abdd42cb0c1cf5abd310b06c39cf8cd64ba
Gerrit-Change-Number: 19432
Gerrit-PatchSet: 1
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Comment-Date: Thu, 19 Jan 2023 16:59:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11850 Adds HTTP tracing headers when using the hs2-http protocol.

2023-01-19 Thread Andrew Sherman (Code Review)
Andrew Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19428 )

Change subject: IMPALA-11850 Adds HTTP tracing headers when using the hs2-http 
protocol.
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19428/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19428/3//COMMIT_MSG@9
PS3, Line 9: When using the hs2 protocol with the http transport, include 
several tracing http
> The commit message should be wrapped at 732 characters.
I mean 72



--
To view, visit http://gerrit.cloudera.org:8080/19428
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7857eb5ec03eba32e06ec8d4133480f2e958ad2f
Gerrit-Change-Number: 19428
Gerrit-PatchSet: 3
Gerrit-Owner: Jason Fehr 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 19 Jan 2023 16:48:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11807: Rewrite iceberg metadata if not on hdfs

2023-01-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19432 )

Change subject: IMPALA-11807: Rewrite iceberg metadata if not on hdfs
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12204/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19432
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic04c5abdd42cb0c1cf5abd310b06c39cf8cd64ba
Gerrit-Change-Number: 19432
Gerrit-PatchSet: 1
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Noemi Pap-Takacs 
Gerrit-Comment-Date: Thu, 19 Jan 2023 16:39:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11807: Rewrite iceberg metadata if not on hdfs

2023-01-19 Thread Code Review
Gergely Fürnstáhl has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/19432


Change subject: IMPALA-11807: Rewrite iceberg metadata if not on hdfs
..

IMPALA-11807: Rewrite iceberg metadata if not on hdfs

Iceberg test tables are usually written on hdfs and the file paths start
with "hdfs://localhost:20500/test-warehouse".

Earlier we manually transformed the metadata so paths would start with
"/test-warehouse"

Since IMPALA-11821, testdata/bin/rewrite-iceberg-metadata.py supports
not only a custom WAREHOUSE_LOCATION_PREFIX, but the ability to trim the
beginning of the file paths.

This commit modifies the data load, so metadata rewrite always executes
if not on hdfs, even with empty WAREHOUSE_LOCATION_PREFIX.

Testing:
  - Ran iceberg tests on ozone and S3

Change-Id: Ic04c5abdd42cb0c1cf5abd310b06c39cf8cd64ba
---
M testdata/bin/load-test-warehouse-snapshot.sh
1 file changed, 4 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19432/1
--
To view, visit http://gerrit.cloudera.org:8080/19432
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic04c5abdd42cb0c1cf5abd310b06c39cf8cd64ba
Gerrit-Change-Number: 19432
Gerrit-PatchSet: 1
Gerrit-Owner: Gergely Fürnstáhl 


[Impala-ASF-CR] IMPALA-11658: Implement Iceberg manifest caching config for Impala

2023-01-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/19423 )

Change subject: IMPALA-11658: Implement Iceberg manifest caching config for 
Impala
..

IMPALA-11658: Implement Iceberg manifest caching config for Impala

Impala needs to supply Iceberg's catalog properties to enable manifest
caching feature. This commit implements the necessary config reading.
Iceberg related config is read from hadoop-conf.xml and supplied as a
Map in catalog instantiation.

Additionally, this patch also replace deprecated RuntimeIOException with
its superclass, UncheckedIOException.

Testing:
- Pass core tests.
- Checked that manifest caching works through debug logging.

Change-Id: I5a60a700d2ae6302dfe395d1ef602e6b1d821888
Reviewed-on: http://gerrit.cloudera.org:8080/19423
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalog.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopCatalog.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopTables.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
7 files changed, 65 insertions(+), 23 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/19423
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I5a60a700d2ae6302dfe395d1ef602e6b1d821888
Gerrit-Change-Number: 19423
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yida Wu 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-11658: Implement Iceberg manifest caching config for Impala

2023-01-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19423 )

Change subject: IMPALA-11658: Implement Iceberg manifest caching config for 
Impala
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/19423
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5a60a700d2ae6302dfe395d1ef602e6b1d821888
Gerrit-Change-Number: 19423
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yida Wu 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 19 Jan 2023 16:03:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11845: Fix incorrect check of struct STAR path in resolvePathWithMasking

2023-01-19 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19429 )

Change subject: IMPALA-11845: Fix incorrect check of struct STAR path in 
resolvePathWithMasking
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19429/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19429/3//COMMIT_MSG@40
PS3, Line 40: try
Nit: tries


http://gerrit.cloudera.org:8080/#/c/19429/3//COMMIT_MSG@50
PS3, Line 50: always matches
Could you explain why the first condition is true for tables/views? Is it 
because their types are also represented as structs?



--
To view, visit http://gerrit.cloudera.org:8080/19429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8f1e78e325baafbe23101909d47e82bf140a2d77
Gerrit-Change-Number: 19429
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 19 Jan 2023 14:12:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11745: Add Hive's ESRI geospatial functions as builtins

2023-01-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19425 )

Change subject: IMPALA-11745: Add Hive's ESRI geospatial functions as builtins
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12203/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225
Gerrit-Change-Number: 19425
Gerrit-PatchSet: 10
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 19 Jan 2023 13:44:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11745: Add Hive's ESRI geospatial functions as builtins

2023-01-19 Thread Peter Rozsa (Code Review)
Peter Rozsa has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/19425 )

Change subject: IMPALA-11745: Add Hive's ESRI geospatial functions as builtins
..

IMPALA-11745: Add Hive's ESRI geospatial functions as builtins

This change adds geospatial functions from Hive's ESRI library as builtin UDFs.
Plain Hive UDFs are imported without changes, but the generic and varargs
functions are handled differently; generic functions are added with all of the
combinations of their parameters (cartesian product of the parameters), and
varargs functions are unfolded as an nth parameter simple function. The varargs
function wrappers are generated at build time and they can be configured in
gen_geospatial_udf_wrappers.py. These additional steps are required because of
the limitations in Impala's UDFExecuter which could be further improved; in
this case, the additional wrapping/mapping steps could be removed.

A new backend flag was added to turn this feature on/off as 
"geospatial_library".
The default value is "NONE" which means no geospatial function gets registered
as builtin, "HIVE_ESRI" value enables this implementation.

Known limitations:
 - ST_MultiLineString, ST_MultiPolygon only works with the WKT overload
 - ST_Polygon supports a maximum of 6 pairs of coordinates
 - ST_MultiPoint, ST_LineStrin supports a maximum of 7 pairs of coordinates
 - ST_ConvexHull, ST_Union supports a maximum of 6 geoms

These limits can be increased in gen_geospatial_udf_wrappers.py and
HiveEsriGeospatialBuiltins.java

Tests:
 - test_geospatial_udfs.py added based on
   https://github.com/Esri/spatial-framework-for-hadoop/tree/master/hive/test

Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225
---
M be/src/common/global-flags.cc
M be/src/exprs/hive-udf-call.cc
M be/src/util/backend-gflag-util.cc
M common/function-registry/CMakeLists.txt
A common/function-registry/gen_geospatial_udf_wrappers.py
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
A fe/src/main/java/org/apache/impala/catalog/HiveEsriGeospatialBuiltins.java
M fe/src/main/java/org/apache/impala/catalog/ScalarFunction.java
A 
fe/src/main/java/org/apache/impala/hive/executor/BinaryToBinaryHiveLegacyFunctionExtractor.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveGenericJavaFunction.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunction.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunctionFactory.java
M 
fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunctionFactoryImpl.java
A 
fe/src/main/java/org/apache/impala/hive/executor/HiveLegacyFunctionExtractor.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveLegacyJavaFunction.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaDoubleWritable.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M 
fe/src/test/java/org/apache/impala/hive/executor/TestHiveJavaFunctionFactory.java
M java/CMakeLists.txt
M java/shaded-deps/hive-exec/pom.xml
M testdata/datasets/README
A testdata/workloads/functional-query/queries/QueryTest/udf-esri-geospatial.test
A tests/custom_cluster/test_geospatial_udfs.py
27 files changed, 3,505 insertions(+), 139 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/19425/10
--
To view, visit http://gerrit.cloudera.org:8080/19425
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225
Gerrit-Change-Number: 19425
Gerrit-PatchSet: 10
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-11845: Fix incorrect check of struct STAR path in resolvePathWithMasking

2023-01-19 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19429 )

Change subject: IMPALA-11845: Fix incorrect check of struct STAR path in 
resolvePathWithMasking
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19429/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19429/2//COMMIT_MSG@41
PS2, Line 41: it's not a table
> Shouldn't this be "it's not a struct"?
I think I should reword it to: "masked_view" is not a table
Note that the underlying TableRef is expected to be a BaseTableRef.


http://gerrit.cloudera.org:8080/#/c/19429/2//COMMIT_MSG@44
PS2, Line 44: These are the conditions for returning the STAR paths directly:
> After looking at the code again, aren't the conditions for returning the pa
Ah, my bad! I said it in the reverse way.. It's the condition for nested struct 
path.



--
To view, visit http://gerrit.cloudera.org:8080/19429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8f1e78e325baafbe23101909d47e82bf140a2d77
Gerrit-Change-Number: 19429
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 19 Jan 2023 13:08:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11845: Fix incorrect check of struct STAR path in resolvePathWithMasking

2023-01-19 Thread Quanlong Huang (Code Review)
Hello Fang-Yu Rao, Daniel Becker, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19429

to look at the new patch set (#3).

Change subject: IMPALA-11845: Fix incorrect check of struct STAR path in 
resolvePathWithMasking
..

IMPALA-11845: Fix incorrect check of struct STAR path in resolvePathWithMasking

resolvePathWithMasking() is a wrapper on resolvePath() to further
resolve nested columns inside the table masking view. When it was
added, complex types in the select list hadn't been supported yet. So
the table masking view can't expose complex type columns directly in the
select list. Any paths in nested types will be further resolved inside
the table masking view in resolvePathWithMasking().

Take the following query as an example:
  select id, nested_struct.* from complextypestbl;
If Ranger column-masking/row-filter policies applied on the table, the
query is rewritten as
  select id, nested_struct.* from (
select mask(id) from complextypestbl
where row-filtering-condition
  ) t;
Table masking view "t" can't expose the nested column "nested_struct".
So we further resolve "nested_struct" inside the inlineView to use the
masked table "complextypestbl". The underlying TableRef is expected to
be a BaseTableRef.

Paths that don't reference nested columns should be resolved and
returned directly (just like the original resolvePath() does). E.g.
  select v.* from masked_view v
is rewritten to
  select v.* from (
select mask(c1), mask(c2), ..., mask(cn)
from masked_view
where row-filtering-condition
  ) v;

The STAR path "v.*" should be resolved directly. However, it's treated as a
nested column unexpectedly. The code then try to resolve it inside the
table "masked_view" and found "masked_view" is not a table so throws the
IllegalStateException.

These are the conditions for nested STAR paths:
 - The type is STRUCT
 - And the resolved path is rooted at a valid tuple descriptor
They don't really recognize the nested columns. In fact, all STAR paths
match these conditions. Reason:
 - STAR expansion is only valid for paths to a struct type (or a
   table/view). So the first condition always matches.
 - The second condition also matches for STAR paths on table/view, i.e.
   paths of "v.*" when "v" is a catalog table/view. The rooted tuple
   descriptor is exactly the output tuple of the table/view.

This patch fixes the check for nested struct STAR path by checking
the matched types instead. Note that if "v.*" is a table/view expansion,
the matched type list is empty. If "v.*" is a struct column expansion,
the matched type list contains the STRUCT column type.

Tests:
 - Add missing coverage on STAR paths (v.*) on masked views.

Change-Id: I8f1e78e325baafbe23101909d47e82bf140a2d77
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
M 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test
4 files changed, 67 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/19429/3
--
To view, visit http://gerrit.cloudera.org:8080/19429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8f1e78e325baafbe23101909d47e82bf140a2d77
Gerrit-Change-Number: 19429
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-01-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8977/


--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 6
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 19 Jan 2023 11:57:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3120: Support Bucket Shuffle Join for bucketed table

2023-01-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19430 )

Change subject: IMPALA-3120: Support Bucket Shuffle Join for bucketed table
..


Patch Set 5: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8976/


--
To view, visit http://gerrit.cloudera.org:8080/19430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If321e7987bc88374d79500cffb77ea25b2ed0316
Gerrit-Change-Number: 19430
Gerrit-PatchSet: 5
Gerrit-Owner: Baike Xia 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 19 Jan 2023 11:28:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11658: Implement Iceberg manifest caching config for Impala

2023-01-19 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19423 )

Change subject: IMPALA-11658: Implement Iceberg manifest caching config for 
Impala
..


Patch Set 2: Code-Review+2

Thanks for adding this feature!


--
To view, visit http://gerrit.cloudera.org:8080/19423
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5a60a700d2ae6302dfe395d1ef602e6b1d821888
Gerrit-Change-Number: 19423
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yida Wu 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 19 Jan 2023 10:47:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11658: Implement Iceberg manifest caching config for Impala

2023-01-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19423 )

Change subject: IMPALA-11658: Implement Iceberg manifest caching config for 
Impala
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8978/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/19423
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5a60a700d2ae6302dfe395d1ef602e6b1d821888
Gerrit-Change-Number: 19423
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yida Wu 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 19 Jan 2023 10:47:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11658: Implement Iceberg manifest caching config for Impala

2023-01-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19423 )

Change subject: IMPALA-11658: Implement Iceberg manifest caching config for 
Impala
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/19423
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5a60a700d2ae6302dfe395d1ef602e6b1d821888
Gerrit-Change-Number: 19423
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Yida Wu 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 19 Jan 2023 10:47:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11845: Fix incorrect check of struct STAR path in resolvePathWithMasking

2023-01-19 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19429 )

Change subject: IMPALA-11845: Fix incorrect check of struct STAR path in 
resolvePathWithMasking
..


Patch Set 2:

(3 comments)

Thanks

http://gerrit.cloudera.org:8080/#/c/19429/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19429/1//COMMIT_MSG@12
PS1, Line 12: the table masking view can't expose complex type columns directly
> Yeah, I filed IMPALA-11847 for the refactor since it's a broader change. It
Great, thanks.


http://gerrit.cloudera.org:8080/#/c/19429/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19429/2//COMMIT_MSG@41
PS2, Line 41: it's not a table
Shouldn't this be "it's not a struct"?


http://gerrit.cloudera.org:8080/#/c/19429/2//COMMIT_MSG@44
PS2, Line 44: These are the conditions for returning the STAR paths directly:
After looking at the code again, aren't the conditions for returning the path 
directly that at least one of the mentioned statements is false?

if (!resolvedPath.destType().isStructType() || !resolvedPath.isRootedAtTuple()) 
{
  return resolvedPath;
}

Shouldn't we say
 - the type is not struct OR
 - the resolved path is not rooted at a valid tuple descriptor
?



--
To view, visit http://gerrit.cloudera.org:8080/19429
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8f1e78e325baafbe23101909d47e82bf140a2d77
Gerrit-Change-Number: 19429
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 19 Jan 2023 10:16:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11626: Handled COMMIT COMPACTION EVENT from HMS

2023-01-19 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19155 )

Change subject: IMPALA-11626: Handled COMMIT_COMPACTION_EVENT from HMS
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19155/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19155/6//COMMIT_MSG@7
PS6, Line 7: Handled
Nit: Handle



--
To view, visit http://gerrit.cloudera.org:8080/19155
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I464faedb4e3bbcd417bab2e3cb0d57e339d42605
Gerrit-Change-Number: 19155
Gerrit-PatchSet: 6
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 19 Jan 2023 09:55:21 +
Gerrit-HasComments: Yes