[Impala-ASF-CR] IMPALA-12414: Add scripts to run Trino in the dev environment

2023-08-31 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20444 )

Change subject: IMPALA-12414: Add scripts to run Trino in the dev environment
..


Patch Set 1: Code-Review+1

(1 comment)

my nit is optional imo

http://gerrit.cloudera.org:8080/#/c/20444/1/testdata/bin/minicluster_trino/hive.properties
File testdata/bin/minicluster_trino/hive.properties:

http://gerrit.cloudera.org:8080/#/c/20444/1/testdata/bin/minicluster_trino/hive.properties@4
PS1, Line 4:
very small nit: empty new line at the end of the file



--
To view, visit http://gerrit.cloudera.org:8080/20444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I49818c7a95e23988b3fbc3d31b4c7fa738e0d952
Gerrit-Change-Number: 20444
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Thu, 31 Aug 2023 15:31:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11946: Add Thrift HTTP support for external frontend

2023-03-06 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19537 )

Change subject: IMPALA-11946: Add Thrift HTTP support for external frontend
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19537/2/fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java
File fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java:

http://gerrit.cloudera.org:8080/#/c/19537/2/fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java@22
PS2, Line 22: import org.apache.hive.service.rpc.thrift.TCloseSessionReq;
> nit: I think the wildcard is not allowed for imports.
Done


http://gerrit.cloudera.org:8080/#/c/19537/2/fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java@46
PS2, Line 46: rift.tr
> nit: maybe call this hs2BinaryPort to easily distinguish with the other one
Done



--
To view, visit http://gerrit.cloudera.org:8080/19537
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ad400b1df471e3d61b62d8c0360b27396c26050
Gerrit-Change-Number: 19537
Gerrit-PatchSet: 3
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: John Sherman 
Gerrit-Comment-Date: Mon, 06 Mar 2023 22:10:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11946: Add Thrift HTTP support for external frontend

2023-03-06 Thread John Sherman (Code Review)
Hello Aman Sinha, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19537

to look at the new patch set (#3).

Change subject: IMPALA-11946: Add Thrift HTTP support for external frontend
..

IMPALA-11946: Add Thrift HTTP support for external frontend

- Add enable_external_fe_http flag that defaults to false
  - When true the external frontend service (external_fe_port) will
  expect clients to use http transport.
  - When false the external frontend service will expect binary
  transport.

- Add tests for basic external frontend functionality
- Add test to ensure the non-external frontend services do not expose
  the ExecutePlannedStatement interface.

Change-Id: I2ad400b1df471e3d61b62d8c0360b27396c26050
---
M be/src/service/impala-server.cc
M fe/src/test/java/org/apache/impala/customcluster/CustomClusterRunner.java
A fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java
3 files changed, 220 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/19537/3
--
To view, visit http://gerrit.cloudera.org:8080/19537
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ad400b1df471e3d61b62d8c0360b27396c26050
Gerrit-Change-Number: 19537
Gerrit-PatchSet: 3
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: John Sherman 


[Impala-ASF-CR] IMPALA-11946: Add Thrift HTTP support for external frontend

2023-03-06 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19537 )

Change subject: IMPALA-11946: Add Thrift HTTP support for external frontend
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19537/2/fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java
File fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java:

http://gerrit.cloudera.org:8080/#/c/19537/2/fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java@22
PS2, Line 22: import org.apache.hive.service.rpc.thrift.*;
> nit: I think the wildcard is not allowed for imports.
I can change, but it does match the test I used for a guideline.
fe/src/test/java/org/apache/impala/customcluster/JwtHttpTest.java:import 
org.apache.hive.service.rpc.thrift.*;
fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java:import 
org.apache.hive.service.rpc.thrift.*;
fe/src/test/java/org/apache/impala/customcluster/LdapWebserverTest.java:import 
org.apache.hive.service.rpc.thrift.*;



--
To view, visit http://gerrit.cloudera.org:8080/19537
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ad400b1df471e3d61b62d8c0360b27396c26050
Gerrit-Change-Number: 19537
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: John Sherman 
Gerrit-Comment-Date: Mon, 06 Mar 2023 21:22:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11946: Add Thrift HTTP support for external frontend

2023-02-27 Thread John Sherman (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19537

to look at the new patch set (#2).

Change subject: IMPALA-11946: Add Thrift HTTP support for external frontend
..

IMPALA-11946: Add Thrift HTTP support for external frontend

- Add enable_external_fe_http flag that defaults to false
  - When true the external frontend service (external_fe_port) will
  expect clients to use http transport.
  - When false the external frontend service will expect binary
  transport.

- Add tests for basic external frontend functionality
- Add test to ensure the non-external frontend services do not expose
  the ExecutePlannedStatement interface.

Change-Id: I2ad400b1df471e3d61b62d8c0360b27396c26050
---
M be/src/service/impala-server.cc
M fe/src/test/java/org/apache/impala/customcluster/CustomClusterRunner.java
A fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java
3 files changed, 209 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/19537/2
--
To view, visit http://gerrit.cloudera.org:8080/19537
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ad400b1df471e3d61b62d8c0360b27396c26050
Gerrit-Change-Number: 19537
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-11946: Add Thrift HTTP support for external frontend

2023-02-24 Thread John Sherman (Code Review)
John Sherman has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/19537


Change subject: IMPALA-11946: Add Thrift HTTP support for external frontend
..

IMPALA-11946: Add Thrift HTTP support for external frontend

- Add enable_external_fe_http flag that defaults to false
  - When true the external frontend service (external_fe_port) will
  expect clients to use http transport.
  - When false the external frontend service will expect binary
  transport.

- Add tests for basic external frontend functionality
- Add test to ensure the non-external frontend services do not expose
  the ExecutePlannedStatement interface.

Change-Id: I2ad400b1df471e3d61b62d8c0360b27396c26050
---
M be/src/service/impala-server.cc
M fe/src/test/java/org/apache/impala/customcluster/CustomClusterRunner.java
A fe/src/test/java/org/apache/impala/customcluster/ExternalFrontendTest.java
3 files changed, 208 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/19537/1
--
To view, visit http://gerrit.cloudera.org:8080/19537
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2ad400b1df471e3d61b62d8c0360b27396c26050
Gerrit-Change-Number: 19537
Gerrit-PatchSet: 1
Gerrit-Owner: John Sherman 


[Impala-ASF-CR] IMPALA-10657: Remove accidental usage of shaded imports

2021-04-13 Thread John Sherman (Code Review)
John Sherman has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17314


Change subject: IMPALA-10657: Remove accidental usage of shaded imports
..

IMPALA-10657: Remove accidental usage of shaded imports

- This changes seemingly accidental usages of shaded imports
  with the direct dependency
  - This is helpful to reduce confusion and possibly reduce the
  required jars for certain usages of the frontend jars

Change-Id: Ifa67aec96539db7861416f3e71cca83607e3d8c9
---
M fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java
M fe/src/main/java/org/apache/impala/catalog/Transaction.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
4 files changed, 6 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/17314/1
--
To view, visit http://gerrit.cloudera.org:8080/17314
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifa67aec96539db7861416f3e71cca83607e3d8c9
Gerrit-Change-Number: 17314
Gerrit-PatchSet: 1
Gerrit-Owner: John Sherman 


[Impala-ASF-CR] IMPALA-10552: Support external frontends supplying timeline for profile

2021-03-19 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17183 )

Change subject: IMPALA-10552: Support external frontends supplying timeline for 
profile
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17183/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17183/1//COMMIT_MSG@59
PS1, Line 59:
> Add Kurt as co-author
Done


http://gerrit.cloudera.org:8080/#/c/17183/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/17183/1/be/src/service/impala-server.cc@1212
PS1, Line 1212: 
(*query_handle)->set_user_profile_access(result.user_has_profile_access);
> I think this line got duplicated, maybe a merge conflict resolution error?
Done - it does seem like a merge dupe



--
To view, visit http://gerrit.cloudera.org:8080/17183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2b3692b4118ea23c0f9f8ec4bcc27b0b68bb32ec
Gerrit-Change-Number: 17183
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 19 Mar 2021 15:21:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10552: Support external frontends supplying timeline for profile

2021-03-19 Thread John Sherman (Code Review)
Hello Thomas Tauber-Marshall, Kurt Deschler, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17183

to look at the new patch set (#2).

Change subject: IMPALA-10552: Support external frontends supplying timeline for 
profile
..

IMPALA-10552: Support external frontends supplying timeline for profile

- Add EXTERNAL_FRONTEND as a client session type
- Use EXTERNAL_FRONTEND session type for clients connected to
  external frontend interface.
- Rename Query Timeline to Impala Backend Timeline for external
  frontends
  - the query timeline is no longer an end to end timeline when
executing a plan from an external frontend
- External frontends can provide timeline information through a
  TExecRequest by filling in the timeline field with a valid
  TEventSequence
- The frontend timeline and backend timeline are completely separate
  entities, meaning there is no overall attempt to capture the timing
  end to end
  - This is due to the fact that the frontend and Impala may not share
the same time source (or even machine).
  - It is safe to add together the backend + frontend timeline times
  to get a rough idea how long the query took end to end to execute,
  but keep in mind that this number does not capture the time it
  took the frontend to send the plan to the backend (Impala) nor does
  it capture how long it took the end user to read the results.

Example timeline with external frontend:
  Frontend Timeline: 3s016ms
 - Analysis finished: 1s130ms (1s130ms)
 - Calcite plan generated: 2s170ms (1s040ms)
 - Metadata load started: 2s245ms (74.486ms)
 - Metadata load finished. loaded-tables=1: 2s654ms (409.847ms)
 - Single node plan created: 2s726ms (71.659ms)
 - Runtime filters computed: 2s756ms (30.000ms)
 - Distributed plan created: 2s761ms (5.265ms)
 - Execution request created: 2s890ms (128.387ms)
 - Impala plan generated: 2s891ms (1.508ms)
 - Planning finished: 2s893ms (1.894ms)
 - Submitted query: 3s016ms (122.377ms)
  Impala Backend Timeline: 79.998ms
 - Query submitted: 0.000ns (0.000ns)
 - Submit for admission: 0.000ns (0.000ns)
 - Completed admission: 0.000ns (0.000ns)
 - Ready to start on 1 backends: 3.999ms (3.999ms)
 - All 1 execution backends (2 fragment instances) started: 7.999ms 
(3.999ms)
 - Rows available: 55.999ms (47.999ms)
 - Execution cancelled: 79.998ms (23.999ms)
 - Released admission control resources: 79.998ms (0.000ns)
 - Unregister query: 79.998ms (0.000ns)

Testing done:
- Manual inspection of profiles on the Impala web UI
- test_hs2.py
- test_tpch_queries.py
- test_tpcds_queries.py::TestTpcdsDecimalV2Query

Co-authored-by: Kurt Deschler 

Change-Id: I2b3692b4118ea23c0f9f8ec4bcc27b0b68bb32ec
---
M be/src/service/client-request-state.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M common/thrift/Query.thrift
4 files changed, 30 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/17183/2
--
To view, visit http://gerrit.cloudera.org:8080/17183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b3692b4118ea23c0f9f8ec4bcc27b0b68bb32ec
Gerrit-Change-Number: 17183
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10553: External Frontend CTAS support

2021-03-15 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10553: External Frontend CTAS support
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17145/7/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/7/be/src/exec/hdfs-table-sink.cc@257
PS7, Line 257: // When an external FE has provided a staging directory we 
use that directly.
 : // We are trusting that the external frontend implementation 
has done appropriate
 : // authorization checks on the external output directory.
> Nit: Update comment to call it an external output directory rather than a s
Done


http://gerrit.cloudera.org:8080/#/c/17145/7/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/17145/7/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@250
PS7, Line 250: if (writeId_ != -1) hdfsTableSink.setWrite_id(writeId_);
 : TTableSink tTableSink = new 
TTableSink(DescriptorTable.TABLE_SINK_ID,
 :
> Nit: If the line hasn't changed, avoid style-only changes.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 8
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 15 Mar 2021 21:50:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10553: External Frontend CTAS support

2021-03-15 Thread John Sherman (Code Review)
Hello Thomas Tauber-Marshall, Kurt Deschler, Joe McDonnell, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17145

to look at the new patch set (#8).

Change subject: IMPALA-10553: External Frontend CTAS support
..

IMPALA-10553: External Frontend CTAS support

- Adds the concept of an external staging dir to HdfsTableSink
  - This allows an external to specify the destination of the
  sink
  - When this is set, the external frontend is responsible for
  for moving and managing the results
  - Any DDL related operations are assumed to be handled by
  the external frontend
  - External frontends may optionally supply a partition
  depth which acts as a hint to skip a certain number of
  partitions while creating directories for partitions. This
  is for when the external frontend has pre-created a
  certain number of the directories in staging (usually the
  static portion of a partition specification)/
- Modifies delta/base naming to include 0 prefix padding to
  match Hive for dynamic partitioning detection
- External frontends are responsible for doing authorization
  checks against the staging directory and it is assumed the
  external frontend service is not exposed directly to users.

Co-authored-by: Kurt Deschler 

Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Reviewed-by: Kurt Deschler 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
5 files changed, 120 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17145/8
--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 8
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-15 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
..


Patch Set 7:

> (1 comment)
 >
 > Apart from the interaction with retry_failed_queries, this looks
 > good to me.
 >
 > If you want to split off the retry_failed_queries piece into a
 > separate change, I am ok with that. I think this change is ok if
 > the external frontend guarantees not to pass in retry_failed_queries=true
 > when using a result sink.

I think it would be better to follow up. I've created 
https://issues.apache.org/jira/browse/IMPALA-10586 to track that work.


--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 7
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 15 Mar 2021 21:37:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10552: Support external frontends supplying timeline for profile

2021-03-15 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17183 )

Change subject: IMPALA-10552: Support external frontends supplying timeline for 
profile
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17183/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17183/1//COMMIT_MSG@59
PS1, Line 59:
Add Kurt as co-author



--
To view, visit http://gerrit.cloudera.org:8080/17183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2b3692b4118ea23c0f9f8ec4bcc27b0b68bb32ec
Gerrit-Change-Number: 17183
Gerrit-PatchSet: 1
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: John Sherman 
Gerrit-Comment-Date: Mon, 15 Mar 2021 15:20:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10552: Support external frontends supplying timeline for profile

2021-03-14 Thread John Sherman (Code Review)
John Sherman has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17183


Change subject: IMPALA-10552: Support external frontends supplying timeline for 
profile
..

IMPALA-10552: Support external frontends supplying timeline for profile

- Add EXTERNAL_FRONTEND as a client session type
- Use EXTERNAL_FRONTEND session type for clients connected to
  external frontend interface.
- Rename Query Timeline to Impala Backend Timeline for external
  frontends
  - the query timeline is no longer an end to end timeline when
executing a plan from an external frontend
- External frontends can provide timeline information through a
  TExecRequest by filling in the timeline field with a valid
  TEventSequence
- The frontend timeline and backend timeline are completely separate
  entities, meaning there is no overall attempt to capture the timing
  end to end
  - This is due to the fact that the frontend and Impala may not share
the same time source (or even machine).
  - It is safe to add together the backend + frontend timeline times
  to get a rough idea how long the query took end to end to execute,
  but keep in mind that this number does not capture the time it
  took the frontend to send the plan to the backend (Impala) nor does
  it capture how long it took the end user to read the results.

Example timeline with external frontend:
  Frontend Timeline: 3s016ms
 - Analysis finished: 1s130ms (1s130ms)
 - Calcite plan generated: 2s170ms (1s040ms)
 - Metadata load started: 2s245ms (74.486ms)
 - Metadata load finished. loaded-tables=1: 2s654ms (409.847ms)
 - Single node plan created: 2s726ms (71.659ms)
 - Runtime filters computed: 2s756ms (30.000ms)
 - Distributed plan created: 2s761ms (5.265ms)
 - Execution request created: 2s890ms (128.387ms)
 - Impala plan generated: 2s891ms (1.508ms)
 - Planning finished: 2s893ms (1.894ms)
 - Submitted query: 3s016ms (122.377ms)
  Impala Backend Timeline: 79.998ms
 - Query submitted: 0.000ns (0.000ns)
 - Submit for admission: 0.000ns (0.000ns)
 - Completed admission: 0.000ns (0.000ns)
 - Ready to start on 1 backends: 3.999ms (3.999ms)
 - All 1 execution backends (2 fragment instances) started: 7.999ms 
(3.999ms)
 - Rows available: 55.999ms (47.999ms)
 - Execution cancelled: 79.998ms (23.999ms)
 - Released admission control resources: 79.998ms (0.000ns)
 - Unregister query: 79.998ms (0.000ns)

Testing done:
- Manual inspection of profiles on the Impala web UI
- test_hs2.py
- test_tpch_queries.py
- test_tpcds_queries.py::TestTpcdsDecimalV2Query

Change-Id: I2b3692b4118ea23c0f9f8ec4bcc27b0b68bb32ec
---
M be/src/service/client-request-state.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M common/thrift/Query.thrift
4 files changed, 31 insertions(+), 12 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/17183/1
--
To view, visit http://gerrit.cloudera.org:8080/17183
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2b3692b4118ea23c0f9f8ec4bcc27b0b68bb32ec
Gerrit-Change-Number: 17183
Gerrit-PatchSet: 1
Gerrit-Owner: John Sherman 


[Impala-ASF-CR] IMPALA-10553: External Frontend CTAS support

2021-03-14 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10553: External Frontend CTAS support
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17145/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17145/4//COMMIT_MSG@12
PS4, Line 12:   - When this is set, the external frontend is responsible for
:   for moving and managing the results
> This is correct. I'll update the commit message to make that clearer.
Done


http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc@127
PS4, Line 127:   staging_dir_ = Substitute("$0/_impala_insert_staging/$1", 
table_desc_->hdfs_base_dir(),
 :   PrintId(state->query_id(), "_"));
 :
 :   // Prepare partition key exprs and gather dynamic partition 
key exprs.
 :   for (size_t i = 0; i < partition_key_expr_evals_.size(); ++i) {
 : // Remember non-constant partition key exprs for building 
hash table of Hdfs files.
 :
> I can certainly change everything to be external output rather than externa
Done



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 7
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sun, 14 Mar 2021 18:40:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10553: External Frontend CTAS support

2021-03-14 Thread John Sherman (Code Review)
Hello Thomas Tauber-Marshall, Kurt Deschler, Joe McDonnell, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17145

to look at the new patch set (#7).

Change subject: IMPALA-10553: External Frontend CTAS support
..

IMPALA-10553: External Frontend CTAS support

- Adds the concept of an external staging dir to HdfsTableSink
  - This allows an external to specify the destination of the
  sink
  - When this is set, the external frontend is responsible for
  for moving and managing the results
  - Any DDL related operations are assumed to be handled by
  the external frontend
  - External frontends may optionally supply a partition
  depth which acts as a hint to skip a certain number of
  partitions while creating directories for partitions. This
  is for when the external frontend has pre-created a
  certain number of the directories in staging (usually the
  static portion of a partition specification)/
- Modifies delta/base naming to include 0 prefix padding to
  match Hive for dynamic partitioning detection
- External frontends are responsible for doing authorization
  checks against the staging directory and it is assumed the
  external frontend service is not exposed directly to users.

Co-authored-by: Kurt Deschler 

Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Reviewed-by: Kurt Deschler 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
5 files changed, 123 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17145/7
--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 7
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-14 Thread John Sherman (Code Review)
Hello Thomas Tauber-Marshall, Kurt Deschler, Joe McDonnell, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17145

to look at the new patch set (#6).

Change subject: IMPALA-10552: External Frontend CTAS support
..

IMPALA-10552: External Frontend CTAS support

- Adds the concept of an external staging dir to HdfsTableSink
  - This allows an external to specify the destination of the
  sink
  - When this is set, the external frontend is responsible for
  for moving and managing the results
  - Any DDL related operations are assumed to be handled by
  the external frontend
  - External frontends may optionally supply a partition
  depth which acts as a hint to skip a certain number of
  partitions while creating directories for partitions. This
  is for when the external frontend has pre-created a
  certain number of the directories in staging (usually the
  static portion of a partition specification)/
- Modifies delta/base naming to include 0 prefix padding to
  match Hive for dynamic partitioning detection
- External frontends are responsible for doing authorization
  checks against the staging directory and it is assumed the
  external frontend service is not exposed directly to users.

Co-authored-by: Kurt Deschler 

Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Reviewed-by: Kurt Deschler 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
5 files changed, 123 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17145/6
--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 6
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-12 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
..


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@796
PS6, Line 796:   // All instances must have reported their final statuses 
before finalization, which is a
 :   // post-condition of Wait. Result sink file clean up is the 
responsibility of the
 :   // external frontend
> This is a copy/paste from FinalizeHdfsDml(), so the second sentence doesn't
Done


http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@802
PS6, Line 802:   RETURN_IF_ERROR(UpdateExecState(Status::OK(), nullptr, 
FLAGS_hostname));
> If there is an error from execution, it would show up here and this would r
I agree that retry_failed_queries might be problematic and we might want to 
recommend that an external frontend not enable the feature.

If I am reading the coordinator.cc code correct though, we do not retry queries 
with a result sink. Are there other areas I should be concerned about? I see 
some usage in query-driver.cc


http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@807
PS6, Line 807:
> Nit: This "0" is the table id. I'm guessing 0 is a special constant for the
I moved it to a named constant within this method since it is the only usage of 
it. I can also move it to DescriptorTbl if your prefer it there.


http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@810
PS6, Line 810: result_sink_table_id, obj_p
> I think this should be "0" (i.e. the special table id used in the CreateHdf
I removed the table_id portion from this message/



--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 7
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 12 Mar 2021 23:43:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-12 Thread John Sherman (Code Review)
Hello Aman Sinha, Thomas Tauber-Marshall, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17144

to look at the new patch set (#7).

Change subject: IMPALA-10551: Add result sink support for external frontends
..

IMPALA-10551: Add result sink support for external frontends

- The intended purpose of these changes is to allow external frontends
  to receive query results via files rather than streaming the results
  through the thrift interface.
- External frontends are expected to provide an FeFsTable implementation
  that describes the desired location to store results.
- External frontends are responsible for managing the files after the
  query is completed.
- Testing has been manual and through an implementation of an external
  frontend.

Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Reviewed-by: Aman Sinha 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/query-exec-params.cc
M be/src/runtime/query-exec-params.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
9 files changed, 99 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/7
--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 7
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-10 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
..


Patch Set 6:

(7 comments)

Thanks Joe, I think the suggested review comments has made this a better patch 
overall.

http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.h@308
PS3, Line 308:   /// Non-null if and only if the query produces results for the 
client; i.e. is of
 :   /// TStmtType::QUERY. Coordinator uses these to pull results 
from plan tree and return
 :   /// them to the client in GetNext(), and also to access the 
fragment instance's runtime
 :   /// state.
 :   ///
 :   /// Result rows are materialized by this fragment instance in 
its own thread. They are
 :   /// materialized into a QueryResultSet provided to the 
coordinator during GetNext().
 :   ///
 :   /// Owned by the QueryState. Set in Exec().
 :   FragmentInstanceState* coord_instance_ = nullptr;
> The way we are using coord_instance_ for a query with a result sink doesn't
I've added a method to query-exec-params and call it in a few places to 
determine if there is a result sink in coordinator.cc - not sure if that is the 
best way.


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@193
PS3, Line 193:   // set coord_instance_ and coord_sink_
 :   if (exec_params_.GetCoordFragment() != nullptr) {
 : // this blocks until all fragment instances have finished 
their Prepare phase
 : Status query_status = 
query_state_->GetFInstanceState(query_id(), _instance_);
 : if (!query_status.ok()) return UpdateExecState(query_status, 
nullptr, FLAGS_hostname);
 : // We expected this query to have a coordinator instance.
 : DCHECK(coord_instance_ != nullptr);
 : // When GetFInstanceState() returns the coordinator 
instance, the Prepare phase is
 : // done and the FragmentInstanceState's root sink will be 
set up.
 : coord_sink_ = coord_instance_->GetRootSink();
 : DCHECK(coord_sink_ != nullptr);
 :   }
 :   return Status::OK();
 : }
 :
> As mentioned in coordinator.h, I think coord_instance_ should be nullptr fo
Done


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@798
PS3, Line 798:   // staging directory.
> I would prefer to review this when I can see how it is being called.
The newest review has the call.


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@805
PS3, Line 805:   DCHECK(query_ctx().__isset.desc_tbl_serialized);
> If this returns an error (which it should if the query is not successful),
This code is based on FinalizeHdfsDml which follows a similar pattern. So I 
wonder if something else ends up cleaning it up in the error case.


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@894
PS3, Line 894: && query_state_->query_options().spool_query_results
 : && 
query_state_->query_options().spool_all_results_for_retries) {
 :   // Wait until the BufferedPlanRootSink spooled all results 
or any errors stopping
 :   // it, e.g. batch queue full, cancellation or failures.
 :   auto sink = 
static_cast(coord_sink_);
 :   if (sink->WaitForAllResultsSpooled()) {
 : VLOG_QUERY << "Cannot spool all results in the allocated 
result spooling space."
 : " Query retry will be skipped if any results have 
been returned.";
 :   }
 :
> If the below if statement does not apply to queries with result sinks, then
Done


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@905
PS3, Line 905:   }
> I think it would make sense to treat a query with a result sink like a DML
Done


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@940
PS3, Line 940:
> If a query with a result sink goes through the current DML path, then it wo
Done



--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 6
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-10 Thread John Sherman (Code Review)
Hello Aman Sinha, Thomas Tauber-Marshall, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17144

to look at the new patch set (#6).

Change subject: IMPALA-10551: Add result sink support for external frontends
..

IMPALA-10551: Add result sink support for external frontends

- The intended purpose of these changes is to allow external frontends
  to receive query results via files rather than streaming the results
  through the thrift interface.
- External frontends are expected to provide an FeFsTable implementation
  that describes the desired location to store results.
- External frontends are responsible for managing the files after the
  query is completed.
- Testing has been manual and through an implementation of an external
  frontend.

Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Reviewed-by: Aman Sinha 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/query-exec-params.cc
M be/src/runtime/query-exec-params.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
9 files changed, 97 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/6
--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 6
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-10 Thread John Sherman (Code Review)
Hello Aman Sinha, Thomas Tauber-Marshall, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17144

to look at the new patch set (#5).

Change subject: IMPALA-10551: Add result sink support for external frontends
..

IMPALA-10551: Add result sink support for external frontends

- The intended purpose of these changes is to allow external frontends
  to receive query results via files rather than streaming the results
  through the thrift interface.
- External frontends are expected to provide an FeFsTable implementation
  that describes the desired location to store results.
- External frontends are responsible for managing the files after the
  query is completed.
- Testing has been manual and through an implementation of an external
  frontend.

Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Reviewed-by: Aman Sinha 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/query-exec-params.cc
M be/src/runtime/query-exec-params.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
9 files changed, 96 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/5
--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 5
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-10 Thread John Sherman (Code Review)
Hello Aman Sinha, Thomas Tauber-Marshall, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17144

to look at the new patch set (#4).

Change subject: IMPALA-10551: Add result sink support for external frontends
..

IMPALA-10551: Add result sink support for external frontends

- The intended purpose of these changes is to allow external frontends
  to receive query results via files rather than streaming the results
  through the thrift interface.
- External frontends are expected to provide an FeFsTable implementation
  that describes the desired location to store results.
- External frontends are responsible for managing the files after the
  query is completed.
- Testing has been manual and through an implementation of an external
  frontend.

Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Reviewed-by: Aman Sinha 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
9 files changed, 113 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/4
--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 4
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 4:

(3 comments)

Thanks Joe for the feedback

http://gerrit.cloudera.org:8080/#/c/17145/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17145/4//COMMIT_MSG@12
PS4, Line 12:   - When this is set, the external frontend is responsible for
:   for moving and managing the results
> I assume the external frontend is also handling any metadata operations as
This is correct. I'll update the commit message to make that clearer.


http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc@127
PS4, Line 127:   if (HasExternalStagingDir()) {
 : // When an external FE has provided a staging path, we use 
it directly.
 : staging_dir_ = external_staging_dir_;
 :   } else {
 : staging_dir_ = Substitute("$0/_impala_insert_staging/$1",
 : table_desc_->hdfs_base_dir(), PrintId(state->query_id(), 
"_"));
 :   }
> The external staging directory is a staging directory from the point of vie
I can certainly change everything to be external output rather than external 
staging.

I'll verify why I'm modifying staging_dir_ here - reading the code I assumed I 
initially did this so BuildHdfsFileNames would use it correctly to build 
tmp_hdfs_* variables but indeed I don't think these paths would utilize them.


http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc@279
PS4, Line 279:   // The 0 padding on base and delta is to match the 
behavior of Hive since various
 :   // systems will expect a certain format for dynamic 
partition creation. Additionally
 :   // include an 0 statement id for delta directory so 
various Hive AcidUtils detect
 :   // the directory (such as AcidUtils.baseOrDeltaSubdir()). 
Multiple statements in a
 :   // single transaction is not supported.
 :   if (overwrite_) {
 : output_partition->final_hdfs_file_name_prefix += 
StringPrintf("/base_%07ld/",
 : write_id_);
 :   } else {
 : output_partition->final_hdfs_file_name_prefix += 
StringPrintf(
 : "/delta_%07ld_%07ld_/", write_id_, write_id_);
 :   }
> Should this be triggered by its own variable independently from the externa
It is a bit orthogonal. I think ideally Impala/Hive and external frontend 
implementations would standardize on something consistent. I took the paranoid 
approach of only applying this to external frontends as I do not want to cause 
any inadvertent regressions. I'm not sure it is currently worth it to plumb 
through a flag indicating which format is expected. If another implementation 
of a frontend shows up,I think it might be worth it. Feel free to push back if 
you feel strongly.



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 4
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 22:15:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.h@249
PS2, Line 249:
> Done
Done


http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc@262
PS2, Line 262: output_partition->final_hdfs_file_name_prefix = 
Substitute("$0/$1/",
> Sure, I can add a comment here. I didn't initially because if I added a com
I added a comment here and to the commit message for this change set.



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 17:07:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread John Sherman (Code Review)
Hello Thomas Tauber-Marshall, Kurt Deschler, Joe McDonnell, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17145

to look at the new patch set (#4).

Change subject: IMPALA-10552: External Frontend CTAS support
..

IMPALA-10552: External Frontend CTAS support

- Adds the concept of an external staging dir to HdfsTableSink
  - This allows an external to specify the destination of the
  sink
  - When this is set, the external frontend is responsible for
  for moving and managing the results
  - External frontends may optionally supply a partition
  depth which acts as a hint to skip a certain number of
  partitions while creating directories for partitions. This
  is for when the external frontend has pre-created a
  certain number of the directories in staging (usually the
  static portion of a partition specification)/
- Modifies delta/base naming to include 0 prefix padding to
  match Hive for dynamic partitioning detection
- External frontends are responsible for doing authorization
  checks against the staging directory and it is assumed the
  external frontend service is not exposed directly to users.

Co-authored-by: Kurt Deschler 

Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Reviewed-by: Kurt Deschler 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
5 files changed, 131 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17145/4
--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 4
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc@262
PS2, Line 262: output_partition->final_hdfs_file_name_prefix = 
Substitute("$0/$1/",
> Please add a comment to this effect. We may want to at least validate the p
Sure, I can add a comment here. I didn't initially because if I added a comment 
everywhere where this is possible it'd be a lot of comments. The commit message 
for the external planner port makes it a bit clearer too
"This allows different security policy to be applied to
  each type of connection. The external_fe_port should be considered
  a privileged service and should only be exposed to an external
  frontend that does user authentication and does authorization
  checks on generated plans"

In any case, I do not mind being paranoid via comments in such a case.


http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@243
PS2, Line 243: if (externalStagingDir_ != null) {
> Actually was thinking C++. This is Java an you are correct.
Done


http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java@1803
PS2, Line 1803:   public static void addFinalizationParamsForInsert(
> No comment added?
I think you are looking at patch set 2 and not 3



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 16:51:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 3:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/17145/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17145/1//COMMIT_MSG@22
PS1, Line 22:
> Add co-author for Kurt
Done


http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.h@249
PS2, Line 249:   // Stores the indices into the list of non-clustering columns 
of the target table that
> Extra line
Done


http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.h@249
PS1, Line 249:   // Stores the indices into the list of non-clustering columns 
of the target table that
> delete
Done


http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.cc@131
PS1, Line 131: staging_dir_ = Substitute("$0/_impala_insert_staging/$1",
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.cc@232
PS1, Line 232: void HdfsTableSink::BuildHdfsFileNames(
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.cc@506
PS1, Line 506:   for (int j = 0; j < partition_key_expr_evals_.size(); ++j) {
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.cc@549
PS1, Line 549:   // partition_name_ss now holds the unique descriptor for this 
partition,
> line too long (95 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc@262
PS2, Line 262: // When an external FE has provided a staging directory we 
use that directly.
> Do we need any validation on the path here to avoid security issues writing
For now we are treating the external planner port as a trusted/protected 
service meant for deployment in scenarios you can lock down access to said 
port. We should certainly in the (near) future look into doing similar things 
that communication between impalads do that prevent users from impersonating a 
coordinator.


http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@90
PS2, Line 90:
> I would say "specifies depth" rather than hint as this needs to be enforced
Done


http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@243
PS2, Line 243: hdfsTableSink.setSort_columns(sortColumns_);
> externalStagingDir_ is not a pointer..
externalStagingDir_ is a String object and if it is not set I do believe it can 
be null.

(Maybe you read it as the int externalStagingPartitionDepth_?)


http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java@1803
PS2, Line 1803:   // This is public to allow external frontends to utilize this 
method to fill in the
> Add comment why this is public static.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 3
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 15:11:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread John Sherman (Code Review)
Hello Thomas Tauber-Marshall, Kurt Deschler, Joe McDonnell, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17145

to look at the new patch set (#3).

Change subject: IMPALA-10552: External Frontend CTAS support
..

IMPALA-10552: External Frontend CTAS support

- Adds the concept of an external staging dir to HdfsTableSink
  - This allows an external to specify the destination of the
  sink
  - When this is set, the external frontend is responsible for
  for moving and managing the results
  - External frontends may optionally supply a partition
  depth which acts as a hint to skip a certain number of
  partitions while creating directories for partitions. This
  is for when the external frontend has pre-created a
  certain number of the directories in staging (usually the
  static portion of a partition specification)/
- Modifies delta/base naming to include 0 prefix padding to
  match Hive for dynamic partitioning detection

Co-authored-by: Kurt Deschler 

Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Reviewed-by: Kurt Deschler 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
5 files changed, 129 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17145/3
--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 3
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-07 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 2:

gerrit-verify-dryrun-external


--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sun, 07 Mar 2021 16:38:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-07 Thread John Sherman (Code Review)
Hello Aman Sinha, Thomas Tauber-Marshall, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17144

to look at the new patch set (#3).

Change subject: IMPALA-10551: Add result sink support for external frontends
..

IMPALA-10551: Add result sink support for external frontends

- The intended purpose of these changes is to allow external frontends
  to receive query results via files rather than streaming the results
  through the thrift interface.
- External frontends are expected to provide an FeFsTable implementation
  that describes the desired location to store results.
- External frontends are responsible for managing the files after the
  query is completed.
- Testing has been manual and through an implementation of an external
  frontend.

Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Reviewed-by: Aman Sinha 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
9 files changed, 93 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/3
--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 3
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-9218: Add support for locally compiled Hive

2021-03-07 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17094 )

Change subject: IMPALA-9218: Add support for locally compiled Hive
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17094/3/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/17094/3/bin/impala-config.sh@254
PS3, Line 254: export 
IMPALA_HIVE_STORAGE_API_VERSION=${HIVE_STORAGE_API_VERSION_OVERRIDE:-"2.3.0.$IMPALA_HIVE_VERSION"}
> line too long (105 > 90)
Going to ignore for this file.


http://gerrit.cloudera.org:8080/#/c/17094/3/bin/impala-config.sh@366
PS3, Line 366: export 
HIVE_HOME=${HIVE_HOME_OVERRIDE:-"$CDP_COMPONENTS_HOME/apache-hive-${IMPALA_HIVE_VERSION}-bin"}
> line too long (101 > 90)
Going to ignore for this file.


http://gerrit.cloudera.org:8080/#/c/17094/3/bin/impala-config.sh@367
PS3, Line 367: export 
HIVE_SRC_DIR=${HIVE_SRC_DIR_OVERRIDE:-"${CDP_COMPONENTS_HOME}/hive-${IMPALA_HIVE_VERSION}"}
> line too long (98 > 90)
Going to ignore for this file.



--
To view, visit http://gerrit.cloudera.org:8080/17094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I21892c153c445e3a5d93f2bc8f5e0b799929dd34
Gerrit-Change-Number: 17094
Gerrit-PatchSet: 3
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Comment-Date: Sun, 07 Mar 2021 16:31:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9218: Add support for locally compiled Hive

2021-03-07 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17094 )

Change subject: IMPALA-9218: Add support for locally compiled Hive
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17094/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17094/2//COMMIT_MSG@23
PS2, Line 23: export HIVE_VERSION_OVERRIDE=3.1.0-SNAPSHOT
> If the long lines were for the description of what the patch is doing, that
Done


http://gerrit.cloudera.org:8080/#/c/17094/2/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/17094/2/bin/bootstrap_toolchain.py@469
PS2, Line 469:   use_override_hive = \
 :   "HIVE_VERSION_OVERRIDE" in os.environ and os.envi
> Nit: for this, I think I would break the line right after the first equals,
Done


http://gerrit.cloudera.org:8080/#/c/17094/2/bin/bootstrap_toolchain.py@471
PS2, Line 471:
> flake8: E501 line too long (91 > 90 characters)
Done



--
To view, visit http://gerrit.cloudera.org:8080/17094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I21892c153c445e3a5d93f2bc8f5e0b799929dd34
Gerrit-Change-Number: 17094
Gerrit-PatchSet: 3
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Comment-Date: Sun, 07 Mar 2021 16:31:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9218: Add support for locally compiled Hive

2021-03-07 Thread John Sherman (Code Review)
John Sherman has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17094 )

Change subject: IMPALA-9218: Add support for locally compiled Hive
..

IMPALA-9218: Add support for locally compiled Hive

- Add HIVE_VERSION_OVERRIDE, HIVE_STORAGE_API_VERSION_OVERRIDE,
  HIVE_METASTORE_THRIFT_DIR_OVERRIDE, HIVE_HOME_OVERRIDE environment
  variable support to impala-config.sh
- When used together with HIVE_SRC_DIR_OVERRIDE allows a user to
  specify a locally compiled version of Hive for development and the
  minicluster
- Hive jars are expected to have been installed into the local maven
  repository
- Currently only version 3 of Hive is supported due to the absence of
  API shims for Hive 4.0
Example:
  ~/hive $ mvn package install -Pdist -DskipTests

Example configuration:
export HIVE_VERSION_OVERRIDE=3.1.0-SNAPSHOT
export HIVE_STORAGE_API_VERSION_OVERRIDE=2.6.0
export HIVE_HOME_OVERRIDE=\
~/hive/packaging/target/apache-hive-3.1.0-SNAPSHOT-bin/apache-hive-3.1.0-SNAPSHOT-bin
export HIVE_SRC_DIR_OVERRIDE=~/hive
export 
HIVE_METASTORE_THRIFT_DIR_OVERRIDE=~/hive/standalone-metastore/src/main/thrift/

Change-Id: I21892c153c445e3a5d93f2bc8f5e0b799929dd34
---
M README-build.md
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M java/pom.xml
4 files changed, 26 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/17094/3
--
To view, visit http://gerrit.cloudera.org:8080/17094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I21892c153c445e3a5d93f2bc8f5e0b799929dd34
Gerrit-Change-Number: 17094
Gerrit-PatchSet: 3
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 


[Impala-ASF-CR] IMPALA-10550: Add External Frontend service port

2021-03-02 Thread John Sherman (Code Review)
Hello Aman Sinha, Thomas Tauber-Marshall, Kurt Deschler, Joe McDonnell, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17125

to look at the new patch set (#9).

Change subject: IMPALA-10550: Add External Frontend service port
..

IMPALA-10550: Add External Frontend service port

- If external_fe_port flag is >0, spins up a new HS2 compatible
  service port
- Added enable_external_fe_support option to start-impala-cluster.py
  - which when detected will start impala clusters with
  external_fe_port on 21150-21152
- Modify impalad_coordinator Dockerfile to expose external frontend
  port at 21150
- The intent of this commit is to separate external frontend
  connections from normal hs2 connections
  - This allows different security policy to be applied to
  each type of connection. The external_fe_port should be considered
  a privileged service and should only be exposed to an external
  frontend that does user authentication and does authorization
  checks on generated plans

Change-Id: I991b5b05e12e37d8739e18ed1086bbb0228acc40
Reviewed-by: Aman Sinha 
---
M be/src/rpc/authentication.cc
M be/src/rpc/authentication.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/impalad-main.cc
M be/src/testutil/in-process-servers.cc
M bin/start-impala-cluster.py
M common/thrift/metrics.json
M docker/impalad_coordinator/Dockerfile
M tests/common/impala_cluster.py
10 files changed, 137 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/17125/9
--
To view, visit http://gerrit.cloudera.org:8080/17125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I991b5b05e12e37d8739e18ed1086bbb0228acc40
Gerrit-Change-Number: 17125
Gerrit-PatchSet: 9
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10550: Add External Frontend service port

2021-03-02 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17125 )

Change subject: IMPALA-10550: Add External Frontend service port
..


Patch Set 8:

I'll investigate the failure(s). Initial look was it is unit test - which I 
should have ran locally in the first place (but evidently did not).


--
To view, visit http://gerrit.cloudera.org:8080/17125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I991b5b05e12e37d8739e18ed1086bbb0228acc40
Gerrit-Change-Number: 17125
Gerrit-PatchSet: 8
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 02 Mar 2021 15:20:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-01 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17144/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17144/1//COMMIT_MSG@14
PS1, Line 14: - External frontends are responsible for managing the files after 
the
:   query is completed.
> I wonder how this would interact with the "retry_failed_queries" query opti
I think the newest version of the patch doesn't try to retry failed queries 
when there is a result sink.


http://gerrit.cloudera.org:8080/#/c/17144/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17144/1/be/src/runtime/coordinator.cc@798
PS1, Line 798: Coordinator::FinalizeResultSink()
> I might be missing something, but where is this called?
Great catch - I think some code got lost in a rebase. Will show up in next 
patch (in Wait()).


http://gerrit.cloudera.org:8080/#/c/17144/1/be/src/runtime/coordinator.cc@907
PS1, Line 907:   } else if (stmt_type_ == TStmtType::QUERY) {
 : DCHECK(coord_instance_ != nullptr);
> I'm thinking through the implication of calling this after WaitForBackends(
I think a delay in reporting an error is okay for now (As long as that is the 
only negative in thee way I am doing this)

I will also take some time to think it through a bit since I haven't thought 
about this code for awhile.


http://gerrit.cloudera.org:8080/#/c/17144/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17144/2/be/src/runtime/coordinator.cc@897
PS2, Line 897: // than streaming them to a client directly. For these types 
of queries we need to wait
> line too long (91 > 90)
I'll catch in next round of fixes.


http://gerrit.cloudera.org:8080/#/c/17144/1/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

http://gerrit.cloudera.org:8080/#/c/17144/1/be/src/runtime/fragment-instance-state.h@121
PS1, Line 121:   JoinBuilder* GetJoinBuildSink() const;
> Add CR
Done


http://gerrit.cloudera.org:8080/#/c/17144/1/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/17144/1/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@245
PS1, Line 245:   exprs.addAll(outputExprs_.subList(0,
> line too long (91 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 02 Mar 2021 00:23:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-01 Thread John Sherman (Code Review)
Hello Aman Sinha, Thomas Tauber-Marshall, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17144

to look at the new patch set (#2).

Change subject: IMPALA-10551: Add result sink support for external frontends
..

IMPALA-10551: Add result sink support for external frontends

- The intended purpose of these changes is to allow external frontends
  to receive query results via files rather than streaming the results
  through the thrift interface.
- External frontends are expected to provide an FeFsTable implementation
  that describes the desired location to store results.
- External frontends are responsible for managing the files after the
  query is completed.
- Testing has been manual and through an implementation of an external
  frontend.

Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Reviewed-by: Aman Sinha 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
9 files changed, 111 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/2
--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10550: Add External Frontend service port

2021-03-01 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17125 )

Change subject: IMPALA-10550: Add External Frontend service port
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17125/5/be/src/rpc/authentication.h
File be/src/rpc/authentication.h:

http://gerrit.cloudera.org:8080/#/c/17125/5/be/src/rpc/authentication.h@67
PS5, Line 67: Currently this is either null if external_fe_port <= 0 or
:   /// NoAuthProvider.
> I think you put this above the wrong function?
I totally did - usually I self review a little better. Thanks for catching.


http://gerrit.cloudera.org:8080/#/c/17125/5/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/17125/5/bin/start-impala-cluster.py@235
PS5, Line 235:   'external_fe_port': DEFAULT_EXTERNAL_FE_PORT + 
instance_num,
> What would you think about putting this behind a flag for now, eg. "start-i
Done
I went with the approach of excluding it in build_impalad_port_args since that 
seemed like a nice clean way of excluding it even if the map contains the port.



--
To view, visit http://gerrit.cloudera.org:8080/17125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I991b5b05e12e37d8739e18ed1086bbb0228acc40
Gerrit-Change-Number: 17125
Gerrit-PatchSet: 5
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 01 Mar 2021 22:26:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10550: Add External Frontend service port

2021-03-01 Thread John Sherman (Code Review)
Hello Aman Sinha, Thomas Tauber-Marshall, Kurt Deschler, Joe McDonnell, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17125

to look at the new patch set (#6).

Change subject: IMPALA-10550: Add External Frontend service port
..

IMPALA-10550: Add External Frontend service port

- If external_fe_port flag is >0, spins up a new HS2 compatible
  service port
- Added enable_external_fe_support option to start-impala-cluster.py
  - which when detected will start impala clusters with
  external_fe_port on 21150-21152
- Modify impalad_coordinator Dockerfile to expose external frontend
  port at 21150
- The intent of this commit is to separate external frontend
  connections from normal hs2 connections
  - This allows different security policy to be applied to
  each type of connection. The external_fe_port should be considered
  a privileged service and should only be exposed to an external
  frontend that does user authentication and does authorization
  checks on generated plans

Change-Id: I991b5b05e12e37d8739e18ed1086bbb0228acc40
Reviewed-by: Aman Sinha 
---
M be/src/rpc/authentication.cc
M be/src/rpc/authentication.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/impalad-main.cc
M be/src/testutil/in-process-servers.cc
M bin/start-impala-cluster.py
M common/thrift/metrics.json
M docker/impalad_coordinator/Dockerfile
M tests/common/impala_cluster.py
10 files changed, 136 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/17125/6
--
To view, visit http://gerrit.cloudera.org:8080/17125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I991b5b05e12e37d8739e18ed1086bbb0228acc40
Gerrit-Change-Number: 17125
Gerrit-PatchSet: 6
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-01 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17145/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17145/1//COMMIT_MSG@22
PS1, Line 22:
Add co-author for Kurt



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 1
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 01 Mar 2021 17:12:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-01 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.h@249
PS1, Line 249:
delete



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 1
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Mon, 01 Mar 2021 17:09:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-01 Thread John Sherman (Code Review)
Hello Kurt Deschler,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/17145

to review the following change.


Change subject: IMPALA-10552: External Frontend CTAS support
..

IMPALA-10552: External Frontend CTAS support

- Adds the concept of an external staging dir to HdfsTableSink
  - This allows an external to specify the destination of the
  sink
  - When this is set, the external frontend is responsible for
  for moving and managing the results
  - External frontends may optionally supply a partition
  depth which acts as a hint to skip a certain number of
  partitions while creating directories for partitions. This
  is for when the external frontend has pre-created a
  certain number of the directories in staging (usually the
  static portion of a partition specification)/
- Modifies delta/base naming to include 0 prefix padding to
  match Hive for dynamic partitioning detection

Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Reviewed-by: Kurt Deschler 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
5 files changed, 124 insertions(+), 22 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17145/1
--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 1
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Kurt Deschler 


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-01 Thread John Sherman (Code Review)
Hello Aman Sinha,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/17144

to review the following change.


Change subject: IMPALA-10551: Add result sink support for external frontends
..

IMPALA-10551: Add result sink support for external frontends

- The intended purpose of these changes is to allow external frontends
  to receive query results via files rather than streaming the results
  through the thrift interface.
- External frontends are expected to provide an FeFsTable implementation
  that describes the desired location to store results.
- External frontends are responsible for managing the files after the
  query is completed.
- Testing has been manual and through an implementation of an external
  frontend.

Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Reviewed-by: Aman Sinha 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
9 files changed, 93 insertions(+), 15 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/1
--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 1
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-01 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17144/1/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

http://gerrit.cloudera.org:8080/#/c/17144/1/be/src/runtime/fragment-instance-state.h@121
PS1, Line 121:   JoinBuilder* GetJoinBuildSink() const;
Add CR



--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 1
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: John Sherman 
Gerrit-Comment-Date: Mon, 01 Mar 2021 17:09:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10550: Add External Frontend service port

2021-02-26 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17125 )

Change subject: IMPALA-10550: Add External Frontend service port
..


Patch Set 5:

For clarity - for now it'll be the responsibility of external frontends to 
ensure they work with impala and not vice-versa until we get more stable. We 
should take due care not to regress any of the current impala functionality and 
near term unit test what we can - such as not exposing ExecutePlannedStatement 
via hs2_port and so forth.


--
To view, visit http://gerrit.cloudera.org:8080/17125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I991b5b05e12e37d8739e18ed1086bbb0228acc40
Gerrit-Change-Number: 17125
Gerrit-PatchSet: 5
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sat, 27 Feb 2021 01:29:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7982: Add host network usage to profile

2019-03-27 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12747 )

Change subject: IMPALA-7982: Add host network usage to profile
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12747/2/be/src/util/system-state-info.cc
File be/src/util/system-state-info.cc:

http://gerrit.cloudera.org:8080/#/c/12747/2/be/src/util/system-state-info.cc@73
PS2, Line 73: LOG(WARNING) << "Could not open " << path << ": " << 
GetStrErrMsg() << endl;
Should you just print the error from the status? or is the errno still 
correctly preserved for GetStrErrMsg to determine the error msg? (Same applies 
to ReadCurrentProcNetDev)



--
To view, visit http://gerrit.cloudera.org:8080/12747
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2cc74f87374080a74a13b7fb6e4da44a11d828ef
Gerrit-Change-Number: 12747
Gerrit-PatchSet: 2
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 27 Mar 2019 19:20:26 +
Gerrit-HasComments: Yes