[Impala-ASF-CR] Add missing authorization in KRPC

2018-08-29 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11331 )

Change subject: Add missing authorization in KRPC
..


Patch Set 5: Code-Review+1

(1 comment)

Testing side LGTM.

http://gerrit.cloudera.org:8080/#/c/11331/5/be/src/rpc/rpc-mgr-kerberized-test.cc
File be/src/rpc/rpc-mgr-kerberized-test.cc:

http://gerrit.cloudera.org:8080/#/c/11331/5/be/src/rpc/rpc-mgr-kerberized-test.cc@114
PS5, Line 114:   rpc_status =
 :   FromKuduStatus(scan_proxy->ScanMem(scan_request, 
_response, ));
 :   EXPECT_TRUE(rpc_status.ok());
So, this is passing only because the Authorize() function for 
ScanMemServiceImpl always returns true right? I had to read across files to 
figure that out.

I think adding a comment like the following should make the test easier to 
understand:
"ScanMemServiceImpl authorizes everyone connecting to the service, so we expect 
it to pass even though we have a different principal name".

And the opposite for PingServiceImpl.



--
To view, visit http://gerrit.cloudera.org:8080/11331
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f82dee5e721f2ed23e75fd91abbc6ab7addd4c5
Gerrit-Change-Number: 11331
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 29 Aug 2018 16:55:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add missing authorization in KRPC

2018-08-27 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11331 )

Change subject: Add missing authorization in KRPC
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11331/2/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/11331/2/be/src/common/global-flags.cc@50
PS2, Line 50: krb5_ccname
nit: Maybe krb5_ccpath ? Or krb5ccname_path


http://gerrit.cloudera.org:8080/#/c/11331/1/be/src/rpc/authentication.cc
File be/src/rpc/authentication.cc:

http://gerrit.cloudera.org:8080/#/c/11331/1/be/src/rpc/authentication.cc@608
PS1, Line 608: sasl::TSaslServer::SaslInit(GENERAL_CALLBACKS, 
APP_NAME.c_str());
> Wasn't very clear in my previous update. I was trying to say:
I think the SASL callbacks are invoked only on SASL handshakes, so they're 
technically once per connection too.

However, I'm all for building it on top of the KRPC auth as well.

Edit: I guess another difference is that with the approach that this patch 
takes, we'll obtain a kerberos ticket, but fail backend RPC communication, 
whereas a SASL auth callback failure will not provide a ticket as well IIRC. I 
can't think of security issues with that though since we're the only service 
they can talk to with the ticket and we'll refuse to talk if they're not 
authorized, so I think it should be fine.


http://gerrit.cloudera.org:8080/#/c/11331/2/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/11331/2/be/src/rpc/rpc-mgr.h@216
PS2, Line 216: authorized_service_name_
> I think this is somewhat ambiguous. When I read this first, I thought you m
Just to add, if we're going to keep the service name fixed, this should 
probably be a const.

That's your call however.



--
To view, visit http://gerrit.cloudera.org:8080/11331
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f82dee5e721f2ed23e75fd91abbc6ab7addd4c5
Gerrit-Change-Number: 11331
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 28 Aug 2018 03:56:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add missing authorization in KRPC

2018-08-26 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11331 )

Change subject: Add missing authorization in KRPC
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11331/1/be/src/rpc/authentication.cc
File be/src/rpc/authentication.cc:

http://gerrit.cloudera.org:8080/#/c/11331/1/be/src/rpc/authentication.cc@608
PS1, Line 608: sasl::TSaslServer::SaslInit(GENERAL_CALLBACKS, 
APP_NAME.c_str());
> I thought I had already taken care of this but looking at the code again, I
Alternatively, you could add these callbacks to the Kudu callback list for both 
the client and the server:
https://github.com/apache/impala/blob/ac4acf1b77ccad95528741c255834d8ccdb84518/be/src/kudu/rpc/server_negotiation.cc#L148-L152

https://github.com/apache/impala/blob/540611e863fe99b3d3ae35f8b94a745a68b9eba2/be/src/kudu/rpc/client_negotiation.cc#L125-L131



--
To view, visit http://gerrit.cloudera.org:8080/11331
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f82dee5e721f2ed23e75fd91abbc6ab7addd4c5
Gerrit-Change-Number: 11331
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Sun, 26 Aug 2018 16:56:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add missing authorization in KRPC

2018-08-26 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11331 )

Change subject: Add missing authorization in KRPC
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11331/1/be/src/rpc/authentication.cc
File be/src/rpc/authentication.cc:

http://gerrit.cloudera.org:8080/#/c/11331/1/be/src/rpc/authentication.cc@608
PS1, Line 608: sasl::TSaslServer::SaslInit(GENERAL_CALLBACKS, 
APP_NAME.c_str());
I thought I had already taken care of this but looking at the code again, I 
realize I made a mistake.

I think the easiest and quickest fix for this is to just check if kerberos is 
enabled and if it is, call TSaslServer::SaslInit() and TSaslClient::SaslInit() 
with 'KERB_INT_CALLBACKS' instead of 'GENERAL_CALLBACKS', which is what I 
thought I had done.

That way on every new SASL GSSAPI authentication, the 'SaslAuthorizeInternal' 
callback will be called. For the client side, L831 should override the internal 
callbacks with its own external callbacks.

If kerberos is enabled, we require all the connections to be kerberos enabled, 
so I expect that this should just work without any issues.

This is a less intuitive fix than the one you've done though, since this will 
happen in the context of some call in the SASL library, whereas in your patch 
you're doing it as part of the RpcMgr.



--
To view, visit http://gerrit.cloudera.org:8080/11331
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f82dee5e721f2ed23e75fd91abbc6ab7addd4c5
Gerrit-Change-Number: 11331
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Sun, 26 Aug 2018 16:42:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP: IMPALA-4063

2018-08-10 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11185


Change subject: WIP: IMPALA-4063
..

WIP: IMPALA-4063

Change-Id: I82dd6d6014f5219550082c3cc59e88bfb7d84ef8
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M common/protobuf/control_service.proto
M common/thrift/RuntimeProfile.thrift
14 files changed, 185 insertions(+), 227 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/11185/1
--
To view, visit http://gerrit.cloudera.org:8080/11185
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I82dd6d6014f5219550082c3cc59e88bfb7d84ef8
Gerrit-Change-Number: 11185
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-7335: Add logs HdfsScanNode to debug the issue

2018-08-08 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11174 )

Change subject: IMPALA-7335: Add logs HdfsScanNode to debug the issue
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11174/1/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/11174/1/be/src/exec/hdfs-scan-node.cc@331
PS1, Line 331: status
Might help to print this out too just to see if it's CANCELLED or something 
else.



--
To view, visit http://gerrit.cloudera.org:8080/11174
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I68698c90031edc6ee8c31e9ce3d52dade9d8f6f1
Gerrit-Change-Number: 11174
Gerrit-PatchSet: 1
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 09 Aug 2018 01:15:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

2018-08-08 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
..


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@310
PS8, Line 310: coord_
I feel that allowing for a non-const pointer to the Coordinator object to the 
Coordinator's child objects is quite dangerous in terms of accidentally 
introducing bugs in future changes. I'd rather prefer if a pointer to the 
DmlExecState is passed to this function, and the 'coord_' is left as a const 
ref.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@317
PS8, Line 317: MergeErrorMaps(instance_exec_status.error_log(), _log_);
This is less severe, but I guess that this is already wrong too. We potentially 
can merge the same error multiple times if there's a ReportExecStatus() RPC 
retry due to the coordinator's ACK being lost on the first try, and thus the 
executor retries and sends the same payload twice.

I'm wondering if it makes sense to also send this only on 'done = true'.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/runtime/coordinator-backend-state.cc@516
PS8, Line 516: lock_guard l1(exec_summary->lock);
Are we violating any lock ordering here?

Previously, we would obtain 'exec_summary->lock' before 'lock_', but now we're 
doing the opposite.

I looked through the code and it doesn't seem like it, but just want to make 
sure.


http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/8/be/src/service/control-service.cc@76
PS8, Line 76:
I feel that we can still keep the deserialization here and just check for the 
(instance_exec_status_size() == 0) case.

That will keep the code simpler and remove the lock.unlock() thing inside the 
CoordinatorBackendState.



--
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 08 Aug 2018 22:02:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-08-07 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..


Patch Set 15: Code-Review+1

(1 comment)

Carry +1.

http://gerrit.cloudera.org:8080/#/c/10813/15/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/15/be/src/runtime/fragment-instance-state.cc@105
PS15, Line 105: current_state_.Load() <= 
TFInstanceExecState::WAITING_FOR_PREPARE) {
  :   DCHECK(!is_prepared);
> Seems easier to understand with the following (with the same effect):
Done



--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 15
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Tue, 07 Aug 2018 18:44:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-08-07 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Joe McDonnell, Bikramjeet Vig, Dan Hecht, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10813

to look at the new patch set (#16).

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..

IMPALA-7163: Implement a state machine for the QueryState class

This patch adds a state machine for the QueryState class. The motivation
behind this patch is to make the query lifecycle from the point of
view of an executor much easier to reason about and this patch is key
for a follow on patch for IMPALA-2990 where the status reporting will
be per-query rather than per-fragment-instance. Currently, the state
machine provides no other purpose, and it will mostly be used for
IMPALA-2990.

We introduce 5 possible states for the QueryState which include 3
terminal states (FINISHED, CANCELLED and ERROR) and 2 non-terminal
states (PREPARING, EXECUTING). The transition from one state to the
next is always handled by a single thread which is also the QueryState
thread. This thread will additionally bear the purpose of sending
periodic updates after IMPALA-4063, which is the primary reason behind
having only this thread modify the state of the query.

Counting barriers are introduced to keep a count of how many fragment
instances have finished Preparing and Executing. These barriers also
block until all the fragment instances have finished a respective state.
The fragment instances update the query wide query status if an error is
hit and unblocks the barrier if it is in the EXECUTING state. The
PREPARING state blocks regardless of whether a fragment instance hit an
error or not, until all the fragment instances have completed
successfully or unsuccessfully, to maintain the invariant that fragment
instances cannot be cancelled until the entire QueryState has finished
PREPARING.

The status reporting protocol has not been changed and remains exactly
as it was.

Testing:
- Added 3 failure points in the query lifecycle using debug actions
  and added tests to validate the same (extension of IMPALA-7376).
- Ran 'core' and 'exhaustive' tests.

Future related work:
1) IMPALA-2990: Make status reporting per-query.
2) Try to logically align the FIS states with the QueryState states.
3) Consider mirroring the QueryState state machine to
CoordinatorBackendState

Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
---
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M common/thrift/ImpalaInternalService.thrift
M tests/failure/test_failpoints.py
7 files changed, 299 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/10813/16
--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 16
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-08-06 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..


Patch Set 15:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10813/13/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/13/be/src/runtime/fragment-instance-state.cc@107
PS13, Line 107: er
> My understanding is that explicitly check the current state during state tr
I think the only way to have a reliable DCHECK is to have a boolean that 
denotes whether the instance was prepared or not. So, I went ahead and took 
your suggestion to add the 'is_prepared' local flag.

So, in case someone changes the order of enums in the future, this DCHECK will 
catch it.


http://gerrit.cloudera.org:8080/#/c/10813/13/be/src/runtime/fragment-instance-state.cc@522
PS13, Line 522:   // Labels to send to the debug webpages to display the 
current state to the user.
  :   static const string finstance_state_labels[] = {
> This needs to be changed. I think this is dependent on the enum ordering.
Good catch. Done.


http://gerrit.cloudera.org:8080/#/c/10813/14/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/10813/14/common/thrift/ImpalaInternalService.thrift@626
PS14, Line 626: // the debug webpages.
> Given the assumption in this patch about the ordering of the fields in this
Done. Not adding the DCHECKs for every state though as that may be excessive. 
However, I've added a few DCHECKs at appropriate places in the code to ensure 
that some of the states are reached in order.



--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 15
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Tue, 07 Aug 2018 00:57:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-08-06 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Joe McDonnell, Bikramjeet Vig, Dan Hecht, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10813

to look at the new patch set (#15).

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..

IMPALA-7163: Implement a state machine for the QueryState class

This patch adds a state machine for the QueryState class. The motivation
behind this patch is to make the query lifecycle from the point of
view of an executor much easier to reason about and this patch is key
for a follow on patch for IMPALA-2990 where the status reporting will
be per-query rather than per-fragment-instance. Currently, the state
machine provides no other purpose, and it will mostly be used for
IMPALA-2990.

We introduce 5 possible states for the QueryState which include 3
terminal states (FINISHED, CANCELLED and ERROR) and 2 non-terminal
states (PREPARING, EXECUTING). The transition from one state to the
next is always handled by a single thread which is also the QueryState
thread. This thread will additionally bear the purpose of sending
periodic updates after IMPALA-4063, which is the primary reason behind
having only this thread modify the state of the query.

Counting barriers are introduced to keep a count of how many fragment
instances have finished Preparing and Executing. These barriers also
block until all the fragment instances have finished a respective state.
The fragment instances update the query wide query status if an error is
hit and unblocks the barrier if it is in the EXECUTING state. The
PREPARING state blocks regardless of whether a fragment instance hit an
error or not, until all the fragment instances have completed
successfully or unsuccessfully, to maintain the invariant that fragment
instances cannot be cancelled until the entire QueryState has finished
PREPARING.

The status reporting protocol has not been changed and remains exactly
as it was.

Testing:
- Added 3 failure points in the query lifecycle using debug actions
  and added tests to validate the same (extension of IMPALA-7376).
- Ran 'core' and 'exhaustive' tests.

Future related work:
1) IMPALA-2990: Make status reporting per-query.
2) Try to logically align the FIS states with the QueryState states.
3) Consider mirroring the QueryState state machine to
CoordinatorBackendState

Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
---
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M common/thrift/ImpalaInternalService.thrift
M tests/failure/test_failpoints.py
7 files changed, 295 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/10813/15
--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 15
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-08-05 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10813/13/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/13/be/src/runtime/fragment-instance-state.cc@107
PS13, Line 107: er
> A simpler alternative to relying on this enum is to save a boolean above wh
I think it's better to rely on the enum ordering. Otherwise, the new boolean is 
just redundant with the one of the states in the state machine.

We also rely on enum ordering in other parts in the code, so I think this 
should be fine.



--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 14
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 06 Aug 2018 01:14:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

2018-08-05 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
..


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.h@276
PS5, Line 276: to construct a fragment instance's status report
"... to construct a status report ..."

Since if 'fis' is nullptr, then we send a generic report with only the status 
of the query.


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.h@277
PS5, Line 277: The runtime profile
"If 'fis' is not 'nullptr', the runtime profile ..."


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc@256
PS5, Line 256: query ID $1
Shouldn't we print the fragment instance ID instead?


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc@295
PS5, Line 295: The profile is serialized in Thrift
The profile is a thrift structure serialized to a string ...


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc@306
PS5, Line 306: sidecar_buf->assign_copy(profile_str);
We could probably consider moving 'profile_str' to the 'sidecar_buf' to avoid 
the copy, but that requires changes in Kudu code, so we can defer for later.


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/runtime/query-state.cc@304
PS5, Line 304: if (!profile_str.empty()) {
 :   unique_ptr sidecar_buf = 
make_unique();
 :   sidecar_buf->assign_copy(profile_str);
 :   unique_ptr sidecar = 
RpcSidecar::FromFaststring(move(sidecar_buf));
 :
 :   int sidecar_idx;
 :   kudu::Status sidecar_status =
 :   rpc_controller.AddOutboundSidecar(move(sidecar), 
_idx);
 :   CHECK(sidecar_status.ok())
 :   << FromKuduStatus(sidecar_status, "Failed to add 
sidecar").GetDetail();
 :
 :   DCHECK_EQ(report.instance_exec_status_size(), 1);
 :   
report.mutable_instance_exec_status(0)->set_thrift_profile_sidecar_idx(sidecar_idx);
 : }
Can't we just do this once before the loop starts?


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/util/error-util.cc
File be/src/util/error-util.cc:

http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/util/error-util.cc@144
PS5, Line 144: auto
auto&


http://gerrit.cloudera.org:8080/#/c/10855/5/be/src/util/error-util.cc@172
PS5, Line 172: auto
auto&



--
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Sun, 05 Aug 2018 21:34:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-08-04 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Joe McDonnell, Bikramjeet Vig, Dan Hecht, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10813

to look at the new patch set (#14).

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..

IMPALA-7163: Implement a state machine for the QueryState class

This patch adds a state machine for the QueryState class. The motivation
behind this patch is to make the query lifecycle from the point of
view of an executor much easier to reason about and this patch is key
for a follow on patch for IMPALA-2990 where the status reporting will
be per-query rather than per-fragment-instance. Currently, the state
machine provides no other purpose, and it will mostly be used for
IMPALA-2990.

We introduce 5 possible states for the QueryState which include 3
terminal states (FINISHED, CANCELLED and ERROR) and 2 non-terminal
states (PREPARING, EXECUTING). The transition from one state to the
next is always handled by a single thread which is also the QueryState
thread. This thread will additionally bear the purpose of sending
periodic updates after IMPALA-4063, which is the primary reason behind
having only this thread modify the state of the query.

Counting barriers are introduced to keep a count of how many fragment
instances have finished Preparing and Executing. These barriers also
block until all the fragment instances have finished a respective state.
The fragment instances update the query wide query status if an error is
hit and unblocks the barrier if it is in the EXECUTING state. The
PREPARING state blocks regardless of whether a fragment instance hit an
error or not, until all the fragment instances have completed
successfully or unsuccessfully, to maintain the invariant that fragment
instances cannot be cancelled until the entire QueryState has finished
PREPARING.

The status reporting protocol has not been changed and remains exactly
as it was.

Testing:
- Added 3 failure points in the query lifecycle using debug actions
  and added tests to validate the same (extension of IMPALA-7376).
- Ran 'core' and 'exhaustive' tests.

Future related work:
1) IMPALA-2990: Make status reporting per-query.
2) Try to logically align the FIS states with the QueryState states.
3) Consider mirroring the QueryState state machine to
CoordinatorBackendState

Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
---
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M common/thrift/ImpalaInternalService.thrift
M tests/failure/test_failpoints.py
7 files changed, 286 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/10813/14
--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 14
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-08-03 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Joe McDonnell, Bikramjeet Vig, Dan Hecht, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10813

to look at the new patch set (#13).

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..

IMPALA-7163: Implement a state machine for the QueryState class

This patch adds a state machine for the QueryState class. The motivation
behind this patch is to make the query lifecycle from the point of
view of an executor much easier to reason about and this patch is key
for a follow on patch for IMPALA-2990 where the status reporting will
be per-query rather than per-fragment-instance. Currently, the state
machine provides no other purpose, and it will mostly be used for
IMPALA-2990.

We introduce 5 possible states for the QueryState which include 3
terminal states (FINISHED, CANCELLED and ERROR) and 2 non-terminal
states (PREPARING, EXECUTING). The transition from one state to the
next is always handled by a single thread which is also the QueryState
thread. This thread will additionally bear the purpose of sending
periodic updates after IMPALA-4063, which is the primary reason behind
having only this thread modify the state of the query.

Counting barriers are introduced to keep a count of how many fragment
instances have finished Preparing and Executing. These barriers also
block until all the fragment instances have finished a respective state.
The fragment instances update the query wide query status if an error is hit
and unblocks the barrier if it is in the EXECUTING state. The PREPARING
state blocks regardless of whether a fragment instance hit an error or not,
until all the fragment instances have completed successfully or unsuccessfully,
to maintain the invariant that fragment instances cannot be cancelled until
the entire QueryState has finished PREPARING.

The status reporting protocol has not been changed and remains exactly
as it was.

Testing:
- Added 3 failure points in the query lifecycle using debug actions
  and added tests to validate the same (extension of IMPALA-7376).
- Ran 'core' and 'exhaustive' tests.

Future related work:
1) IMPALA-2990: Make status reporting per-query.
2) Try to logically align the FIS states with the QueryState states.
3) Consider mirroring the QueryState state machine to CoordinatorBackendState

Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
---
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M tests/failure/test_failpoints.py
6 files changed, 283 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/10813/13
--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 13
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-08-03 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..


Patch Set 12:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10813/12/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/12/be/src/runtime/fragment-instance-state.cc@106
PS12, Line 106:   if (!status.ok() && current_state_.Load() <= 
TFInstanceExecState::WAITING_FOR_PREPARE) {
  : // Tell the managing 'QueryState' that we hit an error 
during Prepare().
  : query_state_->ErrorDuringPrepare(status, instance_id());
  :   } else if (!status.ok()) {
  : // Tell the managing 'QueryState' that we hit an error 
during execution.
  : query_state_->ErrorDuringExecute(status, instance_id());
  :   }
> Seems easier to read if written like the following:
Done


http://gerrit.cloudera.org:8080/#/c/10813/12/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/10813/12/be/src/runtime/query-state.h@244
PS12, Line 244: Status WaitForFinish();
> Is this actually still needed in this patch ? It doesn't seem to be called
We call it towards the end of StartFInstances(). We update the query state only 
after this unblocks.


http://gerrit.cloudera.org:8080/#/c/10813/12/tests/failure/test_failpoints.py
File tests/failure/test_failpoints.py:

http://gerrit.cloudera.org:8080/#/c/10813/12/tests/failure/test_failpoints.py@162
PS12, Line 162: i = 0
> This line isn't needed if you use for i in range(50) below.
When I remove this, it gives me this error:
"NameError: global name 'i' is not defined"



--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 12
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Fri, 03 Aug 2018 12:19:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7390: Pin rpc-mgr-kerberized-test to localhost.

2018-08-02 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/3 )

Change subject: IMPALA-7390: Pin rpc-mgr-kerberized-test to localhost.
..


Patch Set 1: Code-Review+2

Thanks for doing this!


--
To view, visit http://gerrit.cloudera.org:8080/3
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91003cbc86177feb7f99563f61297f7da7fabab4
Gerrit-Change-Number: 3
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 02 Aug 2018 22:54:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-08-01 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..


Patch Set 12:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10813/11/tests/failure/test_failpoints.py
File tests/failure/test_failpoints.py:

http://gerrit.cloudera.org:8080/#/c/10813/11/tests/failure/test_failpoints.py@152
PS11, Line 152: :
> flake8: E231 missing whitespace after ':'
Done


http://gerrit.cloudera.org:8080/#/c/10813/11/tests/failure/test_failpoints.py@157
PS11, Line 157: :
> flake8: E231 missing whitespace after ':'
Done


http://gerrit.cloudera.org:8080/#/c/10813/11/tests/failure/test_failpoints.py@168
PS11, Line 168: :
> flake8: E231 missing whitespace after ':'
Done



--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 12
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 02 Aug 2018 02:58:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-08-01 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Joe McDonnell, Bikramjeet Vig, Dan Hecht, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10813

to look at the new patch set (#12).

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..

IMPALA-7163: Implement a state machine for the QueryState class

This patch adds a state machine for the QueryState class. The motivation
behind this patch is to make the query lifecycle from the point of
view of an executor much easier to reason about and this patch is key
for a follow on patch for IMPALA-2990 where the status reporting will
be per-query rather than per-fragment-instance. Currently, the state
machine provides no other purpose, and it will mostly be used for
IMPALA-2990.

We introduce 5 possible states for the QueryState which include 3
terminal states (FINISHED, CANCELLED and ERROR) and 2 non-terminal
states (PREPARING, EXECUTING). The transition from one state to the
next is always handled by a single thread which is also the QueryState
thread. This thread will additionally bear the purpose of sending
periodic updates after IMPALA-4063, which is the primary reason behind
having only this thread modify the state of the query.

Counting barriers are introduced to keep a count of how many fragment
instances have finished Preparing and Executing. These barriers also
block until all the fragment instances have finished a respective state.
The fragment instances update the query wide query status if an error is hit
and unblocks the barrier if it is in the EXECUTING state. The PREPARING
state blocks regardless of whether a fragment instance hit an error or not,
until all the fragment instances have completed successfully or unsuccessfully,
to maintain the invariant that fragment instances cannot be cancelled until
the entire QueryState has finished PREPARING.

The status reporting protocol has not been changed and remains exactly
as it was.

Testing:
- Added 3 failure points in the query lifecycle using debug actions
  and added tests to validate the same (extension of IMPALA-7376).
- Ran 'core' and 'exhaustive' tests.

Future related work:
1) IMPALA-2990: Make status reporting per-query.
2) Try to logically align the FIS states with the QueryState states.
3) Consider mirroring the QueryState state machine to CoordinatorBackendState

Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
---
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M tests/failure/test_failpoints.py
6 files changed, 279 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/10813/12
--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 12
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-08-01 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Joe McDonnell, Bikramjeet Vig, Dan Hecht, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10813

to look at the new patch set (#11).

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..

IMPALA-7163: Implement a state machine for the QueryState class

This patch adds a state machine for the QueryState class. The motivation
behind this patch is to make the query lifecycle from the point of
view of an executor much easier to reason about and this patch is key
for a follow on patch for IMPALA-2990 where the status reporting will
be per-query rather than per-fragment-instance. Currently, the state
machine provides no other purpose, and it will mostly be used for
IMPALA-2990.

We introduce 5 possible states for the QueryState which include 3
terminal states (FINISHED, CANCELLED and ERROR) and 2 non-terminal
states (PREPARING, EXECUTING). The transition from one state to the
next is always handled by a single thread which is also the QueryState
thread. This thread will additionally bear the purpose of sending
periodic updates after IMPALA-4063, which is the primary reason behind
having only this thread modify the state of the query.

Counting barriers are introduced to keep a count of how many fragment
instances have finished Preparing and Executing. These barriers also
block until all the fragment instances have finished a respective state.
The fragment instances update the query wide query status if an error is hit
and unblocks the barrier if it is in the EXECUTING state. The PREPARING
state blocks regardless of whether a fragment instance hit an error or not,
until all the fragment instances have completed successfully or unsuccessfully,
to maintain the invariant that fragment instances cannot be cancelled until
the entire QueryState has finished PREPARING.

The status reporting protocol has not been changed and remains exactly
as it was.

Testing:
- Added 3 failure points in the query lifecycle using debug actions
  and added tests to validate the same (extension of IMPALA-7376).
- Ran 'core' and 'exhaustive' tests.

Future related work:
1) IMPALA-2990: Make status reporting per-query.
2) Try to logically align the FIS states with the QueryState states.
3) Consider mirroring the QueryState state machine to CoordinatorBackendState

Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
---
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M tests/failure/test_failpoints.py
6 files changed, 279 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/10813/11
--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 11
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-08-01 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..


Patch Set 11:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/10813/10/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/10/be/src/runtime/fragment-instance-state.cc@100
PS10, Line 100:   }
  :   // Tell the managing 'QueryState' that we're done with 
executing and that we've stopped
  :   // the reporting thread.
  :   query_state_->DoneExecuting();
  : 
  : done:
  :   if (!status.ok() && current_state_.Load() <= 
TFInstanceExecState::WAITING_FOR_PREPARE) {
  : // Tell the managing 'QueryS
> Should this be included moved to point after label done ? Otherwise, we may
Great catch. I took your suggestion and moved the ErrorDuring* calls to the 
'done:' section.

I also added a debug action to fail in Open()


http://gerrit.cloudera.org:8080/#/c/10813/10/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/10/be/src/runtime/query-state.cc@216
PS10, Line 216: ExecState o
> not used ?
Done


http://gerrit.cloudera.org:8080/#/c/10813/10/be/src/runtime/query-state.cc@217
PS10, Line 217:
> BackendExecState old_state = backend_exec_state_; and no need for line 220
Done


http://gerrit.cloudera.org:8080/#/c/10813/10/be/src/runtime/query-state.cc@230
PS10, Line 230:   return query_status_;
  : }
> nit: one line
Done


http://gerrit.cloudera.org:8080/#/c/10813/10/be/src/runtime/query-state.cc@408
PS10, Line 408:   ReleaseExecResourceRefcount();
  :   
ExecEnv::GetInstance()->query_exec_mgr()->ReleaseQueryState(this);
  :   break;
  : }
  : // update fragment_map_
  : vector& fis_list = 
fragment_map_[instance_ctx.fragment_idx];
  : fis_list.push_back(fis);
  : t->Detach();
  : --n
> A minor suggestion is to group this loop with ErrorDuringPrepare() below so
Good point. Done.


http://gerrit.cloudera.org:8080/#/c/10813/10/be/src/runtime/query-state.cc@435
PS10, Line 435: ReportExecStatusAux(true, thread_create_status, nullptr, 
true);
> We used to wait for all fragment instances to finish preparing before repor
No, if a Cancel() arrives due to this before all the fragment instances have 
finished preparing, Cancel() will block on WaitForPrepare().

We'll end up cancelling the query earlier which is good.


http://gerrit.cloudera.org:8080/#/c/10813/10/be/src/runtime/query-state.cc@442
PS10, Line 442: BackendExecStat
> not actually used?
Done


http://gerrit.cloudera.org:8080/#/c/10813/10/tests/failure/test_failpoints.py
File tests/failure/test_failpoints.py:

http://gerrit.cloudera.org:8080/#/c/10813/10/tests/failure/test_failpoints.py@151
PS10, Line 151: self.execute_query_expect_failure(self.client, query,
  : query_options={'debug_action':debug_action})
> Do we care about verifying the failure actually occurred ?
execute_query_expect_failure() does that internally.


http://gerrit.cloudera.org:8080/#/c/10813/10/tests/failure/test_failpoints.py@157
PS10, Line 157: query_options={'debug_action':debug_action})
  :
  : # Fail the fragment instance thread creation with a 0.5 
probability.
  : debug_action = 'FIS_FAIL
> for i in range(50):
Done


http://gerrit.cloudera.org:8080/#/c/10813/10/tests/failure/test_failpoints.py@165
PS10, Line 165: while i in range(50):
  :   try:
  : self.execute_query(query,
  : query_options={'de
> assert 'Query aborted:Debug Action: FIS_FAIL_THREAD_CREATION:FAIL@0.5' in s
Done



--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 11
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 02 Aug 2018 02:55:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7376: DCHECK hit if a fragment instance fails to initialize the filter bank

2018-08-01 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11096 )

Change subject: IMPALA-7376: DCHECK hit if a fragment instance fails to 
initialize the filter bank
..


Patch Set 3: Code-Review+2

(1 comment)

Carry +2.

http://gerrit.cloudera.org:8080/#/c/11096/2/tests/failure/test_failpoints.py
File tests/failure/test_failpoints.py:

http://gerrit.cloudera.org:8080/#/c/11096/2/tests/failure/test_failpoints.py@147
PS2, Line 147: query_options={'debug_action':debug_action})
> Should we also verify that the query actually failed ?
It is already doing that since the function called is:
execute_query_expect_failure()



--
To view, visit http://gerrit.cloudera.org:8080/11096
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1d6e7afc18fe2f0e1e29d2bd8a5f804a78f7043a
Gerrit-Change-Number: 11096
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 01 Aug 2018 23:01:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-07-31 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..


Patch Set 9:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/10813/8/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/10813/8/be/src/runtime/query-state.h@149
PS8, Line 149:  single thread
> Seems clearer to also state which thread it is.
Done


http://gerrit.cloudera.org:8080/#/c/10813/8/be/src/runtime/query-state.h@215
PS8, Line 215:   /// during Execute().
> Update 'query_status_' and 'failed_instance_id_' if it's not set already. A
We usually avoid calling out private members in the header comments of public 
functions, so I've added and reworded your suggestion to only using public 
names.


http://gerrit.cloudera.org:8080/#/c/10813/8/be/src/runtime/query-state.h@260
PS8, Line 260:   };
> nit: blank line missing.
Done


http://gerrit.cloudera.org:8080/#/c/10813/8/be/src/runtime/query-state.h@261
PS8, Line 261: //
> nit: ///
Done


http://gerrit.cloudera.org:8080/#/c/10813/8/be/src/runtime/query-state.h@268
PS8, Line 268:  single
> QueryState thread ?
Done


http://gerrit.cloudera.org:8080/#/c/10813/8/be/src/runtime/query-state.h@328
PS8, Line 328:  we recieve the finst ID and the status of the
 :   /// FIS that hit the error.
> the error status and failing fragment instance ID are set in 'query_status_
Done


http://gerrit.cloudera.org:8080/#/c/10813/8/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/8/be/src/runtime/query-state.cc@421
PS8, Line 421: TUniqueId())
> Can we also record the fragment instance id above which failed to start thr
Isn't this more of a QueryState failure though? Since it has nothing to do the 
fragment instance itself, but is rather the failure to start a thread.

If you think it'll help knowing *which* fragment instance was the first to fail 
to start, then we could add it. But I can't think of a scenario where that 
might come in handy.


http://gerrit.cloudera.org:8080/#/c/10813/8/be/src/runtime/query-state.cc@422
PS8, Line 422: Status updated_query_status = UpdateBackendExecState();
 : instances_prepared_barrier_->NotifyRemaining();
> Actually, there may be some problem calling NotifyRemaining() here. The pro
You're right. I missed that. I've added logic to Notify() the 
'instances_prepared_barrier_' as many times as required, from this thread, if 
we fail to start fragment instance threads.

I've also added a debug action here to ensure that this doesn't cause any 
crashes or hangs.


http://gerrit.cloudera.org:8080/#/c/10813/8/be/src/runtime/query-state.cc@425
PS8, Line 425: all_fis_status
> Seems clearer to use thread_create_status here like the old code.
Done



--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 9
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Wed, 01 Aug 2018 02:11:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-07-31 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Joe McDonnell, Bikramjeet Vig, Dan Hecht, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10813

to look at the new patch set (#10).

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..

IMPALA-7163: Implement a state machine for the QueryState class

This patch adds a state machine for the QueryState class. The motivation
behind this patch is to make the query lifecycle from the point of
view of an executor much easier to reason about and this patch is key
for a follow on patch for IMPALA-2990 where the status reporting will
be per-query rather than per-fragment-instance. Currently, the state
machine provides no other purpose, and it will mostly be used for
IMPALA-2990.

We introduce 5 possible states for the QueryState which include 3
terminal states (FINISHED, CANCELLED and ERROR) and 2 non-terminal
states (PREPARING, EXECUTING). The transition from one state to the
next is always handled by a single thread which is also the QueryState
thread. This thread will additionally bear the purpose of sending
periodic updates after IMPALA-4063, which is the primary reason behind
having only this thread modify the state of the query.

Counting barriers are introduced to keep a count of how many fragment
instances have finished Preparing and Executing. These barriers also
block until all the fragment instances have finished a respective state.
The fragment instances update the query wide query status if an error is hit
and unblocks the barrier if it is in the EXECUTING state. The PREPARING
state blocks regardless of whether a fragment instance hit an error or not,
until all the fragment instances have completed successfully or unsuccessfully,
to maintain the invariant that fragment instances cannot be cancelled until
the entire QueryState has finished PREPARING.

The status reporting protocol has not been changed and remains exactly
as it was.

Testing:
- Added 2 failure points in the query lifecycle using debug actions
  and added tests to validate the same (extension of IMPALA-7376).
- Ran 'core' and 'exhaustive' tests.

Future related work:
1) IMPALA-2990: Make status reporting per-query.
2) Try to logically align the FIS states with the QueryState states.
3) Consider mirroring the QueryState state machine to CoordinatorBackendState

Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
---
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M tests/failure/test_failpoints.py
6 files changed, 272 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/10813/10
--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 10
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-7376: DCHECK hit if a fragment instance fails to initialize the filter bank

2018-07-31 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11096


Change subject: IMPALA-7376: DCHECK hit if a fragment instance fails to 
initialize the filter bank
..

IMPALA-7376: DCHECK hit if a fragment instance fails to initialize the filter 
bank

While Prepare()-ing a fragment instance, if we fail to initialize the
runtime filter bank, we will exit FIS::Prepare() without acquiring a
thread token (AcquireThreadToken()):

FIS::Finalize() is called always regardless of whether the fragment
instance succeeded or failed. And FIS::Finalize() tries to ReleaseThreadToken()
even though it might not have gotten acquired, causing a DCHECK to be hit.

This patch fixes it by making sure that no failable code is run before
acquiring the thread token, thereby ensuring that the thread token is
always acquired and thus avoiding the above crash.

A test is added to confirm this as well. This test crashes without the
code changes in this patch.

Change-Id: I1d6e7afc18fe2f0e1e29d2bd8a5f804a78f7043a
---
M be/src/runtime/fragment-instance-state.cc
M tests/failure/test_failpoints.py
2 files changed, 16 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/11096/1
--
To view, visit http://gerrit.cloudera.org:8080/11096
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1d6e7afc18fe2f0e1e29d2bd8a5f804a78f7043a
Gerrit-Change-Number: 11096
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-6644: Add recent heartbeat timestamp into Statestore metric

2018-07-30 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11052 )

Change subject: IMPALA-6644: Add recent heartbeat timestamp into Statestore 
metric
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/11052/2/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/11052/2/be/src/statestore/statestore.h@442
PS2, Line 442: of a recent heartbeat
... of the most recently logged heartbeat ...


http://gerrit.cloudera.org:8080/#/c/11052/2/be/src/statestore/statestore.h@446
PS2, Line 446: recent_heartbeat_timestamp_
This name sounds kind of misleading. To be more accurate, isn't it the " most 
recently logged heartbeat timestamp" ? Since the actual default heartbeat 
frequency is much smaller than the default 
'statestore_heartbeat_log_frequency_seconds'.

I'd suggest changing the name to 'recent_logged_heartbeat_ts_', and update the 
comments and related functions to reflect that too.


http://gerrit.cloudera.org:8080/#/c/11052/2/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/11052/2/be/src/statestore/statestore.cc@77
PS2, Line 77: statestore_heartbeat_log_frequency_seconds
Just wondering if it's necessary to make this configurable. Are there scenarios 
where users may want to change this, and if so, what's the benefit they get 
from doing so? If we don't have a strong reason to keep this flag, my vote is 
to get rid of it and keep 60 seconds (or some reasonable value) as a constant.

The more flags we add, the larger our test matrix needs to become, or we result 
in more untested code paths. So, I'd say we should add it only if it's 
absolutely necessary.


http://gerrit.cloudera.org:8080/#/c/11052/2/be/src/statestore/statestore.cc@389
PS2, Line 389: DCHECK
Use DCHECK_GT



--
To view, visit http://gerrit.cloudera.org:8080/11052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I754adccc4569e8219d5d01500cccdfc8782953f7
Gerrit-Change-Number: 11052
Gerrit-PatchSet: 2
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 30 Jul 2018 18:33:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-07-30 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.cc@479
PS7, Line 479:   if (!is_cancelled_.CompareAndSwap(0, 1)) return;
 :   for (auto entry: fis_map_) entry.second->Cancel();
 : }
 :
> As discussed in person, we this may overwrite earlier error status during p
Yes, you're right. After changing the patch to have the fragment instances 
themselves update the query status, this isn't the right thing to do. I've 
removed this now.



--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 9
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 30 Jul 2018 18:14:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-07-30 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Joe McDonnell, Bikramjeet Vig, Dan Hecht, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10813

to look at the new patch set (#9).

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..

IMPALA-7163: Implement a state machine for the QueryState class

This patch adds a state machine for the QueryState class. The motivation
behind this patch is to make the query lifecycle from the point of
view of an executor much easier to reason about and this patch is key
for a follow on patch for IMPALA-2990 where the status reporting will
be per-query rather than per-fragment-instance. Currently, the state
machine provides no other purpose, and it will mostly be used for
IMPALA-2990.

We introduce 5 possible states for the QueryState which include 3
terminal states (FINISHED, CANCELLED and ERROR) and 2 non-terminal
states (PREPARING, EXECUTING). The transition from one state to the
next is always handled by a single thread which is also the QueryState
thread. This thread will additionally bear the purpose of sending
periodic updates after IMPALA-4063, which is the primary reason behind
having only this thread modify the state of the query.

Counting barriers are introduced to keep a count of how many fragment
instances have finished Preparing and Executing. These barriers also
block until all the fragment instances have finished a respective state.
The fragment instances update the query wide query status if an error is hit
and unblocks the barrier if it is in the EXECUTING state. The PREPARING
state blocks regardless of whether a fragment instance hit an error or not,
until all the fragment instances have completed successfully or unsuccessfully,
to maintain the invariant that fragment instances cannot be cancelled until
the entire QueryState has finished PREPARING.

The status reporting protocol has not been changed and remains exactly
as it was.

Testing: Ran 'core' and 'exhaustive' tests. TODO: Run stress tests.

Future related work:
1) IMPALA-2990: Make status reporting per-query.
2) Try to logically align the FIS states with the QueryState states.
3) Consider mirroring the QueryState state machine to CoordinatorBackendState

Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
---
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
5 files changed, 229 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/10813/9
--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 9
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-3825: Distribute Runtime Filtering Aggregation

2018-07-29 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11055 )

Change subject: IMPALA-3825: Distribute Runtime Filtering Aggregation
..


Patch Set 3:

(40 comments)

http://gerrit.cloudera.org:8080/#/c/11055/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11055/3//COMMIT_MSG@14
PS3, Line 14:
Add comments about the following, so that it's easier for reviewers:

- Explain in a para or two how your patch achieves the distribution; i.e. 
explain your approach in plain english.
- What kind of new failure modes can happen because of this change, as opposed 
to before? (Talk about the Exec() RPC race with the UpdateFilter() RPC)
- How long do we expect the aggregators to be accessible? i.e. what is the 
lifetime of an aggregator tied to?
- How are you updating the runtime profile?


http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/coordinator-backend-state.cc@95
PS3, Line 95: x
naming


http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/coordinator-backend-state.cc@98
PS3, Line 98: LOG(INFO) << "DebuggingPublishFilter AggregatorAddress " << 
tfs.aggregator_address;
remove


http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/coordinator-backend-state.cc@95
PS3, Line 95:   for (auto const& x : filter_routing_table) {
: TFilterState tfs;
: x.second.ToThrift();
: LOG(INFO) << "DebuggingPublishFilter AggregatorAddress " << 
tfs.aggregator_address;
: rpc_params->filter_routing_table.insert(
: std::pair(x.first, tfs));
:   }
Add a comment about what you're doing here.


http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/coordinator.cc@260
PS3, Line 260: x
naming. Call it 'filter' or something similar


http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/coordinator.cc@262
PS3, Line 262: num_filters_
Is this needed as a member variable? Doesn't seem to be used anywhere.


http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/coordinator.cc@690
PS3, Line 690:   if (params.__isset.filter_updates_received) {
Add a comment above this line:

"Update aggregator's filter metrics in the runtime profile"


http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/coordinator.cc@691
PS3, Line 691: if (backend_state->GetNumReceivedFilters() < 
params.filter_updates_received) {
Add another comment above this line:
"Make sure not to double count filter updates from the same aggregator."


http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/coordinator.cc@690
PS3, Line 690:   if (params.__isset.filter_updates_received) {
 : if (backend_state->GetNumReceivedFilters() < 
params.filter_updates_received) {
 :   filter_updates_received_->Add(
 :   params.filter_updates_received - 
backend_state->GetNumReceivedFilters());
 :   
backend_state->SetNumReceivedFilters(params.filter_updates_received);
 : }
 :   }
There's a race here. If 2 updates for the same Backend execute in parallel, 
you'll end up having an incorrect number of updated filters.


http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/coordinator.cc@884
PS3, Line 884:   LOG(INFO)
 :   << "DebuggingPublishFilter Coordinator sending filter 
to fragment with id "
 :   << fragment_idx;
remove


http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/filter-state.h
File be/src/runtime/filter-state.h:

http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/filter-state.h@141
PS3, Line 141: FilterTarget
const ref


http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/filter-state.h@150
PS3, Line 150: /// Need to cast the int type of this class to int32_t of thrift
Still needed?


http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/filter-state.h@151
PS3, Line 151: set
Why not unordered?


http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/filter-state.h@152
PS3, Line 152: int i
int32_t here and remove the cast in the next line. Also, rename to 'idx'.


http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/filter-state.h@167
PS3, Line 167: TFilterTarget
const ref


http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/filter-state.h@171
PS3, Line 171: boost
std


http://gerrit.cloudera.org:8080/#/c/11055/3/be/src/runtime/filter-state.h@130
PS3, Line 130:   bool disabled() const {
 : if (is_bloom_filter()) {
 :   return bloom_filter_.always_true;
 : } else {
 :   DCHECK(is_min_max_filter());
 :   return 

[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-07-27 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@308
PS6, Line 308:
 :   /// Number of active fragment instances and coordinators for 
this query that may consume
 :   /// resources for query execution (i.e. threads, memory) on 
the Impala daemon.
 :   /// Query-wide execution resources for this query are released 
once this goes to zero.
 :   AtomicInt32 exec_resource_refcnt_;
 :
 :   /// Temporary files for this query (owned by obj_pool_). 
Non-null if spilling is
 :   /// enabled. Set in Pr
> Now that the patch has sunk in more, I think the state machine the query st
Yes, this is correct. With IMPALA-4063, we will report errors earlier in the 
lifecycle first (i.e. PREPARING errors before EXECUTING errors).

I think reasoning about errors earlier in the query lifecycle is easier than  
reasoning about errors hit first in terms of a timestamp.



--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 8
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Fri, 27 Jul 2018 21:48:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-07-27 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..


Patch Set 8:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/fragment-instance-state.cc@125
PS7, Line 125: DCHECK_
> Does DCHECK_EQ() not work ?
Done


http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.h@67
PS7, Line 67:
: /// Eg: We transition from the PREPARING state to the EXECUTING 
state only if *all* the
: /// underlying fragment instances have finished Prepare().
> This is not true for PREPARING phase, right ? Seems easier to separately de
Done


http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.h@70
PS7, Line 70: ansitioni
> PREPARING ?
Done


http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.h@71
PS7, Line 71: y fragment instance hits an error or cancella
> This is not necessarily true, right ? In PREPARING state, it only means at
You're right. I changed the wording now.


http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.h@264
PS7, Line 264:
> May want to document the thread safety of this variable. I suppose it's onl
Yup, done.


http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.h@287
PS7, Line 287: /// Protected by 'status_lock_'.
> Please consider documenting the thread safety of this variable too.
Done


http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.cc@230
PS7, Line 230: BackendExecState::EXECUTING : BackendExecState::FINISHED;
 :   }
> nit: one line
Done


http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.cc@479
PS7, Line 479:   {
 : unique_lock l(status_lock_);
 : query_status_ = Status::CANCELLED;
 :   }
> Should this happen after line 483 below ?
No, it makes sense to preemptively set this since a Cancel() is a query wide 
operation and not specific to a fragment instance.

The ErrorDuring*() functions don't update the 'query_status_' if it's not OK, 
so that's not an issue.


http://gerrit.cloudera.org:8080/#/c/10813/8/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/8/be/src/runtime/query-state.cc@390
PS8, Line 390: // Fragment instance successfully started
 : fis_map_.emplace(fis->instance_id(), fis);
I had to move this up here since there's a race with the coordinator between 
L141 and L156 in the coordinator.cc file



--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 8
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Fri, 27 Jul 2018 21:42:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-07-27 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Joe McDonnell, Bikramjeet Vig, Dan Hecht, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10813

to look at the new patch set (#8).

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..

IMPALA-7163: Implement a state machine for the QueryState class

This patch adds a state machine for the QueryState class. The motivation
behind this patch is to make the query lifecycle from the point of
view of an executor much easier to reason about and this patch is key
for a follow on patch for IMPALA-2990 where the status reporting will
be per-query rather than per-fragment-instance. Currently, the state
machine provides no other purpose, and it will mostly be used for
IMPALA-2990.

We introduce 5 possible states for the QueryState which include 3
terminal states (FINISHED, CANCELLED and ERROR) and 2 non-terminal
states (PREPARING, EXECUTING). The transition from one state to the
next is always handled by a single thread which is also the QueryState
thread. This thread will additionally bear the purpose of sending
periodic updates after IMPALA-4063, which is the primary reason behind
having only this thread modify the state of the query.

Counting barriers are introduced to keep a count of how many fragment
instances have finished Preparing and Executing. These barriers also
block until all the fragment instances have finished a respective state.
The fragment instances update the query wide query status if an error is hit
and unblocks the barrier if it is in the EXECUTING state. The PREPARING
state blocks regardless of whether a fragment instance hit an error or not,
until all the fragment instances have completed successfully or unsuccessfully,
to maintain the invariant that fragment instances cannot be cancelled until
the entire QueryState has finished PREPARING.

The status reporting protocol has not been changed and remains exactly
as it was.

Testing: Ran 'core' and 'exhaustive' tests. TODO: Run stress tests.

Future related work:
1) IMPALA-2990: Make status reporting per-query.
2) Try to logically align the FIS states with the QueryState states.
3) Consider mirroring the QueryState state machine to CoordinatorBackendState

Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
---
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
5 files changed, 233 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/10813/8
--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 8
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] [WIP] IMPALA-6153: Execute UpdateFilter() only for executing queries

2018-07-27 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11005 )

Change subject: [WIP] IMPALA-6153: Execute UpdateFilter() only for executing 
queries
..


Patch Set 4:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/11005/4/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/11005/4/be/src/runtime/coordinator.h@302
PS4, Line 302: it
nit: this lock


http://gerrit.cloudera.org:8080/#/c/11005/4/be/src/runtime/coordinator.h@309
PS4, Line 309: releases the memory associated with
 :   /// filter_routing_table_
nit: alters the filter_routing_table_


http://gerrit.cloudera.org:8080/#/c/11005/4/be/src/runtime/coordinator.h@311
PS4, Line 311: boost::shared_mutex
We have to ensure that this is a 'writer preferred' shared mutex. It turns out 
that boost doesn't have an option to specify a preference, and uses a "fair" 
algorithm instead to decide which waiter to schedule next, so I guess this is 
fine.


http://gerrit.cloudera.org:8080/#/c/11005/4/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/11005/4/be/src/runtime/coordinator.cc@79
PS4, Line 79: exec_state_.Load(), ExecState::EXECUTING
not your change, but replace with:
!IsExecuting()


http://gerrit.cloudera.org:8080/#/c/11005/4/be/src/runtime/coordinator.cc@397
PS4, Line 397: lock_guard l(filter_update_lock_);
Why not just get exclusive access to 'filter_lock_' here?


http://gerrit.cloudera.org:8080/#/c/11005/4/be/src/runtime/coordinator.cc@454
PS4, Line 454: exec_state_.Load() != ExecState::EXECUTING
not your change, but replace with:
!IsExecuting()


http://gerrit.cloudera.org:8080/#/c/11005/4/be/src/runtime/coordinator.cc@714
PS4, Line 714: exec_state_.Load() == ExecState::EXECUTING
not your change, but replace with:
IsExecuting()


http://gerrit.cloudera.org:8080/#/c/11005/4/be/src/runtime/coordinator.cc@871
PS4, Line 871: if (!IsExecuting()) return;
Instead of doing this here, I would say you can replace the IsDone() check 
inside CoordinatorBackendState::PublishFilter() with this IsExecuting() check.

CoordinatorBackendState has a reverse reference to the Coordinator object.



--
To view, visit http://gerrit.cloudera.org:8080/11005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I669db217f86db5ff2802335e7b1ae8027ea7161c
Gerrit-Change-Number: 11005
Gerrit-PatchSet: 4
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 27 Jul 2018 21:17:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

2018-07-25 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
..


Patch Set 4:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/coordinator.h@31
PS4, Line 31: #include "kudu/util/slice.h"
Not needed?


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/exec-env.h
File be/src/runtime/exec-env.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/exec-env.h@136
PS4, Line 136: DataStreamService* data_svc() const { return data_svc_.get(); }
Not used, we can remove this.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/query-state.cc@290
PS2, Line 290: query_ctx().coord_address.hostname, );
> We do retry when the server is too busy. There is currently no timeout for
I think I'm missing something here. Where is the code that does the retry?

Also, if there's no timeout, wouldn't this hang if the coordinator node goes 
offline, since we wouldn't get a "Connection timed out" error?


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/client-request-state.h@32
PS4, Line 32: #include "gen-cpp/control_service.pb.h"
Not needed. Forward declaring ReportExecStatusRequestPB should suffice.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/client-request-state.h@35
PS4, Line 35: #include "gen-cpp/RuntimeProfile_types.h"
Not needed. Forward declaring TRuntimeProfileTree should suffice.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h
File be/src/service/control-service.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h@23
PS4, Line 23: #include "common/status.h"
: #include "runtime/mem-tracker.h"
Not required. You can forward declare 'Status', 'MemTracker' and 'MetricGroup' 
instead, and include "runtime/mem-tracker.h" in the .cc file.

I just checked and the same applies to data-stream-service.h


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h@34
PS4, Line 34: class ImpalaServer;
Not required.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h@53
PS4, Line 53:/// Reference to the singleton ImpalaServer object. Not owned.
:ImpalaServer* impala_server_ = nullptr;
This is probably not needed. You can get the ImpalaServer using:
ExecEnv::GetInstances()->impala_server()


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@60
PS4, Line 60: num_svc_threads
Was there any noticeable slowdown on stress workloads since we now have only a 
limited number of threads that can process reports in parallel, vs. before 
where we could have an unbounded number of virtual threads with Thrift?

I'm guessing not, since parallelism would be bounded by the number of physical 
cores, but just thought I'd check.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@71
PS4, Line 71:
Shouldn't we add:
FAULT_INJECTION_RPC_DELAY(RPC_REPORTEXECSTATUS); ?


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@111
PS4, Line 111: dummy
nit: empty


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/util/error-util-internal.h
File be/src/util/error-util-internal.h:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/util/error-util-internal.h@26
PS2, Line 26: /// Factor out the following structures from 'error-util.h' to 
prevent circular dependency
> There is circular dependency between control_service.pb.h and some files in
Just curious but what was the circular dependency? Is there a way to get rid of 
it using forward declares so that this new file isn't needed?


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc
File be/src/util/error-util.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc@144
PS4, Line 144: for (auto iter : entry.messages()) *stream << iter << "\n";
Formatting. Also, 'iter' isn't an iterator, it's the element itself, so we'd 
prefer to call it 'msg' or something similar.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc@170
PS4, Line 170: for (auto iter : entry.messages()) target->add_messages(iter);
Formatting, we should avoid single line for loops.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc@194
PS4, Line 194: if (target.messages_size() == 0) target.add_messages(e.msg());
Formatting



[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

2018-07-25 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10855/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10855/4//COMMIT_MSG@16
PS4, Line 16: This patch also introduces a new service pool for all query 
execution
: control related RPCs in the future so that control commands from
: coordinators aren't blocked by long-running DataStream services' 
RPCs.
> do we want to consider the ability to put these on separate TCP connections
+1 for this.

I guess another way to do this would be to slightly modify how 
ConnectionId::Equals() works, which today matches the remote_ Sockaddr and the 
hostname_, and user credentials only:
https://github.com/apache/kudu/blob/master/src/kudu/rpc/connection_id.cc#L70-L74

We can change it to include another field to match based on the type of RPC 
(call it 'proxy_hash_' or something). This can be done by making sure that 
different Proxy objects get different conn_id_ fields by initializing their 
ConnectionIds with these unique 'proxy_hash_' fields.
https://github.com/apache/kudu/blob/master/src/kudu/rpc/proxy.cc#L68

This way FindConnection() would only find TCP connections meant to be used by 
that Proxy():
https://github.com/apache/kudu/blob/master/src/kudu/rpc/reactor.cc#L489


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@72
PS4, Line 72:   // TODO: implement something more efficient here, we're 
currently
:   // acquiring/releasing the map lock and doing a map lookup for
:   // every report (assign each query a local int32_t id and use 
that to index into a
:   // vector of ClientRequestStates, w/o lookup or locking?)
> seems like an easy fix here is to use a RWLock since we expect that queries
I'd done an analysis on using a R/W lock for the ClientRequestState map earlier 
and found that it could starve writers pretty badly, which means that new 
queries could be starved on admittance, badly affecting user experience: 
IMPALA-4456

In any case, I've sharded the map as part of the above JIRA and this comment 
predates my patch. So I don't think this is as big an issue anymore, and even 
less so after IMPALA-4063. I'd say we could get rid of this TODO.



--
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 25 Jul 2018 22:08:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-07-24 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..


Patch Set 7:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/fragment-instance-state.cc@a113
PS6, Line 113:
> May make sense to replace this by checking the state. Doing a racy read of
I'm not sure if there's a way to do a racy read on an AtomicEnum, so I've used 
current_state_.Load() here.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@152
PS6, Line 152:   /// Not idempotent, not thread-safe. Must only be called by a 
single thread.
> update comment to mention that it blocks till a terminal state has been rea
Done


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@156
PS6, Line 156: mentInstanceState* GetFInstanc
> nit: avoid reference to private methods. how about switching the descriptio
Done


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@198
PS6, Line 198:  void DonePreparing() { d
> If you take the suggestion to only keep a single 'query_status_', there doe
Done


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@212
PS6, Line 212:   }
> This is only meaningful when there is an error status, right ? Please docum
Done


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@234
PS6, Line 234: on
> fragment instance
Done


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@271
PS6, Line 271:
> Consider calling it "PREPARING"
Done


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@274
PS6, Line 274:
> Consider calling it "EXECUTING"
Done


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@281
PS6, Line 281: / The ove
> when is the cancelled state set?
I've added a change to Cancel() to make the query_status_ Status::CANCELLED. 
Anytime UpdateBackendExecState() is called after that, the 
BackendExecState::CANCELLED state will be set.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@289
PS6, Line 289:
> if cancelled, will the status be Status::CANCELLED? if yes that is also cou
You're right. I've fixed that in the next patch set.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@308
PS6, Line 308:
 :   /// Number of active fragment instances and coordinators for 
this query that may consume
 :   /// resources for query execution (i.e. threads, memory) on 
the Impala daemon.
 :   /// Query-wide execution resources for this query are released 
once this goes to zero.
 :   AtomicInt32 exec_resource_refcnt_;
 :
 :   /// Temporary files for this query (owned by obj_pool_). 
Non-null if spilling is
 :   /// enabled. Set in Pr
> It seems simpler to keep track of a single status for the first error hit i
I feel that it's not ideal having the fragment instance threads modify the 
query wide status. Now the patch looks like this:
The fragment instance threads modify the query wide Status to show that there's 
an error, but they don't modify the query BackendExecState. The 
BackendExecState is modified by the query state thread. This seems a little 
counter intuitive, since the fragment instance thread could itself modify the 
BackendExecState if it wants to as it has all the information to do so when it 
hits an error. But that would mean that sending final reports, etc. would have 
to be driven by the fragment instance thread too, which is not a good idea.

In any case, I'll go ahead with this, as functionally it'll have the same 
behavior, and it's not a huge problem, so we can avoid more back and forth 
about it.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@320
PS6, Line 320:
> Mind documenting the thread safety about this variable ?
Done


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@233
PS6, Line 233: return query_status_;
> unused variable. maybe just pass a status object to this method if the fins
I was printing this to the logs in the previous patchset, but I removed it 
since that was one of the comments. So, in this patch, we aren't using the 
'finst_id_' part of the struct, but as part of a following change, we will 
require it (IMPALA-4063). This patch is primarily meant as a set up to do that 
change, so this state machine isn't really used for anything until that follow 
on patch comes in.

In any case, 

[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-07-24 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Joe McDonnell, Bikramjeet Vig, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10813

to look at the new patch set (#7).

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..

IMPALA-7163: Implement a state machine for the QueryState class

This patch adds a state machine for the QueryState class. The motivation
behind this patch is to make the query lifecycle from the point of
view of an executor much easier to reason about and this patch is key
for a follow on patch for IMPALA-2990 where the status reporting will
be per-query rather than per-fragment-instance. Currently, the state
machine provides no other purpose, and it will mostly be used for
IMPALA-2990.

We introduce 5 possible states for the QueryState which include 3
terminal states (FINISHED, CANCELLED and ERROR) and 2 non-terminal
states (PREPARING, EXECUTING). The transition from one state to the
next is always handled by a single thread which is also the QueryState
thread. This thread will additionally bear the purpose of sending
periodic updates after IMPALA-4063, which is the primary reason behind
having only this thread modify the state of the query.

Counting barriers are introduced to keep a count of how many fragment
instances have finished Preparing and Executing. These barriers also
block until all the fragment instances have finished a respective state.
The fragment instances update the query wide query status if an error is hit
and unblocks the barrier if it is in the EXECUTING state. The PREPARING
state blocks regardless of whether a fragment instance hit an error or not,
until all the fragment instances have completed successfully or unsuccessfully,
to maintain the invariant that fragment instances cannot be cancelled until
the entire QueryState has finished PREPARING.

The status reporting protocol has not been changed and remains exactly
as it was.

Testing: Ran 'core' and 'exhaustive' tests. TODO: Run stress tests.

Future related work:
1) IMPALA-2990: Make status reporting per-query.
2) Try to logically align the FIS states with the QueryState states.
3) Consider mirroring the QueryState state machine to CoordinatorBackendState

Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
---
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
5 files changed, 231 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/10813/7
--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 7
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-7212: Removes --use krpc flag and remove old DataStream services

2018-07-23 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10835 )

Change subject: IMPALA-7212: Removes --use_krpc flag and remove old DataStream 
services
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed200751508478a3d728a917448f2dabfc67c3
Gerrit-Change-Number: 10835
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 23 Jul 2018 18:43:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] KUDU-2492: Make the use of SO REUSEPORT conditional on it being defined

2018-07-19 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10994


Change subject: KUDU-2492: Make the use of SO_REUSEPORT conditional on it being 
defined
..

KUDU-2492: Make the use of SO_REUSEPORT conditional on it being defined

A recent commit to Kudu, "rpc: add experimental rpc_reuseport flag",
added the use of the rpc flag SO_REUSEREPORT. This flag is not
available with older versions of Linux, resulting in a compiler error.

This patch avoids the compiler error with a macro that checks if
SO_REUSEPORT is defined, and if it's not attempting to set it will
return an error.

--

IMPALA-7302: This is cherry-picked to fix builds breaking on CentOS 6.4.
Since some of our Jenkins machines are CentOS 6.4, and upgrading them
to our new minimum supported OS of CentOS 6.8 is non-trivial, we cherry-
pick this patch to temporarily unblock these builds until the Jenkins
AMIs are upgraded.

Change-Id: I042125cdaedafa5ebbc09e5a3c12112dfeec59a1
---
M be/src/kudu/util/net/socket.cc
1 file changed, 8 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/10994/1
--
To view, visit http://gerrit.cloudera.org:8080/10994
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I042125cdaedafa5ebbc09e5a3c12112dfeec59a1
Gerrit-Change-Number: 10994
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Thomas Marshall 


[Impala-ASF-CR] IMPALA-3825: [WIP]Distributed RuntimeFiltering Work in progress patch for Distributing runtime filtering aggregation.

2018-07-19 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10937 )

Change subject: IMPALA-3825: [WIP]Distributed RuntimeFiltering Work in progress 
patch for Distributing runtime filtering aggregation.
..


Patch Set 1:

(9 comments)

I added some high level comments. After addressing these high level comments, 
you can clean up the code so we can do a second round of reviews.

http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/common/global-flags.cc@132
PS1, Line 132: DEFINE_bool(enable_distributed_filter_aggregation, true,
 : "Enables aggregation of filters"
 : "in a distributed manner, set to false to revert to 
coordinator based aggregation");
After looking through the code, it seems better to avoid having this flag. 
Let's get rid of coordinator-filter-state.* completely, and go with the new 
approach you've done. When we need to compare the 2 approaches for benchmarking 
purposes, we can use 2 branches.


http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/coordinator-backend-state.cc@93
PS1, Line 93: x.second.ToThrift();
Move inside if(), since it's only needed in that case.


http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/coordinator-backend-state.cc@94
PS1, Line 94: AggregatorRoutingTable::const_iterator it = 
aggregator_routing_table.find(x.first);
: if (it != aggregator_routing_table.end()) {
:   tfs.__set_aggregator_address(it->second);
:   rpc_params->filter_routing_table.insert(
:   std::pair(x.first, tfs));
: }
Wouldn't all filters that are present in 'filter_routing_table' be present in 
'aggregator_routing_table' ?

If that's the case, we're doing the same work multiple times here (once per 
BackendState), and we would probably be better off not having the 
'aggregator_routing_table' at all and just set the aggregator address in 
'FilterState' as part of InitFilterRoutingTable().


http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/coordinator-backend-state.cc@101
PS1, Line 101:
We also don't need to send the filter routing table to the nodes that don't 
aggregate or don't produce right? If it's not simple to do, you can leave a 
TODO for now.


http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/coordinator.cc@284
PS1, Line 284:
We can set the aggregator address in the FilterState here, and not have the 
aggregator_routing_table.


http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/query-exec-mgr.h
File be/src/runtime/query-exec-mgr.h:

http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/query-exec-mgr.h@22
PS1, Line 22: #include 
None of these new #includes should be needed since you haven't added any new 
code to this file.


http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/query-state.cc@559
PS1, Line 559:
Add a TODO here to call PublishFilterFromAggregator() asynchronously using a 
thread pool.


http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/10937/1/be/src/runtime/runtime-filter-bank.cc@205
PS1, Line 205:   SendFilterToAggregator(
What if the aggregator and the producer are on the same node? We can consider 
short-circuiting that path. Add a TODO for that.


http://gerrit.cloudera.org:8080/#/c/10937/1/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/10937/1/bin/bootstrap_toolchain.py@430
PS1, Line 430: "flatbuffers", "gcc", "gflags", "glog",
I'm guessing this was a mistake?



--
To view, visit http://gerrit.cloudera.org:8080/10937
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0289ee39787d8e9c13a16c3d7d64f471bc554536
Gerrit-Change-Number: 10937
Gerrit-PatchSet: 1
Gerrit-Owner: Rahul Shivu Mahadev 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 19 Jul 2018 16:42:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7298: Stop passing IP address as hostname in Kerberos principal

2018-07-18 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10980 )

Change subject: IMPALA-7298: Stop passing IP address as hostname in Kerberos 
principal
..


Patch Set 1: Code-Review+2

Forgot to add +2


--
To view, visit http://gerrit.cloudera.org:8080/10980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e3e978746cf03766eee151835aad5877d9ed63e
Gerrit-Change-Number: 10980
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 19 Jul 2018 00:54:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7298: Stop passing IP address as hostname in Kerberos principal

2018-07-18 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10980 )

Change subject: IMPALA-7298: Stop passing IP address as hostname in Kerberos 
principal
..


Patch Set 1:

(1 comment)

Patch LGTM. Thanks for doing this.

http://gerrit.cloudera.org:8080/#/c/10980/1/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/10980/1/be/src/runtime/krpc-data-stream-sender.cc@598
PS1, Line 598: destinations[i].thrift_backend.hostname,
> destinations[i].krpc_backend.hostname is actually set to the resolved IP ad
Ok thanks for the explanation.



--
To view, visit http://gerrit.cloudera.org:8080/10980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e3e978746cf03766eee151835aad5877d9ed63e
Gerrit-Change-Number: 10980
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 19 Jul 2018 00:53:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7298: Stop passing IP address as hostname in Kerberos principal

2018-07-18 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10980 )

Change subject: IMPALA-7298: Stop passing IP address as hostname in Kerberos 
principal
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10980/1/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/10980/1/be/src/runtime/krpc-data-stream-sender.cc@598
PS1, Line 598: destinations[i].thrift_backend.hostname,
Instead of adding this new parameter, would it be possible to ensure that 
"destinations[i].krpc_backend.hostname" is set? That way, we can avoid a lot of 
code changes.



--
To view, visit http://gerrit.cloudera.org:8080/10980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e3e978746cf03766eee151835aad5877d9ed63e
Gerrit-Change-Number: 10980
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 19 Jul 2018 00:36:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7299: [DOCS] A known issue with IMPALA-7298

2018-07-16 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10952 )

Change subject: IMPALA-7299: [DOCS] A known issue with IMPALA-7298
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5b8104a2747b4e8051d4bdcab906486444680218
Gerrit-Change-Number: 10952
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 16 Jul 2018 18:10:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7212: Deprecate --use krpc flag and remove old DataStream services

2018-07-12 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10835 )

Change subject: IMPALA-7212: Deprecate --use_krpc flag and remove old 
DataStream services
..


Patch Set 3: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10835/3/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/10835/3/be/src/common/global-flags.cc@40
PS3, Line 40: DEFINE_int32(krpc_port, 27000,
: "port on which KRPC based ImpalaInternalService is exported");
Now that we're exposing this, we should probably document this (if we're not 
already doing so).


http://gerrit.cloudera.org:8080/#/c/10835/1/tests/custom_cluster/test_rpc_exception.py
File tests/custom_cluster/test_rpc_exception.py:

http://gerrit.cloudera.org:8080/#/c/10835/1/tests/custom_cluster/test_rpc_exception.py@a73
PS1, Line 73:
> ReportExecStatus() can be very timing sensitive as the number of iterations
So that goes to say that we won't benefit much from running with this for 
ReportExecStatus() or CancelQueryFInstnaces() ?



--
To view, visit http://gerrit.cloudera.org:8080/10835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed200751508478a3d728a917448f2dabfc67c3
Gerrit-Change-Number: 10835
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 12 Jul 2018 21:39:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-07-11 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..


Patch Set 6:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-exec-mgr.cc@128
PS4, Line 128:
> This seems to fit into IMPALA-4063 better. As the code stands in this patch
I removed this and moved the code inside StartFInstances(). I'll bring this 
back with IMPALA-4063


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@159
PS4, Line 159:  until all
> IMPALA-4063
Removed this.


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@278
PS4, Line 278: /// CANCELLED: This query received a CancelQueryFInstances() 
RPC or was directed by
> No implication on whether all fragment instances have finished cancelling t
Yes, added a line that explains that.


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@292
PS4, Line 292: ackendExecS
> Did you mean IMPALA-4063 ? Does it mean this comment needs to be removed. M
Yes I meant IMPALA-4063. Removed this.


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@350
PS4, Line 350: /// created in St
> std::unique_ptr
Done


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.h@355
PS4, Line 355: /// whether they
> std::unique_ptr
Done


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@a350
PS4, Line 350:
 :
> Was this a bug ?! Will we hang in the for loop at line 363 below if we fail
This technically is a race, but we never hit it because none of the fragment 
instances try to access fis_map_. I changed it back to the original way.

The reason I had this change in the first place was because one of my 
intermediate patches hit that error, but I don't need this change now.


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@141
PS4, Line 141: instances_prepared_barrier_
> As discussed offline, the behavior during "preparation" phase of the query
I changed this based on our offline discussion.

The gist is that I removed the FIS::prepared_promise_ since it was unnecessary 
and introduced a 'status_lock_' and a 'prepared_status_'. Each FIS will try to 
update 'prepared_status_' on its own when it hits an error.


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@225
PS4, Line 225: DCHECK(false
> DCHECK(false) ?! Does this warrant a targeted BE test ?
Oops, fixed. I don't think we'll be able to write a meaningful BE test for the 
state machine, because we'll be forcing it into cases we never expect the code 
to hit, just to see an error message.


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@230
PS4, Line 230: s QueryState::UpdateBackendExecState(
> This function seems like a no-op in this patch. Can this be added in the ne
Yes, I removed it.


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@278
PS4, Line 278:   ReportExecStatusAux(done, status, fis, true);
 : }
 :
 : void QueryState::ReportExecStatusAux(bool done, const Status& 
status,
 : FragmentInstanceState* fis, bool instances_started) {
 :   // if we're reporting an error, we're done
 :   DCHECK(status.ok() || done
> Will this actually be duplicating the error message which we usually log al
The first operator to hit an error will log the error itself. This is a query 
wide acknowledgement of that error in an executor. Subsequent errors hit in the 
same query won't show up here.

We have a pattern of multiple places logging the same error already, and I 
think it's helpful if we have an acknowledgement of the query status of a query 
in an executor.

Anyway, since we discussed and agreed to remove this, I'll do so.


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@391
PS4, Line 391: Status status = DescriptorTbl::Create(
> As discussed offline, there is no need to have a different implementation f
Changed based on our offline discussion.


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@434
PS4, Line 434: rence count
> IMPALA-4063 ?
Done


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@502
PS4, Line 502: initiate cancellation if nobody
> Please see comments in query-exec-mgr.cc
Done


http://gerrit.cloudera.org:8080/#/c/10813/4/be/src/runtime/query-state.cc@542
PS4, Line 542:
 :
> Not sure what you think 

[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-07-11 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Joe McDonnell, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10813

to look at the new patch set (#6).

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..

IMPALA-7163: Implement a state machine for the QueryState class

This patch adds a state machine for the QueryState class. The motivation
behind this patch is to make the query lifecycle from the point of
view of an executor much easier to reason about and this patch is key
for a follow on patch for IMPALA-2990 where the status reporting will
be per-query rather than per-fragment-instance. Currently, the state
machine provides no other purpose, and it will mostly be used for
IMPALA-2990.

We introduce 5 possible states for the QueryState which include 3
terminal states (FINISHED, CANCELLED and ERROR) and 2 non-terminal
states (INITIALIZED, PREPARED). The transition from one state to the
next is always handled by a single thread which is also the QueryState
thread. This thread will additionally bear the purpose of sending
periodic updates after IMPALA-4063, which is the primary reason behind
having only this thread modify the state of the query.

2 counting barriers are introduced which are used to indicate how many
fragment instances have finished Preparing and Executing. We
use CountingBarriers to allow fragment instance threads to communicate
their completion of their respective states or errors with the QueryState
thread. Due to this design, it's possible for the fragment instances to
have collectively moved on to future states while the query state thread
lags behind in realizing that. However, that's not a concern for us
since we only care about showing a unified view of what's happening on
this executor to the coordinator (post IMPALA-2990).

The status reporting protocol has not been changed and remains exactly
as it was.

Testing: Ran 'core' and 'exhaustive' tests. TODO: Run stress tests.

Future related work:
1) IMPALA-2990: Make status reporting per-query.
2) Try to logically align the FIS states with the QueryState states.
3) Consider mirroring the QueryState state machine to CoordinatorBackendState

Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
---
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
5 files changed, 290 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/10813/6
--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 6
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork

2018-07-10 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10763 )

Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10763
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3c6e933c454e7adca69ef03e7d5c0c84b656895
Gerrit-Change-Number: 10763
Gerrit-PatchSet: 8
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Tue, 10 Jul 2018 17:43:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7213: Port ReportExecStatus() RPC to use KRPC

2018-07-06 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
..


Patch Set 2:

(37 comments)

Did a first pass. Will have another look after these are addressed.

Also, a rebase might be required, it seems like this is based off a commit 
which has aged a bit.

http://gerrit.cloudera.org:8080/#/c/10855/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10855/2//COMMIT_MSG@17
PS2, Line 17: they
nit: that


http://gerrit.cloudera.org:8080/#/c/10855/2//COMMIT_MSG@19
PS2, Line 19: convertion
nit: conversion


http://gerrit.cloudera.org:8080/#/c/10855/2//COMMIT_MSG@25
PS2, Line 25:
Not to be pedantic, but a lot of Thrift structs are being removed as part of 
this patch, so I think it would help if you outline them and their new Protobuf 
counterparts in the commit message, so that it's easier to reference in the 
future in case of bugs around the new code that use these protobuf structs.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/exec/hdfs-parquet-table-writer.cc@1183
PS2, Line 1183: 
(*parquet_insert_stats_.mutable_per_column_size())[col_name] +=
  : col_writer->total_compressed_size();
nit: Should we have a level of indirection here? It's not very obvious what 
this line is doing on first glance.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/rpc/thrift-util.h
File be/src/rpc/thrift-util.h:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/rpc/thrift-util.h@93
PS2, Line 93: Serialize
Should we explicitly qualify the names of these functions to avoid confusion?
Eg: SerializeToVector(), SerializeToString(), SerializeToFaststring(), etc.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/rpc/thrift-util.h@103
PS2, Line 103: uint8_t* buffer;
 : uint32_t len;
 : mem_buffer_->getBuffer(, );
 : result->assign_copy(buffer, len);
I think you can avoid the copy here if you do:

uint32_t len = mem_buffer_->available_read();
result->resize(len);
uint8_t* buffer = result->data();

// 'len' wouldn't have changed from the first line to here.
mem_buffer_->getBuffer(, );


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.h
File be/src/runtime/dml-exec-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.h@35
PS2, Line 35: class TInsertResult;
: class TFinalizeParams;
: class TUpdateCatalogRequest;
: class RuntimeProfile;
: class HdfsTableDescriptor;
nit: Not your change, but the ordering is not alphabetical.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.h@126
PS2, Line 126: std::map
I'm wondering if it just makes sense to use google::protobuf::Map here instead 
of std::map, so that we don't have to pay the cost of serializing/deserializing 
from a payload to the map, and can just std::move() from one to the other.

Eg: Kudu does this in certain places:
https://github.com/apache/kudu/blob/d1d7572b364e06320e7afab8724242508709625d/src/kudu/rpc/service_if.h#L47-L48


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc
File be/src/runtime/dml-exec-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@83
PS2, Line 83: for (auto i = parquet_stats.per_column_size().begin();
:i != parquet_stats.per_column_size().end(); ++i) {
Range based for-loop with const ref.

Range based for loops evaluation end() only once, whereas iterator loops like 
this evaluate end() on every iteration.

Range based for loops are good when you need to iterate over every element in a 
collection and when you're not modifying the size of that collection.


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@95
PS2, Line 95: auto
Let's try to stay explicit about data types where we can. Also take it as a 
reference:

const PartitionStatusMap&


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@95
PS2, Line 95: new_partition_status
nit: new_per_partition_status_map


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@96
PS2, Line 96: (auto iter = new_partition_status.begin(); iter != 
new_partition_status.end();
:++iter)
Why not use a range based for loop? It's much more readable. Also, reference to 
avoid iterator copy.

for (auto const& partition_status : new_per_partition_status_map) {
  ...
}


http://gerrit.cloudera.org:8080/#/c/10855/2/be/src/runtime/dml-exec-state.cc@404
PS2, Line 404: for (auto iter = files_to_move_.begin(); iter != 
files_to_move_.end(); ++iter) {
Range based for loop, and take iterator as const ref.



[Impala-ASF-CR] IMPALA-7212: Deprecate --use krpc flag and remove old DataStream services

2018-07-03 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10835 )

Change subject: IMPALA-7212: Deprecate --use_krpc flag and remove old 
DataStream services
..


Patch Set 1:

(5 comments)

LGTM overall. Just some comments around the test code.

http://gerrit.cloudera.org:8080/#/c/10835/1/be/src/runtime/data-stream-test.cc
File be/src/runtime/data-stream-test.cc:

http://gerrit.cloudera.org:8080/#/c/10835/1/be/src/runtime/data-stream-test.cc@23
PS1, Line 23: #include "common/status.h"
I think I've found all the dead code in this patch, but if you want a 
reference, you can have a look at this patch to see if you need to remove 
anything else:

https://github.com/apache/impala/commit/ff86feaa67ff8bf703896e33d9a358e42739ae30#diff-0b7021d2a8bfaf517b243f11a53bcde8

I added that when I was making this test KRPC compatible.


http://gerrit.cloudera.org:8080/#/c/10835/1/be/src/runtime/data-stream-test.cc@128
PS1, Line 128: template  class DataStreamTestBase : public T {
 :  protected:
 :   virtual void SetUp() {}
 :   virtual void TearDown() {}
 : };
No longer needed.


http://gerrit.cloudera.org:8080/#/c/10835/1/be/src/runtime/data-stream-test.cc@134
PS1, Line 134: enum KrpcSwitch {
 :   USE_THRIFT,
 :   USE_KRPC
 : };
No longer needed.


http://gerrit.cloudera.org:8080/#/c/10835/1/be/src/runtime/data-stream-test.cc@139
PS1, Line 139: public DataStreamTestBase>
You can replace this with:
public testing::Test


http://gerrit.cloudera.org:8080/#/c/10835/1/tests/custom_cluster/test_rpc_exception.py
File tests/custom_cluster/test_rpc_exception.py:

http://gerrit.cloudera.org:8080/#/c/10835/1/tests/custom_cluster/test_rpc_exception.py@a73
PS1, Line 73:
> We should probably look into removing certain injected exception if it's no
Can't this happen without TransmitData() today?  Don't we also call it on 
ReportExecStatus() and CancelQueryFInstances() ?

https://github.com/apache/impala/blob/fb8ea6a9ccaedc41a71f6f6dcb367fc1facd73b6/be/src/runtime/backend-client.h#L58-L76



--
To view, visit http://gerrit.cloudera.org:8080/10835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed200751508478a3d728a917448f2dabfc67c3
Gerrit-Change-Number: 10835
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Tue, 03 Jul 2018 18:36:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-07-03 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10813/2/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/2/be/src/runtime/query-state.cc@437
PS2, Line 437:
 :   }
 :   VLOG_QUERY << "descriptor table for query=" << 
PrintId(query_id())
 :  << "\n" << desc_tbl_->DebugString();
 : 
 :   Status thread_create_status;
 :   DCHECK_GT(rpc_params_.fragment_ctxs.size(), 0);
 :   TPlanFragmentCtx* fragment_ctx = _params_.fragment_ctxs[0];
 :   int fragment_ctx_idx = 0;
 :   fragment_events_start_time_ = MonotonicStopWatch::Now();
 :   for (const TPlanFragmentInstanceCtx& instance_ctx: 
rpc_params_.fragment_instance_ctxs) {
 : // determine corresponding TPlanFragmentCtx
 : if (fragment_ctx->fragment.idx != instance_ctx.fragment_idx) 
{
 :   ++fragment_ctx_idx;
 :
> My point was shouldn't the counting barrier take care of that for us? I.e.
CountingBarrier::Notify() does not set the promise unless it's the one to 
reduce the count to 0 (or in other words only the last fragment instance to 
complete preparing would get to set the promise).

So if a first fragment instance hits an error in Prepare() while the others are 
still running Prepare(), then it won't be able to set the error unless it has 
some way of communicating its error to the other fragment instances, or it 
waits until the rest of them are done and then sets the error.

So, we'd need some other mechanism of allowing us to both:
1) Save the first error, and
2) Wait for all the FInstances to complete Prepare()

Unless I'm missing something?


http://gerrit.cloudera.org:8080/#/c/10813/2/be/src/runtime/query-state.cc@586
PS2, Line 586: 
 :
 :
 :
 :
 :
> Right, the references are held by FIS for resources shared across the query
As we spoke in person, I will roll this part out as a separate patch to make 
reviewing this a bit easier.



--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 4
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Tue, 03 Jul 2018 17:00:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-07-03 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..


Patch Set 4:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@65
PS3, Line 65: state denot
> I think we should avoid using the term "query state" given that this class
Done


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@65
PS3, Line 65: dExecState. We
> Backend
Done


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@68
PS3, Line 68: to the ERROR or CANCELLED
> are there multiple error states? corresponding implies that. Or are you inc
I reworded it to be clearer now. It can only either be ERROR or CANCELLED.


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@72
PS3, Line 72:  This is to simplify the
: /// query lifecycle so that Prepare() is always completed befor
> Let's not state in terms of past code. How about:
Yes, technically Cancel() or PublishFilter(). So I've re-worded it to that.


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@149
PS3, Line 149: s belonging to this query. Each instance receives its own 
execution
> put another way, I guess this blocks until threads have exited INITIALIZED?
Yes, I've added that wording to make it clearer.

The reason I kept MonitorFInstances() separately is because the status 
reporting will be driven from there after IMPALA-2990, making it more readable 
than keeping it in StartFInstances().


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@155
PS3, Line 155:   /// Monitors all the fragment instances and updates the 
BackendExecState according
> it it required that this is called after StartFInstances()?
Yes, I've added that to the comments now.


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@210
PS3, Line 210: return strings::Substitute("Status: $0 (fra
> Is that needed in C++?  You'd do it for C but I think C++ effectively inclu
You're right. Removed it.


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@213
PS3, Line 213:
> nit: here and below you could delete the word "Function" since that's obvio
Done


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@266
PS3, Line 266: dExecS
> Isn't Cancel() called in case of error as well? If so, does this mean we go
Yes, that's true. I've reworded it to specify only CancelQueryFInstances() RPC 
and a coordinator directed CANCEL from a response to ReportExecStatus() RPC.


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@271
PS3, Line 271: repare(). Implies
> how about backend_exec_state_ for consistency?
Done


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@284
PS3, Line 284: u
> here and elsewhere: given BackendExecState is a scalar (enum) value, should
Done


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@286
PS3, Line 286: or E
> from
Done


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@305
PS3, Line 305: ar* Ba
> overall status
Done


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.h@306
PS3, Line 306:
> otherwise it's the status of the first non-OK FIS?
Yes, added that to the comment.

Also, I changed this to a FInstanceStatus. I wanted to do that initially, but I 
forgot to.


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.cc@201
PS3, Line 201: const char* QueryState::BackendExecStateToString(const 
BackendExecState& state) {
 :   static const unordered_map 
exec_state_to_str{
 :   {BackendExecState::INITIALIZED, "INITIALIZED"},
 :   {BackendExecState::PREPARED, "PREPARED"}, 
{BackendExecState::FINISHED, "FINISHED"},
 :   {BackendExecState::CANCELLED, "CANCELLED"}, 
{BackendExecState::ERROR, "ERROR"}};
 :   retu
> given that the function is inlined this is probably unnecessary. dead code
Good point. Done.


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.cc@226
PS3, Line 226:  << BackendExecStateToString(old_state) << ")";
> why doesn't that include FINISHED? This check doesn't seem very intuitive t
After removing the 2 states I had before, this seems pretty unnecessary now. I 
removed it.


http://gerrit.cloudera.org:8080/#/c/10813/3/be/src/runtime/query-state.cc@266
PS3, Line 266: ested cancellati
> how about UpdateBackendExecState (since QueryState is the name of the class
Done



[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-07-03 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Joe McDonnell, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10813

to look at the new patch set (#4).

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..

IMPALA-7163: Implement a state machine for the QueryState class

This patch adds a state machine for the QueryState class. The motivation
behind this patch is to make the query lifecycle from the point of
view of an executor much easier to reason about and this patch is key
for a follow on patch for IMPALA-2990 where the status reporting will
be per-query rather than per-fragment-instance. Currently, the state
machine provides no other purpose, and it will mostly be used for
IMPALA-2990.

We introduce 7 possible states for the QueryState which include 3
terminal states (FINISHED, CANCELLED and ERROR) and 4 non-terminal
states (INITIALIZED, PREPARED). The transition from one state to the
next is always handled by a single thread which is also the QueryState
thread. This thread will additionally bear the purpose of sending
periodic updates after IMPALA-2990, which is the primary reason behind
having only this thread modify the state of the query.

2 counting barriers are introduced which are used to indicate how many
fragment instances have finished Preparing and Executing. We
use CountingBarriers to allow fragment instance threads to communicate
their completion of their respective states or errors with the QueryState
thread. Due to this design, it's possible for the fragment instances to
have collectively moved on to future states while the query state thread
lags behind in realizing that. However, that's not a concern for us
since we only care about showing a unified view of what's happening on
this executor to the coordinator (post IMPALA-2990).

The status reporting protocol has not been changed and remains exactly
as it was.

Testing: Ran 'core' and 'exhaustive' tests. TODO: Run stress tests.

Future related work:
1) IMPALA-2990: Make status reporting per-query.
2) Try to logically align the FIS states with the QueryState states.
3) Consider mirroring the QueryState state machine to CoordinatorBackendState

Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
---
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
5 files changed, 328 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/10813/4
--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 4
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-4784: Remove InProcessStatestore

2018-07-02 Thread Sailesh Mukil (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10843

to look at the new patch set (#6).

Change subject: IMPALA-4784: Remove InProcessStatestore
..

IMPALA-4784: Remove InProcessStatestore

InProcessStatestore was only used by statestore-test, expr-test and
session-expiry-test. With a slight refactor of the Statestore class,
InProcessStatestore becomes obsolete.

This patch moves the ownership of the ThriftServer into the Statestore
class and Statestore::Init() now takes a 'port' parameter instead of
using the FLAGS_state_store_port directly.

We also remove the InProcessStatestore completely. A follow on patch will
get rid of the InProcessImpalaServer too (IMPALA-6013)

Testing: Ran 'core' tests.

Change-Id: I2621873e593b36c9612a6402ac6c5d8e3b49cde9
---
M be/src/exprs/expr-test.cc
M be/src/service/session-expiry-test.cc
M be/src/statestore/statestore-test.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M be/src/statestore/statestored-main.cc
M be/src/testutil/in-process-servers.cc
M be/src/testutil/in-process-servers.h
8 files changed, 95 insertions(+), 125 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/10843/6
--
To view, visit http://gerrit.cloudera.org:8080/10843
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2621873e593b36c9612a6402ac6c5d8e3b49cde9
Gerrit-Change-Number: 10843
Gerrit-PatchSet: 6
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4784: Remove InProcessStatestore

2018-07-02 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10843 )

Change subject: IMPALA-4784: Remove InProcessStatestore
..


Patch Set 6: Code-Review+2

Hit a clang-tidy issue.

Rebase, carry +2.


--
To view, visit http://gerrit.cloudera.org:8080/10843
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2621873e593b36c9612a6402ac6c5d8e3b49cde9
Gerrit-Change-Number: 10843
Gerrit-PatchSet: 6
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 03 Jul 2018 04:59:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4784: Remove InProcessStatestore

2018-07-02 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10843 )

Change subject: IMPALA-4784: Remove InProcessStatestore
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10843/3/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/10843/3/be/src/exprs/expr-test.cc@8738
PS3, Line 8738:   Statestore* statestore = new Statestore(metrics.get());
> Would it make sense to make this a global scoped_ptr or similar? Seems triv
I made this global, but making it a scoped pointer has problems due to no clean 
shutdown semantics for the statestore. See IMPALA-7235.


http://gerrit.cloudera.org:8080/#/c/10843/3/be/src/service/session-expiry-test.cc
File be/src/service/session-expiry-test.cc:

http://gerrit.cloudera.org:8080/#/c/10843/3/be/src/service/session-expiry-test.cc@53
PS3, Line 53:   Statestore* statestore = new Statestore(metrics.get());
> Same question
Same as other comments: IMPALA-7235


http://gerrit.cloudera.org:8080/#/c/10843/3/be/src/statestore/statestore-test.cc
File be/src/statestore/statestore-test.cc:

http://gerrit.cloudera.org:8080/#/c/10843/3/be/src/statestore/statestore-test.cc@40
PS3, Line 40:   Statestore* statestore = new Statestore(metrics.get());
> Same question here and below. Could also use the IGNORE_LEAKING_OBJECT macr
Unfortunately, the Statestore does not have clean shut down semantics since 
it's expected that it will run for the life of the cluster.

Due to that having a scoped_ptr for the Statestore will cause crashes since 
some of the threads it spawns run functions that have no exit conditions. I 
think this is why we leak the InProcessStatestore object in pretty much every 
BE test we use it in.

It's possible to have the Statestore shut down cleanly, but I'd rather do that 
as a separate patch, since it could have edge cases of its own.

For now, I'll use IGNORE_LEAKING_OBJECT in the tests.



--
To view, visit http://gerrit.cloudera.org:8080/10843
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2621873e593b36c9612a6402ac6c5d8e3b49cde9
Gerrit-Change-Number: 10843
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 02 Jul 2018 21:21:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4784: Remove InProcessStatestore

2018-07-02 Thread Sailesh Mukil (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10843

to look at the new patch set (#4).

Change subject: IMPALA-4784: Remove InProcessStatestore
..

IMPALA-4784: Remove InProcessStatestore

InProcessStatestore was only used by statestore-test, expr-test and
session-expiry-test. With a slight refactor of the Statestore class,
InProcessStatestore becomes obsolete.

This patch moves the ownership of the ThriftServer into the Statestore
class and Statestore::Init() now takes a 'port' parameter instead of
using the FLAGS_state_store_port directly.

We also remove the InProcessStatestore completely. A follow on patch will
get rid of the InProcessImpalaServer too (IMPALA-6013)

Testing: Ran 'core' tests.

Change-Id: I2621873e593b36c9612a6402ac6c5d8e3b49cde9
---
M be/src/exprs/expr-test.cc
M be/src/service/session-expiry-test.cc
M be/src/statestore/statestore-test.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M be/src/statestore/statestored-main.cc
M be/src/testutil/in-process-servers.cc
M be/src/testutil/in-process-servers.h
8 files changed, 95 insertions(+), 125 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/10843/4
--
To view, visit http://gerrit.cloudera.org:8080/10843
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2621873e593b36c9612a6402ac6c5d8e3b49cde9
Gerrit-Change-Number: 10843
Gerrit-PatchSet: 4
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4784: Remove InProcessStatestore

2018-06-29 Thread Sailesh Mukil (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10843

to look at the new patch set (#3).

Change subject: IMPALA-4784: Remove InProcessStatestore
..

IMPALA-4784: Remove InProcessStatestore

InProcessStatestore was only used by statestore-test, expr-test and
session-expiry-test. With a slight refactor of the Statestore class,
InProcessStatestore becomes obsolete.

This patch moves the ownership of the ThriftServer into the Statestore
class and Statestore::Init() now takes a 'port' parameter instead of
using the FLAGS_state_store_port directly.

We also remove the InProcessStatestore completely. A follow on patch will
get rid of the InProcessImpalaServer too (IMPALA-6013)

Testing: Ran 'core' tests.

Change-Id: I2621873e593b36c9612a6402ac6c5d8e3b49cde9
---
M be/src/exprs/expr-test.cc
M be/src/service/session-expiry-test.cc
M be/src/statestore/statestore-test.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M be/src/statestore/statestored-main.cc
M be/src/testutil/in-process-servers.cc
M be/src/testutil/in-process-servers.h
8 files changed, 74 insertions(+), 119 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/10843/3
--
To view, visit http://gerrit.cloudera.org:8080/10843
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2621873e593b36c9612a6402ac6c5d8e3b49cde9
Gerrit-Change-Number: 10843
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4784: Remove InProcessStatestore

2018-06-29 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10843 )

Change subject: IMPALA-4784: Remove InProcessStatestore
..


Patch Set 2: Code-Review-1

Found a bug. Will upload a new patchset shortly.


--
To view, visit http://gerrit.cloudera.org:8080/10843
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2621873e593b36c9612a6402ac6c5d8e3b49cde9
Gerrit-Change-Number: 10843
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 29 Jun 2018 21:34:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4784: Remove InProcessStatestore

2018-06-29 Thread Sailesh Mukil (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10843

to look at the new patch set (#2).

Change subject: IMPALA-4784: Remove InProcessStatestore
..

IMPALA-4784: Remove InProcessStatestore

InProcessStatestore was only used by statestore-test and with a slight
refactor of the Statestore class, InProcessStatestore becomes obsolete.

This patch moves the ownership of the ThriftServer into the Statestore
class and Statestore::Init() now takes a 'port' parameter instead of
using the FLAGS_state_store_port directly.

We also remove the InProcessStatestore completely. A follow on patch will
get rid of the InProcessImpalaServer too (IMPALA-6013)

Testing: Ran 'core' tests.

Change-Id: I2621873e593b36c9612a6402ac6c5d8e3b49cde9
---
M be/src/statestore/statestore-test.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M be/src/statestore/statestored-main.cc
M be/src/testutil/in-process-servers.cc
M be/src/testutil/in-process-servers.h
6 files changed, 62 insertions(+), 113 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/10843/2
--
To view, visit http://gerrit.cloudera.org:8080/10843
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2621873e593b36c9612a6402ac6c5d8e3b49cde9
Gerrit-Change-Number: 10843
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4784: Remove InProcessStatestore

2018-06-29 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10843


Change subject: IMPALA-4784: Remove InProcessStatestore
..

IMPALA-4784: Remove InProcessStatestore

InProcessStatestore was only used by statestore-test and with a slight
refactor of the Statestore class, InProcessStatestore becomes obsolete.

This patch moves the ownership of the ThriftServer into the Statestore
class and Statestore::Init() now takes a 'port' parameter instead of
using the FLAGS_state_store_port directly.

We also remove the InProcessStatestore completely. A follow on patch will
get rid of the InProcessImpalaServer too (IMPALA-6013)

Testing: Ran 'core' tests.

Change-Id: I2621873e593b36c9612a6402ac6c5d8e3b49cde9
---
M be/src/statestore/statestore-test.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M be/src/statestore/statestored-main.cc
M be/src/testutil/in-process-servers.cc
M be/src/testutil/in-process-servers.h
6 files changed, 63 insertions(+), 110 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/10843/1
--
To view, visit http://gerrit.cloudera.org:8080/10843
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2621873e593b36c9612a6402ac6c5d8e3b49cde9
Gerrit-Change-Number: 10843
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-7212: Deprecate --use krpc flag and remove old DataStream services

2018-06-29 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10835 )

Change subject: IMPALA-7212: Deprecate --use_krpc flag and remove old 
DataStream services
..


Patch Set 1:

Some high level questions:

1. At what point do we want this checked in? 2.12.0 was the first release with 
KRPC and we've not gotten enough user feedback yet to know all the issues with 
the feature. Should we bite the bullet and deal with the issues as they come? 
Or should we leave Thrift RPC as a fallback for a while?

2. Should we rename all the KrpcDataStream* to DataStream* now since there's 
only one implementation now? Or should we keep a buffed period of releases 
before renaming to avoid confusion?

3. Was this patch mostly mechanical? Or did you face any issue(s) in getting 
rid of the flag and DataStream* classes?


--
To view, visit http://gerrit.cloudera.org:8080/10835
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed200751508478a3d728a917448f2dabfc67c3
Gerrit-Change-Number: 10835
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Fri, 29 Jun 2018 16:29:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-06-28 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10813/2/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/10813/2/be/src/runtime/query-state.h@300
PS2, Line 300: The QueryState is ready for tear-down at this state
> I've removed that line since we don't really do any tear down in this patch
Actually, to reiterate, we may need to add a new state if we're going to manage 
ExecResources with this state machine as well.

In that case, we'd need to add a new STOPPED state, since on ERROR and 
CANCELLED, we may have fragment instances still running.



--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 28 Jun 2018 22:26:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-06-28 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Joe McDonnell, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10813

to look at the new patch set (#3).

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..

IMPALA-7163: Implement a state machine for the QueryState class

This patch adds a state machine for the QueryState class. The motivation
behind this patch is to make the query lifecycle from the point of
view of an executor much easier to reason about and this patch is key
for a follow on patch for IMPALA-2990 where the status reporting will
be per-query rather than per-fragment-instance. Currently, the state
machine provides no other purpose, and it will mostly be used for
IMPALA-2990.

We introduce 7 possible states for the QueryState which include 3
terminal states (FINISHED, CANCELLED and ERROR) and 4 non-terminal
states (INITIALIZED, PREPARED). The transition from one state to the
next is always handled by a single thread which is also the QueryState
thread. This thread will additionally bear the purpose of sending
periodic updates after IMPALA-2990, which is the primary reason behind
having only this thread modify the state of the query.

2 counting barriers are introduced which are used to indicate how many
fragment instances have finished Preparing and Executing. We
use CountingBarriers to allow fragment instance threads to communicate
their completion of their respective states or errors with the QueryState
thread. Due to this design, it's possible for the fragment instances to
have collectively moved on to future states while the query state thread
lags behind in realizing that. However, that's not a concern for us
since we only care about showing a unified view of what's happening on
this executor to the coordinator (post IMPALA-2990).

The status reporting protocol has not been changed and remains exactly
as it was.

Testing: Ran 'core' and 'exhaustive' tests. TODO: Run stress tests.

Future related work:
1) IMPALA-2990: Make status reporting per-query.
2) Try to logically align the FIS states with the QueryState states.
3) Consider mirroring the QueryState state machine to CoordinatorBackendState

Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
---
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
5 files changed, 343 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/10813/3
--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-06-28 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..


Patch Set 2:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/10813/2/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/10813/2/be/src/runtime/query-state.h@66
PS2, Line 66: f any fragment instance hits an error or cancellation, then we 
immediately change the
: /// state of the query to the corresponding error state.
> but won't we need to wait until all FIS have completed (even if it's becaus
Yes, once IMPALA-2990 is implemented, the invariant we'll maintain is that a 
final profile can be sent at any time after the PREPARED state.

In the ERROR and CANCELLED states, that means we can send the final profile 
immediately without having to wait for the remaining fragment instances to 
complete, and since all the FISes will have been PREPARED, there will be 
mention of all fragment instances in the final report.

In the FINISHED state, all the fragment instances will have completed anyway, 
so we can send the final profile then.


http://gerrit.cloudera.org:8080/#/c/10813/2/be/src/runtime/query-state.h@194
PS2, Line 194: pair
> rather than using a pair<>, how about defining a struct so that the fields
Done


http://gerrit.cloudera.org:8080/#/c/10813/2/be/src/runtime/query-state.h@245
PS2, Line 245: 2&
> we generally only use const references. For things that will be modified, u
Got rid of this.


http://gerrit.cloudera.org:8080/#/c/10813/2/be/src/runtime/query-state.h@245
PS2, Line 245: pair
> Use FInstanceStatus
Got rid of this.


http://gerrit.cloudera.org:8080/#/c/10813/2/be/src/runtime/query-state.h@262
PS2, Line 262: to another thread that
 : // called this function.
> That doesn't seem right in the "less than 0" case, since this function only
The less than 0 case is specifically handled because this can race with 
DoneWithState(). I just found that the same is done in 
CountingBarrier::NotifyRemaining() for the same reason (since it can race with 
CountingBarrier::Notify()).


http://gerrit.cloudera.org:8080/#/c/10813/2/be/src/runtime/query-state.h@289
PS2, Line 289: INITIALIZED,
 : /// STARTED: All fragment instances have been started. 
Implies that the query is
 : /// being prepared.
 : STARTED,
 : /// PREPARED: All fragment instances managed by this 
QueryState have successfully
 : /// completed Prepare(). Implies that the query is Opening.
 : PREPARED,
 : /// OPENED: All fragment instances managed by this 
QueryState have successfully
 : /// completed Open(). Implies that the query is executing.
 : OPENED,
> why do we need all of these states? it's a bit hard to see how they will be
I didn't want to write too much in the header comments since that would end up 
explaining the implementation itself. I'll write down what the different states 
mean here in this comment and if you think it's reasonable to add them to the 
header comments, then I'll do that in the next patchset.

INITIALIZED is the first state. The reason I added a separate state for this 
and not start with STARTED or PREPARED directly is because there are distinct 
failures that could happen before Prepare(). The only failures possible in this 
state today are:
1) If we're unable to create a descriptor table (PS2: L473).
2) If we're unable to create a thread for a FIS (PS2: L534)

STARTED basically means that we were able to create a descriptor table and 
start all the threads for the FISes. It doesn't serve any special purpose other 
than marking a logical step which can help with debugging issues and providing 
observability once we're able to reflect these states in the RuntimeProfile. If 
a query fails in the STARTED state, then we know that some FIS failed to 
Prepare(). It's also lightweight to have this since we don't add any counting 
barriers or such for this. It basically means that StartFInstances() has 
completed successfully. I've removed this as well though, since there's been 
more push back on this.

PREPARED means that all the FISes have finished Prepare(). This state is 
important as we expose FISes only after this state, for Cancel(), 
PublishFilter(), etc. We will also start status reporting only after reaching 
this state.

OPENED doesn't serve any special purpose other than act as a logical step in 
the query lifecycle. Again, it could help with observability and debugging, but 
the benefit we get from this doesn't justify the cost of all the added 
synchronization, so I've removed it.

FINISHED, CANCELLED and ERROR are fairly obvious as to why they're needed and 
important.



[Impala-ASF-CR] IMPALA-7215: Implement a templatized CountingBarrier

2018-06-28 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10827 )

Change subject: IMPALA-7215: Implement a templatized CountingBarrier
..


Patch Set 5: Code-Review+2

(4 comments)

Thanks for the review!

Rebase, carry +2.

http://gerrit.cloudera.org:8080/#/c/10827/4/be/src/util/counting-barrier.h
File be/src/util/counting-barrier.h:

http://gerrit.cloudera.org:8080/#/c/10827/4/be/src/util/counting-barrier.h@36
PS4, Line 36: ks Wait() with
> Not clear which "returned value" this is talking about - sounds like Notify
Done


http://gerrit.cloudera.org:8080/#/c/10827/4/be/src/util/counting-barrier.h@43
PS4, Line 43:
:   /// Sets the number of pending notificatio
> that's clearer. you could say it this way for Notify() comment.
Done


http://gerrit.cloudera.org:8080/#/c/10827/4/be/src/util/counting-barrier.h@56
PS4, Line 56:
> public comments shouldn't talk about private fields (the client of this cla
Done


http://gerrit.cloudera.org:8080/#/c/10827/4/be/src/util/counting-barrier.h@61
PS4, Line 61: ms' passes, in which
:   /// case '*timed_out' will be true. If
> same
Done



--
To view, visit http://gerrit.cloudera.org:8080/10827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e
Gerrit-Change-Number: 10827
Gerrit-PatchSet: 5
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 28 Jun 2018 17:49:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7215: Implement a templatized CountingBarrier

2018-06-28 Thread Sailesh Mukil (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10827

to look at the new patch set (#5).

Change subject: IMPALA-7215: Implement a templatized CountingBarrier
..

IMPALA-7215: Implement a templatized CountingBarrier

Currently, our CountingBarrier util only notifies a 'bool' value
and uses an underlying Promise.

We're seeing cases in code where we might want to be notified of a
different kind of Promise (other than bool). We add a templatized
class TypedCountingBarrier and convert CountingBarrier to use the
TypedCountingBarrier internally.

This was identified while working on IMPALA-7163.

Testing: Ran 'core' tests.

Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e
---
M be/src/exec/hdfs-scan-node.cc
M be/src/rpc/rpc-mgr-test.cc
M be/src/runtime/coordinator.cc
M be/src/util/counting-barrier.h
M be/src/util/hdfs-bulk-ops.cc
5 files changed, 62 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/10827/5
--
To view, visit http://gerrit.cloudera.org:8080/10827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e
Gerrit-Change-Number: 10827
Gerrit-PatchSet: 5
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-7215: Implement a templatized CountingBarrier

2018-06-28 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10827 )

Change subject: IMPALA-7215: Implement a templatized CountingBarrier
..


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10827/3/be/src/util/counting-barrier.h
File be/src/util/counting-barrier.h:

http://gerrit.cloudera.org:8080/#/c/10827/3/be/src/util/counting-barrier.h@34
PS3, Line 34:   /// Sends one notification, decrementing the number of pending 
notifications by one.
> Add something like:
Done


http://gerrit.cloudera.org:8080/#/c/10827/3/be/src/util/counting-barrier.h@36
PS3, Line 36: /// If
> looks like no caller actually uses this return value and so you've annotate
We just don't use it anywhere now, but I think it is useful to have as a Util 
API, since in some cases we may want to know if we were the last one to 
Notify() or not. Otherwise, it's also useful to return the number of pending 
counts.


http://gerrit.cloudera.org:8080/#/c/10827/3/be/src/util/counting-barrier.h@42
PS3, Line 42:
> similarly, add comment about 'promise_value'
Done


http://gerrit.cloudera.org:8080/#/c/10827/3/be/src/util/counting-barrier.h@54
PS3, Line 54:   }
> comment the return value
Done


http://gerrit.cloudera.org:8080/#/c/10827/3/be/src/util/counting-barrier.h@55
PS3, Line 55:
> would be good to return "const T&" here and below (to match Promise.Get())
Done


http://gerrit.cloudera.org:8080/#/c/10827/3/be/src/util/counting-barrier.h@58
PS3, Line 58:   const T& Wait() { return promise_.Get(); }
> comment return value
Done



--
To view, visit http://gerrit.cloudera.org:8080/10827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e
Gerrit-Change-Number: 10827
Gerrit-PatchSet: 4
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 28 Jun 2018 17:24:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7215: Implement a templatized CountingBarrier

2018-06-28 Thread Sailesh Mukil (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10827

to look at the new patch set (#4).

Change subject: IMPALA-7215: Implement a templatized CountingBarrier
..

IMPALA-7215: Implement a templatized CountingBarrier

Currently, our CountingBarrier util only notifies a 'bool' value
and uses an underlying Promise.

We're seeing cases in code where we might want to be notified of a
different kind of Promise (other than bool). We add a templatized
class TypedCountingBarrier and convert CountingBarrier to use the
TypedCountingBarrier internally.

This was identified while working on IMPALA-7163.

Testing: Ran 'core' tests.

Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e
---
M be/src/exec/hdfs-scan-node.cc
M be/src/rpc/rpc-mgr-test.cc
M be/src/runtime/coordinator.cc
M be/src/util/counting-barrier.h
M be/src/util/hdfs-bulk-ops.cc
5 files changed, 61 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/10827/4
--
To view, visit http://gerrit.cloudera.org:8080/10827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e
Gerrit-Change-Number: 10827
Gerrit-PatchSet: 4
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated

2018-06-27 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10815 )

Change subject: IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED 
if query execution has terminated
..


Patch Set 6: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10815/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10815/1/be/src/runtime/coordinator-backend-state.cc@336
PS1, Line 336: ) {
> I'm curious why you feel this is less readable.
As we spoke, my basic reasoning was that if we have a function signature like 
so:
def Func(param..)

then that would signify that 'param' is used to do 'Func'.

In this case:
bool Cancel(query_ctx)

that would signify, use 'query_ctx' to 'Cancel'.

However, 'query_ctx' is used for an orthogonal use case within the function 
(which is important, but is just not doing any real 'cancel' work).

There are cases where this pattern is unavoidable, but I feel like we could do 
without it here. The second patch set looks much better though. Thanks for 
doing that.



--
To view, visit http://gerrit.cloudera.org:8080/10815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bb2c26edace89853f14a329f891d1f9a065a991
Gerrit-Change-Number: 10815
Gerrit-PatchSet: 6
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 28 Jun 2018 02:58:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7215: Implement a templatized CountingBarrier

2018-06-27 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10827 )

Change subject: IMPALA-7215: Implement a templatized CountingBarrier
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10827/1/be/src/util/counting-barrier.h
File be/src/util/counting-barrier.h:

http://gerrit.cloudera.org:8080/#/c/10827/1/be/src/util/counting-barrier.h@39
PS1, Line 39: ;
> rather than have a default_promise_value_, why not just pass it as a parame
Done


http://gerrit.cloudera.org:8080/#/c/10827/1/be/src/util/counting-barrier.h@46
PS1, Line 46: eturn;  // count_ can legitimately drop below 0
:   if (count_.CompareAndSwap(value, 0)) {
: promise_.Set(promise_value);
> this would be simpler if we just require the promise value for the non-bool
Good point. I implemented a TypedCountingBarrier and made CountingBarrier a 
user of TypedCountingBarrier.



--
To view, visit http://gerrit.cloudera.org:8080/10827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e
Gerrit-Change-Number: 10827
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 28 Jun 2018 02:49:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7215: Implement a templatized CountingBarrier

2018-06-27 Thread Sailesh Mukil (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10827

to look at the new patch set (#3).

Change subject: IMPALA-7215: Implement a templatized CountingBarrier
..

IMPALA-7215: Implement a templatized CountingBarrier

Currently, our CountingBarrier util only notifies a 'bool' value
and uses an underlying Promise.

We're seeing cases in code where we might want to be notified of a
different kind of Promise (other than bool). We add a templatized
class TypedCountingBarrier and convert CountingBarrier to use the
TypedCountingBarrier internally.

This was identified while working on IMPALA-7163.

Testing: Ran 'core' tests.

Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e
---
M be/src/exec/hdfs-scan-node.cc
M be/src/rpc/rpc-mgr-test.cc
M be/src/runtime/coordinator.cc
M be/src/util/counting-barrier.h
M be/src/util/hdfs-bulk-ops.cc
5 files changed, 54 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/10827/3
--
To view, visit http://gerrit.cloudera.org:8080/10827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e
Gerrit-Change-Number: 10827
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 


[Impala-ASF-CR] IMPALA-7215: Implement a templatized CountingBarrier

2018-06-27 Thread Sailesh Mukil (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10827

to look at the new patch set (#2).

Change subject: IMPALA-7215: Implement a templatized CountingBarrier
..

IMPALA-7215: Implement a templatized CountingBarrier

Currently, our CountingBarrier util only notifies a 'bool' value
and uses an underlying Promise.

We're seeing cases in code where we might want to be notified of a
different kind of Promise (other than bool). We add a templatized
class TypedCountingBarrier and convert CountingBarrier to use the
TypedCountingBarrier internally.

This was identified while working on IMPALA-7163.

Testing: Ran 'core' tests.

Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e
---
M be/src/exec/hdfs-scan-node.cc
M be/src/rpc/rpc-mgr-test.cc
M be/src/runtime/coordinator.cc
M be/src/util/counting-barrier.h
M be/src/util/hdfs-bulk-ops.cc
5 files changed, 59 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/10827/2
--
To view, visit http://gerrit.cloudera.org:8080/10827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e
Gerrit-Change-Number: 10827
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 


[Impala-ASF-CR] IMPALA-7215: Implement a templatized CountingBarrier

2018-06-26 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10827


Change subject: IMPALA-7215: Implement a templatized CountingBarrier
..

IMPALA-7215: Implement a templatized CountingBarrier

Currently, our CountingBarrier util only notifies a 'bool' value
and uses an underlying Promise.

We're seeing cases in code where we might want to be notified of a
different kind of Promise (other than bool). We convert this utility
code to be templatized to be able to reuse the CountingBarrier for
these new use cases as well.

This was identified while working on IMPALA-7163.

Testing: Ran 'core' tests.

Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e
---
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/rpc/rpc-mgr-test.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/util/counting-barrier.h
M be/src/util/hdfs-bulk-ops.cc
M be/src/util/hdfs-bulk-ops.h
10 files changed, 43 insertions(+), 34 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/10827/1
--
To view, visit http://gerrit.cloudera.org:8080/10827
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I05fc79228250408ae16481ae7ff3491a90d26b8e
Gerrit-Change-Number: 10827
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED if query execution has terminated

2018-06-26 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10815 )

Change subject: IMPALA-7205: Respond to ReportExecStatus() RPC with CANCELLED 
if query execution has terminated
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10815/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/10815/1/be/src/runtime/coordinator-backend-state.cc@336
PS1, Line 336: const TQueryCtx& query_ctx
This unfortunately makes the code look a little less readable. Since we require 
the TQueryCtx in multiple places in BackendState, how about storing the 
TQueryCtx at BackendState object construction time?

Or alternatively, store a pointer to Coordinator, and get it from 
Coodrdinator::query_ctx() ?



--
To view, visit http://gerrit.cloudera.org:8080/10815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bb2c26edace89853f14a329f891d1f9a065a991
Gerrit-Change-Number: 10815
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Tue, 26 Jun 2018 20:06:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-06-25 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10813/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10813/2//COMMIT_MSG@26
PS2, Line 26: We
: use a combination of these atomic variables and promises to allow 
the
: fragment instance threads to communicate their completion of their
: respective states or errors with the QueryState thread.
An alternative design is to have the fragment instance state threads themselves 
update the BackendExecState by calling QueryState::UpdateQueryState(). This 
would require more synchronization, and would also split the responsibility of 
sending reports between threads, i.e. good reports will be sent by the 
QueryState thread and the code path on an error (which includes sending a final 
error report) will execute on a FIS thread.

With how it's implemented in this patch, all FIS threads have no other 
responsibility other than doing their bit of the query and notifying the QS of 
successful completion or an error via AtomicInts and Promises, and the QS 
thread will be able to handle all the logic of what to do on an error, or 
during successful completion, or cancellation.

That's one of the main reasons why I went with the design as implemented in 
this patch. I'm open to suggestions however.



--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 25 Jun 2018 22:31:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

2018-06-25 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
..

IMPALA-7163: Implement a state machine for the QueryState class

This patch adds a state machine for the QueryState class. The motivation
behind this patch is to make the query lifecycle from the point of
view of an executor much easier to reason about and this patch is key
for a follow on patch for IMPALA-2990 where the status reporting will
be per-query rather than per-fragment-instance. Currently, the state
machine provides no other purpose, and it will mostly be used for
IMPALA-2990.

We introduce 7 possible states for the QueryState which include 3
terminal states (FINISHED, CANCELLED and ERROR) and 4 non-terminal
states (INITIALIZED, STARTED, PREPARED, OPENED). The transition from
one state to the next is always handled by a single thread which is also
the QueryState thread. This thread will additionally bear the purpose
of sending periodic updates after IMPALA-2990, which is the primary
reason behind having only this thread modify the state of the query.

3 atomic integers are introduced which are used to indicate how many
fragment instances have finished Preparing, Opening and Executing. We
use a combination of these atomic variables and promises to allow the
fragment instance threads to communicate their completion of their
respective states or errors with the QueryState thread. Due to this
design, it's possible for the fragment instances to have collectively
moved on to future states while the query state thread lags behind in
realizing that. However, that's not a concern for us since we only care
about showing a unified view of what's happening on this executor to
the coordinator (post IMPALA-2990).

The status reporting protocol has not been changed and remains exactly
as it was.

Testing: Ran 'core' tests. TODO: Run stress tests.

Future related work:
1) IMPALA-2990: Make status reporting per-query.
2) Try to logically align the FIS states with the QueryState states.
3) Consider mirroring the QueryState state machine to CoordinatorBackendState

Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
---
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
5 files changed, 411 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/10813/2
--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-7207: make Coordinator::exec state an atomic enum

2018-06-25 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10811 )

Change subject: IMPALA-7207: make Coordinator::exec_state_ an atomic enum
..


Patch Set 2: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10811/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10811/2/be/src/runtime/coordinator.cc@a511
PS2, Line 511:
 :
 :
 :
> For x86, Load() and Store() only needs to ensure that the memory access emi
Thanks! I wasn't aware of that.



--
To view, visit http://gerrit.cloudera.org:8080/10811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6c5d5c6ccfdfd533cd0607aab6f554e664b90ac
Gerrit-Change-Number: 10811
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 25 Jun 2018 18:49:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7207: make Coordinator::exec state an atomic enum

2018-06-25 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10811 )

Change subject: IMPALA-7207: make Coordinator::exec_state_ an atomic enum
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10811/1/be/src/common/atomic.h
File be/src/common/atomic.h:

http://gerrit.cloudera.org:8080/#/c/10811/1/be/src/common/atomic.h@156
PS1, Line 156: static_cast
Since they're enums, shouldn't implicit casts work?


http://gerrit.cloudera.org:8080/#/c/10811/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10811/2/be/src/runtime/coordinator.cc@a511
PS2, Line 511:
 :
 :
 :
I've not done an analysis, but is it fair to say that using an atomic enum in 
this case is "strictly" better than using spin locks?

Atomic variables cause cache invalidations across CPUs, and lock the 
corresponding cache line in the corresponding CPU.

Load() and Store() operations on this atomic enum happen much more often than 
calls to ReturnedAllResults(), so I was wondering if we're actually improving 
anything by doing this or not.

Not a major issue, I'm just thinking out loud.



--
To view, visit http://gerrit.cloudera.org:8080/10811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6c5d5c6ccfdfd533cd0607aab6f554e664b90ac
Gerrit-Change-Number: 10811
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 25 Jun 2018 18:30:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7200: Fix missing FILESYSTEM PREFIX hitting local dataload

2018-06-22 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10803 )

Change subject: IMPALA-7200: Fix missing FILESYSTEM_PREFIX hitting local 
dataload
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10803
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I972192f22943baef6043a4c9db54d5d48089ea9d
Gerrit-Change-Number: 10803
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Fri, 22 Jun 2018 22:10:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP: IMPALA-7163

2018-06-21 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has abandoned this change. ( 
http://gerrit.cloudera.org:8080/10781 )

Change subject: WIP: IMPALA-7163
..


Abandoned

Whoops, meant to push it as a draft to gerrit. This is not ready for review. 
Sorry for the noise.
--
To view, visit http://gerrit.cloudera.org:8080/10781
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I27158dae217b0e22445377e5935eb71a906688b5
Gerrit-Change-Number: 10781
Gerrit-PatchSet: 7
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] WIP: IMPALA-7163

2018-06-21 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/10781 )

Change subject: WIP: IMPALA-7163
..

WIP: IMPALA-7163

Change-Id: I27158dae217b0e22445377e5935eb71a906688b5
---
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
4 files changed, 417 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/10781/7
--
To view, visit http://gerrit.cloudera.org:8080/10781
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I27158dae217b0e22445377e5935eb71a906688b5
Gerrit-Change-Number: 10781
Gerrit-PatchSet: 7
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5202: Disallow PREPARE:WAIT debug action

2018-06-20 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10776 )

Change subject: IMPALA-5202: Disallow PREPARE:WAIT debug action
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1caa4f8e6ce7f32a8a3722648e08e24f34dba35d
Gerrit-Change-Number: 10776
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 20 Jun 2018 22:35:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7151: Rework ephemeral port assignment for be tests

2018-06-20 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10740 )

Change subject: IMPALA-7151: Rework ephemeral port assignment for be tests
..


Patch Set 4: Code-Review+2

(1 comment)

LGTM. Please also check with Anuj if he has any further comments.

http://gerrit.cloudera.org:8080/#/c/10740/4/be/src/statestore/statestore-subscriber.cc
File be/src/statestore/statestore-subscriber.cc:

http://gerrit.cloudera.org:8080/#/c/10740/4/be/src/statestore/statestore-subscriber.cc@223
PS4, Line 223: heartbeat_address_.port = heartbeat_server_->port();
> Am I misreading something - isn't this within the same scoped block as excl
Oh my bad. I completely misinterpreted the scope of the lock by looking at the 
wrong curly braces.



--
To view, visit http://gerrit.cloudera.org:8080/10740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2fcc9243099f8249380ac5f01ac5eb67cba24cf5
Gerrit-Change-Number: 10740
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 20 Jun 2018 17:43:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7151: Rework ephemeral port assignment for be tests

2018-06-20 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10740 )

Change subject: IMPALA-7151: Rework ephemeral port assignment for be tests
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10740/4/be/src/statestore/statestore-subscriber.cc
File be/src/statestore/statestore-subscriber.cc:

http://gerrit.cloudera.org:8080/#/c/10740/4/be/src/statestore/statestore-subscriber.cc@223
PS4, Line 223: heartbeat_address_.port = heartbeat_server_->port();
Are we sure that setting this outside the scope of 'lock_' won't cause any kind 
of race?

I don't exactly understand the comment on L193-195 since Register() happens 
only after dropping the lock_, so I'm not able to make the call.



--
To view, visit http://gerrit.cloudera.org:8080/10740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2fcc9243099f8249380ac5f01ac5eb67cba24cf5
Gerrit-Change-Number: 10740
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 20 Jun 2018 17:05:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7151: Rework ephemeral port assignment for be tests

2018-06-19 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10740 )

Change subject: IMPALA-7151: Rework ephemeral port assignment for be tests
..


Patch Set 2:

(7 comments)

Thanks for doing this. This also sets up some of the code nicely for 
IMPALA-6013 which I'll be getting to soon.

http://gerrit.cloudera.org:8080/#/c/10740/2/be/src/rpc/thrift-server.cc
File be/src/rpc/thrift-server.cc:

http://gerrit.cloudera.org:8080/#/c/10740/2/be/src/rpc/thrift-server.cc@366
PS2, Line 366: TServerSocket
Are we sure that this change from TServerTransport to TServerSocket doesn't 
handicap us from using TServerTransport APIs that we may find useful in the 
future?

Or were we doing the wrong thing by using TServerTransport all along?


http://gerrit.cloudera.org:8080/#/c/10740/2/be/src/rpc/thrift-server.cc@452
PS2, Line 452: If port was 0
nit: "If port_ was initially configured as 0..."


http://gerrit.cloudera.org:8080/#/c/10740/2/be/src/runtime/exec-env.h
File be/src/runtime/exec-env.h:

http://gerrit.cloudera.org:8080/#/c/10740/2/be/src/runtime/exec-env.h@222
PS2, Line 222:   TNetworkAddress configured_backend_address_;
Why make this public?


http://gerrit.cloudera.org:8080/#/c/10740/2/be/src/scheduling/scheduler.h
File be/src/scheduling/scheduler.h:

http://gerrit.cloudera.org:8080/#/c/10740/2/be/src/scheduling/scheduler.h@96
PS2, Line 96: UpdateLocalBackendForBeTest
UpdateLocalBackendAddrForBeTest


http://gerrit.cloudera.org:8080/#/c/10740/2/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/10740/2/be/src/service/impala-server.h@142
PS2, Line 142:   ///respective service will not be started.
Update comment with new behavior about BE tests passing in port=0.


http://gerrit.cloudera.org:8080/#/c/10740/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/10740/2/be/src/service/impala-server.cc@2070
PS2, Line 2070: >=
> 0


http://gerrit.cloudera.org:8080/#/c/10740/2/be/src/statestore/statestore-subscriber.h
File be/src/statestore/statestore-subscriber.h:

http://gerrit.cloudera.org:8080/#/c/10740/2/be/src/statestore/statestore-subscriber.h@190
PS2, Line 190:   /// Address that the heartbeat service should be started on. 
Initialised in constructor,
 :   /// updated in Start() with the actual port if the wildcard 
port 0 was specified.
 :   TNetworkAddress heartbeat_address_;
Any reason this was moved down? Prefer leaving it where it was as surrounding 
members are similar there.



--
To view, visit http://gerrit.cloudera.org:8080/10740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2fcc9243099f8249380ac5f01ac5eb67cba24cf5
Gerrit-Change-Number: 10740
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 19 Jun 2018 22:57:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7151: Rework ephemeral port assignment for be tests

2018-06-19 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10740 )

Change subject: IMPALA-7151: Rework ephemeral port assignment for be tests
..


Patch Set 2:

> Ping

I'll have a first pass at this by EOD.


--
To view, visit http://gerrit.cloudera.org:8080/10740
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2fcc9243099f8249380ac5f01ac5eb67cba24cf5
Gerrit-Change-Number: 10740
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 19 Jun 2018 20:03:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7182: [DOCS] Insecure clusters with public IPs not allowed

2018-06-18 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10751 )

Change subject: IMPALA-7182: [DOCS] Insecure clusters with public IPs not 
allowed
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10751
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9db28d42fccd9711635c6aee66f2aafc758d58d0
Gerrit-Change-Number: 10751
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 18 Jun 2018 22:14:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6119: Fix issue with multiple partitions sharing same location

2018-06-14 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10543 )

Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same 
location
..


Patch Set 11: Code-Review+1

(1 comment)

Sorry for the delay in getting back.

http://gerrit.cloudera.org:8080/#/c/10543/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10543/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1440
PS9, Line 1440: loadPartitionsFromMetastore(dirtyPartitions, client);
> This wouldn't work out unfortunately. When you modify the location of a par
Your reasoning makes sense to me. I'm fine with removing this call to 
getPartitionsWithSameLocation.



--
To view, visit http://gerrit.cloudera.org:8080/10543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a
Gerrit-Change-Number: 10543
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 14 Jun 2018 18:04:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7157: Avoid unnecessarily pretty printing profiles per fragment instance

2018-06-12 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10669 )

Change subject: IMPALA-7157: Avoid unnecessarily pretty printing profiles per 
fragment instance
..


Patch Set 3: Code-Review+2

Rebase, carry +2.


--
To view, visit http://gerrit.cloudera.org:8080/10669
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0445950385fa6160764feaed9a993fa0e59b242
Gerrit-Change-Number: 10669
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Tue, 12 Jun 2018 20:33:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7145: fix leak of OpenSSL context when spilling

2018-06-12 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10666 )

Change subject: IMPALA-7145: fix leak of OpenSSL context when spilling
..


Patch Set 7: Code-Review+1

(1 comment)

One last comment. LGTM otherwise. Feel free to upgrade to a +2 if no one else 
is looking at this.

http://gerrit.cloudera.org:8080/#/c/10666/7/be/src/util/openssl-util.cc
File be/src/util/openssl-util.cc:

http://gerrit.cloudera.org:8080/#/c/10666/7/be/src/util/openssl-util.cc@103
PS7, Line 103:
DISALLOW_COPY_AND_ASSIGN(ScopedEVPCipherCtx);



--
To view, visit http://gerrit.cloudera.org:8080/10666
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I98760ed8f31b18b489a156f945c29c95c9bf3184
Gerrit-Change-Number: 10666
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 12 Jun 2018 15:53:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) IMPALA-7005: [DOCS] Noted which version the 3.0 changelog generated from

2018-06-11 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10687 )

Change subject: IMPALA-7005: [DOCS] Noted which version the 3.0 changelog 
generated from
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10687
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: comment
Gerrit-Change-Id: I706e2aa6791c88e41a2717c30f64619e7082c3d4
Gerrit-Change-Number: 10687
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 11 Jun 2018 18:59:49 +
Gerrit-HasComments: No


[Impala-ASF-CR](asf-site) IMPALA-7005: [DOCS] Noted which version the 3.0 changelog generated from

2018-06-11 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10687 )

Change subject: IMPALA-7005: [DOCS] Noted which version the 3.0 changelog 
generated from
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10687/1/docs/changelog-3.0.html
File docs/changelog-3.0.html:

http://gerrit.cloudera.org:8080/#/c/10687/1/docs/changelog-3.0.html@8
PS1, Line 8:  The changes in this log are in comparison to Impala 2.11.
nit: Since this is a non-standard message, let's format it to italics.



--
To view, visit http://gerrit.cloudera.org:8080/10687
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: comment
Gerrit-Change-Id: I706e2aa6791c88e41a2717c30f64619e7082c3d4
Gerrit-Change-Number: 10687
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 11 Jun 2018 18:51:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7145: fix leak of OpenSSL context when spilling

2018-06-11 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10666 )

Change subject: IMPALA-7145: fix leak of OpenSSL context when spilling
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10666/6/be/src/util/asan.h
File be/src/util/asan.h:

http://gerrit.cloudera.org:8080/#/c/10666/6/be/src/util/asan.h@a32
PS6, Line 32:
Could you add a comment to the commit message as to why you removed this?


http://gerrit.cloudera.org:8080/#/c/10666/6/be/src/util/openssl-util.cc
File be/src/util/openssl-util.cc:

http://gerrit.cloudera.org:8080/#/c/10666/6/be/src/util/openssl-util.cc@96
PS6, Line 96: EVPCipherCtx
We usually prepend the names of these kinds of classes/structs with 'scoped'.

So renaming to ScopedEVPCipherCtx would be better.


http://gerrit.cloudera.org:8080/#/c/10666/6/be/src/util/openssl-util.cc@97
PS6, Line 97:   EVPCipherCtx(int padding) {
Make 'explicit' to avoid implicit conversions.


http://gerrit.cloudera.org:8080/#/c/10666/6/be/src/util/openssl-util.cc@171
PS6, Line 171: ctx.ctx
nit: Maybe it would be cleaner to add a getter() to the newly added struct, to 
get the context?

Your call though.



--
To view, visit http://gerrit.cloudera.org:8080/10666
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I98760ed8f31b18b489a156f945c29c95c9bf3184
Gerrit-Change-Number: 10666
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 11 Jun 2018 18:37:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7157: Avoid unnecessarily pretty printing profiles per fragment instance

2018-06-11 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10669 )

Change subject: IMPALA-7157: Avoid unnecessarily pretty printing profiles per 
fragment instance
..


Patch Set 1: Code-Review+1

(2 comments)

Thanks for the reviews.

Carry +1.

http://gerrit.cloudera.org:8080/#/c/10669/1/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/10669/1/be/src/runtime/fragment-instance-state.cc@a371
PS1, Line 371:
 :
 :
> From git blame, it appears to be added in 368115cda. Probably was useful wh
It's actually from the following commit which is from 6 years ago:
https://github.com/apache/impala/commit/7725f25ff5219fb7440ac92d32802dd4ce3cb8f0

Commit 368115cda just moved it around. So, I feel that it's reasonable to 
remove this now.


http://gerrit.cloudera.org:8080/#/c/10669/1/be/src/runtime/fragment-instance-state.cc@368
PS1, Line 368: if (VLOG_FILE_IS_ON) {
> you should just remove that if-stmt now. the VLOG_FILE macro will have the
Done



--
To view, visit http://gerrit.cloudera.org:8080/10669
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0445950385fa6160764feaed9a993fa0e59b242
Gerrit-Change-Number: 10669
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 11 Jun 2018 18:14:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7157: Avoid unnecessarily pretty printing profiles per fragment instance

2018-06-11 Thread Sailesh Mukil (Code Review)
Hello Michael Ho, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10669

to look at the new patch set (#2).

Change subject: IMPALA-7157: Avoid unnecessarily pretty printing profiles per 
fragment instance
..

IMPALA-7157: Avoid unnecessarily pretty printing profiles per fragment instance

In SendReport(), if VLOG_FILE_IS_ON is 'true' (which is not the most
verbose logging level, but is higher than default), we pretty print
the profile for every fragment instance, which is a very expensive
operation, as serializing the profile is non-trivial (look at
RuntimeProfile::PrettyPrint()), and printing large amounts of information
to the logs isn't cheap as well. Lastly, it is very noisy.

This seems unnecessary since this will not benefit us, as all the
profiles are merged at the coordinator side. We could argue that this
might be necessary when an executor fails to send the profile to the
coordinator, but that signifies a network issue which will not be
reflected in the profile of any fragment instance.

This will help reduce noise in the logs when the log level is
bumped up to find other real issues that VLOG_FILE can help with.

Change-Id: Ic0445950385fa6160764feaed9a993fa0e59b242
---
M be/src/runtime/fragment-instance-state.cc
1 file changed, 2 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/10669/2
--
To view, visit http://gerrit.cloudera.org:8080/10669
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic0445950385fa6160764feaed9a993fa0e59b242
Gerrit-Change-Number: 10669
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-7158: Fix HdfsScanNodeBase::progress 's init

2018-06-08 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10672 )

Change subject: IMPALA-7158: Fix HdfsScanNodeBase::progress_'s init
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/10672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id2a738edea80ff3fb13ff368b4093c8b4ef34df7
Gerrit-Change-Number: 10672
Gerrit-PatchSet: 1
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 09 Jun 2018 00:53:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7157: Avoid unnecessarily pretty printing profiles per fragment instance

2018-06-08 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10669 )

Change subject: IMPALA-7157: Avoid unnecessarily pretty printing profiles per 
fragment instance
..


Patch Set 1:

> Uploaded patch set 1.

I found this while working on another patch. If anyone feels that printing the 
profiles for every fragment instance periodically is useful, or has found it 
useful in the past, please speak up.


--
To view, visit http://gerrit.cloudera.org:8080/10669
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0445950385fa6160764feaed9a993fa0e59b242
Gerrit-Change-Number: 10669
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Fri, 08 Jun 2018 23:43:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7157: Avoid unnecessarily pretty printing profiles per fragment instance

2018-06-08 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10669


Change subject: IMPALA-7157: Avoid unnecessarily pretty printing profiles per 
fragment instance
..

IMPALA-7157: Avoid unnecessarily pretty printing profiles per fragment instance

In SendReport(), if VLOG_FILE_IS_ON is 'true' (which is not the most
verbose logging level, but is higher than default), we pretty print
the profile for every fragment instance, which is a very expensive
operation, as serializing the profile is non-trivial (look at
RuntimeProfile::PrettyPrint()), and printing large amounts of information
to the logs isn't cheap as well. Lastly, it is very noisy.

This seems unnecessary since this will not benefit us, as all the
profiles are merged at the coordinator side. We could argue that this
might be necessary when an executor fails to send the profile to the
coordinator, but that signifies a network issue which will not be
reflected in the profile of any fragment instance.

This will help reduce noise in the logs when the log level is
bumped up to find other real issues that VLOG_FILE can help with.

Change-Id: Ic0445950385fa6160764feaed9a993fa0e59b242
---
M be/src/runtime/fragment-instance-state.cc
1 file changed, 0 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/10669/1
--
To view, visit http://gerrit.cloudera.org:8080/10669
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic0445950385fa6160764feaed9a993fa0e59b242
Gerrit-Change-Number: 10669
Gerrit-PatchSet: 1
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-6119: Fix issue with multiple partitions sharing same location

2018-06-07 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10543 )

Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same 
location
..


Patch Set 9: Code-Review+1

> (1 comment)
 >
 > Thanks for your comments, Sailesh! This change doesn't cover the
 > case when 2 partitions from different tables point to the same
 > location. I'm not sure that it's a valid real world scenario. I
 > mean I tried and it's feasible to do this but form me it doesn't
 > make sense to point one partition to a different table's directory
 > in the file system. Do you think we should make this documented? I
 > think that it also has to be documented that all the other
 > partitions are dropped on a same location when you drop one of
 > them.

I was under the impression that 2 tables with the same structure could have 
partitions pointing to the same location, but maybe that's an extreme edge case 
and we don't need to worry about it.

Yes, let's at least document the purpose of this patch. We can open a seperate 
docs JIRA for that.

Thanks for fixing all this so far. This LGTM! I'll let Bharath sign off.


--
To view, visit http://gerrit.cloudera.org:8080/10543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a
Gerrit-Change-Number: 10543
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 07 Jun 2018 15:49:45 +
Gerrit-HasComments: No


  1   2   3   4   5   >