[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..


Patch Set 6: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 6
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 19 Sep 2019 23:07:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..

IMPALA-8634: Catalog client should retry RPCs

Add retries to catalogd RPCs. Previously, connection failures triggered
a retry, but failures on the actual RPC did not trigger a retry. This
change replaces all usages of ClientCache::DoRpc() in the
CatalogOpExecutor with ClientCache::DoRpcWithRetry(). This change moves
the connection retry loop to DoRpcWithRetry(), instead of relying on the
ClientCache to retry the connection.

This patch is based to IMPALA-8904, which adds similar functionality to
statestore RPCs.

Testing:
* Renamed test_statestore_rpc_errors.py to test_services_rpc_errors.py
and added new tests for catalogd RPC errors
* Added new tests to test_restart_services.py
* Ran core tests

Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Reviewed-on: http://gerrit.cloudera.org:8080/14246
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exec/catalog-op-executor.cc
M be/src/runtime/exec-env.cc
M tests/custom_cluster/test_restart_services.py
A tests/custom_cluster/test_services_rpc_errors.py
D tests/custom_cluster/test_statestore_rpc_errors.py
5 files changed, 232 insertions(+), 85 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 7
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4968/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 6
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 19 Sep 2019 18:54:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 6
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 19 Sep 2019 18:54:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4594/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Sep 2019 23:32:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-18 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..


Patch Set 5: Code-Review+2

Carrying +2


--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Sep 2019 22:50:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-18 Thread Sahil Takiar (Code Review)
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14246

to look at the new patch set (#5).

Change subject: IMPALA-8634: Catalog client should retry RPCs
..

IMPALA-8634: Catalog client should retry RPCs

Add retries to catalogd RPCs. Previously, connection failures triggered
a retry, but failures on the actual RPC did not trigger a retry. This
change replaces all usages of ClientCache::DoRpc() in the
CatalogOpExecutor with ClientCache::DoRpcWithRetry(). This change moves
the connection retry loop to DoRpcWithRetry(), instead of relying on the
ClientCache to retry the connection.

This patch is based to IMPALA-8904, which adds similar functionality to
statestore RPCs.

Testing:
* Renamed test_statestore_rpc_errors.py to test_services_rpc_errors.py
and added new tests for catalogd RPC errors
* Added new tests to test_restart_services.py
* Ran core tests

Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
---
M be/src/exec/catalog-op-executor.cc
M be/src/runtime/exec-env.cc
M tests/custom_cluster/test_restart_services.py
A tests/custom_cluster/test_services_rpc_errors.py
D tests/custom_cluster/test_statestore_rpc_errors.py
5 files changed, 232 insertions(+), 85 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/14246/5
--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-18 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14246/4/tests/custom_cluster/test_services_rpc_errors.py
File tests/custom_cluster/test_services_rpc_errors.py:

http://gerrit.cloudera.org:8080/#/c/14246/4/tests/custom_cluster/test_services_rpc_errors.py@80
PS4, Line 80: Validte
> typo
Done



--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Sep 2019 22:50:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14246/4/tests/custom_cluster/test_services_rpc_errors.py
File tests/custom_cluster/test_services_rpc_errors.py:

http://gerrit.cloudera.org:8080/#/c/14246/4/tests/custom_cluster/test_services_rpc_errors.py@80
PS4, Line 80: Validte
typo



--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Sep 2019 22:35:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..


Patch Set 4: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14246/1/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/14246/1/be/src/runtime/exec-env.cc@144
PS1, Line 144: 0
> Sure, this patch or a separate one? Not sure what an appropriate default fo
A separate one makes sense. Please also see 
https://issues.apache.org/jira/browse/KUDU-2192



--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Sep 2019 22:33:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4588/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Sep 2019 20:32:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-18 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14246/1/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/14246/1/be/src/runtime/exec-env.cc@142
PS1, Line 142: DEFINE_int32(catalog_client_connection_num_retries, 3, "The 
number of times connections "
> Seems reasonable to me.
Done, interestingly 10 retries and a 3 second interval is what the statestore 
client is already using.


http://gerrit.cloudera.org:8080/#/c/14246/1/be/src/runtime/exec-env.cc@144
PS1, Line 144: 0
> We may want to consider setting this timeout to non-zero at some point. Whe
Sure, this patch or a separate one? Not sure what an appropriate default for 
this would be.


http://gerrit.cloudera.org:8080/#/c/14246/1/be/src/runtime/exec-env.cc@180
PS1, Line 180: 1, 0
> May help to name these constants as variables.
I added some code comments which I think makes it clearer than introducing 
variables like `catalog_client_connection_num_retries` and 
`catalog_client_rpc_retry_interval_ms`.

It still tries to recreate new connections, but the retry is driven by the 
DoRpcWithRetry() loop rather than the loop in ThriftClientImpl::OpenWithRetry.

The loop in DoRpcWithRetry() first tries to create a new ClientConnection, that 
will trigger a call to ClientCacheHelper::GetClient, which will either try to 
create a new connection or used a cached one. If it tries to get a new 
connection and the catalogd is down, the connection will fail, but the next 
iteration of the loop in DoRpcWithRetry() will retry to establish the 
connection. If GetClient finds a cached connection, the RPC will fail, and 
again the loop will trigger a retry.



--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Sep 2019 19:53:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-18 Thread Sahil Takiar (Code Review)
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14246

to look at the new patch set (#4).

Change subject: IMPALA-8634: Catalog client should retry RPCs
..

IMPALA-8634: Catalog client should retry RPCs

Add retries to catalogd RPCs. Previously, connection failures triggered
a retry, but failures on the actual RPC did not trigger a retry. This
change replaces all usages of ClientCache::DoRpc() in the
CatalogOpExecutor with ClientCache::DoRpcWithRetry(). This change moves
the connection retry loop to DoRpcWithRetry(), instead of relying on the
ClientCache to retry the connection.

This patch is based to IMPALA-8904, which adds similar functionality to
statestore RPCs.

Testing:
* Renamed test_statestore_rpc_errors.py to test_services_rpc_errors.py
and added new tests for catalogd RPC errors
* Added new tests to test_restart_services.py
* Ran core tests

Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
---
M be/src/exec/catalog-op-executor.cc
M be/src/runtime/exec-env.cc
M tests/custom_cluster/test_restart_services.py
A tests/custom_cluster/test_services_rpc_errors.py
D tests/custom_cluster/test_statestore_rpc_errors.py
5 files changed, 232 insertions(+), 85 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/14246/4
--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14246/1/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/14246/1/be/src/runtime/exec-env.cc@142
PS1, Line 142: DEFINE_int32(catalog_client_connection_num_retries, 3, "The 
number of times connections "
> Yeah, 10 seconds does seem a little long. How about 10 retries, with an int
Seems reasonable to me.


http://gerrit.cloudera.org:8080/#/c/14246/1/be/src/runtime/exec-env.cc@144
PS1, Line 144: 0
We may want to consider setting this timeout to non-zero at some point. When we 
convert the Catalog clients to use KRPC, we may be able to rely on using 
TCP_USER_TIMEOUT which seems more appropriate than socket timeout.


http://gerrit.cloudera.org:8080/#/c/14246/1/be/src/runtime/exec-env.cc@180
PS1, Line 180: 1, 0
May help to name these constants as variables.

IIUC, this change relies on the sleep and retry  in the retry loop of 
DoRpcWithRetry() so simply don't try recreating new connections on failure now.



--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Sep 2019 06:52:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4582/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Sep 2019 02:20:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4581/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Sep 2019 02:20:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-17 Thread Sahil Takiar (Code Review)
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14246

to look at the new patch set (#3).

Change subject: IMPALA-8634: Catalog client should retry RPCs
..

IMPALA-8634: Catalog client should retry RPCs

Add retries to catalogd RPCs. Previously, connection failures triggered
a retry, but failures on the actual RPC did not trigger a retry. This
change replaces all usages of ClientCache::DoRpc() in the
CatalogOpExecutor with ClientCache::DoRpcWithRetry(). This change moves
the connection retry loop to DoRpcWithRetry(), instead of relying on the
ClientCache to retry the connection.

This patch is based to IMPALA-8904, which adds similar functionality to
statestore RPCs.

Testing:
* Renamed test_statestore_rpc_errors.py to test_services_rpc_errors.py
and added new tests for catalogd RPC errors
* Added new tests to test_restart_services.py
* Ran core tests

Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
---
M be/src/exec/catalog-op-executor.cc
M be/src/runtime/exec-env.cc
M tests/custom_cluster/test_restart_services.py
A tests/custom_cluster/test_services_rpc_errors.py
D tests/custom_cluster/test_statestore_rpc_errors.py
5 files changed, 227 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/14246/3
--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-17 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14246/1/be/src/exec/catalog-op-executor.cc
File be/src/exec/catalog-op-executor.cc:

http://gerrit.cloudera.org:8080/#/c/14246/1/be/src/exec/catalog-op-executor.cc@62
PS1, Line 62: static Status CatalogRpcDebugFn(int* attempt) {
> nit: pass as a pointer
Done


http://gerrit.cloudera.org:8080/#/c/14246/2/be/src/exec/catalog-op-executor.cc
File be/src/exec/catalog-op-executor.cc:

http://gerrit.cloudera.org:8080/#/c/14246/2/be/src/exec/catalog-op-executor.cc@63
PS2, Line 63:   return (*attempt)++ == 0 ? DebugAction(FLAGS_debug_actions, 
"CATALOG_RPC_FIRST_ATTEMPT") :
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/14246/1/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/14246/1/be/src/runtime/exec-env.cc@142
PS1, Line 142: DEFINE_int32(catalog_client_connection_num_retries, 3, "The 
number of times connections "
> I kinda wonder if we should tweak the defaults to make it poll more frequen
Yeah, 10 seconds does seem a little long. How about 10 retries, with an 
interval of 3 seconds? This way it still waits 30 seconds, but it just retries 
more often.



--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Sep 2019 01:41:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14246/2/be/src/exec/catalog-op-executor.cc
File be/src/exec/catalog-op-executor.cc:

http://gerrit.cloudera.org:8080/#/c/14246/2/be/src/exec/catalog-op-executor.cc@63
PS2, Line 63:   return (*attempt)++ == 0 ? DebugAction(FLAGS_debug_actions, 
"CATALOG_RPC_FIRST_ATTEMPT") :
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Sep 2019 01:39:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-17 Thread Sahil Takiar (Code Review)
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14246

to look at the new patch set (#2).

Change subject: IMPALA-8634: Catalog client should retry RPCs
..

IMPALA-8634: Catalog client should retry RPCs

Add retries to catalogd RPCs. Previously, connection failures triggered
a retry, but failures on the actual RPC did not trigger a retry. This
change replaces all usages of ClientCache::DoRpc() in the
CatalogOpExecutor with ClientCache::DoRpcWithRetry(). This change moves
the connection retry loop to DoRpcWithRetry(), instead of relying on the
ClientCache to retry the connection.

This patch is based to IMPALA-8904, which adds similar functionality to
statestore RPCs.

Testing:
* Renamed test_statestore_rpc_errors.py to test_services_rpc_errors.py
and added new tests for catalogd RPC errors
* Added new tests to test_restart_services.py
* Ran core tests

Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
---
M be/src/exec/catalog-op-executor.cc
M be/src/runtime/exec-env.cc
M tests/custom_cluster/test_restart_services.py
A tests/custom_cluster/test_services_rpc_errors.py
D tests/custom_cluster/test_statestore_rpc_errors.py
5 files changed, 226 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/14246/2
--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 17 Sep 2019 20:47:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14246/1/be/src/exec/catalog-op-executor.cc
File be/src/exec/catalog-op-executor.cc:

http://gerrit.cloudera.org:8080/#/c/14246/1/be/src/exec/catalog-op-executor.cc@62
PS1, Line 62: static Status CatalogRpcDebugFn(int& attempt) {
nit: pass as a pointer


http://gerrit.cloudera.org:8080/#/c/14246/1/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/14246/1/be/src/runtime/exec-env.cc@142
PS1, Line 142: DEFINE_int32(catalog_client_connection_num_retries, 3, "The 
number of times connections "
I kinda wonder if we should tweak the defaults to make it poll more frequently 
and/or longer. Every 10 seconds seems kinda long if the outage might be 
intermittent flakiness.

30 seconds also might not be long enough for the catalogd to recover (although 
I can see we might just want to fail queries if they're delayed 30+ seconds).



--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 17 Sep 2019 20:43:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14246 )

Change subject: IMPALA-8634: Catalog client should retry RPCs
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4576/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 17 Sep 2019 18:04:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8634: Catalog client should retry RPCs

2019-09-17 Thread Sahil Takiar (Code Review)
Sahil Takiar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/14246


Change subject: IMPALA-8634: Catalog client should retry RPCs
..

IMPALA-8634: Catalog client should retry RPCs

Add retries to catalogd RPCs. Previously, connection failures triggered
a retry, but failures on the actual RPC did not trigger a retry. This
change replaces all usages of ClientCache::DoRpc() in the
CatalogOpExecutor with ClientCache::DoRpcWithRetry(). This change moves
the connection retry loop to DoRpcWithRetry(), instead of relying on the
ClientCache to retry the connection.

This patch is based to IMPALA-8904, which adds similar functionality to
statestore RPCs.

Testing:
* Renamed test_statestore_rpc_errors.py to test_services_rpc_errors.py
and added new tests for catalogd RPC errors
* Added new tests to test_restart_services.py
* Ran core tests

Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
---
M be/src/exec/catalog-op-executor.cc
M be/src/runtime/exec-env.cc
M tests/custom_cluster/test_restart_services.py
A tests/custom_cluster/test_services_rpc_errors.py
D tests/custom_cluster/test_statestore_rpc_errors.py
5 files changed, 226 insertions(+), 84 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/14246/1
--
To view, visit http://gerrit.cloudera.org:8080/14246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7f33ad2b36d301fb64e70a939e71decab0ca993c
Gerrit-Change-Number: 14246
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar