[kudu-CR] [java client] ITClient's new snapshot scanners can easily timeout
Jean-Daniel Cryans has submitted this change and it was merged. Change subject: [java client] ITClient's new snapshot scanners can easily timeout .. [java client] ITClient's new snapshot scanners can easily timeout This patch makes it so that a scanner timeout won't kill the test. It also takes care of a typo in related javadoc. The failure in question: 19:56:29.459 [ERROR - Thread-10] (ITClient.java:134) Got error while getting row 260 org.apache.kudu.client.NonRecoverableException: could not wait for desired snapshot timestamp to be consistent: Timed out waiting for all transactions with ts < P: 1475006188645381 usec, L: 0 to commit at org.apache.kudu.client.TabletClient.dispatchTSErrorOrReturnException(TabletClient.java:548) at org.apache.kudu.client.TabletClient.decode(TabletClient.java:482) at org.apache.kudu.client.TabletClient.decode(TabletClient.java:83) ... On the server-side: W0927 19:56:29.455293 746 tablet_service.cc:1509] Error setting up scanner with request new_scan_request { tablet_id: "5584580c8ff3463e8092814f4b45ee99" limit: 9223372036854775807 projected_columns { name: "key" type: INT32 is_key: true is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION cfile_block_size: 0 } projected_columns { name: "column1_i" type: INT32 is_key: false is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION cfile_block_size: 0 } projected_columns { name: "column2_i" type: INT32 is_key: false is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION cfile_block_size: 0 } projected_columns { name: "column3_s" type: STRING is_key: false is_nullable: true encoding: DICT_ENCODING compression: LZ4 cfile_block_size: 4096 } projected_columns { name: "column4_b" type: BOOL is_key: false is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION cfile_block_size: 0 } read_mode: READ_AT! _SNAPSHOT snap_timestamp: 6041625348691480576 propagated_timestamp: 6041625348556406784 cache_blocks: true order_mode: UNORDERED column_predicates { column: "key" equality { value: "\004\001\000\000" } } } batch_size_bytes: 1048576 W0927 19:56:29.457337 746 rpcz_store.cc:234] Call kudu.tserver.TabletServerService.Scan from 127.0.0.1:44116 (request call id 3) took 492ms (client timeout 500). W0927 19:56:29.458102 746 rpcz_store.cc:238] Trace: 0927 19:56:28.964746 (+ 0us) service_pool.cc:143] Inserting onto call queue 0927 19:56:28.965109 (+ 363us) service_pool.cc:202] Handling call 0927 19:56:28.967353 (+ 2244us) tablet_service.cc:1466] Creating iterator 0927 19:56:28.967400 (+47us) tablet_service.cc:1782] Waiting for operations in snapshot to commit 0927 19:56:29.455140 (+487740us) tablet_service.cc:1485] Iterator created 0927 19:56:29.455202 (+62us) tablet_service.cc:1500] Iterator init: Timed out: could not wait for desired snapshot timestamp to be consistent: Timed out waiting for all transactions with ts < P: 1475006188645381 usec, L: 0 to commit Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e Reviewed-on: http://gerrit.cloudera.org:8080/4545 Reviewed-by: Adar DemboTested-by: Kudu Jenkins --- M java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java 1 file changed, 10 insertions(+), 1 deletion(-) Approvals: Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Jean-Daniel Cryans Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] [java client] ITClient's new snapshot scanners can easily timeout
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4545 to look at the new patch set (#2). Change subject: [java client] ITClient's new snapshot scanners can easily timeout .. [java client] ITClient's new snapshot scanners can easily timeout This patch makes it so that a scanner timeout won't kill the test. It also takes care of a typo in related javadoc. The failure in question: 19:56:29.459 [ERROR - Thread-10] (ITClient.java:134) Got error while getting row 260 org.apache.kudu.client.NonRecoverableException: could not wait for desired snapshot timestamp to be consistent: Timed out waiting for all transactions with ts < P: 1475006188645381 usec, L: 0 to commit at org.apache.kudu.client.TabletClient.dispatchTSErrorOrReturnException(TabletClient.java:548) at org.apache.kudu.client.TabletClient.decode(TabletClient.java:482) at org.apache.kudu.client.TabletClient.decode(TabletClient.java:83) ... On the server-side: W0927 19:56:29.455293 746 tablet_service.cc:1509] Error setting up scanner with request new_scan_request { tablet_id: "5584580c8ff3463e8092814f4b45ee99" limit: 9223372036854775807 projected_columns { name: "key" type: INT32 is_key: true is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION cfile_block_size: 0 } projected_columns { name: "column1_i" type: INT32 is_key: false is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION cfile_block_size: 0 } projected_columns { name: "column2_i" type: INT32 is_key: false is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION cfile_block_size: 0 } projected_columns { name: "column3_s" type: STRING is_key: false is_nullable: true encoding: DICT_ENCODING compression: LZ4 cfile_block_size: 4096 } projected_columns { name: "column4_b" type: BOOL is_key: false is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION cfile_block_size: 0 } read_mode: READ_AT! _SNAPSHOT snap_timestamp: 6041625348691480576 propagated_timestamp: 6041625348556406784 cache_blocks: true order_mode: UNORDERED column_predicates { column: "key" equality { value: "\004\001\000\000" } } } batch_size_bytes: 1048576 W0927 19:56:29.457337 746 rpcz_store.cc:234] Call kudu.tserver.TabletServerService.Scan from 127.0.0.1:44116 (request call id 3) took 492ms (client timeout 500). W0927 19:56:29.458102 746 rpcz_store.cc:238] Trace: 0927 19:56:28.964746 (+ 0us) service_pool.cc:143] Inserting onto call queue 0927 19:56:28.965109 (+ 363us) service_pool.cc:202] Handling call 0927 19:56:28.967353 (+ 2244us) tablet_service.cc:1466] Creating iterator 0927 19:56:28.967400 (+47us) tablet_service.cc:1782] Waiting for operations in snapshot to commit 0927 19:56:29.455140 (+487740us) tablet_service.cc:1485] Iterator created 0927 19:56:29.455202 (+62us) tablet_service.cc:1500] Iterator init: Timed out: could not wait for desired snapshot timestamp to be consistent: Timed out waiting for all transactions with ts < P: 1475006188645381 usec, L: 0 to commit Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e --- M java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java 1 file changed, 10 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/4545/2 -- To view, visit http://gerrit.cloudera.org:8080/4545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Jean-Daniel CryansGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] [java client] ITClient's new snapshot scanners can easily timeout
Adar Dembo has posted comments on this change. Change subject: [java client] ITClient's new snapshot scanners can easily timeout .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Jean-Daniel CryansGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] [java client] ITClient's new snapshot scanners can easily timeout
Adar Dembo has posted comments on this change. Change subject: [java client] ITClient's new snapshot scanners can easily timeout .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4545/1/java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java File java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java: PS1, Line 416: // It's possible to get timeouts if we're unlucky. A particularly common one is : // "could not wait for desired snapshot timestamp to be consistent" since we're using : // READ_AT_SNAPSHOT scanners. : if (e.getStatus().isTimedOut()) { : LOG.warn("Received a scan timeout", e); : return true; : } > I think the problem here is that we cannot distinguish between a timeout ca But shouldn't we retry in either case? The C++ client differentiates between operation level timeouts and RPC timeouts, and at any point where an RPC times out, it'll look at the operation timeout to decide whether to retry. I'd expect the Java client to behave similarly. -- To view, visit http://gerrit.cloudera.org:8080/4545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Jean-Daniel CryansGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] [java client] ITClient's new snapshot scanners can easily timeout
Jean-Daniel Cryans has posted comments on this change. Change subject: [java client] ITClient's new snapshot scanners can easily timeout .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4545/1/java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java File java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java: PS1, Line 416: // It's possible to get timeouts if we're unlucky. A particularly common one is : // "could not wait for desired snapshot timestamp to be consistent" since we're using : // READ_AT_SNAPSHOT scanners. : if (e.getStatus().isTimedOut()) { : LOG.warn("Received a scan timeout", e); : return true; : } > I see. That comment assumes that the client will retry, though. Why isn't i I think the problem here is that we cannot distinguish between a timeout caused by "could not wait for desired snapshot timestamp" and an RPC-level timeout. We could introspect the message but it seems flaky. IMO this is one of those "gray zones" in our APIs. -- To view, visit http://gerrit.cloudera.org:8080/4545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Jean-Daniel CryansGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] [java client] ITClient's new snapshot scanners can easily timeout
Adar Dembo has posted comments on this change. Change subject: [java client] ITClient's new snapshot scanners can easily timeout .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4545/1/java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java File java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java: PS1, Line 416: // It's possible to get timeouts if we're unlucky. A particularly common one is : // "could not wait for desired snapshot timestamp to be consistent" since we're using : // READ_AT_SNAPSHOT scanners. : if (e.getStatus().isTimedOut()) { : LOG.warn("Received a scan timeout", e); : return true; : } > This made me think that timing out is the expected behavior: I see. That comment assumes that the client will retry, though. Why isn't it retrying? When the client sees the server-side timeout, shouldn't it compare the overall scan deadline with the current time and decide to retry? If that worked correctly, you could bump the scan timeout and everything would behave as expected: the server would time out a scan request early (thus freeing up an RPC handler thread) and the client would retry, until the scan succeeds or the real timeout is actually reached. -- To view, visit http://gerrit.cloudera.org:8080/4545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Jean-Daniel CryansGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] [java client] ITClient's new snapshot scanners can easily timeout
Jean-Daniel Cryans has posted comments on this change. Change subject: [java client] ITClient's new snapshot scanners can easily timeout .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4545/1/java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java File java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java: PS1, Line 416: // It's possible to get timeouts if we're unlucky. A particularly common one is : // "could not wait for desired snapshot timestamp to be consistent" since we're using : // READ_AT_SNAPSHOT scanners. : if (e.getStatus().isTimedOut()) { : LOG.warn("Received a scan timeout", e); : return true; : } > Why not just extend the scan timeout instead? This made me think that timing out is the expected behavior: https://github.com/cloudera/kudu/blob/master/src/kudu/tserver/tablet_service.cc#L1764 -- To view, visit http://gerrit.cloudera.org:8080/4545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Jean-Daniel CryansGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] [java client] ITClient's new snapshot scanners can easily timeout
Adar Dembo has posted comments on this change. Change subject: [java client] ITClient's new snapshot scanners can easily timeout .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4545/1/java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java File java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java: PS1, Line 416: // It's possible to get timeouts if we're unlucky. A particularly common one is : // "could not wait for desired snapshot timestamp to be consistent" since we're using : // READ_AT_SNAPSHOT scanners. : if (e.getStatus().isTimedOut()) { : LOG.warn("Received a scan timeout", e); : return true; : } Why not just extend the scan timeout instead? -- To view, visit http://gerrit.cloudera.org:8080/4545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Jean-Daniel CryansGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] [java client] ITClient's new snapshot scanners can easily timeout
Jean-Daniel Cryans has uploaded a new change for review. http://gerrit.cloudera.org:8080/4545 Change subject: [java client] ITClient's new snapshot scanners can easily timeout .. [java client] ITClient's new snapshot scanners can easily timeout This patch makes it so that a scanner timeout won't kill the test. It also takes care of a typo in related javadoc. The failure in question: 19:56:29.459 [ERROR - Thread-10] (ITClient.java:134) Got error while getting row 260 org.apache.kudu.client.NonRecoverableException: could not wait for desired snapshot timestamp to be consistent: Timed out waiting for all transactions with ts < P: 1475006188645381 usec, L: 0 to commit at org.apache.kudu.client.TabletClient.dispatchTSErrorOrReturnException(TabletClient.java:548) at org.apache.kudu.client.TabletClient.decode(TabletClient.java:482) at org.apache.kudu.client.TabletClient.decode(TabletClient.java:83) ... On the server-side: W0927 19:56:29.455293 746 tablet_service.cc:1509] Error setting up scanner with request new_scan_request { tablet_id: "5584580c8ff3463e8092814f4b45ee99" limit: 9223372036854775807 projected_columns { name: "key" type: INT32 is_key: true is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION cfile_block_size: 0 } projected_columns { name: "column1_i" type: INT32 is_key: false is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION cfile_block_size: 0 } projected_columns { name: "column2_i" type: INT32 is_key: false is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION cfile_block_size: 0 } projected_columns { name: "column3_s" type: STRING is_key: false is_nullable: true encoding: DICT_ENCODING compression: LZ4 cfile_block_size: 4096 } projected_columns { name: "column4_b" type: BOOL is_key: false is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION cfile_block_size: 0 } read_mode: READ_AT! _SNAPSHOT snap_timestamp: 6041625348691480576 propagated_timestamp: 6041625348556406784 cache_blocks: true order_mode: UNORDERED column_predicates { column: "key" equality { value: "\004\001\000\000" } } } batch_size_bytes: 1048576 W0927 19:56:29.457337 746 rpcz_store.cc:234] Call kudu.tserver.TabletServerService.Scan from 127.0.0.1:44116 (request call id 3) took 492ms (client timeout 500). W0927 19:56:29.458102 746 rpcz_store.cc:238] Trace: 0927 19:56:28.964746 (+ 0us) service_pool.cc:143] Inserting onto call queue 0927 19:56:28.965109 (+ 363us) service_pool.cc:202] Handling call 0927 19:56:28.967353 (+ 2244us) tablet_service.cc:1466] Creating iterator 0927 19:56:28.967400 (+47us) tablet_service.cc:1782] Waiting for operations in snapshot to commit 0927 19:56:29.455140 (+487740us) tablet_service.cc:1485] Iterator created 0927 19:56:29.455202 (+62us) tablet_service.cc:1500] Iterator init: Timed out: could not wait for desired snapshot timestamp to be consistent: Timed out waiting for all transactions with ts < P: 1475006188645381 usec, L: 0 to commit Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e --- M java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java 1 file changed, 9 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/4545/1 -- To view, visit http://gerrit.cloudera.org:8080/4545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Jean-Daniel Cryans