[kudu-CR] [java client] ITClient's new snapshot scanners can easily timeout

2016-09-27 Thread Jean-Daniel Cryans (Code Review)
Jean-Daniel Cryans has submitted this change and it was merged.

Change subject: [java client] ITClient's new snapshot scanners can easily 
timeout
..


[java client] ITClient's new snapshot scanners can easily timeout

This patch makes it so that a scanner timeout won't kill the test. It also
takes care of a typo in related javadoc.

The failure in question:
19:56:29.459 [ERROR - Thread-10] (ITClient.java:134) Got error while getting 
row 260
org.apache.kudu.client.NonRecoverableException: could not wait for desired 
snapshot timestamp to be consistent: Timed out waiting for all transactions 
with ts < P: 1475006188645381 usec, L: 0 to commit
at 
org.apache.kudu.client.TabletClient.dispatchTSErrorOrReturnException(TabletClient.java:548)
at org.apache.kudu.client.TabletClient.decode(TabletClient.java:482)
at org.apache.kudu.client.TabletClient.decode(TabletClient.java:83)
...

On the server-side:
W0927 19:56:29.455293   746 tablet_service.cc:1509] Error setting up scanner 
with request new_scan_request { tablet_id: "5584580c8ff3463e8092814f4b45ee99" 
limit: 9223372036854775807 projected_columns { name: "key" type: INT32 is_key: 
true is_nullable: false encoding: AUTO_ENCODING compression: 
DEFAULT_COMPRESSION cfile_block_size: 0 } projected_columns { name: "column1_i" 
type: INT32 is_key: false is_nullable: false encoding: AUTO_ENCODING 
compression: DEFAULT_COMPRESSION cfile_block_size: 0 } projected_columns { 
name: "column2_i" type: INT32 is_key: false is_nullable: false encoding: 
AUTO_ENCODING compression: DEFAULT_COMPRESSION cfile_block_size: 0 } 
projected_columns { name: "column3_s" type: STRING is_key: false is_nullable: 
true encoding: DICT_ENCODING compression: LZ4 cfile_block_size: 4096 } 
projected_columns { name: "column4_b" type: BOOL is_key: false is_nullable: 
false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION 
cfile_block_size: 0 } read_mode: READ_AT!
 _SNAPSHOT snap_timestamp: 6041625348691480576 propagated_timestamp: 
6041625348556406784 cache_blocks: true order_mode: UNORDERED column_predicates 
{ column: "key" equality { value: "\004\001\000\000" } } } batch_size_bytes: 
1048576
W0927 19:56:29.457337   746 rpcz_store.cc:234] Call 
kudu.tserver.TabletServerService.Scan from 127.0.0.1:44116 (request call id 3) 
took 492ms (client timeout 500).
W0927 19:56:29.458102   746 rpcz_store.cc:238] Trace:
 0927 19:56:28.964746 (+ 0us) service_pool.cc:143] Inserting onto call queue
 0927 19:56:28.965109 (+   363us) service_pool.cc:202] Handling call
 0927 19:56:28.967353 (+  2244us) tablet_service.cc:1466] Creating iterator
 0927 19:56:28.967400 (+47us) tablet_service.cc:1782] Waiting for 
operations in snapshot to commit
 0927 19:56:29.455140 (+487740us) tablet_service.cc:1485] Iterator created
 0927 19:56:29.455202 (+62us) tablet_service.cc:1500] Iterator init: Timed 
out: could not wait for desired snapshot timestamp to be consistent: Timed out 
waiting for all transactions with ts < P: 1475006188645381 usec, L: 0 to commit

Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e
Reviewed-on: http://gerrit.cloudera.org:8080/4545
Reviewed-by: Adar Dembo 
Tested-by: Kudu Jenkins
---
M java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java
1 file changed, 10 insertions(+), 1 deletion(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] [java client] ITClient's new snapshot scanners can easily timeout

2016-09-27 Thread Jean-Daniel Cryans (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4545

to look at the new patch set (#2).

Change subject: [java client] ITClient's new snapshot scanners can easily 
timeout
..

[java client] ITClient's new snapshot scanners can easily timeout

This patch makes it so that a scanner timeout won't kill the test. It also
takes care of a typo in related javadoc.

The failure in question:
19:56:29.459 [ERROR - Thread-10] (ITClient.java:134) Got error while getting 
row 260
org.apache.kudu.client.NonRecoverableException: could not wait for desired 
snapshot timestamp to be consistent: Timed out waiting for all transactions 
with ts < P: 1475006188645381 usec, L: 0 to commit
at 
org.apache.kudu.client.TabletClient.dispatchTSErrorOrReturnException(TabletClient.java:548)
at org.apache.kudu.client.TabletClient.decode(TabletClient.java:482)
at org.apache.kudu.client.TabletClient.decode(TabletClient.java:83)
...

On the server-side:
W0927 19:56:29.455293   746 tablet_service.cc:1509] Error setting up scanner 
with request new_scan_request { tablet_id: "5584580c8ff3463e8092814f4b45ee99" 
limit: 9223372036854775807 projected_columns { name: "key" type: INT32 is_key: 
true is_nullable: false encoding: AUTO_ENCODING compression: 
DEFAULT_COMPRESSION cfile_block_size: 0 } projected_columns { name: "column1_i" 
type: INT32 is_key: false is_nullable: false encoding: AUTO_ENCODING 
compression: DEFAULT_COMPRESSION cfile_block_size: 0 } projected_columns { 
name: "column2_i" type: INT32 is_key: false is_nullable: false encoding: 
AUTO_ENCODING compression: DEFAULT_COMPRESSION cfile_block_size: 0 } 
projected_columns { name: "column3_s" type: STRING is_key: false is_nullable: 
true encoding: DICT_ENCODING compression: LZ4 cfile_block_size: 4096 } 
projected_columns { name: "column4_b" type: BOOL is_key: false is_nullable: 
false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION 
cfile_block_size: 0 } read_mode: READ_AT!
 _SNAPSHOT snap_timestamp: 6041625348691480576 propagated_timestamp: 
6041625348556406784 cache_blocks: true order_mode: UNORDERED column_predicates 
{ column: "key" equality { value: "\004\001\000\000" } } } batch_size_bytes: 
1048576
W0927 19:56:29.457337   746 rpcz_store.cc:234] Call 
kudu.tserver.TabletServerService.Scan from 127.0.0.1:44116 (request call id 3) 
took 492ms (client timeout 500).
W0927 19:56:29.458102   746 rpcz_store.cc:238] Trace:
 0927 19:56:28.964746 (+ 0us) service_pool.cc:143] Inserting onto call queue
 0927 19:56:28.965109 (+   363us) service_pool.cc:202] Handling call
 0927 19:56:28.967353 (+  2244us) tablet_service.cc:1466] Creating iterator
 0927 19:56:28.967400 (+47us) tablet_service.cc:1782] Waiting for 
operations in snapshot to commit
 0927 19:56:29.455140 (+487740us) tablet_service.cc:1485] Iterator created
 0927 19:56:29.455202 (+62us) tablet_service.cc:1500] Iterator init: Timed 
out: could not wait for desired snapshot timestamp to be consistent: Timed out 
waiting for all transactions with ts < P: 1475006188645381 usec, L: 0 to commit

Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e
---
M java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java
1 file changed, 10 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/4545/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] [java client] ITClient's new snapshot scanners can easily timeout

2016-09-27 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: [java client] ITClient's new snapshot scanners can easily 
timeout
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No


[kudu-CR] [java client] ITClient's new snapshot scanners can easily timeout

2016-09-27 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: [java client] ITClient's new snapshot scanners can easily 
timeout
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4545/1/java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java
File java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java:

PS1, Line 416:   // It's possible to get timeouts if we're unlucky. A 
particularly common one is
 :   // "could not wait for desired snapshot timestamp to be 
consistent" since we're using
 :   // READ_AT_SNAPSHOT scanners.
 :   if (e.getStatus().isTimedOut()) {
 : LOG.warn("Received a scan timeout", e);
 : return true;
 :   }
> I think the problem here is that we cannot distinguish between a timeout ca
But shouldn't we retry in either case? The C++ client differentiates between 
operation level timeouts and RPC timeouts, and at any point where an RPC times 
out, it'll look at the operation timeout to decide whether to retry. I'd expect 
the Java client to behave similarly.


-- 
To view, visit http://gerrit.cloudera.org:8080/4545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes


[kudu-CR] [java client] ITClient's new snapshot scanners can easily timeout

2016-09-27 Thread Jean-Daniel Cryans (Code Review)
Jean-Daniel Cryans has posted comments on this change.

Change subject: [java client] ITClient's new snapshot scanners can easily 
timeout
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4545/1/java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java
File java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java:

PS1, Line 416:   // It's possible to get timeouts if we're unlucky. A 
particularly common one is
 :   // "could not wait for desired snapshot timestamp to be 
consistent" since we're using
 :   // READ_AT_SNAPSHOT scanners.
 :   if (e.getStatus().isTimedOut()) {
 : LOG.warn("Received a scan timeout", e);
 : return true;
 :   }
> I see. That comment assumes that the client will retry, though. Why isn't i
I think the problem here is that we cannot distinguish between a timeout caused 
by "could not wait for desired snapshot timestamp" and an RPC-level timeout. We 
could introspect the message but it seems flaky. IMO this is one of those "gray 
zones" in our APIs.


-- 
To view, visit http://gerrit.cloudera.org:8080/4545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes


[kudu-CR] [java client] ITClient's new snapshot scanners can easily timeout

2016-09-27 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: [java client] ITClient's new snapshot scanners can easily 
timeout
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4545/1/java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java
File java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java:

PS1, Line 416:   // It's possible to get timeouts if we're unlucky. A 
particularly common one is
 :   // "could not wait for desired snapshot timestamp to be 
consistent" since we're using
 :   // READ_AT_SNAPSHOT scanners.
 :   if (e.getStatus().isTimedOut()) {
 : LOG.warn("Received a scan timeout", e);
 : return true;
 :   }
> This made me think that timing out is the expected behavior:
I see. That comment assumes that the client will retry, though. Why isn't it 
retrying? When the client sees the server-side timeout, shouldn't it compare 
the overall scan deadline with the current time and decide to retry?

If that worked correctly, you could bump the scan timeout and everything would 
behave as expected: the server would time out a scan request early (thus 
freeing up an RPC handler thread) and the client would retry, until the scan 
succeeds or the real timeout is actually reached.


-- 
To view, visit http://gerrit.cloudera.org:8080/4545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes


[kudu-CR] [java client] ITClient's new snapshot scanners can easily timeout

2016-09-27 Thread Jean-Daniel Cryans (Code Review)
Jean-Daniel Cryans has posted comments on this change.

Change subject: [java client] ITClient's new snapshot scanners can easily 
timeout
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4545/1/java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java
File java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java:

PS1, Line 416:   // It's possible to get timeouts if we're unlucky. A 
particularly common one is
 :   // "could not wait for desired snapshot timestamp to be 
consistent" since we're using
 :   // READ_AT_SNAPSHOT scanners.
 :   if (e.getStatus().isTimedOut()) {
 : LOG.warn("Received a scan timeout", e);
 : return true;
 :   }
> Why not just extend the scan timeout instead?
This made me think that timing out is the expected behavior:
https://github.com/cloudera/kudu/blob/master/src/kudu/tserver/tablet_service.cc#L1764


-- 
To view, visit http://gerrit.cloudera.org:8080/4545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes


[kudu-CR] [java client] ITClient's new snapshot scanners can easily timeout

2016-09-27 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: [java client] ITClient's new snapshot scanners can easily 
timeout
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4545/1/java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java
File java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java:

PS1, Line 416:   // It's possible to get timeouts if we're unlucky. A 
particularly common one is
 :   // "could not wait for desired snapshot timestamp to be 
consistent" since we're using
 :   // READ_AT_SNAPSHOT scanners.
 :   if (e.getStatus().isTimedOut()) {
 : LOG.warn("Received a scan timeout", e);
 : return true;
 :   }
Why not just extend the scan timeout instead?


-- 
To view, visit http://gerrit.cloudera.org:8080/4545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes


[kudu-CR] [java client] ITClient's new snapshot scanners can easily timeout

2016-09-27 Thread Jean-Daniel Cryans (Code Review)
Jean-Daniel Cryans has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4545

Change subject: [java client] ITClient's new snapshot scanners can easily 
timeout
..

[java client] ITClient's new snapshot scanners can easily timeout

This patch makes it so that a scanner timeout won't kill the test. It also
takes care of a typo in related javadoc.

The failure in question:
19:56:29.459 [ERROR - Thread-10] (ITClient.java:134) Got error while getting 
row 260
org.apache.kudu.client.NonRecoverableException: could not wait for desired 
snapshot timestamp to be consistent: Timed out waiting for all transactions 
with ts < P: 1475006188645381 usec, L: 0 to commit
at 
org.apache.kudu.client.TabletClient.dispatchTSErrorOrReturnException(TabletClient.java:548)
at org.apache.kudu.client.TabletClient.decode(TabletClient.java:482)
at org.apache.kudu.client.TabletClient.decode(TabletClient.java:83)
...

On the server-side:
W0927 19:56:29.455293   746 tablet_service.cc:1509] Error setting up scanner 
with request new_scan_request { tablet_id: "5584580c8ff3463e8092814f4b45ee99" 
limit: 9223372036854775807 projected_columns { name: "key" type: INT32 is_key: 
true is_nullable: false encoding: AUTO_ENCODING compression: 
DEFAULT_COMPRESSION cfile_block_size: 0 } projected_columns { name: "column1_i" 
type: INT32 is_key: false is_nullable: false encoding: AUTO_ENCODING 
compression: DEFAULT_COMPRESSION cfile_block_size: 0 } projected_columns { 
name: "column2_i" type: INT32 is_key: false is_nullable: false encoding: 
AUTO_ENCODING compression: DEFAULT_COMPRESSION cfile_block_size: 0 } 
projected_columns { name: "column3_s" type: STRING is_key: false is_nullable: 
true encoding: DICT_ENCODING compression: LZ4 cfile_block_size: 4096 } 
projected_columns { name: "column4_b" type: BOOL is_key: false is_nullable: 
false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION 
cfile_block_size: 0 } read_mode: READ_AT!
 _SNAPSHOT snap_timestamp: 6041625348691480576 propagated_timestamp: 
6041625348556406784 cache_blocks: true order_mode: UNORDERED column_predicates 
{ column: "key" equality { value: "\004\001\000\000" } } } batch_size_bytes: 
1048576
W0927 19:56:29.457337   746 rpcz_store.cc:234] Call 
kudu.tserver.TabletServerService.Scan from 127.0.0.1:44116 (request call id 3) 
took 492ms (client timeout 500).
W0927 19:56:29.458102   746 rpcz_store.cc:238] Trace:
 0927 19:56:28.964746 (+ 0us) service_pool.cc:143] Inserting onto call queue
 0927 19:56:28.965109 (+   363us) service_pool.cc:202] Handling call
 0927 19:56:28.967353 (+  2244us) tablet_service.cc:1466] Creating iterator
 0927 19:56:28.967400 (+47us) tablet_service.cc:1782] Waiting for 
operations in snapshot to commit
 0927 19:56:29.455140 (+487740us) tablet_service.cc:1485] Iterator created
 0927 19:56:29.455202 (+62us) tablet_service.cc:1500] Iterator init: Timed 
out: could not wait for desired snapshot timestamp to be consistent: Timed out 
waiting for all transactions with ts < P: 1475006188645381 usec, L: 0 to commit

Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e
---
M java/kudu-client/src/test/java/org/apache/kudu/client/ITClient.java
1 file changed, 9 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/4545/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I08de2fcd6736c2bc75ef0c92c8c8f48f9d7da49e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans