[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests

2018-11-06 Thread Grant Henke (Code Review)
Grant Henke has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11815 )

Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests
..

KUDU-2584: Prevent flaky off-by-one errors in backup tests

This patch adds 1 ms to the target snapshot time when
a backup is taken. This ensures that we don’t have
flakes due to off-by-one errors where all the values are not read.

The underlying reason for adding 1 ms is that we pass
the timestamp in ms granularity but the snapshot time
consists of microseconds plus a logical clock. This
means if the data is inserted with a fraction of a ms
remaining it could be truncated and unread.

Additionaly this patch copies over the timestamp
propagation call from the KuduRDD and ensures
the Spark tests use the Kudu client from the
KuduContext. This should further prevent future
snapshot issues.

This patch also includes an auto-formating change in
KuduBackupOptions that must have been missed in
a previous commit.

Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Reviewed-on: http://gerrit.cloudera.org:8080/11815
Tested-by: Grant Henke 
Reviewed-by: Adar Dembo 
---
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala
4 files changed, 29 insertions(+), 7 deletions(-)

Approvals:
  Grant Henke: Verified
  Adar Dembo: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/11815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Gerrit-Change-Number: 11815
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests

2018-11-06 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11815 )

Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests
..


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
File 
java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@364
PS2, Line 364: nough that data is i
> No, I just wanted to be correct. I can remove it.
Yeah I would remove it. It's important for this explanation to be clear, so 
anything not germane should be elided.



--
To view, visit http://gerrit.cloudera.org:8080/11815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Gerrit-Change-Number: 11815
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Tue, 06 Nov 2018 22:00:34 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests

2018-11-06 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11815 )

Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala:

http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@138
PS2, Line 138: 
kuduContext.timestampAccumulator.add(kuduContext.syncClient.getLastPropagatedTimestamp)
> This was added because it brings KuduBackupRDD closer to KuduRDD. Given we
I think updating timestamp on driver with max of timestamp on executors is 
important for READ_AT_SNAPSHPT mode as well if the same KuduContext is used.



--
To view, visit http://gerrit.cloudera.org:8080/11815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Gerrit-Change-Number: 11815
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Tue, 06 Nov 2018 19:49:55 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests

2018-11-06 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11815 )

Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/11815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Gerrit-Change-Number: 11815
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Tue, 06 Nov 2018 15:57:03 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests

2018-11-06 Thread Grant Henke (Code Review)
Grant Henke has removed a vote on this change.

Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/11815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Gerrit-Change-Number: 11815
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests

2018-11-06 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11815 )

Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala:

http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@138
PS2, Line 138: 
kuduContext.timestampAccumulator.add(kuduContext.syncClient.getLastPropagatedTimestamp)
> This is so that if you're using the same KuduContext for a backup job and a
This was added because it brings KuduBackupRDD closer to KuduRDD. Given we are 
using READ_AT_SNAPSHOT mode and not RYW mode, I don't think this has any direct 
impact on backups even for testing. In the real world, each backup job will be 
a separate job, so this line has even less impact.


http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
File 
java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@356
PS2, Line 356: error
> nit: why not use 'info' instead of 'error'?
oh, good catch. This was just for my experimenting. I intended to lower back to 
info.


http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@364
PS2, Line 364: plus a logical clock
> This detail isn't relevant to the problem/solution discussion, is it?
No, I just wanted to be correct. I can remove it.


http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@364
PS2, Line 364: This means
 : // if the data is inserted with a fraction of a ms remaining 
it could be truncated and
 : // unread.
> In practice this is always true; what are the odds that a timestamp value o
Right. will update.


http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@368
PS2, Line 368: nowMs + 1
> Does this mean the default value for timestampMs in KuduBackupOptions shoul
I don't think so. In the real world this issue can't occur and defaulting to 
currentTimeMillis makes sense. In the real world writing data and submitting a 
spark job in < 1ms is impossible.



--
To view, visit http://gerrit.cloudera.org:8080/11815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Gerrit-Change-Number: 11815
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Tue, 06 Nov 2018 15:27:10 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests

2018-11-06 Thread Grant Henke (Code Review)
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Adar Dembo, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11815

to look at the new patch set (#3).

Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests
..

KUDU-2584: Prevent flaky off-by-one errors in backup tests

This patch adds 1 ms to the target snapshot time when
a backup is taken. This ensures that we don’t have
flakes due to off-by-one errors where all the values are not read.

The underlying reason for adding 1 ms is that we pass
the timestamp in ms granularity but the snapshot time
consists of microseconds plus a logical clock. This
means if the data is inserted with a fraction of a ms
remaining it could be truncated and unread.

Additionaly this patch copies over the timestamp
propagation call from the KuduRDD and ensures
the Spark tests use the Kudu client from the
KuduContext. This should further prevent future
snapshot issues.

This patch also includes an auto-formating change in
KuduBackupOptions that must have been missed in
a previous commit.

Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
---
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala
4 files changed, 29 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/11815/3
--
To view, visit http://gerrit.cloudera.org:8080/11815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Gerrit-Change-Number: 11815
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests

2018-11-05 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11815 )

Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
File 
java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@356
PS2, Line 356: error
nit: why not use 'info' instead of 'error'?


http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@368
PS2, Line 368: nowMs + 1
Does this mean the default value for timestampMs in KuduBackupOptions should be 
'nowMs + 1'?



--
To view, visit http://gerrit.cloudera.org:8080/11815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Gerrit-Change-Number: 11815
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Tue, 06 Nov 2018 07:27:41 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests

2018-11-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11815 )

Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/11815/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11815/2//COMMIT_MSG@13
PS2, Line 13: The underlying reason for adding 1 ms is that we pass
: the timestamp in ms granularity but the snapshot time
: consists of microseconds plus a logical clock. This
: means if the data is inserted with a fraction of a ms
: remaining it could be truncated and unread.
I left some comments on this in the code; please also update this paragraph 
accordingly.


http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala:

http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@138
PS2, Line 138: 
kuduContext.timestampAccumulator.add(kuduContext.syncClient.getLastPropagatedTimestamp)
This is so that if you're using the same KuduContext for a backup job and a 
write, the timestamps "seen" during the backup job's scan will propagate into 
the clients so that they'll be sent during the write too?


http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
File 
java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@364
PS2, Line 364: plus a logical clock
This detail isn't relevant to the problem/solution discussion, is it?


http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@364
PS2, Line 364: This means
 : // if the data is inserted with a fraction of a ms remaining 
it could be truncated and
 : // unread.
In practice this is always true; what are the odds that a timestamp value of 
micro granularity is generated with exactly 0 micros?

So I think the more interesting bit to document is that the test has to run 
fast enough such that the currentTimeMillis() call on L353 ends up with the 
same ms value as the write's timestamp (after truncating the micros).



--
To view, visit http://gerrit.cloudera.org:8080/11815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Gerrit-Change-Number: 11815
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 05 Nov 2018 23:48:48 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests

2018-11-05 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11815 )

Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests
..


Patch Set 2: Verified+1

Failure was a know DefaultSourceTest flake.


--
To view, visit http://gerrit.cloudera.org:8080/11815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Gerrit-Change-Number: 11815
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 05 Nov 2018 17:44:33 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests

2018-11-05 Thread Grant Henke (Code Review)
Grant Henke has removed a vote on this change.

Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/11815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Gerrit-Change-Number: 11815
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests

2018-11-05 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11815 )

Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11815/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11815/1//COMMIT_MSG@13
PS1, Line 13:
: The off-by-one errors are possibly due to very fast
: test runs where the writes and backups start
: in under 1ms. They could also be due to some
: underlying issue that is not well understood.
> +1 I am suspicious of fixing a test failure like this with a sleep, and par
Done. I added an explanation and +1 ms to the snapshot scan. I also added 
logging just in case we see this flake in the future.


http://gerrit.cloudera.org:8080/#/c/11815/1//COMMIT_MSG@20
PS1, Line 20:
> dist-test works with Java now right? Can you run some experiments to demons
This failure is very difficult to produce in dist-test. I ran 500 tests before 
and after changes and couldn't produce it in either.



--
To view, visit http://gerrit.cloudera.org:8080/11815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Gerrit-Change-Number: 11815
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 05 Nov 2018 17:09:54 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests

2018-11-05 Thread Grant Henke (Code Review)
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Adar Dembo, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11815

to look at the new patch set (#2).

Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests
..

KUDU-2584: Prevent flaky off-by-one errors in backup tests

This patch adds 1 ms to the target snapshot time when
a backup is taken. This ensures that we don’t have
flakes due to off-by-one errors where all the values are not read.

The underlying reason for adding 1 ms is that we pass
the timestamp in ms granularity but the snapshot time
consists of microseconds plus a logical clock. This
means if the data is inserted with a fraction of a ms
remaining it could be truncated and unread.

Additionaly this patch copies over the timestamp
propagation call from the KuduRDD and ensures
the Spark tests use the Kudu client from the
KuduContext. This should further prevent future
snapshot issues.

This patch also includes an auto-formating change in
KuduBackupOptions that must have been missed in
a previous commit.

Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
---
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala
4 files changed, 29 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/11815/2
--
To view, visit http://gerrit.cloudera.org:8080/11815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Gerrit-Change-Number: 11815
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests

2018-10-30 Thread Grant Henke (Code Review)
Grant Henke has removed a vote on this change.

Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests
..


Removed Code-Review+2 by Mike Percy 
--
To view, visit http://gerrit.cloudera.org:8080/11815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Gerrit-Change-Number: 11815
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests

2018-10-29 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11815 )

Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11815/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11815/1//COMMIT_MSG@13
PS1, Line 13:
: The off-by-one errors are possibly due to very fast
: test runs where the writes and backups start
: in under 1ms. They could also be due to some
: underlying issue that is not well understood.
> So I thought the underlying issue was due to a different in granularity: Ku
+1 I am suspicious of fixing a test failure like this with a sleep, and 
particularly so in this case because the commit message doesn't make it clear 
to me why the sleep is truly fixing the issue and not just covering it up.

If Adar's analysis is right then I think his proposed fix makes sense.


http://gerrit.cloudera.org:8080/#/c/11815/1//COMMIT_MSG@20
PS1, Line 20:
dist-test works with Java now right? Can you run some experiments to 
demonstrate this patch fixes the problem and that it reduces the flakiness of 
the test in general?



--
To view, visit http://gerrit.cloudera.org:8080/11815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Gerrit-Change-Number: 11815
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 29 Oct 2018 20:09:37 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests

2018-10-29 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11815 )

Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11815/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11815/1//COMMIT_MSG@13
PS1, Line 13:
: The off-by-one errors are possibly due to very fast
: test runs where the writes and backups start
: in under 1ms. They could also be due to some
: underlying issue that is not well understood.
So I thought the underlying issue was due to a different in granularity: Kudu's 
HT timestamps are specified in micros while the default value for the HT 
timestamp to backup (System.currentTimeMillis) is specified in millis. The 
issue being: if the data inserted for the test completes at HT timestamp of "x 
millis + y micros", the use of currentTimeMillis configures the backup to read 
at HT timestamp of "x millis", which means the backup misses some of the 
inserts. After restoring this backup, the number of rows in the table doesn't 
match the number of rows originally inserted.

If this is true, I'd like to see a few changes here:
1) Spell out the issue in detail, both in the commit message and in the 
corresponding workaround in the test code.
2) Work around the issue in a different way, by calling currentTimeMillis in 
the Spark job and passing that argument (plus 1 ms) in as the HT timestamp to 
use for the backup. It's not fundamentally different, but it doesn't slow down 
the test in the way that Thread.sleep does.



--
To view, visit http://gerrit.cloudera.org:8080/11815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Gerrit-Change-Number: 11815
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 29 Oct 2018 18:53:55 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests

2018-10-29 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11815 )

Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Gerrit-Change-Number: 11815
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 29 Oct 2018 18:35:19 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests

2018-10-29 Thread Grant Henke (Code Review)
Grant Henke has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11815


Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests
..

KUDU-2584: Prevent flaky off-by-one errors in backup tests

This patch adds a one second sleep to the back up
and restore tests before a backup is taken. This
ensures that we don’t have flakes due to
off-by-one errors where all the values are not read.

The off-by-one errors are possibly due to very fast
test runs where the writes and backups start
in under 1ms. They could also be due to some
underlying issue that is not well understood. That
said in a real world backup scenario, this should have
no impact.

Further testing/hardening of this job may remove the
need for this pause in the future.

Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
---
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
1 file changed, 4 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/11815/1
--
To view, visit http://gerrit.cloudera.org:8080/11815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d
Gerrit-Change-Number: 11815
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke