[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests
Grant Henke has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11815 ) Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests .. KUDU-2584: Prevent flaky off-by-one errors in backup tests This patch adds 1 ms to the target snapshot time when a backup is taken. This ensures that we don’t have flakes due to off-by-one errors where all the values are not read. The underlying reason for adding 1 ms is that we pass the timestamp in ms granularity but the snapshot time consists of microseconds plus a logical clock. This means if the data is inserted with a fraction of a ms remaining it could be truncated and unread. Additionaly this patch copies over the timestamp propagation call from the KuduRDD and ensures the Spark tests use the Kudu client from the KuduContext. This should further prevent future snapshot issues. This patch also includes an auto-formating change in KuduBackupOptions that must have been missed in a previous commit. Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Reviewed-on: http://gerrit.cloudera.org:8080/11815 Tested-by: Grant Henke Reviewed-by: Adar Dembo --- M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala 4 files changed, 29 insertions(+), 7 deletions(-) Approvals: Grant Henke: Verified Adar Dembo: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/11815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Gerrit-Change-Number: 11815 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11815 ) Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala File java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala: http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@364 PS2, Line 364: nough that data is i > No, I just wanted to be correct. I can remove it. Yeah I would remove it. It's important for this explanation to be clear, so anything not germane should be elided. -- To view, visit http://gerrit.cloudera.org:8080/11815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Gerrit-Change-Number: 11815 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 06 Nov 2018 22:00:34 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/11815 ) Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala: http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@138 PS2, Line 138: kuduContext.timestampAccumulator.add(kuduContext.syncClient.getLastPropagatedTimestamp) > This was added because it brings KuduBackupRDD closer to KuduRDD. Given we I think updating timestamp on driver with max of timestamp on executors is important for READ_AT_SNAPSHPT mode as well if the same KuduContext is used. -- To view, visit http://gerrit.cloudera.org:8080/11815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Gerrit-Change-Number: 11815 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 06 Nov 2018 19:49:55 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11815 ) Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/11815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Gerrit-Change-Number: 11815 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 06 Nov 2018 15:57:03 + Gerrit-HasComments: No
[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests
Grant Henke has removed a vote on this change. Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/11815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Gerrit-Change-Number: 11815 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11815 ) Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala: http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@138 PS2, Line 138: kuduContext.timestampAccumulator.add(kuduContext.syncClient.getLastPropagatedTimestamp) > This is so that if you're using the same KuduContext for a backup job and a This was added because it brings KuduBackupRDD closer to KuduRDD. Given we are using READ_AT_SNAPSHOT mode and not RYW mode, I don't think this has any direct impact on backups even for testing. In the real world, each backup job will be a separate job, so this line has even less impact. http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala File java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala: http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@356 PS2, Line 356: error > nit: why not use 'info' instead of 'error'? oh, good catch. This was just for my experimenting. I intended to lower back to info. http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@364 PS2, Line 364: plus a logical clock > This detail isn't relevant to the problem/solution discussion, is it? No, I just wanted to be correct. I can remove it. http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@364 PS2, Line 364: This means : // if the data is inserted with a fraction of a ms remaining it could be truncated and : // unread. > In practice this is always true; what are the odds that a timestamp value o Right. will update. http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@368 PS2, Line 368: nowMs + 1 > Does this mean the default value for timestampMs in KuduBackupOptions shoul I don't think so. In the real world this issue can't occur and defaulting to currentTimeMillis makes sense. In the real world writing data and submitting a spark job in < 1ms is impossible. -- To view, visit http://gerrit.cloudera.org:8080/11815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Gerrit-Change-Number: 11815 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 06 Nov 2018 15:27:10 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Adar Dembo, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11815 to look at the new patch set (#3). Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests .. KUDU-2584: Prevent flaky off-by-one errors in backup tests This patch adds 1 ms to the target snapshot time when a backup is taken. This ensures that we don’t have flakes due to off-by-one errors where all the values are not read. The underlying reason for adding 1 ms is that we pass the timestamp in ms granularity but the snapshot time consists of microseconds plus a logical clock. This means if the data is inserted with a fraction of a ms remaining it could be truncated and unread. Additionaly this patch copies over the timestamp propagation call from the KuduRDD and ensures the Spark tests use the Kudu client from the KuduContext. This should further prevent future snapshot issues. This patch also includes an auto-formating change in KuduBackupOptions that must have been missed in a previous commit. Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d --- M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala 4 files changed, 29 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/11815/3 -- To view, visit http://gerrit.cloudera.org:8080/11815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Gerrit-Change-Number: 11815 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/11815 ) Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala File java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala: http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@356 PS2, Line 356: error nit: why not use 'info' instead of 'error'? http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@368 PS2, Line 368: nowMs + 1 Does this mean the default value for timestampMs in KuduBackupOptions should be 'nowMs + 1'? -- To view, visit http://gerrit.cloudera.org:8080/11815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Gerrit-Change-Number: 11815 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 06 Nov 2018 07:27:41 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11815 ) Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/11815/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11815/2//COMMIT_MSG@13 PS2, Line 13: The underlying reason for adding 1 ms is that we pass : the timestamp in ms granularity but the snapshot time : consists of microseconds plus a logical clock. This : means if the data is inserted with a fraction of a ms : remaining it could be truncated and unread. I left some comments on this in the code; please also update this paragraph accordingly. http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala: http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@138 PS2, Line 138: kuduContext.timestampAccumulator.add(kuduContext.syncClient.getLastPropagatedTimestamp) This is so that if you're using the same KuduContext for a backup job and a write, the timestamps "seen" during the backup job's scan will propagate into the clients so that they'll be sent during the write too? http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala File java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala: http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@364 PS2, Line 364: plus a logical clock This detail isn't relevant to the problem/solution discussion, is it? http://gerrit.cloudera.org:8080/#/c/11815/2/java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala@364 PS2, Line 364: This means : // if the data is inserted with a fraction of a ms remaining it could be truncated and : // unread. In practice this is always true; what are the odds that a timestamp value of micro granularity is generated with exactly 0 micros? So I think the more interesting bit to document is that the test has to run fast enough such that the currentTimeMillis() call on L353 ends up with the same ms value as the write's timestamp (after truncating the micros). -- To view, visit http://gerrit.cloudera.org:8080/11815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Gerrit-Change-Number: 11815 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 05 Nov 2018 23:48:48 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11815 ) Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests .. Patch Set 2: Verified+1 Failure was a know DefaultSourceTest flake. -- To view, visit http://gerrit.cloudera.org:8080/11815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Gerrit-Change-Number: 11815 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 05 Nov 2018 17:44:33 + Gerrit-HasComments: No
[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests
Grant Henke has removed a vote on this change. Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/11815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Gerrit-Change-Number: 11815 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11815 ) Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/11815/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11815/1//COMMIT_MSG@13 PS1, Line 13: : The off-by-one errors are possibly due to very fast : test runs where the writes and backups start : in under 1ms. They could also be due to some : underlying issue that is not well understood. > +1 I am suspicious of fixing a test failure like this with a sleep, and par Done. I added an explanation and +1 ms to the snapshot scan. I also added logging just in case we see this flake in the future. http://gerrit.cloudera.org:8080/#/c/11815/1//COMMIT_MSG@20 PS1, Line 20: > dist-test works with Java now right? Can you run some experiments to demons This failure is very difficult to produce in dist-test. I ran 500 tests before and after changes and couldn't produce it in either. -- To view, visit http://gerrit.cloudera.org:8080/11815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Gerrit-Change-Number: 11815 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 05 Nov 2018 17:09:54 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Adar Dembo, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11815 to look at the new patch set (#2). Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests .. KUDU-2584: Prevent flaky off-by-one errors in backup tests This patch adds 1 ms to the target snapshot time when a backup is taken. This ensures that we don’t have flakes due to off-by-one errors where all the values are not read. The underlying reason for adding 1 ms is that we pass the timestamp in ms granularity but the snapshot time consists of microseconds plus a logical clock. This means if the data is inserted with a fraction of a ms remaining it could be truncated and unread. Additionaly this patch copies over the timestamp propagation call from the KuduRDD and ensures the Spark tests use the Kudu client from the KuduContext. This should further prevent future snapshot issues. This patch also includes an auto-formating change in KuduBackupOptions that must have been missed in a previous commit. Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d --- M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala 4 files changed, 29 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/11815/2 -- To view, visit http://gerrit.cloudera.org:8080/11815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Gerrit-Change-Number: 11815 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests
Grant Henke has removed a vote on this change. Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests .. Removed Code-Review+2 by Mike Percy -- To view, visit http://gerrit.cloudera.org:8080/11815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Gerrit-Change-Number: 11815 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11815 ) Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/11815/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11815/1//COMMIT_MSG@13 PS1, Line 13: : The off-by-one errors are possibly due to very fast : test runs where the writes and backups start : in under 1ms. They could also be due to some : underlying issue that is not well understood. > So I thought the underlying issue was due to a different in granularity: Ku +1 I am suspicious of fixing a test failure like this with a sleep, and particularly so in this case because the commit message doesn't make it clear to me why the sleep is truly fixing the issue and not just covering it up. If Adar's analysis is right then I think his proposed fix makes sense. http://gerrit.cloudera.org:8080/#/c/11815/1//COMMIT_MSG@20 PS1, Line 20: dist-test works with Java now right? Can you run some experiments to demonstrate this patch fixes the problem and that it reduces the flakiness of the test in general? -- To view, visit http://gerrit.cloudera.org:8080/11815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Gerrit-Change-Number: 11815 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 29 Oct 2018 20:09:37 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11815 ) Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/11815/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11815/1//COMMIT_MSG@13 PS1, Line 13: : The off-by-one errors are possibly due to very fast : test runs where the writes and backups start : in under 1ms. They could also be due to some : underlying issue that is not well understood. So I thought the underlying issue was due to a different in granularity: Kudu's HT timestamps are specified in micros while the default value for the HT timestamp to backup (System.currentTimeMillis) is specified in millis. The issue being: if the data inserted for the test completes at HT timestamp of "x millis + y micros", the use of currentTimeMillis configures the backup to read at HT timestamp of "x millis", which means the backup misses some of the inserts. After restoring this backup, the number of rows in the table doesn't match the number of rows originally inserted. If this is true, I'd like to see a few changes here: 1) Spell out the issue in detail, both in the commit message and in the corresponding workaround in the test code. 2) Work around the issue in a different way, by calling currentTimeMillis in the Spark job and passing that argument (plus 1 ms) in as the HT timestamp to use for the backup. It's not fundamentally different, but it doesn't slow down the test in the way that Thread.sleep does. -- To view, visit http://gerrit.cloudera.org:8080/11815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Gerrit-Change-Number: 11815 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 29 Oct 2018 18:53:55 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11815 ) Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Gerrit-Change-Number: 11815 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 29 Oct 2018 18:35:19 + Gerrit-HasComments: No
[kudu-CR] KUDU-2584: Prevent flaky off-by-one errors in backup tests
Grant Henke has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11815 Change subject: KUDU-2584: Prevent flaky off-by-one errors in backup tests .. KUDU-2584: Prevent flaky off-by-one errors in backup tests This patch adds a one second sleep to the back up and restore tests before a backup is taken. This ensures that we don’t have flakes due to off-by-one errors where all the values are not read. The off-by-one errors are possibly due to very fast test runs where the writes and backups start in under 1ms. They could also be due to some underlying issue that is not well understood. That said in a real world backup scenario, this should have no impact. Further testing/hardening of this job may remove the need for this pause in the future. Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d --- M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala 1 file changed, 4 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/11815/1 -- To view, visit http://gerrit.cloudera.org:8080/11815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia0f1b4a4138cc8c913543a68fad748927cdc439d Gerrit-Change-Number: 11815 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke