[kudu-CR] Add error handling to Backup and Restore
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/10941 ) Change subject: Add error handling to Backup and Restore .. Patch Set 6: I think you are auto-reformatting all the files you edit. Using an auto-formatter for Scala is something I support, but it should be enabled and standardized as a separate patch. Until then please, disable it and keep the formatting changes to lines you are changing as a part of this feature. -- To view, visit http://gerrit.cloudera.org:8080/10941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I104fd604e12b70fcff9beace71ed4ef96b54d691 Gerrit-Change-Number: 10941 Gerrit-PatchSet: 6 Gerrit-Owner: Tony Foerster Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tony Foerster Gerrit-Comment-Date: Fri, 20 Jul 2018 20:44:40 + Gerrit-HasComments: No
[kudu-CR] Add error handling to Backup and Restore
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/10941 ) Change subject: Add error handling to Backup and Restore .. Patch Set 6: (8 comments) http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala: http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@35 PS6, Line 35: private val cause: Throwable = None.orNull) nit: Just None.orNull is a bit unusual. Just null should work. http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@46 PS6, Line 46: // TODO: Check if a vailid-looking result is in the directory and don't retry nit: s/vailid/valid http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@57 PS6, Line 57: if (metadata.getToMs >= options.timestampMs) { I think we should only consider the case where getToMs == options.timestampMs right now. In that case we can skip the full backup. Otherwise we should fail saying a backup for a different time already exists (that should automatically happen if you fall through here). Then the user can delete it or provide a new path for the new backup. Because we don't support any idea of incremental yet, that seams the most straightforward to me. http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@111 PS6, Line 111: val fs = hPath.getFileSystem(conf) nit: extra spaces http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala: http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@90 PS6, Line 90: val conf = session.sparkContext.hadoopConfiguration nit: extra spaces. I think a formatter is aligning the =. This should be fixed in general. http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@103 PS6, Line 103:context: KuduContext): Try[KuduTable] = Try { nit: explicit return type. http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@115 PS6, Line 115: private def dropTable(tableName: String, kuduClient: KuduClient): Try[Unit] = Where is this used? We definitely don't want the restore process dropping tables. I suppose we may want to drop a created table if it's restore failed for some reason. http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@136 PS6, Line 136: s"${failures.length} were note restored, see previous logs for details") nit: s/note/not -- To view, visit http://gerrit.cloudera.org:8080/10941 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I104fd604e12b70fcff9beace71ed4ef96b54d691 Gerrit-Change-Number: 10941 Gerrit-PatchSet: 6 Gerrit-Owner: Tony Foerster Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tony Foerster Gerrit-Comment-Date: Fri, 20 Jul 2018 20:42:30 + Gerrit-HasComments: Yes
[kudu-CR] schema: add is deleted virtual column
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10968 ) Change subject: schema: add is_deleted virtual column .. Patch Set 6: > Plus the current implementation has the advantage of being dead simple to > implement, which is important because we've tabled several design decisions > until a working backup and restore MVP is ready, so the faster we can get > there, the better. I'm all for a simple solution, but in this case it's a solution which introduces a breaking changes the public API of Kudu, and from what I've gathered the long-term solution won't require such a breaking change. -- To view, visit http://gerrit.cloudera.org:8080/10968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56eb1d44ba8bfbd76d8bb794b8076b695782939e Gerrit-Change-Number: 10968 Gerrit-PatchSet: 6 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 20 Jul 2018 20:28:02 + Gerrit-HasComments: No
[kudu-CR] schema: add is deleted virtual column
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/10968 ) Change subject: schema: add is_deleted virtual column .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/10968/6/src/kudu/common/schema.cc File src/kudu/common/schema.cc: http://gerrit.cloudera.org:8080/#/c/10968/6/src/kudu/common/schema.cc@140 PS6, Line 140: static const bool kFalse = false; Can you comment why the read default should be false? It's a little different than the usual case where a non-nullable column was added and historical values are being read. In this case, there are no values to read ever. http://gerrit.cloudera.org:8080/#/c/10968/6/src/kudu/common/schema.cc@145 PS6, Line 145: , Nit: Comment with /* read_default=*/ to match others. http://gerrit.cloudera.org:8080/#/c/10968/6/src/kudu/common/schema.cc@154 PS6, Line 154: // TODO(adar): C++11 provides a single-arg constructor Nit: Do we assign individual names to given TODO's? I was under the impression we used Jiras or nothing. http://gerrit.cloudera.org:8080/#/c/10968/6/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/10968/6/src/kudu/master/catalog_manager.cc@1302 PS6, Line 1302: if (ContainsKey(Schema::kVirtualColumnNames, col_name)) { Would it be more future proof to prevent the $$ pattern in general? -- To view, visit http://gerrit.cloudera.org:8080/10968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56eb1d44ba8bfbd76d8bb794b8076b695782939e Gerrit-Change-Number: 10968 Gerrit-PatchSet: 6 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 20 Jul 2018 19:50:37 + Gerrit-HasComments: Yes
[kudu-CR] memrowset: support iteration with include deleted rows
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/10929 ) Change subject: memrowset: support iteration with include_deleted_rows .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10929 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie018043518b437ecc719cf9f87b2c5eea560c9a1 Gerrit-Change-Number: 10929 Gerrit-PatchSet: 4 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 20 Jul 2018 19:38:22 + Gerrit-HasComments: No
[kudu-CR] memrowset: support iteration with snap to exclude
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/10926 ) Change subject: memrowset: support iteration with snap_to_exclude .. Patch Set 6: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/10926/6/src/kudu/tablet/memrowset-test.cc File src/kudu/tablet/memrowset-test.cc: http://gerrit.cloudera.org:8080/#/c/10926/6/src/kudu/tablet/memrowset-test.cc@586 PS6, Line 586: // @1 Nit: Zero base these @# comments so it lines up with the snaps indices below. http://gerrit.cloudera.org:8080/#/c/10926/6/src/kudu/tablet/memrowset-test.cc@620 PS6, Line 620: // Captures zero rows; a snapshot range [x, x) does not include anything. Nit: What happens if exclude is beyond include? -- To view, visit http://gerrit.cloudera.org:8080/10926 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9be2ab5af4d5223889e4545ae1db7cc0275480ce Gerrit-Change-Number: 10926 Gerrit-PatchSet: 6 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 20 Jul 2018 19:23:50 + Gerrit-HasComments: Yes
[kudu-CR] schema: add is deleted virtual column
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10968 ) Change subject: schema: add is_deleted virtual column .. Patch Set 6: > > - last-updated-timestamp per-row. If I understand tablet internals > > correctly, this can't ever be provided with per-row granularity, since we > > dispose of per-row timestamps during compaction after the ancient history > > watermark. I think the best we could do here is to provide it per scan > > batch. > > I'm not sure this is correct, but I'm following up with Mike (and looking at > the code) to find out for sure. Mike says that you're right; UNDO GC can and will discard the INSERT UNDO representing the insertion of a row, even if the row is still alive. So once the row's insertion is behind the ancient history mark, there's no guarantee of there being even a single UNDO or REDO record for the row, which means we can't provide a last-updated-timestamp (or created-timestamp, or whatever). -- To view, visit http://gerrit.cloudera.org:8080/10968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56eb1d44ba8bfbd76d8bb794b8076b695782939e Gerrit-Change-Number: 10968 Gerrit-PatchSet: 6 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 20 Jul 2018 18:56:25 + Gerrit-HasComments: No
[kudu-CR] schema: add is deleted virtual column
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10968 ) Change subject: schema: add is_deleted virtual column .. Patch Set 6: Todd was the main advocate for the virtual column approach; I'd be interested in his take too. I've responded to a few things below: > First, if looking at 'is-deleted' in isolation, I think it would be better to > have an optional bitset that gets plumbed through which holds the deleted > status. As I understand it a boolean flag indicating whether to retain > deleted rows is already going to have to get plumbed through, so this > wouldn't be a big complication. As a bonus, bitsets are more efficient in > terms of memory overhead (1 bit per row vs 1 byte per row). Ultimately I > think this will have a cleaner implementation than checking against known > column values. It's clear that in the long run incremental (CDC) scans > should use a sparse row representation, in which case re-using the existing > row-batch layout is out the window, anyway. The plumbing is quite different though: 1. The boolean flag you allude to is part of the RowIteratorOptions passed to each iterator at construction time, and it's just a single boolean for the entire iterator. This plumbing has already been done in commit 0fbebd67c. 2. The bitset you're proposing would need to be per-RowBlock in rowwise iterators and per-ColumnBlock in columnwise iterators, which means plumbing through NextBlock, MaterializeColumn, and several other iterator methods. Plus the current implementation has the advantage of being dead simple to implement, which is important because we've tabled several design decisions until a working backup and restore MVP is ready, so the faster we can get there, the better. > - last-updated-timestamp per-row. If I understand tablet internals correctly, > this can't ever be provided with per-row granularity, since we dispose of > per-row timestamps during compaction after the ancient history watermark. I > think the best we could do here is to provide it per scan batch. I'm not sure this is correct, but I'm following up with Mike (and looking at the code) to find out for sure. -- To view, visit http://gerrit.cloudera.org:8080/10968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56eb1d44ba8bfbd76d8bb794b8076b695782939e Gerrit-Change-Number: 10968 Gerrit-PatchSet: 6 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 20 Jul 2018 18:29:34 + Gerrit-HasComments: No
[kudu-CR] schema: add is deleted virtual column
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10968 ) Change subject: schema: add is_deleted virtual column .. Patch Set 6: I had a long discussion about this offline with Adar. As I understand it, 'virtual columns' are a generalized feature meant to allow querying the 'is-deleted' status of a row without having to plumb through special handling in many layers of the stack (client, RPC, tserver). Eventually this feature could be extended to more per-row metadata. I'm sympathetic to this motivation, but nonetheless I don't come down in favor of this feature. First, if looking at 'is-deleted' in isolation, I think it would be better to have an optional bitset that gets plumbed through which holds the deleted status. As I understand it a boolean flag indicating whether to retain deleted rows is already going to have to get plumbed through, so this wouldn't be a big complication. As a bonus, bitsets are more efficient in terms of memory overhead (1 bit per row vs 1 byte per row). Ultimately I think this will have a cleaner implementation than checking against known column values. It's clear that in the long run incremental (CDC) scans should use a sparse row representation, in which case re-using the existing row-batch layout is out the window, anyway. Second, it doesn't seem to me that a generalized virtual column feature would be useful for other metadata. Here's a partial list of attributes that have been thrown around, and an explanation of why I don't think virtual columns are appropriate: - per-row tablet server and per-row tablet information. It would be preferable to expose this on the scan batch, it's not necessary to hold a copy of this per row, and it's not necessary to send it across the wire. - last-updated-timestamp per-row. If I understand tablet internals correctly, this can't ever be provided with per-row granularity, since we dispose of per-row timestamps during compaction after the ancient history watermark. I think the best we could do here is to provide it per scan batch. - DRS ID / CFile ID / block (page) ID per row. I think this use-case is a real stretch, motivation wise. I think there will never be much demand for this info, and what little demand there is would be better served by just providing aggregate counts of # of DRS's, # of CFiles, and # of blocks scanned per row batch. To wrap up this already-too-long comment, I'd favor not introducing this abstraction unless we can identify a realistic second use-case for it. -- To view, visit http://gerrit.cloudera.org:8080/10968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I56eb1d44ba8bfbd76d8bb794b8076b695782939e Gerrit-Change-Number: 10968 Gerrit-PatchSet: 6 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 20 Jul 2018 16:41:14 + Gerrit-HasComments: No