[kudu-CR] Add error handling to Backup and Restore

2018-07-20 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10941 )

Change subject: Add error handling to Backup and Restore
..


Patch Set 6:

I think you are auto-reformatting all the files you edit.  Using an 
auto-formatter for Scala is something I support, but it should be enabled and 
standardized as a separate patch. Until then please, disable it and keep the 
formatting changes to lines you are changing as a part of this feature.


--
To view, visit http://gerrit.cloudera.org:8080/10941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I104fd604e12b70fcff9beace71ed4ef96b54d691
Gerrit-Change-Number: 10941
Gerrit-PatchSet: 6
Gerrit-Owner: Tony Foerster 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tony Foerster 
Gerrit-Comment-Date: Fri, 20 Jul 2018 20:44:40 +
Gerrit-HasComments: No


[kudu-CR] Add error handling to Backup and Restore

2018-07-20 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10941 )

Change subject: Add error handling to Backup and Restore
..


Patch Set 6:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@35
PS6, Line 35:  private val cause: Throwable 
= None.orNull)
nit: Just None.orNull is a bit unusual. Just null should work.


http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@46
PS6, Line 46: // TODO: Check if a vailid-looking result is in the directory 
and don't retry
nit: s/vailid/valid


http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@57
PS6, Line 57:   if (metadata.getToMs >= options.timestampMs) {
I think we should only consider the case where getToMs == options.timestampMs 
right now. In that case we can skip the full backup. Otherwise we should fail 
saying a backup for a different time already exists (that should automatically 
happen if you fall through here). Then the user can delete it or provide a new 
path for the new backup.

Because we don't support any idea of incremental yet, that seams the most 
straightforward to me.


http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@111
PS6, Line 111: val fs   = hPath.getFileSystem(conf)
nit: extra spaces


http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@90
PS6, Line 90: val conf  = session.sparkContext.hadoopConfiguration
nit: extra spaces. I think a formatter is aligning the =. This should be fixed 
in general.


http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@103
PS6, Line 103:context: KuduContext): 
Try[KuduTable] = Try {
nit: explicit return type.


http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@115
PS6, Line 115:   private def dropTable(tableName: String, kuduClient: 
KuduClient): Try[Unit] =
Where is this used? We definitely don't want the restore process dropping 
tables. I suppose we may want to drop a created table if it's restore failed 
for some reason.


http://gerrit.cloudera.org:8080/#/c/10941/6/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@136
PS6, Line 136: s"${failures.length} were note restored, see previous 
logs for details")
nit: s/note/not



--
To view, visit http://gerrit.cloudera.org:8080/10941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I104fd604e12b70fcff9beace71ed4ef96b54d691
Gerrit-Change-Number: 10941
Gerrit-PatchSet: 6
Gerrit-Owner: Tony Foerster 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tony Foerster 
Gerrit-Comment-Date: Fri, 20 Jul 2018 20:42:30 +
Gerrit-HasComments: Yes


[kudu-CR] schema: add is deleted virtual column

2018-07-20 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10968 )

Change subject: schema: add is_deleted virtual column
..


Patch Set 6:

> Plus the current implementation has the advantage of being dead simple to 
> implement, which is important because we've tabled several design decisions 
> until a working backup and restore MVP is ready, so the faster we can get 
> there, the better.

I'm all for a simple solution, but in this case it's a solution which 
introduces a breaking changes the public API of Kudu, and from what I've 
gathered the long-term solution won't require such a breaking change.


--
To view, visit http://gerrit.cloudera.org:8080/10968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56eb1d44ba8bfbd76d8bb794b8076b695782939e
Gerrit-Change-Number: 10968
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 20 Jul 2018 20:28:02 +
Gerrit-HasComments: No


[kudu-CR] schema: add is deleted virtual column

2018-07-20 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10968 )

Change subject: schema: add is_deleted virtual column
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10968/6/src/kudu/common/schema.cc
File src/kudu/common/schema.cc:

http://gerrit.cloudera.org:8080/#/c/10968/6/src/kudu/common/schema.cc@140
PS6, Line 140: static const bool kFalse = false;
Can you comment why the read default should be false? It's a little different 
than the usual case where a non-nullable column was added and historical values 
are being read. In this case, there are no values to read ever.


http://gerrit.cloudera.org:8080/#/c/10968/6/src/kudu/common/schema.cc@145
PS6, Line 145: ,
Nit: Comment with /* read_default=*/ to match others.


http://gerrit.cloudera.org:8080/#/c/10968/6/src/kudu/common/schema.cc@154
PS6, Line 154: // TODO(adar): C++11 provides a single-arg constructor
Nit: Do we assign individual names to given TODO's? I was under the impression 
we used Jiras or nothing.


http://gerrit.cloudera.org:8080/#/c/10968/6/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/10968/6/src/kudu/master/catalog_manager.cc@1302
PS6, Line 1302: if (ContainsKey(Schema::kVirtualColumnNames, col_name)) {
Would it be more future proof to prevent the $$ pattern in general?



--
To view, visit http://gerrit.cloudera.org:8080/10968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56eb1d44ba8bfbd76d8bb794b8076b695782939e
Gerrit-Change-Number: 10968
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 20 Jul 2018 19:50:37 +
Gerrit-HasComments: Yes


[kudu-CR] memrowset: support iteration with include deleted rows

2018-07-20 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10929 )

Change subject: memrowset: support iteration with include_deleted_rows
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10929
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie018043518b437ecc719cf9f87b2c5eea560c9a1
Gerrit-Change-Number: 10929
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 20 Jul 2018 19:38:22 +
Gerrit-HasComments: No


[kudu-CR] memrowset: support iteration with snap to exclude

2018-07-20 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10926 )

Change subject: memrowset: support iteration with snap_to_exclude
..


Patch Set 6: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10926/6/src/kudu/tablet/memrowset-test.cc
File src/kudu/tablet/memrowset-test.cc:

http://gerrit.cloudera.org:8080/#/c/10926/6/src/kudu/tablet/memrowset-test.cc@586
PS6, Line 586:   // @1
Nit: Zero base these @# comments so it lines up with the snaps indices below.


http://gerrit.cloudera.org:8080/#/c/10926/6/src/kudu/tablet/memrowset-test.cc@620
PS6, Line 620:   // Captures zero rows; a snapshot range [x, x) does not 
include anything.
Nit: What happens if exclude is beyond include?



-- 
To view, visit http://gerrit.cloudera.org:8080/10926
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9be2ab5af4d5223889e4545ae1db7cc0275480ce
Gerrit-Change-Number: 10926
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 20 Jul 2018 19:23:50 +
Gerrit-HasComments: Yes


[kudu-CR] schema: add is deleted virtual column

2018-07-20 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10968 )

Change subject: schema: add is_deleted virtual column
..


Patch Set 6:

> > - last-updated-timestamp per-row. If I understand tablet internals 
> > correctly, this can't ever be provided with per-row granularity, since we 
> > dispose of per-row timestamps during compaction after the ancient history 
> > watermark.  I think the best we could do here is to provide it per scan 
> > batch.
>
> I'm not sure this is correct, but I'm following up with Mike (and looking at 
> the code) to find out for sure.

Mike says that you're right; UNDO GC can and will discard the INSERT UNDO 
representing the insertion of a row, even if the row is still alive. So once 
the row's insertion is behind the ancient history mark, there's no guarantee of 
there being even a single UNDO or REDO record for the row, which means we can't 
provide a last-updated-timestamp (or created-timestamp, or whatever).


--
To view, visit http://gerrit.cloudera.org:8080/10968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56eb1d44ba8bfbd76d8bb794b8076b695782939e
Gerrit-Change-Number: 10968
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 20 Jul 2018 18:56:25 +
Gerrit-HasComments: No


[kudu-CR] schema: add is deleted virtual column

2018-07-20 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10968 )

Change subject: schema: add is_deleted virtual column
..


Patch Set 6:

Todd was the main advocate for the virtual column approach; I'd be interested 
in his take too. I've responded to a few things below:

> First, if looking at 'is-deleted' in isolation, I think it would be better to 
> have an optional bitset that gets plumbed through which holds the deleted 
> status.  As I understand it a boolean flag indicating whether to retain 
> deleted rows is already going to have to get plumbed through, so this 
> wouldn't be a big complication.  As a bonus, bitsets are more efficient in 
> terms of memory overhead (1 bit per row vs 1 byte per row).  Ultimately I 
> think this will have a cleaner implementation than checking against known 
> column values.  It's clear that in the long run incremental (CDC) scans 
> should use a sparse row representation, in which case re-using the existing 
> row-batch layout is out the window, anyway.

The plumbing is quite different though:
1. The boolean flag you allude to is part of the RowIteratorOptions passed to 
each iterator at construction time, and it's just a single boolean for the 
entire iterator. This plumbing has already been done in commit 0fbebd67c.
2. The bitset you're proposing would need to be per-RowBlock in rowwise 
iterators and per-ColumnBlock in columnwise iterators, which means plumbing 
through NextBlock, MaterializeColumn, and several other iterator methods.

Plus the current implementation has the advantage of being dead simple to 
implement, which is important because we've tabled several design decisions 
until a working backup and restore MVP is ready, so the faster we can get 
there, the better.

> - last-updated-timestamp per-row. If I understand tablet internals correctly, 
> this can't ever be provided with per-row granularity, since we dispose of 
> per-row timestamps during compaction after the ancient history watermark.  I 
> think the best we could do here is to provide it per scan batch.

I'm not sure this is correct, but I'm following up with Mike (and looking at 
the code) to find out for sure.


--
To view, visit http://gerrit.cloudera.org:8080/10968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56eb1d44ba8bfbd76d8bb794b8076b695782939e
Gerrit-Change-Number: 10968
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 20 Jul 2018 18:29:34 +
Gerrit-HasComments: No


[kudu-CR] schema: add is deleted virtual column

2018-07-20 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10968 )

Change subject: schema: add is_deleted virtual column
..


Patch Set 6:

I had a long discussion about this offline with Adar.  As I understand it, 
'virtual columns' are a generalized feature meant to allow querying the 
'is-deleted' status of a row without having to plumb through special handling 
in many layers of the stack (client, RPC, tserver).  Eventually this feature 
could be extended to more per-row metadata.  I'm sympathetic to this 
motivation, but nonetheless I don't come down in favor of this feature.

First, if looking at 'is-deleted' in isolation, I think it would be better to 
have an optional bitset that gets plumbed through which holds the deleted 
status.  As I understand it a boolean flag indicating whether to retain deleted 
rows is already going to have to get plumbed through, so this wouldn't be a big 
complication.  As a bonus, bitsets are more efficient in terms of memory 
overhead (1 bit per row vs 1 byte per row).  Ultimately I think this will have 
a cleaner implementation than checking against known column values.  It's clear 
that in the long run incremental (CDC) scans should use a sparse row 
representation, in which case re-using the existing row-batch layout is out the 
window, anyway.

Second, it doesn't seem to me that a generalized virtual column feature would 
be useful for other metadata.  Here's a partial list of attributes that have 
been thrown around, and an explanation of why I don't think virtual columns are 
appropriate:

- per-row tablet server and per-row tablet information.  It would be preferable 
to expose this on the scan batch, it's not necessary to hold a copy of this per 
row, and it's not necessary to send it across the wire.

- last-updated-timestamp per-row. If I understand tablet internals correctly, 
this can't ever be provided with per-row granularity, since we dispose of 
per-row timestamps during compaction after the ancient history watermark.  I 
think the best we could do here is to provide it per scan batch.

- DRS ID / CFile ID / block (page) ID per row. I think this use-case is a real 
stretch, motivation wise.  I think there will never be much demand for this 
info, and what little demand there is would be better served by just providing 
aggregate counts of # of DRS's, # of CFiles, and # of blocks scanned per row 
batch.

To wrap up this already-too-long comment, I'd favor not introducing this 
abstraction unless we can identify a realistic second use-case for it.


--
To view, visit http://gerrit.cloudera.org:8080/10968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56eb1d44ba8bfbd76d8bb794b8076b695782939e
Gerrit-Change-Number: 10968
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 20 Jul 2018 16:41:14 +
Gerrit-HasComments: No