Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10968 )

Change subject: schema: add is_deleted virtual column
......................................................................


Patch Set 6:

Todd was the main advocate for the virtual column approach; I'd be interested 
in his take too. I've responded to a few things below:

> First, if looking at 'is-deleted' in isolation, I think it would be better to 
> have an optional bitset that gets plumbed through which holds the deleted 
> status.  As I understand it a boolean flag indicating whether to retain 
> deleted rows is already going to have to get plumbed through, so this 
> wouldn't be a big complication.  As a bonus, bitsets are more efficient in 
> terms of memory overhead (1 bit per row vs 1 byte per row).  Ultimately I 
> think this will have a cleaner implementation than checking against known 
> column values.  It's clear that in the long run incremental (CDC) scans 
> should use a sparse row representation, in which case re-using the existing 
> row-batch layout is out the window, anyway.

The plumbing is quite different though:
1. The boolean flag you allude to is part of the RowIteratorOptions passed to 
each iterator at construction time, and it's just a single boolean for the 
entire iterator. This plumbing has already been done in commit 0fbebd67c.
2. The bitset you're proposing would need to be per-RowBlock in rowwise 
iterators and per-ColumnBlock in columnwise iterators, which means plumbing 
through NextBlock, MaterializeColumn, and several other iterator methods.

Plus the current implementation has the advantage of being dead simple to 
implement, which is important because we've tabled several design decisions 
until a working backup and restore MVP is ready, so the faster we can get 
there, the better.

> - last-updated-timestamp per-row. If I understand tablet internals correctly, 
> this can't ever be provided with per-row granularity, since we dispose of 
> per-row timestamps during compaction after the ancient history watermark.  I 
> think the best we could do here is to provide it per scan batch.

I'm not sure this is correct, but I'm following up with Mike (and looking at 
the code) to find out for sure.


--
To view, visit http://gerrit.cloudera.org:8080/10968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56eb1d44ba8bfbd76d8bb794b8076b695782939e
Gerrit-Change-Number: 10968
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Grant Henke <granthe...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Fri, 20 Jul 2018 18:29:34 +0000
Gerrit-HasComments: No

Reply via email to