Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10968 )

Change subject: schema: add is_deleted virtual column
......................................................................


Patch Set 9:

(1 comment)

This implementation is so straightfoward and the risks limited enough in scope 
that I'm frankly pretty happy with this approach for the time being. Here's why:

> breaking change

I think we can target this precisely enough (perhaps make it 
$$kudu.is_deleted$$ that the probability of breaking someone in real life is 
vanishingly small.

> looking at 'is-deleted' in isolation, I think it would be better to have an 
> optional bitset that gets plumbed through which holds the deleted status. 
> [...]  It's clear that in the long run incremental (CDC) scans should use a 
> sparse row representation

I agree that a sparse row layout would be more optimal for CDC, but 
implementing a virtual column now doesn't preclude us from doing that in the 
future, and implementing an optimal solution isn't a priority for us right now.

> last-updated-timestamp per-row. If I understand tablet internals correctly, 
> this can't ever be provided with per-row granularity, since we dispose of 
> per-row timestamps during compaction after the ancient history watermark.

While it isn't possible to get that information today, if we wanted this 
feature in the future we could implement it by keeping created / modified 
information on a per-row basis in a separate cfile or perhaps in one of the 
indexes. It wouldn't necessarily be accurate for existing tables, but going 
forward from the time of upgrade the modified information could theoretically 
be maintained.

http://gerrit.cloudera.org:8080/#/c/10968/7/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/10968/7/src/kudu/master/catalog_manager.cc@1302
PS7, Line 1302:     if (Schema::is_column_name_reserved(col_name)) {
> Yeah, Grant had the same feedback. I'm open to making that change; just wan
I don't think it's unreasonable to reserve $$kudu.*$$ or $$kudu_*$$ because 
this is so easy to implement and meets all of the other requirements for the 
is_deleted feature except for namespacing.



--
To view, visit http://gerrit.cloudera.org:8080/10968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56eb1d44ba8bfbd76d8bb794b8076b695782939e
Gerrit-Change-Number: 10968
Gerrit-PatchSet: 9
Gerrit-Owner: Adar Dembo <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Wed, 25 Jul 2018 20:33:01 +0000
Gerrit-HasComments: Yes

Reply via email to