Andrew Wong has posted comments on this change.

Change subject: WIP: Kudu Consistency Blog Post Pt1
......................................................................


Patch Set 3:

(32 comments)

Really appreciated explicit definitions of what and when things happen. I think 
the section about design decisions can be shortened a bit but will see after 
the rest of the post is in.

mostly comments about wording, some about content

http://gerrit.cloudera.org:8080/#/c/7019/3/_posts/2017-05-30-kudu-consistency-pt1.md
File _posts/2017-05-30-kudu-consistency-pt1.md:

Line 11: of how they all play together and what are our future plans.
would be nice to also include what is discussed in Part 1.


PS3, Line 17: users and requests
            : are executed by many different machines,
how about, "...users, and must serve requests across numerous machines,..."?


PS3, Line 20: how Kudu implements this
"how these systems implement"


PS3, Line 29: single thread
"single-threaded"


Line 45: ## Motivation behind the consistency design decisions in Kudu:
Can we assume the reader understands what a tradeoff is? If we can, I think we 
can take out the first paragraph.

Maybe relabels this "Motivation behind some key design decisions in Kudu" since 
these aren't specifically consistency design decisions

The key points here I think are:
1. the overarching goal for Kudu is fast analytic queries
2. we make similar design choices as those made in the Spanner paper w.r.t 
writes
3. we targeted continuous inserts
4. we wanted to grants users flexibility of the consistency constraint


PS3, Line 95: semantics
"guarantees", "constraints"?


PS3, Line 101: Kudu currently lacks support for atomic multi-row mutation 
operations (i.e. mutation
             : operations to more than one row in the same or different 
tablets, planned as a future feature) so,
             : for the write path, we’ll be talking about the consistency 
semantics of single row mutations.
             : In this context we’ll discuss Kudu’s properties more from a 
key/value store standpoint. On the
             : other hand Kudu is an analytical storage engine so, for the read 
path, we’ll also discuss the
             : semantics of large (multi-row) scans. This moves the discussion 
more into the field of traditional
             : DBMSs. These ingredients make for a non-traditional discussion 
that is not exactly apples-to-apples
             : with what the reader might be familiar with, but our hope is 
that it still provides valuable, or
             : at least interesting, insight.
maybe consider leaving the write and read discussion to their respective 
sections, or move this entire highlighted portion to  the previous section (and 
leave this section as just "Relevant architectural components")


PS3, Line 113: Operation execution
"Transaction driver", "TransactionDriver"? trying to have each bullet's title 
be the component name


Line 117: replicated to follower replicas by a leader replica following the 
Raft consensus algorithm[ref].
", and thus, each tablet owns its own Consensus module."


PS3, Line 121: Timestamp assignment and propagation
maybe "Timestamp and Clock"? Or even have a separate bullet for "Clock".


PS3, Line 130: landed
"been added to"?


PS3, Line 130: A request is not considered “committed” until is has landed on
             : the WAL of a majority of replicas
Consider inverting this sentence, since we're introducing a bunch of terms here 
anyway:
"A request is considered "committed" when it has been appended to a majority of 
replicas' WALs and registered with MVCC as committed" (and then maybe put the 
definition for WAL after MVCC)


PS3, Line 132: This component serves a dual purpose storing the original client
             : requests and how they mutated each tablet’s internal state after 
being applied
how about: "This component serves two purposes: it stores the original logical 
client-requested op (i.e. inserts, updates, deletes), and the physical change 
to the tablet after the op is applied (i.e. flushes and compactions).


PS3, Line 136: Scans take a snapshots
"A scan will take a snapshot"


PS3, Line 136: they start executing and use
"it starts executing, and will use"


PS3, Line 137: unfinished operations as a tablet is scanned
"unfinished tablet operations as it progresses"


PS3, Line 139: component
remove


PS3, Line 159: 2. Timestamp assignment
Note that this is done by the leader


PS3, Line 161:  follow up posts
"a later post"


PS3, Line 161: now the reader can just think of
             :     it
"a timestamp can be thought of"


PS3, Line 162: That timestamp
This


PS3, Line 163: related operations
Would "any two operations" be accurate as well? Or does "related operations" 
refer to something specific (e.g. operations on the same machine, in the same 
config, etc.)?


PS3, Line 164: Register the operation with MVCC
super nit: for consistency with previous bullets: "Registration of operation 
with MVCC"

same for the below bullets, or could make the above bullets "Acquire locks" and 
"Assign timestamps"


PS3, Line 166:  and they don’t return u
missing something?


PS3, Line 169: to consensus 
"to the Consensus module"? "to Consensus"? I don't think "submitting to 
consensus" has caught on yet with the layman, so maybe specify that consensus 
is the module you mentioned in the above section.

Alternatively just "submit the operation to be replicated" but that loses out 
on specificity.


Line 170:     the operation follower replicas will execute the prepare phase in 
the exact same sequence as the
Don't they also send an ACK/vote to the leader?


Line 172:     the write-ahead-log.
And this will only happen when it receives enough votes, right?

Could you also note when the followers write to the WAL?


PS3, Line 173: the 
> remove
Is successful completion referring to the completion across all replicas? A 
majority of replicas? Just the leader?


PS3, Line 178: For this blog post the relevant aspects
             :     of this are: a) contrary to the Prepare phase this phase is 
multi-threaded; b) in the large
             :     majority of cases apply only changes in-memory data 
structures. Regarding a) since operations
             :     that mutate the same rows only move to the apply phase one 
at a time due to lock acquisition
             :     (step 2.1) it is now safe to have multiple operations be 
executed at the same 
I think this a/b is a bit confusing. Maybe:
For this blog post, there are a couple of key points to keep in mind. First, in 
most cases, the Apply phase only changes in-memory data structures, and this is 
safe even in the event of a crash because logical operations have already been 
written to the WAL. Second, contrary to the Prepare phase, this phase is 
multi-threaded, and since operations that mutate the sa... at the same time.

Also consider bringing this out of the numbered steps and into its own bullet 
or paragraph, since b) is already noted below.


PS3, Line 184: no need
"no immediate need"


PS3, Line 184: consensus
"the on-disk WAL"


PS3, Line 187:  write doesn’t 
"These changes don't"


-- 
To view, visit http://gerrit.cloudera.org:8080/7019
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to