Daniel: By random read, I meant not reading the data sequentially as is the
norm in Kafka, not necessarily a random disk seek. That in-memory data
structure is what enables the random read. You're either going to need the
disk seek if the data isn't in the fs cache or you're trading memory to
avoid it. If it's a full index containing keys and values then you're
potentially committing to a much larger JVM memory footprint (and all the
GC issues that come with it) since you'd be storing that data in the JVM
heap. If you're only storing the keys + offset info, then you potentially
introduce random disk seeks on any CAS operation (and making page caching
harder for the OS, etc.).


On Sat, Jun 13, 2015 at 11:33 AM, Daniel Schierbeck <
daniel.schierb...@gmail.com> wrote:

> Ewen: would single-key CAS necessitate random reads? My idea was to have
> the broker maintain an in-memory table that could be rebuilt from the log
> or a snapshot.
> On lør. 13. jun. 2015 at 20.26 Ewen Cheslack-Postava <e...@confluent.io>
> wrote:
>
> > Jay - I think you need broker support if you want CAS to work with
> > compacted topics. With the approach you described you can't turn on
> > compaction since that would make it last-writer-wins, and using any
> > non-infinite retention policy would require some external process to
> > monitor keys that might expire and refresh them by rewriting the data.
> >
> > That said, I think any addition like this warrants a lot of discussion
> > about potential use cases since there are a lot of ways you could go
> adding
> > support for something like this. I think this is an obvious next
> > incremental step, but someone is bound to have a use case that would
> > require multi-key CAS and would be costly to build atop single key CAS.
> Or,
> > since the compare requires a random read anyway, why not throw in
> > read-by-key rather than sequential log reads, which would allow for
> > minitransactions a la Sinfonia?
> >
> > I'm not convinced trying to make Kafka support traditional key-value
> store
> > functionality is a good idea. Compacted topics made it possible to use
> it a
> > bit more in that way, but didn't change the public interface, only the
> way
> > storage was implemented, and importantly all the potential additional
> > performance costs & data structures are isolated to background threads.
> >
> > -Ewen
> >
> > On Sat, Jun 13, 2015 at 9:59 AM, Daniel Schierbeck <
> > daniel.schierb...@gmail.com> wrote:
> >
> > > @Jay:
> > >
> > > Regarding your first proposal: wouldn't that mean that a producer
> > wouldn't
> > > know whether a write succeeded? In the case of event sourcing, a failed
> > CAS
> > > may require re-validating the input with the new state. Simply
> discarding
> > > the write would be wrong.
> > >
> > > As for the second idea: how would a client of the writer service know
> > which
> > > writer is the leader? For example, how would a load balancer know which
> > web
> > > app process to route requests to? Ideally, all processes would be able
> to
> > > handle requests.
> > >
> > > Using conditional writes would allow any producer to write and provide
> > > synchronous feedback to the producers.
> > > On fre. 12. jun. 2015 at 18.41 Jay Kreps <j...@confluent.io> wrote:
> > >
> > > > I have been thinking a little about this. I don't think CAS actually
> > > > requires any particular broker support. Rather the two writers just
> > write
> > > > messages with some deterministic check-and-set criteria and all the
> > > > replicas read from the log and check this criteria before applying
> the
> > > > write. This mechanism has the downside that it creates additional
> > writes
> > > > when there is a conflict and requires waiting on the full roundtrip
> > > (write
> > > > and then read) but it has the advantage that it is very flexible as
> to
> > > the
> > > > criteria you use.
> > > >
> > > > An alternative strategy for accomplishing the same thing a bit more
> > > > efficiently is to elect leaders amongst the writers themselves. This
> > > would
> > > > require broker support for single writer to avoid the possibility of
> > > split
> > > > brain. I like this approach better because the leader for a partition
> > can
> > > > then do anything they want on their local data to make the decision
> of
> > > what
> > > > is committed, however the downside is that the mechanism is more
> > > involved.
> > > >
> > > > -Jay
> > > >
> > > > On Fri, Jun 12, 2015 at 6:43 AM, Ben Kirwin <b...@kirw.in> wrote:
> > > >
> > > > > Gwen: Right now I'm just looking for feedback -- but yes, if folks
> > are
> > > > > interested, I do plan to do that implementation work.
> > > > >
> > > > > Daniel: Yes, that's exactly right. I haven't thought much about
> > > > > per-key... it does sound useful, but the implementation seems a bit
> > > > > more involved. Want to add it to the ticket?
> > > > >
> > > > > On Fri, Jun 12, 2015 at 7:49 AM, Daniel Schierbeck
> > > > > <daniel.schierb...@gmail.com> wrote:
> > > > > > Ben: your solutions seems to focus on partition-wide CAS. Have
> you
> > > > > > considered per-key CAS? That would make the feature more useful
> in
> > my
> > > > > > opinion, as you'd greatly reduce the contention.
> > > > > >
> > > > > > On Fri, Jun 12, 2015 at 6:54 AM Gwen Shapira <
> > gshap...@cloudera.com>
> > > > > wrote:
> > > > > >
> > > > > >> Hi Ben,
> > > > > >>
> > > > > >> Thanks for creating the ticket. Having check-and-set capability
> > will
> > > > be
> > > > > >> sweet :)
> > > > > >> Are you planning to implement this yourself? Or is it just an
> idea
> > > for
> > > > > >> the community?
> > > > > >>
> > > > > >> Gwen
> > > > > >>
> > > > > >> On Thu, Jun 11, 2015 at 8:01 PM, Ben Kirwin <b...@kirw.in>
> wrote:
> > > > > >> > As it happens, I submitted a ticket for this feature a couple
> > days
> > > > > ago:
> > > > > >> >
> > > > > >> > https://issues.apache.org/jira/browse/KAFKA-2260
> > > > > >> >
> > > > > >> > Couldn't find any existing proposals for similar things, but
> > it's
> > > > > >> > certainly possible they're out there...
> > > > > >> >
> > > > > >> > On the other hand, I think you can solve your particular issue
> > by
> > > > > >> > reframing the problem: treating the messages as 'requests' or
> > > > > >> > 'commands' instead of statements of fact. In your
> flight-booking
> > > > > >> > example, the log would correctly reflect that two different
> > people
> > > > > >> > tried to book the same flight; the stream consumer would be
> > > > > >> > responsible for finalizing one booking, and notifying the
> other
> > > > client
> > > > > >> > that their request had failed. (In-browser or by email.)
> > > > > >> >
> > > > > >> > On Wed, Jun 10, 2015 at 5:04 AM, Daniel Schierbeck
> > > > > >> > <daniel.schierb...@gmail.com> wrote:
> > > > > >> >> I've been working on an application which uses Event
> Sourcing,
> > > and
> > > > > I'd
> > > > > >> like
> > > > > >> >> to use Kafka as opposed to, say, a SQL database to store
> > events.
> > > > This
> > > > > >> would
> > > > > >> >> allow me to easily integrate other systems by having them
> read
> > > off
> > > > > the
> > > > > >> >> Kafka topics.
> > > > > >> >>
> > > > > >> >> I do have one concern, though: the consistency of the data
> can
> > > only
> > > > > be
> > > > > >> >> guaranteed if a command handler has a complete picture of all
> > > past
> > > > > >> events
> > > > > >> >> pertaining to some entity.
> > > > > >> >>
> > > > > >> >> As an example, consider an airline seat reservation system.
> > Each
> > > > > >> >> reservation command issued by a user is rejected if the seat
> > has
> > > > > already
> > > > > >> >> been taken. If the seat is available, a record describing the
> > > event
> > > > > is
> > > > > >> >> appended to the log. This works great when there's only one
> > > > producer,
> > > > > >> but
> > > > > >> >> in order to scale I may need multiple producer processes.
> This
> > > > > >> introduces a
> > > > > >> >> race condition: two command handlers may simultaneously
> > receive a
> > > > > >> command
> > > > > >> >> to reserver the same seat. The event log indicates that the
> > seat
> > > is
> > > > > >> >> available, so each handler will append a reservation event –
> > thus
> > > > > >> >> double-booking that seat!
> > > > > >> >>
> > > > > >> >> I see three ways around that issue:
> > > > > >> >> 1. Don't use Kafka for this.
> > > > > >> >> 2. Force a singler producer for a given flight. This will
> > impact
> > > > > >> >> availability and make routing more complex.
> > > > > >> >> 3. Have a way to do optimistic locking in Kafka.
> > > > > >> >>
> > > > > >> >> The latter idea would work either on a per-key basis or
> > globally
> > > > for
> > > > > a
> > > > > >> >> partition: when appending to a partition, the producer would
> > > > > indicate in
> > > > > >> >> its request that the request should be rejected unless the
> > > current
> > > > > >> offset
> > > > > >> >> of the partition is equal to x. For the per-key setup, Kafka
> > > > brokers
> > > > > >> would
> > > > > >> >> track the offset of the latest message for each unique key,
> if
> > so
> > > > > >> >> configured. This would allow the request to specify that it
> > > should
> > > > be
> > > > > >> >> rejected if the offset for key k is not equal to x.
> > > > > >> >>
> > > > > >> >> This way, only one of the command handlers would succeed in
> > > writing
> > > > > to
> > > > > >> >> Kafka, thus ensuring consistency.
> > > > > >> >>
> > > > > >> >> There are different levels of complexity associated with
> > > > implementing
> > > > > >> this
> > > > > >> >> in Kafka depending on whether the feature would work
> > > per-partition
> > > > or
> > > > > >> >> per-key:
> > > > > >> >> * For the per-partition optimistic locking, the broker would
> > just
> > > > > need
> > > > > >> to
> > > > > >> >> keep track of the high water mark for each partition and
> reject
> > > > > >> conditional
> > > > > >> >> requests when the offset doesn't match.
> > > > > >> >> * For per-key locking, the broker would need to maintain an
> > > > in-memory
> > > > > >> table
> > > > > >> >> mapping keys to the offset of the last message with that key.
> > > This
> > > > > >> should
> > > > > >> >> be fairly easy to maintain and recreate from the log if
> > > necessary.
> > > > It
> > > > > >> could
> > > > > >> >> also be saved to disk as a snapshot from time to time in
> order
> > to
> > > > cut
> > > > > >> down
> > > > > >> >> the time needed to recreate the table on restart. There's a
> > small
> > > > > >> >> performance penalty associated with this, but it could be
> > opt-in
> > > > for
> > > > > a
> > > > > >> >> topic.
> > > > > >> >>
> > > > > >> >> Am I the only one thinking about using Kafka like this? Would
> > > this
> > > > > be a
> > > > > >> >> nice feature to have?
> > > > > >>
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Ewen
> >
>



-- 
Thanks,
Ewen

Reply via email to