Re: CASSANDRA-10993 Approaches

2016-08-18 Thread Stefan Podkowinski
>From my perspective, one of the most important reasons for RxJava would be
the strategic option to integrate reactive streams [1] in the overall
Cassandra architecture at some point in the future. Reactive streams would
allow to design back pressure fundamentally different compared to what we
do in the current work-queue based execution model. Think about the
optimizations currently deployed to walk a thin line between throughput,
latency and GC pressure. About the lack of coordination between individual
processes such as compactions, streaming and client requests that will
effect each other; where we can just hope that clients back off due to
latency aware policies, streams that will eventually timeout, or
compactions that hopefully get enough work done at some point. We even have
to tell people to tune batch sizes to not overwhelm nodes in the cluster.
Squeezing out n% during performance tests is nice, but IMO 10993 should
also address how to get more control on using system resources and a
reactive stream based approach could help with that.

[1] https://github.com/ReactiveX/RxJava/wiki/Reactive-Streams


On Wed, Aug 17, 2016 at 9:54 PM, Jake Luciani  wrote:

> I think I outlined the tradeoffs I see between the roll our own vs use a
> reactive framework in
> https://issues.apache.org/jira/plugins/servlet/mobile#
> issue/CASSANDRA-10528
>
> My view is we should try to utilize the existing before we start writing
> our own. And even if we do write our own keep it reactive since reactive
> APIs are going to be adopted in the Java 9 spec.  There is an entire
> community out there thinking about asynchronous programming that we can tap
> into.
>
> I don't buy the argument (yet) that Rx or other libraries lack the control
> we need. In fact these APIs are quite extensible.
>
> On Aug 17, 2016 3:08 PM, "Tyler Hobbs"  wrote:
>
> > In the spirit of the recent thread about discussing large changes on the
> > Dev ML, I'd like to talk about CASSANDRA-10993, the first step in the
> > "thread per core" work.
> >
> > The goal of 10993 is to transform the read and write paths into an
> > event-driven model powered by event loops.  This means that each request
> > can be handled on a single thread (although typically broken up into
> > multiple steps, depending on I/O and locking) and the old mutation and
> read
> > thread pools can be removed.  So far, we've prototyped this with a couple
> > of approaches:
> >
> > The first approach models each request as a state machine (or composition
> > of state machines).  For example, a single write request is encapsulated
> in
> > a WriteTask object which moves through a series of states as portions of
> > the write complete (allocating a commitlog segment, syncing the
> commitlog,
> > receiving responses from remote replicas).  These state transitions are
> > triggered by Events that are emitted by, e.g., the
> > CommitlogSegmentManager.  The event loop that manages tasks, events,
> > timeouts, and scheduling is custom and is (currently) closely tied to a
> > Netty event loop.  Here are a couple of example classes to take a look
> at:
> >
> > WriteTask:
> > https://github.com/thobbs/cassandra/blob/CASSANDRA-
> > 10993-WIP/src/java/org/apache/cassandra/poc/WriteTask.java
> > EventLoop:
> > https://github.com/thobbs/cassandra/blob/CASSANDRA-
> > 10993-WIP/src/java/org/apache/cassandra/poc/EventLoop.java
> >
> > The second approach utilizes RxJava and the Observable pattern.  Where we
> > would wait for emitted events in the state machine approach, we instead
> > depend on an Observable to "push" the data/result we're awaiting.
> > Scheduling is handled by an Rx scheduler (which is customizable).  The
> code
> > changes required for this are, overall, less intrusive.  Here's a quick
> > example of what this looks like for high-level operations:
> > https://github.com/thobbs/cassandra/blob/rxjava-rebase/
> > src/java/org/apache/cassandra/service/StorageProxy.java#L1724-L1732
> > .
> >
> > So far we've benchmarked both approaches on in-memory reads to get an
> idea
> > of the upper-bound performance of both approaches.  Throughput appears to
> > be very similar with both branches.
> >
> > There are a few considerations up for debate as to which approach we
> should
> > go with that I would appreciate input on.
> >
> > First, performance.  There are concerns that going with Rx (or something
> > similar) may limit the peak performance we can eventually attain in a
> > couple of ways.  First, we don't have as much control over the event
> loop,
> > scheduling, and chunking of tasks.  With the state machine approach,
> we're
> > writing all of this, so it's totally under our control.  With Rx, a lot
> of
> > things are customizable or already have decent tools, but this may come
> up
> > short in critical ways.  Second, the overhead of the Observable machinery
> > may become significant as other bottlenecks are removed.  Of course,
> > WriteTask 

MAX_COMPACTING_L0, is it still important to enforce?

2016-08-18 Thread Wei Deng
I was digging into LCS code lately, and found the following comments (note
the last paragraph "that would be ideal, but we can't"):

"// The problem is that L0 has a much higher score (almost 250)
than L1 (11), so what we'll
// do is compact a batch of MAX_COMPACTING_L0 sstables with all 117
L1 sstables, and put the
// result (say, 120 sstables) in L1. Then we'll compact the next
batch of MAX_COMPACTING_L0,
// and so forth.  So we spend most of our i/o rewriting the L1 data
with each batch.
//
// If we could just do *all* L0 a single time with L1, that would
be ideal.  But we can't
// -- see the javadoc for MAX_COMPACTING_L0."

And then when I read the MAX_COMPACTING_L0 javadoc referenced above:

"/**
 * limit the number of L0 sstables we do at once, because compaction
bloom filter creation
 * uses a pessimistic estimate of how many keys overlap (none), so we
risk wasting memory
 * or even OOMing when compacting highly overlapping sstables
 */"

I'm starting to wonder if this is still a concern post C* 2.1 given that
we've implemented CASSANDRA-5906. Here is an excerpt from Jonathan's blog
post (
http://www.datastax.com/dev/blog/improving-compaction-in-cassandra-with-cardinality-estimation)
on what motivated 5906 to be implemented:

"Because bloom filters are not re-sizeable, we need to pre-allocate them at
the start of the compaction, but at the start of the compaction, we don’t
know how much the sstables being compacted overlap. Since bloom filter
performance deteriorates dramatically when over-filled, we allocate our
bloom filters to be large enough even if the sstables do not overlap at
all. Which means that if they do overlap (which they should if compaction
is doing a good job picking candidates), then we waste space — up to 100%
per sstable compacted."

Since we have 5906 to address this very issue for a few years, does it make
sense now to revisit MAX_COMPACTING_L0 choice (hard coded to 32) since the
"bloom filter wasting memory" concern is no longer there? I would imagine
this could have the potential of improving backlogged LCS behavior when we
have thousands of L0 SSTables.

Thanks.

-Wei


Re: Contribute to Cassandra Wiki Third Party Support

2016-08-18 Thread Michael Shuler
It looks like Dave replied to your first email that he added your wiki
user to be able to edit pages. Did your login not work properly, or did
you get some sort of error editing the wiki?

(cc'ed directly, too)

-- 
Kind regards,
Michael

On 08/18/2016 08:15 AM, Danielle Blake wrote:
> Hi,
> 
> I emailed you last week in regards to contributing to
> https://wiki.apache.org/cassandra/ThirdPartySupport and have not heard a
> response yet.
> I would like to get the Company that I work for, OpenCredo
> , on this page as we are Datastax Certified
> experts.
> 
> Username: Danielle Blake
> 
> Kind regards,
> 
> Danielle Blake
> 
> Marketing Executive
> 
> M. +44 (0) 7403 565 785
> 
> T.  +44 (0) 207 928 9200
> 
> Opencredo.com . Twitter  . LinkedIn
> 
> 
> OpenCredo Ltd -- Excellence in Enterprise Application Development
> 
> Registered Office:  5-11 Lavington St, London SE1 0NZ
> 
> Registered in UK. No 3943999
> 
> If you have received this e-mail in error please accept our apologies,
> destroy it immediately and it immediately and it would be greatly
> appreciated if you notified the sender.  It is your responsibility to
> protect your system from viruses and any other harmful code or device.  We
> try to eliminate them from e-mails and attachments; but we accept no
> liability for any that remain. We may monitor or access any or all e-mails
> sent to us.
> 



Contribute to Cassandra Wiki Third Party Support

2016-08-18 Thread Danielle Blake
Hi,

I emailed you last week in regards to contributing to
https://wiki.apache.org/cassandra/ThirdPartySupport and have not heard a
response yet.
I would like to get the Company that I work for, OpenCredo
, on this page as we are Datastax Certified
experts.

Username: Danielle Blake

Kind regards,

Danielle Blake

Marketing Executive

M. +44 (0) 7403 565 785

T.  +44 (0) 207 928 9200

Opencredo.com . Twitter  . LinkedIn


OpenCredo Ltd -- Excellence in Enterprise Application Development

Registered Office:  5-11 Lavington St, London SE1 0NZ

Registered in UK. No 3943999

If you have received this e-mail in error please accept our apologies,
destroy it immediately and it immediately and it would be greatly
appreciated if you notified the sender.  It is your responsibility to
protect your system from viruses and any other harmful code or device.  We
try to eliminate them from e-mails and attachments; but we accept no
liability for any that remain. We may monitor or access any or all e-mails
sent to us.

-- 


opencredo.com  . Twitter 
 . LinkedIn 



OpenCredo Ltd -- Excellence in Enterprise Application Development

Registered Office:  5-11 Lavington Street, London, SE1 0NZ
Registered in UK. No 3943999