Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-18 Thread Ben Slater
I wanted to provide a bit of background in the interest we've seen in this
ticket/feature (at Instaclustr) - essentially it comes down to in-db
encryption at rest being a feature that compliance people are used to
seeing in databases and having a very hard time believing that operating
system level encryption is an equivalent control (whatever the reality may
be). I've seen this be a significant obstacle for people who want to adopt
Apache Cassandra many times and an insurmountable obstacle on multiple
occasions. From what I've seen, I think this is one of the most watched
tickets with the most "is this coming soon" comments in the project backlog
and it's something we pretty regularly get asked whether we know if/when
it's coming.

That said, I completely agree that we don't want to be engaging in security
theatre or " introducing something that is either insecure or too slow to
be useful." and I think there are some really good suggestions in this
thread to come up with a strong solution for what will undoubtedly be a
pretty complex and major change.

Cheers
Ben




On Wed, 17 Nov 2021 at 03:34, Joseph Lynch  wrote:

> For FDE you'd probably have  the key file in a tmpfs pulled from a
> remote secret manager and when the machine boots it mounts the
> encrypted partition that contains your data files. I'm not aware of
> anyone doing FDE with a password in production. If you wanted
> selective encryption it would make sense to me to support placing
> keyspaces on different data directories (this may already be possible)
> but since crypto in the kernel is so cheap I don't know why you'd do
> selective encryption. Also I think it's worth noting many hosting
> providers (e.g. AWS) just encrypt the disks for you so you can check
> the "data is encrypted at rest" box.
>
> I think Cassandra will be pretty handicapped by being in the JVM which
> generally has very slow crypto. I'm slightly concerned that we're
> already slow at streaming and compaction, and adding slow JVM crypto
> will make C* even less competitive. For example, if we have to disable
> full sstable streaming (zero copy or otherwise) I think that would be
> very unfortunate (although Bowen's approach of sharing one secret
> across the cluster and then having files use a key derivation function
> may avoid that). Maybe if we did something like CASSANDRA-15294 [1] to
> try to offload to native crypto like how internode networking did with
> tcnative to fix the perf issues with netty TLS with JVM crypto I'd
> feel a little less concerned but ... crypto that is both secure and
> performant in the JVM is a hard problem ...
>
> I guess I'm just concerned we're going to introduce something that is
> either insecure or too slow to be useful.
>
> -Joey
>
> On Tue, Nov 16, 2021 at 8:10 AM Bowen Song  wrote:
> >
> > I don't like the idea that FDE Full Disk Encryption as an alternative to
> > application managed encryption at rest. Each has their own advantages
> > and disadvantages.
> >
> > For example, if the encryption key is the same across nodes in the same
> > cluster, and Cassandra can share the key securely between authenticated
> > nodes, rolling restart of the servers will be a lot simpler than if the
> > servers were using FDE - someone will have to type in the passphrase on
> > each reboot, or have a script to mount the encrypted device over SSH and
> > then start Cassandra service after a reboot.
> >
> > Another valid use case of encryption implemented in Cassandra is
> > selectively encrypt some tables, but leave others unencrypted. Doing
> > this outside Cassandra on the filesystem level is very tedious and
> > error-prone - a lots of symlinks and pretty hard to handle newly created
> > tables or keyspaces.
> >
> > However, I don't know if there's enough demand to justify the above use
> > cases.
> >
> >
> > On 16/11/2021 14:45, Joseph Lynch wrote:
> > > I think a CEP is wise (or a more thorough design document on the
> > > ticket) given how easy it is to do security incorrectly and key
> > > management, rotation and key derivation are not particularly
> > > straightforward.
> > >
> > > I am curious what advantage Cassandra implementing encryption has over
> > > asking the user to use an encrypted filesystem or disks instead where
> > > the kernel or device will undoubtedly be able to do the crypto more
> > > efficiently than we can in the JVM and we wouldn't have to further
> > > complicate the storage engine? I think the state of encrypted
> > > filesystems (e.g. LUKS on Linux) is significantly more user friendly
> > > these days than it was in 2015 when that ticket was created.
> > >
> > > If the application has existing exfiltration paths (e.g. backups) it's
> > > probably better to encrypt/decrypt in the backup/restore process via
> > > something extremely fast (and modern) like piping through age [1]
> > > isn't it?
> > >
> > > [1] https://github.com/FiloSottile/age
> > >
> > > -Joey
> > >
> > >
> > > On Sat, Nov 13, 2021 at 6:01 AM 

Re: [RELEASE] Apache Cassandra 4.0.0 released

2021-07-26 Thread Ben Slater
Congratulations and thank you to everyone involved in getting 4.0 released!
It has been a very impressive community effort.

---


*Ben Slater**Chief Product Officer*


<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Tue, 27 Jul 2021 at 07:34, Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> wrote:

> Super exciting, congratulations everybody. Great times ahead!
>
> On Mon, 26 Jul 2021 at 22:19, Patrick McFadin  wrote:
> >
> > Wow. Just wow. Congratulations to everyone involved in this huge
> milestone.
> >
> > On Mon, Jul 26, 2021 at 1:04 PM Brandon Williams 
> wrote:
> >
> > > The Cassandra team is pleased to announce the release of Apache
> > > Cassandra version 4.0.0.
> > >
> > > Apache Cassandra is a fully distributed database. It is the right
> > > choice when you need scalability and high availability without
> > > compromising performance.
> > >
> > > http://cassandra.apache.org/
> > >
> > > Downloads of source and binary distributions are available in our
> > > download section:
> > >
> > > http://cassandra.apache.org/download/
> > >
> > > This version is the initial release in the 4.0 series. As always,
> > > please pay attention to the release notes[2] and Let us know[3] if you
> > > were to encounter any problem.
> > >
> > > Enjoy!
> > >
> > > [1]: CHANGES.txt
> > >
> > >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-4.0.0
> > > [2]: NEWS.txt
> > >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-4.0.0
> > > [3]: https://issues.apache.org/jira/browse/CASSANDRA
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Project website analytics

2020-10-23 Thread Ben Slater
I'm sure I could hide a server like that in our AWS bill somewhere and also
happy to help out with getting it set up if there is general agreement to
go ahead.

Cheers
Ben

---


*Ben Slater**Chief Product Officer*

<https://www.instaclustr.com/platform/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Sat, 24 Oct 2020 at 01:01, Mick Semb Wever  wrote:

> > Yes, we would need a server donated. And (Brandon) I'm trying to chase
> > down what specs would be required for our traffic numbers. I see other
> > users mentioning 1-4GB ram server, but I have no idea what traffic
> > they are dealing with.
>
>
> One of the founders did a test that seems to confirm such a small
> server should work for our needs.
>
> https://plausible.discourse.group/t/hardware-recommendations-capacity-planning
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Project website analytics

2020-10-22 Thread Ben Slater
Hi Mick

I'm a bit unclear what you're looking for here - someone to donate a
server, someone to do admin or both? Or have I missed the point altogether?

Cheers
Ben

---


*Ben Slater**Chief Product Officer*

<https://www.instaclustr.com/platform/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Fri, 23 Oct 2020 at 07:43, Mick Semb Wever  wrote:

> > It may help gauge what's needed if we have some kind of ballpark idea
> > for what kind of resources may be needed for this. (The irony that
> > having analytics here could tell us is not lost on me.)
> > > For monthly traffic, if what we're seeing from ASF is a 30-day moving
> > average, then the site sees approx. 10-15K pageviews/month if that helps.
> > Source: https://uls.apache.org/exports/cassandra.apache.org.yaml
>
>
> I just got confirmed that those numbers are not averages. That's daily
> numbers.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Supported upgrade path for 4.0

2020-10-11 Thread Ben Slater
Just to add to Mick's point, we (Instaclustr) have also been running and
recommending 3.11.x by default. It's currently by far the most common
version in our managed fleet and our last 3.0.x cluster will likely be
upgraded shortly. 3.11.x is also our recommendation for consulting and
support customers. I'd therefore support Mick's recommendation (really
based on our experience with and confidence in 3.11.x rather than being
able to point to specific issues off hand) that 2.*->3.11.x->4.0 is the
preferred upgrade path. We will do testing on 3.11.x to 4.0 upgrade but I
can't see us doing any work on 3.0 to 4.0.

Cheers
Ben

---


*Ben Slater**Chief Product Officer*

<https://www.instaclustr.com/platform/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Sun, 11 Oct 2020 at 06:42, Mick Semb Wever  wrote:

> > "3.11 performs close to parity with 2.1/2.2. 3.0 does not. If we
> recommend
> > people upgrade from 2.1 -> 3.0 -> 4.0, we are asking them to have a
> cluster
> > in a regressed performance state for potentially months as they execute
> > their upgrade."
> >
> > Did I get anything wrong here Mick? ^
> >
>
>
> That's correct Josh.
>
> From tickets like those listed, and from experience, we recommend folk
> avoid 3.0 altogether. This has only been made more evident by witnessing
> the benefits from 3.0 → 3.11 upgrades.
>
> My recommendation remains  2.*→3.11→4.0. And I don't believe I'm alone.
> Though if a user was already on 3.0, then I would (of course) recommend an
> upgrade directly to 4.0.
>
> I feel like I'm just splitting straws at this point, since we have accepted
> (folk willing to help with) both paths to 4.0, and I can't see how we stop
> recommending  2.*→3.11 upgrades.
>


Re: [Discuss] num_tokens default in Cassandra 4.0

2020-02-18 Thread Ben Slater
In case it helps move the decision along, we moved to 16 vnodes as default
in Nov 2018 and haven't looked back (many clusters from 3-100s of nodes
later). The testing we did in making that decision is summarised here:
https://www.instaclustr.com/cassandra-vnodes-how-many-should-i-use/

<https://www.instaclustr.com/cassandra-vnodes-how-many-should-i-use/>Cheers
Ben

---


*Ben Slater**Chief Product Officer*

<https://www.instaclustr.com/platform/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Tue, 18 Feb 2020 at 18:44, Mick Semb Wever 
wrote:

> -1
>
> Discussions here and on slack have brought up a number of important
> concerns. I think those concerns need to be summarised here before any
> informal vote.
>
> It was my understanding that some of those concerns may even be blockers to
> a move to 16. That is we have to presume the worse case scenario where all
> tokens get randomly generated.
>
> Can we ask for some analysis and data against the risks different
> num_tokens choices present. We shouldn't rush into a new default, and such
> background information and data is operator value added. Maybe I missed any
> info/experiments that have happened?
>
>
>
> On Mon., 17 Feb. 2020, 11:14 pm Jeremy Hanna, 
> wrote:
>
> > I just wanted to close the loop on this if possible.  After some
> discussion
> > in slack about various topics, I would like to see if people are okay
> with
> > num_tokens=8 by default (as it's not much different operationally than
> > 16).  Joey brought up a few small changes that I can put on the ticket.
> It
> > also requires some documentation for things like decommission order and
> > skew.
> >
> > Are people okay with this change moving forward like this?  If so, I'll
> > comment on the ticket and we can move forward.
> >
> > Thanks,
> >
> > Jeremy
> >
> > On Tue, Feb 4, 2020 at 8:45 AM Jon Haddad  wrote:
> >
> > > I think it's a good idea to take a step back and get a high level view
> of
> > > the problem we're trying to solve.
> > >
> > > First, high token counts result in decreased availability as each node
> > has
> > > data overlap with with more nodes in the cluster.  Specifically, a node
> > can
> > > share data with RF-1 * 2 * num_tokens.  So a 256 token cluster at RF=3
> is
> > > going to almost always share data with every other node in the cluster
> > that
> > > isn't in the same rack, unless you're doing something wild like using
> > more
> > > than a thousand nodes in a cluster.  We advertise
> > >
> > > With 16 tokens, that is vastly improved, but you still have up to 64
> > nodes
> > > each node needs to query against, so you're again, hitting every node
> > > unless you go above ~96 nodes in the cluster (assuming 3 racks /
> AZs).  I
> > > wouldn't use 16 here, and I doubt any of you would either.  I've
> > advocated
> > > for 4 tokens because you'd have overlap with only 16 nodes, which works
> > > well for small clusters as well as large.  Assuming I was creating a
> new
> > > cluster for myself (in a hypothetical brand new application I'm
> > building) I
> > > would put this in production.  I have worked with several teams where I
> > > helped them put 4 token clusters in prod and it has worked very well.
> We
> > > didn't see any wild imbalance issues.
> > >
> > > As Mick's pointed out, our current method of using random token
> > assignment
> > > for the default number of problematic for 4 tokens.  I fully agree with
> > > this, and I think if we were to try to use 4 tokens, we'd want to
> address
> > > this in tandem.  We can discuss how to better allocate tokens by
> default
> > > (something more predictable than random), but I'd like to avoid the
> > > specifics of that for the sake of this email.
> > >
> > > To Alex's point, repairs are problematic with lower token counts due to
> > > over streaming.  I think this is a pretty serious issue and I we'd have
> > to

Re: Google Season of Docs 2019 for Apache Cassandra

2019-03-12 Thread Ben Slater
Hi Dinesh

Great idea. We should be able to find some Instaclustr people to help with
technical input (Stefan has already put his hand up).

I’m also happy to help with the application if that’s useful.

Cheers
Ben

---


*Ben Slater*
*Chief Product Officer*


<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Wed, 13 Mar 2019 at 08:12, Dinesh Joshi 
wrote:

> Hi all,
>
> I came across GSoD 2019[1]. This is different from GSoC and focuses on
> improving documentation for Open Source projects. I think this would be
> beneficial for Cassandra especially with 4.0 coming up. However, working
> with a technical writer will require a substantial time commitment from us
> to bring them up to speed.
>
> Are there any volunteers to help guide the technical writer if Cassandra
> is picked as a project?
>
> On a side note, we can put together the application on the Confluence
> wiki. I will create a page and if anybody is interested in helping out with
> putting together the application, please feel free to collaborate on it.
>
> Thanks,
>
> Dinesh
>
> [1] https://developers.google.com/season-of-docs/docs/timeline
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: CommitLog Recovery replay stop on first timestamp after restore-point-in-time

2018-12-20 Thread Ben Slater
I don’t have any personal knowledge of the fix but out of interest I took a
look in Jira and it looks sounds to me like the behaviour was fixed here
(in 2.0.10): https://issues.apache.org/jira/browse/CASSANDRA-6905

---


*Ben Slater*
*Chief Product Officer*


<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Thu, 20 Dec 2018 at 21:07, Morten Vejen Nielsen  wrote:

> Hi,
>
> (Moved from user mailing list to here)
>
> I have found a statement in the Datastax documentation regarding CommitLog
> recovery that concerns me, namely:
>
> "*Restore stops when the first client-supplied timestamp is greater than
> the restore point timestamp. Because the order in which the database
> receives mutations does not strictly follow the timestamp order, this can
> leave some mutations unrecovered.*"
>
> From:
>
> https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configLogArchive.html
> Which to me means that point in time restore really doesn't guarantee point
> in time replay for the configured time. Since we expect to have mutations
> out of order in our setup.
>
> I conducted a few experiments on this myself by forcing my Cassandra
> instance to do CommitLog replay with changes ahead in time. But I was not
> able to reproduce this behavior.
> I used a fresh instance taken from the official Cassandra docker image to
> run the tests, so no changes to any configs was done other than setting the
> restore_point_in_time as specified below.
> I did the experiment as follows:
>
> --edit /etc/cassandra/commitlog_archiving.properties, set
> *restore_point_in_time* to something in the near future (lets say 2
> hours ahead of server-time)
>
> ssh into instance
>
> cqlsh
> create keyspace thezoo with replication =
> {'class':'SimpleStrategy','replication_factor':1};
> use thezoo;
> create table animal (id int primary key, name varchar);
> insert into animal (id, name) values (1, 'Bear1');insert into animal
> (id, name) values (2, 'Bear2');insert into animal (id, name) values
> (3, 'Bear3');insert into animal (id, name) values (4, 'Bear4');insert
> into animal (id, name) values (5, 'Bear5');insert into animal (id,
> name) values (6, 'Bear6');insert into animal (id, name) values (7,
> 'Bear7');insert into animal (id, name) values (8, 'Bear8');insert into
> animal (id, name) values (9, 'Bear9');insert into animal (id, name)
> values (10, 'Bear10');
> select id,name,writetime(name) from animal;
> --Add some to timestamp, and use this as future_timestamp, must be
> ahead of what was defined in commitlog config file
> insert into animal (id, name) values (11, 'DuckFromFuture') using
> timestamp 
> insert into animal (id, name) values (12, 'Bird1');insert into animal
> (id, name) values (13, 'Bird2');insert into animal (id, name) values
> (14, 'Bird3');insert into animal (id, name) values (15,
> 'Bird4');insert into animal (id, name) values (16, 'Bird5');insert
> into animal (id, name) values (17, 'Bird6');insert into animal (id,
> name) values (18, 'Bird7');insert into animal (id, name) values (19,
> 'Bird8');insert into animal (id, name) values (20, 'Bird9');insert
> into animal (id, name) values (21, 'Bird10');
>
> --Now I simply forced the power off the machine held the power button
> down. And restarted
>
> --During startup verify that commitlog replay has been done in log
>
> ssh into instance and enter cqlsh
>
> cqlsh:thezoo> select * from animal;
>
> --Which shows all the bears and birds have been replayed but not the duck!
>
> I also did some digging in the Cassandra source code, and made the
> following findings:
>
> I think the code that skips mutations ahead of time is in CommitLogReplayer
> class:
> See lines: 194-195 (at the time of writing)
> if (commitLogReplayer.pointInTimeExceeded(mutation))
>return;
> This code is triggerred from CommitLogReader, where readSection seems to be
> responsible for reading the commit logs, this is wrapped in a while loop,
> that just reads the file until EOF.
> See:
>  while (statusTracker.shouldContinue() && reader.getFilePointer() < end &&
> !reader.isEOF())
> This method is called file by file from CommitLog.r

Re: Built in trigger: double-write for app migration

2018-10-18 Thread Ben Slater
I might be missing something but we’ve done this operation on a few
occasions by:
1) Commission the new cluster and join it to the existing cluster as a 2nd
DC
2) Replicate just the keyspace that you want to move to the 2nd DC
3) Make app changes to read moved tables from 2nd DC
4) Change keyspace definition to remove moved keyspace from first DC
5) Split the 2DCs into separate clusters (sever network connections, change
seeds)

If it’s just a table you moving and not a whole keyspace then you can skip
step 4 and drop the unneeded tables from either side after splitting. This
might mean the new cluster needs to be temporarily bigger than the
end-state during the migration process.

Cheers
Ben

On Fri, 19 Oct 2018 at 07:04 Jeff Jirsa  wrote:

> Could be done with CDC
> Could be done with triggers
> (Could be done with vtables — double writes or double reads — if they were
> extended to be user facing)
>
> Would be very hard to generalize properly, especially handling failure
> cases (write succeeds in one cluster/table but not the other) which are
> often app specific
>
>
> --
> Jeff Jirsa
>
>
> > On Oct 18, 2018, at 6:47 PM, Jonathan Ellis  wrote:
> >
> > Isn't this what CDC was designed for?
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-8844
> >
> > On Thu, Oct 18, 2018 at 10:54 AM Carl Mueller
> >  wrote:
> >
> >> tl;dr: a generic trigger on TABLES that will mirror all writes to
> >> facilitate data migrations between clusters or systems. What is
> necessary
> >> to ensure full write mirroring/coherency?
> >>
> >> When cassandra clusters have several "apps" aka keyspaces serving
> >> applications colocated on them, but the app/keyspace bandwidth and size
> >> demands begin impacting other keyspaces/apps, then one strategy is to
> >> migrate the keyspace to its own dedicated cluster.
> >>
> >> With backups/sstableloading, this will entail a delay and therefore a
> >> "coherency" shortfall between the clusters. So typically one would
> employ a
> >> "double write, read once":
> >>
> >> - all updates are mirrored to both clusters
> >> - writes come from the current most coherent.
> >>
> >> Often two sstable loads are done:
> >>
> >> 1) first load
> >> 2) turn on double writes/write mirroring
> >> 3) a second load is done to finalize coherency
> >> 4) switch the app to point to the new cluster now that it is coherent
> >>
> >> The double writes and read is the sticking point. We could do it at the
> app
> >> layer, but if the app wasn't written with that, it is a lot of testing
> and
> >> customization specific to the framework.
> >>
> >> We could theoretically do some sort of proxying of the java-driver
> somehow,
> >> but all the async structures and complex interfaces/apis would be
> difficult
> >> to proxy. Maybe there is a lower level in the java-driver that is
> possible.
> >> This also would only apply to the java-driver, and not
> >> python/go/javascript/other drivers.
> >>
> >> Finally, I suppose we could do a trigger on the tables. It would be
> really
> >> nice if we could add to the cassandra toolbox the basics of a write
> >> mirroring trigger that could be activated "fairly easily"... now I know
> >> there are the complexities of inter-cluster access, and if we are even
> >> using cassandra as the target mirror system (for example there is an
> >> article on triggers write-mirroring to kafka:
> >> https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
> >>
> >> And this starts to get into the complexities of hinted handoff as well.
> But
> >> fundamentally this seems something that would be a very nice feature
> >> (especially when you NEED it) to have in the core of cassandra.
> >>
> >> Finally, is the mutation hook in triggers sufficient to track all
> incoming
> >> mutations (outside of "shudder" other triggers generating data)
> >>
> >
> >
> > --
> > Jonathan Ellis
> > co-founder, http://www.datastax.com
> > @spyced
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Contributing to code

2017-06-19 Thread Ben Slater
Hi Salih

If you haven’t already see it, there is a section in the doco about getting
started with contributing here:
http://cassandra.apache.org/doc/latest/development/gettingstarted.html

Cheers
Ben

On Tue, 20 Jun 2017 at 10:48 Salih Gedik <m...@salih.xyz> wrote:

> Hi everyone,
>
> I am a rising senior and been considering to contribute to the project
> for a year but could not find an appropriate entry point. I have lots of
> time to work on this project this year and would not want to miss the
> chance. Could you committers please suggest me some branch to work on?
> Thank you so much!
>
>
> Salih Gedik
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Pluggable throttling of read and write queries

2017-02-20 Thread Ben Slater
We’ve actually had several customers where we’ve done the opposite - split
large clusters apart to separate uses cases. We found that this allowed us
to better align hardware with use case requirements (for example using AWS
c3.2xlarge for very hot data at low latency, m4.xlarge for more general
purpose data) we can also tune JVM settings, etc to meet those uses cases.

Cheers
Ben

On Mon, 20 Feb 2017 at 22:21 Oleksandr Shulgin <oleksandr.shul...@zalando.de>
wrote:

> On Sat, Feb 18, 2017 at 3:12 AM, Abhishek Verma <ve...@uber.com> wrote:
>
> Cassandra is being used on a large scale at Uber. We usually create
> dedicated clusters for each of our internal use cases, however that is
> difficult to scale and manage.
>
> We are investigating the approach of using a single shared cluster with
> 100s of nodes and handle 10s to 100s of different use cases for different
> products in the same cluster. We can define different keyspaces for each of
> them, but that does not help in case of noisy neighbors.
>
> Does anybody in the community have similar large shared clusters and/or
> face noisy neighbor issues?
>
>
> Hi,
>
> We've never tried this approach and given my limited experience I would
> find this a terrible idea from the perspective of maintenance (remember the
> old saying about basket and eggs?)
>
> What potential benefits do you see?
>
> Regards,
> --
> Alex
>
> --

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: [RELEASE] Apache Cassandra 3.10 released

2017-02-03 Thread Ben Slater
I’d like to add my thanks and congrats to everyone who has worked on this
release. It has clearly been tough to get out the door but it has been
awesome to see the commitment to quality.

Cheers
Ben

On Sat, 4 Feb 2017 at 14:09 Edward Capriolo <edlinuxg...@gmail.com> wrote:

>
> On Fri, Feb 3, 2017 at 6:52 PM, Michael Shuler <mich...@pbandjelly.org>
> wrote:
>
> The Cassandra team is pleased to announce the release of Apache
> Cassandra version 3.10.
>
> Apache Cassandra is a fully distributed database. It is the right choice
> when you need scalability and high availability without compromising
> performance.
>
>  http://cassandra.apache.org/
>
> Downloads of source and binary distributions are listed in our download
> section:
>
>  http://cassandra.apache.org/download/
>
> This version is a new feature and bug fix release[1] on the 3.X series.
> As always, please pay attention to the release notes[2] and Let us
> know[3] if you were to encounter any problem.
>
> This is the last tick-tock feature release of Apache Cassandra. Version
> 3.11.0 will continue bug fixes from this point on the cassandra-3.11
> branch in git.
>
> Enjoy!
>
> [1]: (CHANGES.txt) https://goo.gl/J0VghF
> [2]: (NEWS.txt) https://goo.gl/00KNVW
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>
>
> Great job all on this release.
>
-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: Broader community involvement in 4.0 (WAS Re: Rough roadmap for 4.0)

2016-11-12 Thread Ben Slater
For anyone that’s interested, I’ve submitted my doc changes for point 2
below (emphasising contributions other than new features) here:
https://issues.apache.org/jira/browse/CASSANDRA-12906

I haven’t added anything about the sponsor/shepherd idea as doesn’t seem to
be agreed at this point.

Cheers
Ben

On Mon, 7 Nov 2016 at 09:34 Nate McCall  wrote:

> Ben,
> Thank you for providing two thoughtful, concrete recommendations.
> There is some good feedback in general on this thread, but I'm calling
> Ben's response out because point #1 is important to discuss and point
> #2 is immediately actionable.
>
> > 1) I think some process of assigning a committer of a “sponsor” of a
> change
> > (which would probably mean committers volunteering) before it commences
> > would be useful. You can kind of do this at the moment by creating a JIRA
> > and asking for comment but I think the process is a bit unclear and a bit
> > intimidating for people starting off and it would be nice to know who was
> > your primary reviewer for a piece of work. (Or maybe this process does
> > exist and I don’t know about.)
>
> This is a good idea, but it assumes a single point triage and resource
> management that we don't really have right now.
>
> For the history of the project, we had triage in the form of sponsored
> resources flighting most of the new issues. This has made the rest of
> us complacent. It's probably the most immediate thing to fix and I
> don't know how to do that.
>
> Does anybody have any recommendations about ASF projects doing this
> effectively? Note that the folks from DS engineering are still heavily
> involved and I very much thank them for that, but diversifying is the
> only way to get us over our complacency.
>
> > 2) I think the “how to contribute” docs could emphasise activities other
> > than creating new features as a great place to start.It seems that
> review,
> > testing and doco could all do with more hands (as on just about any
> > project). So, encouraging this as a way to start on the project might
> help
> > to get some more bandwidth in this area rather than people creating
> patches
> > that the committers don’t have bandwidth to review. I would be happy to
> > draft an update to the docs including some of this if people think it’s a
> > good idea.
>
> Also a good idea and much more accessible/easily fixable.
>
> We will gladly look at any doc updates for this, looping in the
> broader community once published (this last part being key - I'm
> afraid if we ask for help too early, we'll get tons of interest to
> which we cannot reply and then be in even worse shape).
>
> -Nate
>


Re: Broader community involvement in 4.0 (WAS Re: Rough roadmap for 4.0)

2016-11-07 Thread Ben Slater
Thanks Dave. The shepherd concept sounds a lot like I had in mind (and a
better name).

One other thing I noted from the Mesos process - they have an “Accepted”
jira status that comes after open and means “at least one Mesos developer
thought that the ideas proposed in the issue are worth pursuing further”.
Might also be something to consider as part of a process like this?

Cheers
Ben

On Mon, 7 Nov 2016 at 09:37 Dave Lester <dles...@apache.org> wrote:

> Hi Ben,
>
> A few ideas to add to your suggestions [inline]:
>
> On 2016-11-06 13:51 (-0800), Ben Slater <ben.sla...@instaclustr.com>
> wrote:
> > Hi All,
> >
> > I thought I would add a couple of observations and suggestions as someone
> > who has both personally made my first contributions to the project in the
> > last few months and someone in a leadership role in an organisation
> > (Instaclustr) that is feeling it’s way through increasing our
> contributions
> > as an organisation.
> >
> > Firstly - an observation on contribution experience and what I think is
> > likely to make people want to contribute again:
> > 1) The worst thing that can happen is for your contribution to be
> > completely ignored.
> > 2) The second worst thing is for it to be rejected without a good
> > explanation (that you can learn from) or with hostility.
> > 3) Having it rejected with a good reason is not a bad thing (you learn)
> > 4) Having it accepted is, of course, the best!
> >
> > With this as a background I would suggest a couple of thing that help
> make
> > sure (3) and (4) are always more common that (1) and (2) (good outcomes
> are
> > probably more common than bad at the moment but we’ve experienced all
> four
> > scenarios in the last few months):
> > 1) I think some process of assigning a committer of a “sponsor” of a
> change
> > (which would probably mean committers volunteering) before it commences
> > would be useful. You can kind of do this at the moment by creating a JIRA
> > and asking for comment but I think the process is a bit unclear and a bit
> > intimidating for people starting off and it would be nice to know who was
> > your primary reviewer for a piece of work. (Or maybe this process does
> > exist and I don’t know about.)
>
> I've seen this approach before and it that can reduce ambiguity on the
> state of contributions; the Apache Mesos project has a shepherding system
> similar to this. I would shy away from the term "sponsor" since it could
> infer a non-voluntary relationship between contributors and volunteer
> committers.
>
> From the Mesos docs: "Find a shepherd to collaborate on your patch. A
> shepherd is a Mesos committer that will work with you to give you feedback
> on your proposed design, and to eventually commit your change into the
> Mesos source tree." More info on how they approach this is in both their
> newbie guide: http://mesos.apache.org/documentation/newbie-guide/, and
> submitting a patch guide:
> http://mesos.apache.org/documentation/latest/submitting-a-patch/.
>
> In practice, there are some limitations and risks to this model. For one,
> a shepherding process is not a substitute for the Apache Way, and it's
> critical that design decisions and reviews are still done in the open.
> Additionally, in projects where a single organization has disproportionate
> representation at the committer level it can create bottlenecks if features
> are a lower priority for those orgs (while not malicious, it may mean that
> certain patches are shepherded while others are ignored). It's possible to
> work within these limitations, especially in cases where the community is
> having healthy conversations about the direction and roadmap for the
> project (similar to the original thread).
>
> If this is something the project would like to push forward, I'd suggest a
> committer vote to ensure there's sufficient buy-in.
>
> > 2) I think the “how to contribute” docs could emphasise activities other
> > than creating new features as a great place to start.It seems that
> review,
> > testing and doco could all do with more hands (as on just about any
> > project). So, encouraging this as a way to start on the project might
> help
> > to get some more bandwidth in this area rather than people creating
> patches
> > that the committers don’t have bandwidth to review. I would be happy to
> > draft an update to the docs including some of this if people think it’s a
> > good idea.
>
> This would be great. If you make changes here and create a JIRA ticket
> associated with it, please add me to the ticket and I'll happily provide
> feedback.
>
> Dave
&g

Re: Broader community involvement in 4.0 (WAS Re: Rough roadmap for 4.0)

2016-11-06 Thread Ben Slater
Hi All,

I thought I would add a couple of observations and suggestions as someone
who has both personally made my first contributions to the project in the
last few months and someone in a leadership role in an organisation
(Instaclustr) that is feeling it’s way through increasing our contributions
as an organisation.

Firstly - an observation on contribution experience and what I think is
likely to make people want to contribute again:
1) The worst thing that can happen is for your contribution to be
completely ignored.
2) The second worst thing is for it to be rejected without a good
explanation (that you can learn from) or with hostility.
3) Having it rejected with a good reason is not a bad thing (you learn)
4) Having it accepted is, of course, the best!

With this as a background I would suggest a couple of thing that help make
sure (3) and (4) are always more common that (1) and (2) (good outcomes are
probably more common than bad at the moment but we’ve experienced all four
scenarios in the last few months):
1) I think some process of assigning a committer of a “sponsor” of a change
(which would probably mean committers volunteering) before it commences
would be useful. You can kind of do this at the moment by creating a JIRA
and asking for comment but I think the process is a bit unclear and a bit
intimidating for people starting off and it would be nice to know who was
your primary reviewer for a piece of work. (Or maybe this process does
exist and I don’t know about.)
2) I think the “how to contribute” docs could emphasise activities other
than creating new features as a great place to start.It seems that review,
testing and doco could all do with more hands (as on just about any
project). So, encouraging this as a way to start on the project might help
to get some more bandwidth in this area rather than people creating patches
that the committers don’t have bandwidth to review. I would be happy to
draft an update to the docs including some of this if people think it’s a
good idea.

Cheers
Ben

On Sun, 6 Nov 2016 at 06:40 Michael Shuler  wrote:

> On 11/04/2016 06:43 PM, Jeff Beck wrote:
> > I run the local Cassandra User Group and I would love to help get the
> > community more involved.  I would propose holding a night to add patches
> to
> > Cassandra some will be simple things like making sure some asserts have
> > proper messages with them etc, but some may be slightly larger. The goal
> > being to just get people used to the process, to help make this a success
> > it would be great if we could have support on getting the patches we
> submit
> > at least looked at briefly in 1 month. That timeframe allows us to talk
> > about it at the next meetup and show people their contributions even
> small
> > ones are valued.
>
> This is a great idea and I have a suggestion that would benefit the
> project as a whole, as well as help new people get used to the
> development process:
>
>   Document the process.
>
> Recently, the project included documentation in the source tree under
> `doc/`, which is directly presented at
> https://cassandra.apache.org/doc/latest/
>
> The red bar at the top has a link to contributions, there are docs about
> getting started with development, reviewing patches, and testing. If
> those docs need updating for better readability, missing steps, hints
> for new contributors, etc. I think this could be one of the most
> valuable contributions a user group could make, as well as provide some
> initial experience in the development process itself.
>
> > Before we did this night I would probably dig through some tickets and
> get
> > an example list going and any feedback notes on making the process easier
> > would be great.
>
> Some more ideas:
> The user group members could get themselves set up in JIRA in order to
> review one another's patches, get a feel for testing patches, go through
> the motions of *how* to contribute improvements, and again, get
> documentation change patches up in JIRA, so everyone benefits from your
> experiences, as the group works through the process.
>
> > Generally if there is anything you need from the meetups ask I know I
> will
> > do my best to get the local group to support things.
>
> Thanks for the interest!
>
> --
> Kind regards,
> Michael
>


Re: [jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-13 Thread Ben Slater
OK, I think it’s pretty unlikely to be this change as I didn’t change the
existing code (certainly nothing near what is used by -pop) and also I just
noticed you said you had the issue in 3.9 and CASS-12490 is destined for
3.10.

Also, last time I looked, I thought stress didn’t validate returned results
for YAML specs. Did I miss something or did that get added recently? Can
you add your actual command, etc to the ticket?

Anyway, I will try to do some more digging over the weekend as I still
suspect there is something wrong (or at least unexpected) going on aside
from this change.

(BTW - I noticed you moved the discussion from JIRA to the dev list. What’s
the etiquette there?)

Cheers
Ben



On Fri, 14 Oct 2016 at 09:02 Jake Luciani <jak...@gmail.com> wrote:

> No I'm not using a seq anywhere else then the command line
>
> On Oct 13, 2016 4:40 PM, "Ben Slater (JIRA)" <j...@apache.org> wrote:
>
> >
> > [ https://issues.apache.org/jira/browse/CASSANDRA-12490?
> > page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&
> > focusedCommentId=15573119#comment-15573119 ]
> >
> > Ben Slater commented on CASSANDRA-12490:
> > 
> >
> > Just to check [~tjake] when you say "this also breaks validation", I
> > assume you mean it breaks validation when you use the sequence
> distribution
> > type, not in the case where you don't use seq()?
> >
> > > Add sequence distribution type to cassandra stress
> > > --
> > >
> > > Key: CASSANDRA-12490
> > > URL: https://issues.apache.org/
> > jira/browse/CASSANDRA-12490
> > > Project: Cassandra
> > >  Issue Type: Improvement
> > >  Components: Tools
> > >Reporter: Ben Slater
> > >Assignee: Ben Slater
> > >Priority: Minor
> > > Fix For: 3.10
> > >
> > > Attachments: 12490-trunk.patch, 12490.yaml,
> > cqlstress-seq-example.yaml
> > >
> > >
> > > When using the write command, cassandra stress sequentially generates
> > seeds. This ensures generated values don't overlap (unless the sequence
> > wraps) providing more predictable number of inserted records (and
> > generating a base set of data without wasted writes).
> > > When using a yaml stress spec there is no sequenced distribution
> > available. It think it would be useful to have this for doing initial
> load
> > of data for testing
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.3.4#6332)
> >
>
-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798