Re: Requiring Java 8 for C* 3.0
Is running 2.1 with java 8 a supported or recommended way to run at this point? If not then we'll be requiring users to upgrade both java and C* at the same time when making the jump to 3.0. On Thu, May 7, 2015 at 11:25 AM, Aleksey Yeschenko alek...@apache.org wrote: The switch will necessarily hurt 3.0 adoption, but I think we’ll live. To me, the benefits (mostly access to lambdas and default methods, tbh) slightly outweigh the downsides. +0.1 -- AY On May 7, 2015 at 19:22:53, Gary Dusbabek (gdusba...@gmail.com) wrote: +1 On Thu, May 7, 2015 at 11:09 AM, Jonathan Ellis jbel...@gmail.com wrote: We discussed requiring Java 8 previously and decided to remain Java 7-compatible, but at the time we were planning to release 3.0 before Java 7 EOL. Now that 8099 and increased emphasis on QA have delayed us past Java 7 EOL, I think it's worth reopening this discussion. If we require 8, then we can use lambdas, LongAdder, StampedLock, Streaming collections, default methods, etc. Not just in 3.0 but over 3.x for the next year. If we don't, then people can choose whether to deploy on 7 or 8 -- but the vast majority will deploy on 8 simply because 7 is no longer supported without a premium contract with Oracle. 8 also has a more advanced G1GC implementation (see CASSANDRA-7486). I think that gaining access to the new features in 8 as we develop 3.x is worth losing the ability to run on a platform that will have been EOL for a couple months by the time we release. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: Trying to write tests for CASSANDRA-3127 (Internode Compression)
Not sure if you saw this, but org.apache.cassandra.net.sink has a SinkManager class and an interface for implementing message sinks. Basically it lets you catch messages as they are being sent or received by MessagingService. Could be useful, and is used in a couple other tests. On Sun, Oct 23, 2011 at 11:54 AM, Zach Richardson z...@raveldata.comwrote: Thanks, Will give these a shot. On Sun, Oct 23, 2011 at 11:18 AM, Jonathan Ellis jbel...@gmail.com wrote: I see a couple options. StorageProxy has this constant: private static final boolean OPTIMIZE_LOCAL_REQUESTS = true; // set to false to test messagingservice path on single node So, you could make it an instance variable and create a SP object with it set to false for tests. Or, you could do a test using ccm for multinode control, as in the long_read.sh test on https://issues.apache.org/jira/browse/CASSANDRA-3303. On Fri, Oct 21, 2011 at 4:28 PM, Zach Richardson j.zach.richard...@gmail.com wrote: Hi All, I have been working on an implementation for internode compression (CASSANDRA-3127.) https://issues.apache.org/jira/browse/CASSANDRA-3127 I have written code that works, but I'm looking for some advice on how to write unit tests for it. At the moment it compresses where: interrnode_message_compression_threshold 0 means it compress messages larger than it compresses all messags if it is == 0 and compresses none if less than 0 The code itself has been tested in an environment outside of cassandra (i.e. a few mock classes, and a heavily modified OutboundTcpConnection and IncomingTcpConnection.) and inside of Cassandra all of the current unit tests are passing. Since I can't inject a mocked MessagingService into IncomingTcpConnection, I'm guessing I have to do the testing from the outside of MessagingService, but the MessagingService itself checks to see if you are sending messages to yourself, and doesn't put them over the connection. Are there any tricks to letting me mock different endpoints from within a unit test? Can this only be tested in a distributed fashion at the OS level? Also the internode_message_compression_threshold is set through the cassandra.yaml file--is it possible to set these properties at runtime? Can I just change the public entry for it in the static Config class directly, or will that break other things? Thanks for you help and time, Zach -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Zach Richardson Ravel, Co-founder Austin, TX z...@raveldata.com 512.825.6031
Re: Interested in contributing to Cassandra
http://wiki.apache.org/cassandra/HowToContribute Specifically from there, see the link to tickets marked as 'Low Hanging Fruit' https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+12310865+AND+labels+%3D+lhf+AND+status+!%3D+resolved On Tue, Aug 9, 2011 at 11:14 AM, Tharindu Mathew mcclou...@gmail.com wrote: Hi everyone, I'm interested in contributing to Cassandra in my spare time. Any areas that is of interest to the project that I can look into (that leaves enough time for a learning curve :) ? I already have the source checked out and built on my machine. -- Regards, Tharindu
Re: Order preserving partitioning strategy
Tokens are really no different than thresholds. Your token is your min and your neighbors token is your max. To change your min, you move your token. To change your max you move your neighbors token. Your idea of calculating optimal number of keys is similar to the load balancing idea described on https://issues.apache.org/jira/browse/CASSANDRA-1418 On Thursday, August 26, 2010 10:47am, Mohamed Ibrahim mibra...@clker.com said: Hi All, There might be a simpler way to make the OPP achieve even, or close to even loads. The small change here is that the OPP has to use thresholds to distribute keys instead of centers. Every node should have a MIN and a MAX threshold. A key gets inserted in a node x if MIN_xk=MAX_x . Nodes share the thresholds between them, so MAX_x = MIN_(x+1) for all x=0 to n-1. If ever a key k is attempted to be inserted and k MIN_0, then we set MIN_0 = k -1 . Similarly, if ever a key k is attempted to be inserted and k is MAX_(n-1), then set MAX_(n-1)=k Those thresholds with such setup can be recalculated very easily to redistribute the data evenly. Actually, after doing some thinking I came up with two algorithms, one I call minor redistribution, and the other I called major redistribution, and the goal of doing a redistribution is to achieve an equal number of keys per node. The minor redistribution algorithm does not require full scan and can recalculate the thresholds very fast, but is approximate. The major redistribution may require a full key scan (or partial depending on the implementation) and will be able to exactly calculate the node thresholds to achieve equal loads. Due to the full (or partial) key scan requirement, the major redistribution will require longer time to process. Minor redistribution Step1: Calculate the desired load per node L= Total number of keys in the cluster / n Step2: Update the max thresholds of nodes 0 to n-2 to achieve the average load in every node n_x is a snapshot of the number of keys in node x Node average density D_x=Number of keys in node x / (MAX_x - MIN_x) If n_x L then // We're moving the max threshold back into the node, since it is overloaded New Max= MIN_x + L / D_x if (xn-1) n_(x+1)+=n_x-L; else // We're moving the max threshold into the next node, as the node is under fulled. Use the next node's density for better approx. New Max= MAX_x + (L-n_x)/D_(x+1) if (xn-1) n_(x+1)-=n_x-L; After the new thresholds are calculated, then nodes should move the data. The approximate here is the assumption that keys are evenly distributed over the range of every node, and I chose that because it is the simplest in my point of view. Since the data we have is already and incomplete data set (as more keys are expected in the future), any assumption of any distribution will have errors, so we rather use the simplest. Major redistribution This is actually much simpler to do. We know that we need every node to have L keys (as calculated in the minor distribution). Starting from the smallest key, move up L keys and set the max threshold, and by repeating we can actually figure out the max threshold of every node. That where actually we might need a full key scan, to implement this hopping of L keys to calculate the max. threshold. Hopefully this helps, or may be tickles some one else's brain to produce a nicer idea. Best, Mohamed Ibrahim On Thu, Aug 26, 2010 at 12:25 AM, J. Andrew Rogers jar.mail...@gmail.comwrote: Hi Jonathan, I've never seen a paper that discusses it as a primary topic, it is always in some other context. IIRC, the most recent discussions of it I have seen have been in join algorithm literature from somewhere in Asia. MPP analytical databases often implement some form of skew adaptivity but there is no standard way because the design tradeoffs are context dependent. DB2 also has a non-adaptive technique for dealing with skew that should be simple to implement on Cassandra and might provide an 80/20 option (more on that a little further on). Skew adaptivity is generally implemented with a mix of data structures along the lines of an adaptive quad-tree. The reason you only see this in analytical databases is that the data skew is unlikely to change much and/or have too much concurrent updating. If the distribution radically changes all of the time under high concurrency, it will create some mix of resource contention, lost selectivity, or runaway space consumption depending on implementation detail. The optimal mix of pain tends to be a compile-time option, so it isn't very flexible. Definitely not optimal for concurrent OLTP-ish workloads. Alternatively: IBM's DB2 has a couple different data organization options that essentially define partitionable skew invariants. The closer the real data distribution is to the skew