Re: Requiring Java 8 for C* 3.0

2015-05-07 Thread Nick Bailey
Is running 2.1 with java 8 a supported or recommended way to run at this
point? If not then we'll be requiring users to upgrade both java and C* at
the same time when making the jump to 3.0.

On Thu, May 7, 2015 at 11:25 AM, Aleksey Yeschenko alek...@apache.org
wrote:

 The switch will necessarily hurt 3.0 adoption, but I think we’ll live. To
 me, the benefits (mostly access to lambdas and default methods, tbh)
 slightly outweigh the downsides.

 +0.1

 --
 AY

 On May 7, 2015 at 19:22:53, Gary Dusbabek (gdusba...@gmail.com) wrote:

 +1

 On Thu, May 7, 2015 at 11:09 AM, Jonathan Ellis jbel...@gmail.com wrote:

  We discussed requiring Java 8 previously and decided to remain Java
  7-compatible, but at the time we were planning to release 3.0 before
 Java 7
  EOL. Now that 8099 and increased emphasis on QA have delayed us past Java
  7 EOL, I think it's worth reopening this discussion.
 
  If we require 8, then we can use lambdas, LongAdder, StampedLock,
 Streaming
  collections, default methods, etc. Not just in 3.0 but over 3.x for the
  next year.
 
  If we don't, then people can choose whether to deploy on 7 or 8 -- but
 the
  vast majority will deploy on 8 simply because 7 is no longer supported
  without a premium contract with Oracle. 8 also has a more advanced G1GC
  implementation (see CASSANDRA-7486).
 
  I think that gaining access to the new features in 8 as we develop 3.x is
  worth losing the ability to run on a platform that will have been EOL
 for a
  couple months by the time we release.
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder, http://www.datastax.com
  @spyced
 



Re: Trying to write tests for CASSANDRA-3127 (Internode Compression)

2011-10-24 Thread Nick Bailey
Not sure if you saw this, but org.apache.cassandra.net.sink has a
SinkManager class and an interface for implementing message sinks. Basically
it lets you catch messages as they are being sent or received by
MessagingService.

Could be useful, and is used in a couple other tests.

On Sun, Oct 23, 2011 at 11:54 AM, Zach Richardson z...@raveldata.comwrote:

 Thanks,

 Will give these a shot.

 On Sun, Oct 23, 2011 at 11:18 AM, Jonathan Ellis jbel...@gmail.com
 wrote:

  I see a couple options.
 
  StorageProxy has this constant:
 
 private static final boolean OPTIMIZE_LOCAL_REQUESTS = true; //
  set to false to test messagingservice path on single node
 
  So, you could make it an instance variable and create a SP object with
  it set to false for tests.
 
  Or, you could do a test using ccm for multinode control, as in the
  long_read.sh test on
  https://issues.apache.org/jira/browse/CASSANDRA-3303.
 
  On Fri, Oct 21, 2011 at 4:28 PM, Zach Richardson
  j.zach.richard...@gmail.com wrote:
   Hi All,
  
   I have been working on an implementation for internode compression
   (CASSANDRA-3127.) https://issues.apache.org/jira/browse/CASSANDRA-3127
  
   I have written code that works, but I'm looking for some advice on
   how to write unit tests for it.  At the moment it compresses where:
  
   interrnode_message_compression_threshold  0 means it compress
   messages larger than it
   compresses all messags if it is == 0
   and compresses none if less than 0
  
   The code itself has been tested in an environment outside of cassandra
   (i.e. a few mock classes, and a heavily modified OutboundTcpConnection
   and IncomingTcpConnection.) and inside of Cassandra all of the current
   unit tests are passing.
  
   Since I can't inject a mocked MessagingService into
   IncomingTcpConnection, I'm guessing I have to do the testing from the
   outside of MessagingService, but the MessagingService itself checks to
   see if you are sending messages to yourself, and doesn't put them over
   the connection.  Are there any tricks to letting me mock different
   endpoints from within a unit test?
  
   Can this only be tested in a distributed fashion at the OS level?
  
   Also the internode_message_compression_threshold is set through the
   cassandra.yaml file--is it possible to set these properties at
   runtime?  Can I just change the public entry for it in the static
   Config class directly, or will that break other things?
  
   Thanks for you help and time,
  
   Zach
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 



 --
 Zach Richardson
 Ravel, Co-founder
 Austin, TX
 z...@raveldata.com
 512.825.6031



Re: Interested in contributing to Cassandra

2011-08-09 Thread Nick Bailey
http://wiki.apache.org/cassandra/HowToContribute

Specifically from there, see the link to tickets marked as 'Low Hanging Fruit'

https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+12310865+AND+labels+%3D+lhf+AND+status+!%3D+resolved

On Tue, Aug 9, 2011 at 11:14 AM, Tharindu Mathew mcclou...@gmail.com wrote:
 Hi everyone,

 I'm interested in contributing to Cassandra in my spare time.

 Any areas that is of interest to the project that I can look into (that
 leaves enough time for a learning curve :) ? I already have the source
 checked out and built on my machine.

 --
 Regards,

 Tharindu



Re: Order preserving partitioning strategy

2010-08-26 Thread Nick Bailey
Tokens are really no different than thresholds. Your token is your min and your 
neighbors token is your max. To change your min, you move your token. To change 
your max you move your neighbors token.

Your idea of calculating optimal number of keys is similar to the load 
balancing idea described on https://issues.apache.org/jira/browse/CASSANDRA-1418

On Thursday, August 26, 2010 10:47am, Mohamed Ibrahim mibra...@clker.com 
said:

 Hi All,
 
 There might be a simpler way to make the OPP achieve even, or close to even
 loads.
 
 The small change here is that the OPP has to use thresholds to distribute
 keys instead of centers. Every node should have a MIN and a MAX threshold. A
 key gets inserted in a node x if MIN_xk=MAX_x . Nodes share the thresholds
 between them, so MAX_x = MIN_(x+1) for all x=0 to n-1.
 
 If ever a key k is attempted to be inserted and k  MIN_0, then we set MIN_0
 = k -1 . Similarly, if ever a key k is attempted to be inserted and k is 
 MAX_(n-1), then set MAX_(n-1)=k
 
 Those thresholds with such setup can be recalculated very easily to
 redistribute the data evenly. Actually, after doing some thinking I came up
 with two algorithms, one I call minor redistribution, and the other I called
 major redistribution, and the goal of doing a redistribution is to achieve
 an equal number of keys per node.
 
 The minor redistribution algorithm does not require full scan and can
 recalculate the thresholds very fast, but is approximate. The major
 redistribution may require a full key scan (or partial depending on the
 implementation) and will be able to exactly calculate the node thresholds to
 achieve equal loads. Due to the full (or partial) key scan requirement, the
 major redistribution will require longer time to process.
 
 Minor redistribution
 
 Step1: Calculate the desired load per node
 L= Total number of keys in the cluster / n
 
 Step2: Update the max thresholds of nodes 0 to n-2 to achieve the average
 load in every node
 n_x is a snapshot of the number of keys in node x
 Node average density D_x=Number of keys in node x / (MAX_x - MIN_x)
 If n_x  L then // We're moving the max threshold back into the node,
 since it is overloaded
New Max= MIN_x + L / D_x
if (xn-1) n_(x+1)+=n_x-L;
 else  // We're moving the max threshold into the next node, as the node
 is under fulled. Use the next node's density for better approx.
New Max= MAX_x + (L-n_x)/D_(x+1)
if (xn-1) n_(x+1)-=n_x-L;
 
 After the new thresholds are calculated, then nodes should move the data.
 The approximate here is the assumption that keys are evenly distributed over
 the range of every node, and I chose that because it is the simplest in my
 point of view. Since the data we have is already and incomplete data set (as
 more keys are expected in the future), any assumption of any distribution
 will have errors, so we rather use the simplest.
 
 Major redistribution
 
 This is actually much simpler to do. We know that we need every node to have
 L keys (as calculated in the minor distribution). Starting from the smallest
 key, move up L keys and set the max threshold, and by repeating we can
 actually figure out the max threshold of every node. That where actually we
 might need a full key scan, to implement this hopping of L keys to calculate
 the max. threshold.
 
 Hopefully this helps, or may be tickles some one else's brain to produce a
 nicer idea.
 
 Best,
 Mohamed Ibrahim
 
 On Thu, Aug 26, 2010 at 12:25 AM, J. Andrew Rogers 
 jar.mail...@gmail.comwrote:
 
 Hi Jonathan,

 I've never seen a paper that discusses it as a primary topic, it is
 always in some other context. IIRC, the most recent discussions of it
 I have seen have been in join algorithm literature from somewhere in
 Asia. MPP analytical databases often implement some form of skew
 adaptivity but there is no standard way because the design tradeoffs
 are context dependent. DB2 also has a non-adaptive technique for
 dealing with skew that should be simple to implement on Cassandra and
 might provide an 80/20 option (more on that a little further on).

 Skew adaptivity is generally implemented with a mix of data structures
 along the lines of an adaptive quad-tree. The reason you only see this
 in analytical databases is that the data skew is unlikely to change
 much and/or have too much concurrent updating. If the distribution
 radically changes all of the time under high concurrency, it will
 create some mix of resource contention, lost selectivity, or runaway
 space consumption depending on implementation detail. The optimal mix
 of pain tends to be a compile-time option, so it isn't very flexible.
 Definitely not optimal for concurrent OLTP-ish workloads.

 Alternatively:

 IBM's DB2 has a couple different data organization options that
 essentially define partitionable skew invariants. The closer the real
 data distribution is to the skew