Re: parallel processing - splitting data

2017-01-19 Thread Benjamin Roth
I meant the global whole token range which is -(2^64/2) to ((2^64) / 2 - 1) I remember there are classes that already generate the right slices but don't know by heart which one it was. 2017-01-19 13:29 GMT+01:00 Frank Hughes : > I have tried to retrieve the token range and slice in 4, but the re

Re: parallel processing - splitting data

2017-01-19 Thread Frank Hughes
I have tried to retrieve the token range and slice in 4, but the response i get for the following code is different on each node: TokenRange[] tokenRanges = unwrapTokenRanges(metadata.getTokenRanges(keyspaceName, localHost)).toArray(new TokenRange[0]); On each node, the 1024 token ranges are diff

Re: parallel processing - splitting data

2017-01-19 Thread Benjamin Roth
If you have 4 Nodes with RF 4 then all data is on every node. So you can just slice the whole token range into 4 pieces and let each node process 1 slice. Determining local ranges also only helps if you read with CL_ONE. 2017-01-19 13:05 GMT+01:00 Frank Hughes : > Hello there, > > I'm running a 4

Re: parallel processing - splitting data

2017-01-19 Thread siddharth verma
Hi Frank, You could try this https://github.com/siddv29/cfs I have processed 1.2 billion rows in 480 seconds with just 20 threads on client side. C* 3.0.9 Nodes = 6 RF = 3 Have a go at it. You might be surprised. Regards, On Thu, Jan 19, 2017 at 5:35 PM, Frank Hughes wrote: > Hello there, >

parallel processing - splitting data

2017-01-19 Thread Frank Hughes
Hello there, I'm running a 4 node cluster of Cassandra 3.9 with a replication factor of 4. I want to be able to run a java process on each node only selecting a 25% of the data on each node, so i can process all of the data in parallel on each node. What is the best way to do this with the java