Re: What benefits do we really get out of colocation?

2016-12-03 Thread kant kodali
ephemeral storage on ssd will be very painful to maintain especially with large datasets. we will pretty soon have somewhere in PB. I am thinking to leverage something like below. But not sure how much performance gain we could get out of that. https://github.com/stec-inc/EnhanceIO On Sat, Dec

Re: What benefits do we really get out of colocation?

2016-12-03 Thread vincent gromakowski
What about ephemeral storage on ssd ? If performance is required it's generally for production so the cluster would never be stopped. Then a spark job to backup/restore on S3 allows to shut down completely the cluster Le 3 déc. 2016 1:28 PM, "David Mitchell" a écrit :

Re: What benefits do we really get out of colocation?

2016-12-03 Thread David Mitchell
To get a node local read from Spark to Cassandra, one has to use a read consistency level of LOCAL_ONE. For some use cases, this is not an option. For example, if you need to use a read consistency level of LOCAL_QUORUM, as many use cases demand, then one is not going to get a node local read.

Re: What benefits do we really get out of colocation?

2016-12-03 Thread Steve Loughran
On 3 Dec 2016, at 09:16, Manish Malhotra > wrote: thanks for sharing number as well ! Now a days even network can be with very high throughput, and might out perform the disk, but as Sean mentioned data on network will have

Re: What benefits do we really get out of colocation?

2016-12-03 Thread kant kodali
hmm GCE pretty much seems to follow the same model as AWS. On Sat, Dec 3, 2016 at 1:22 AM, kant kodali wrote: > GCE seems to have better options. Any one had any experience with GCE? > > On Sat, Dec 3, 2016 at 1:16 AM, Manish Malhotra < > manish.malhotra.w...@gmail.com>

Re: What benefits do we really get out of colocation?

2016-12-03 Thread kant kodali
GCE seems to have better options. Any one had any experience with GCE? On Sat, Dec 3, 2016 at 1:16 AM, Manish Malhotra < manish.malhotra.w...@gmail.com> wrote: > thanks for sharing number as well ! > > Now a days even network can be with very high throughput, and might out > perform the disk,

Re: What benefits do we really get out of colocation?

2016-12-03 Thread Manish Malhotra
thanks for sharing number as well ! Now a days even network can be with very high throughput, and might out perform the disk, but as Sean mentioned data on network will have other dependencies like network hops, like if its across rack, which can have switch in between. But yes people are

Re: What benefits do we really get out of colocation?

2016-12-03 Thread kant kodali
Forgot to mention my entire cluster is on one DC. so if it is across multiple DC's then colocating does makes sense in theory as well. On Sat, Dec 3, 2016 at 1:12 AM, kant kodali wrote: > Thanks Sean! Just for the record I am currently seeing 95 MB/s RX (Receive > throughput

Re: What benefits do we really get out of colocation?

2016-12-03 Thread kant kodali
Thanks Sean! Just for the record I am currently seeing 95 MB/s RX (Receive throughput ) on my spark worker machine when I do `sudo iftop -B` The problem with instance store on AWS is that they all are ephemeral so placing Cassandra on top doesn't make a lot of sense. so In short, AWS doesn't seem

Re: What benefits do we really get out of colocation?

2016-12-03 Thread kant kodali
wait, how is that a benefit? isn't that a bad thing if you are saying colocating leads to more latency and overall execution time is longer? On Sat, Dec 3, 2016 at 12:34 AM, vincent gromakowski < vincent.gromakow...@gmail.com> wrote: > You get more latency on reads so overall execution time is

Re: What benefits do we really get out of colocation?

2016-12-03 Thread vincent gromakowski
You get more latency on reads so overall execution time is longer Le 3 déc. 2016 7:39 AM, "kant kodali" a écrit : > > I wonder what benefits do I really I get If I colocate my spark worker > process and Cassandra server process on each node? > > I understand the concept of

What benefits do we really get out of colocation?

2016-12-02 Thread kant kodali
I wonder what benefits do I really I get If I colocate my spark worker process and Cassandra server process on each node? I understand the concept of moving compute towards the data instead of moving data towards computation but It sounds more like one is trying to optimize for network latency.