ephemeral storage on ssd will be very painful to maintain especially with
large datasets. we will pretty soon have somewhere in PB.
I am thinking to leverage something like below. But not sure how much
performance gain we could get out of that.
https://github.com/stec-inc/EnhanceIO
On Sat, Dec
What about ephemeral storage on ssd ? If performance is required it's
generally for production so the cluster would never be stopped. Then a
spark job to backup/restore on S3 allows to shut down completely the cluster
Le 3 déc. 2016 1:28 PM, "David Mitchell" a
écrit :
To get a node local read from Spark to Cassandra, one has to use a read
consistency level of LOCAL_ONE. For some use cases, this is not an
option. For example, if you need to use a read consistency level
of LOCAL_QUORUM, as many use cases demand, then one is not going to get a
node local read.
On 3 Dec 2016, at 09:16, Manish Malhotra
> wrote:
thanks for sharing number as well !
Now a days even network can be with very high throughput, and might out perform
the disk, but as Sean mentioned data on network will have
hmm GCE pretty much seems to follow the same model as AWS.
On Sat, Dec 3, 2016 at 1:22 AM, kant kodali wrote:
> GCE seems to have better options. Any one had any experience with GCE?
>
> On Sat, Dec 3, 2016 at 1:16 AM, Manish Malhotra <
> manish.malhotra.w...@gmail.com>
GCE seems to have better options. Any one had any experience with GCE?
On Sat, Dec 3, 2016 at 1:16 AM, Manish Malhotra <
manish.malhotra.w...@gmail.com> wrote:
> thanks for sharing number as well !
>
> Now a days even network can be with very high throughput, and might out
> perform the disk,
thanks for sharing number as well !
Now a days even network can be with very high throughput, and might out
perform the disk, but as Sean mentioned data on network will have other
dependencies like network hops, like if its across rack, which can have
switch in between.
But yes people are
Forgot to mention my entire cluster is on one DC. so if it is across
multiple DC's then colocating does makes sense in theory as well.
On Sat, Dec 3, 2016 at 1:12 AM, kant kodali wrote:
> Thanks Sean! Just for the record I am currently seeing 95 MB/s RX (Receive
> throughput
Thanks Sean! Just for the record I am currently seeing 95 MB/s RX (Receive
throughput ) on my spark worker machine when I do `sudo iftop -B`
The problem with instance store on AWS is that they all are ephemeral so
placing Cassandra on top doesn't make a lot of sense. so In short, AWS
doesn't seem
wait, how is that a benefit? isn't that a bad thing if you are saying
colocating leads to more latency and overall execution time is longer?
On Sat, Dec 3, 2016 at 12:34 AM, vincent gromakowski <
vincent.gromakow...@gmail.com> wrote:
> You get more latency on reads so overall execution time is
You get more latency on reads so overall execution time is longer
Le 3 déc. 2016 7:39 AM, "kant kodali" a écrit :
>
> I wonder what benefits do I really I get If I colocate my spark worker
> process and Cassandra server process on each node?
>
> I understand the concept of
I wonder what benefits do I really I get If I colocate my spark worker
process and Cassandra server process on each node?
I understand the concept of moving compute towards the data instead of
moving data towards computation but It sounds more like one is trying to
optimize for network latency.
12 matches
Mail list logo