Re: is there anyway to enforce Spark to cache data in all worker nodes(almost equally) ?

2015-04-30 Thread shahab
Thanks Alex, but 482MB was just example size, and  I am looking for
generic approach  doing this without broadcasting,

any idea?

best,
/Shahab

On Thu, Apr 30, 2015 at 4:21 PM, Alex  wrote:

> 482 MB should be small enough to be distributed as a set of broadcast
> variables. Then you can use local features of spark to process.
> --
> From: shahab 
> Sent: ‎4/‎30/‎2015 9:42 AM
> To: user@spark.apache.org
> Subject: is there anyway to enforce Spark to cache data in all worker
> nodes(almost equally) ?
>
> Hi,
>
> I load data from Cassandra into spark The entire data is almost around 482
> MB. and it is cached as TempTable in 7 tables. How can I enforce spark to
> cache data in both worker nodes not only in ONE worker (as in my case)?
>
> I am using spark "2.1.1" with spark-connector "1.2.0-rc3". I have small
> stand-alone cluster with two  nodes A, B. Where node A accommodates
> Cassandra,  Spark Master and Worker and node B contains the second spark
> worker.
>
> best,
> /Shahab
>


RE: is there anyway to enforce Spark to cache data in all worker nodes(almost equally) ?

2015-04-30 Thread Alex
482 MB should be small enough to be distributed as a set of broadcast 
variables. Then you can use local features of spark to process.

-Original Message-
From: "shahab" 
Sent: ‎4/‎30/‎2015 9:42 AM
To: "user@spark.apache.org" 
Subject: is there anyway to enforce Spark to cache data in all worker 
nodes(almost equally) ?

Hi,


I load data from Cassandra into spark The entire data is almost around 482 MB. 
and it is cached as TempTable in 7 tables. How can I enforce spark to cache 
data in both worker nodes not only in ONE worker (as in my case)?


I am using spark "2.1.1" with spark-connector "1.2.0-rc3". I have small 
stand-alone cluster with two  nodes A, B. Where node A accommodates Cassandra,  
Spark Master and Worker and node B contains the second spark worker.


best,
/Shahab