Re: Is RDD.persist honoured if multiple actions are executed in parallel
If you want to ensure the persisted RDD has been calculated first, just run foreach with a dummy function first to force evaluation. -- Michael Mior michael.m...@gmail.com Le jeu. 24 sept. 2020 à 00:38, Arya Ketan a écrit : > > Thanks, we were able to validate the same behaviour. > > On Wed, 23 Sep 2020 at 18:05, Sean Owen wrote: >> >> It is but it happens asynchronously. If you access the same block twice >> quickly, the cached block may not yet be available the second time yet. >> >> On Wed, Sep 23, 2020, 7:17 AM Arya Ketan wrote: >>> >>> Hi, >>> I have a spark streaming use-case ( spark 2.2.1 ). And in my spark job, I >>> have multiple actions. I am running them in parallel by executing the >>> actions in separate threads. I have a rdd.persist after which the DAG >>> forks into multiple actions. >>> but I see that rdd caching is not happening and the entire DAG is executed >>> twice ( once in each action) . >>> >>> What am I missing? >>> Arya >>> >>> >> >> > -- > Arya - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Is RDD.persist honoured if multiple actions are executed in parallel
Thanks, we were able to validate the same behaviour. On Wed, 23 Sep 2020 at 18:05, Sean Owen wrote: > It is but it happens asynchronously. If you access the same block twice > quickly, the cached block may not yet be available the second time yet. > > On Wed, Sep 23, 2020, 7:17 AM Arya Ketan wrote: > >> Hi, >> I have a spark streaming use-case ( spark 2.2.1 ). And in my spark job, I >> have multiple actions. I am running them in parallel by executing the >> actions in separate threads. I have a rdd.persist after which the DAG >> forks into multiple actions. >> but I see that rdd caching is not happening and the entire DAG is >> executed twice ( once in each action) . >> >> What am I missing? >> Arya >> >> >> > > -- Arya
Re: Is RDD.persist honoured if multiple actions are executed in parallel
It is but it happens asynchronously. If you access the same block twice quickly, the cached block may not yet be available the second time yet. On Wed, Sep 23, 2020, 7:17 AM Arya Ketan wrote: > Hi, > I have a spark streaming use-case ( spark 2.2.1 ). And in my spark job, I > have multiple actions. I am running them in parallel by executing the > actions in separate threads. I have a rdd.persist after which the DAG > forks into multiple actions. > but I see that rdd caching is not happening and the entire DAG is > executed twice ( once in each action) . > > What am I missing? > Arya >
Is RDD.persist honoured if multiple actions are executed in parallel
Hi, I have a spark streaming use-case ( spark 2.2.1 ). And in my spark job, I have multiple actions. I am running them in parallel by executing the actions in separate threads. I have a rdd.persist after which the DAG forks into multiple actions. but I see that rdd caching is not happening and the entire DAG is executed twice ( once in each action) . What am I missing? Arya