Re: Is RDD.persist honoured if multiple actions are executed in parallel

2020-09-24 Thread Michael Mior
If you want to ensure the persisted RDD has been calculated first,
just run foreach with a dummy function first to force evaluation.

--
Michael Mior
michael.m...@gmail.com

Le jeu. 24 sept. 2020 à 00:38, Arya Ketan  a écrit :
>
> Thanks, we were able to validate the same behaviour.
>
> On Wed, 23 Sep 2020 at 18:05, Sean Owen  wrote:
>>
>> It is but it happens asynchronously. If you access the same block twice 
>> quickly, the cached block may not yet be available the second time yet.
>>
>> On Wed, Sep 23, 2020, 7:17 AM Arya Ketan  wrote:
>>>
>>> Hi,
>>> I have a spark streaming use-case ( spark 2.2.1 ). And in my spark job, I 
>>> have multiple actions. I am running them in parallel by executing the 
>>> actions in separate threads. I have  a rdd.persist after which the DAG 
>>> forks into multiple actions.
>>> but I see that rdd caching is not happening  and the entire DAG is executed 
>>> twice ( once in each action) .
>>>
>>> What am I missing?
>>> Arya
>>>
>>>
>>
>>
> --
> Arya

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Is RDD.persist honoured if multiple actions are executed in parallel

2020-09-23 Thread Arya Ketan
Thanks, we were able to validate the same behaviour.

On Wed, 23 Sep 2020 at 18:05, Sean Owen  wrote:

> It is but it happens asynchronously. If you access the same block twice
> quickly, the cached block may not yet be available the second time yet.
>
> On Wed, Sep 23, 2020, 7:17 AM Arya Ketan  wrote:
>
>> Hi,
>> I have a spark streaming use-case ( spark 2.2.1 ). And in my spark job, I
>> have multiple actions. I am running them in parallel by executing the
>> actions in separate threads. I have  a rdd.persist after which the DAG
>> forks into multiple actions.
>> but I see that rdd caching is not happening  and the entire DAG is
>> executed twice ( once in each action) .
>>
>> What am I missing?
>> Arya
>>
>>
>>
>
> --
Arya


Re: Is RDD.persist honoured if multiple actions are executed in parallel

2020-09-23 Thread Sean Owen
It is but it happens asynchronously. If you access the same block twice
quickly, the cached block may not yet be available the second time yet.

On Wed, Sep 23, 2020, 7:17 AM Arya Ketan  wrote:

> Hi,
> I have a spark streaming use-case ( spark 2.2.1 ). And in my spark job, I
> have multiple actions. I am running them in parallel by executing the
> actions in separate threads. I have  a rdd.persist after which the DAG
> forks into multiple actions.
> but I see that rdd caching is not happening  and the entire DAG is
> executed twice ( once in each action) .
>
> What am I missing?
> Arya
>


Is RDD.persist honoured if multiple actions are executed in parallel

2020-09-23 Thread Arya Ketan
Hi,
I have a spark streaming use-case ( spark 2.2.1 ). And in my spark job, I
have multiple actions. I am running them in parallel by executing the
actions in separate threads. I have  a rdd.persist after which the DAG
forks into multiple actions.
but I see that rdd caching is not happening  and the entire DAG is executed
twice ( once in each action) .

What am I missing?
Arya