I need to cache the DataFrame for accelerating query.  In such case, the
two query may simultaneously run the DAG before cache data actually happen.

Sonal Goyal <sonalgoy...@gmail.com> 于2019年11月19日周二 下午9:46写道:

> the RDD or the dataframe is distributed and partitioned by Spark so as to
> leverage all your workers (CPUs) effectively. So all the Dataframe
> operations are actually happening simultaneously on a section of the data.
> Why do you want to use threading here?
>
> Thanks,
> Sonal
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
> On Tue, Nov 12, 2019 at 7:18 AM Chang Chen <baibaic...@gmail.com> wrote:
>
>>
>> Hi all
>>
>> I meet a case where I need cache a source RDD, and then create different
>> DataFrame from it in different threads to accelerate query.
>>
>> I know that SparkSession is thread safe(
>> https://issues.apache.org/jira/browse/SPARK-15135), but i am not sure
>> whether RDD  si thread safe or not
>>
>> Thanks
>> Chang
>>
>

Reply via email to