updating dataframe returns NEW dataframe like RDD please? ---Original--- From: "vincent gromakowski"<vincent.gromakow...@gmail.com> Date: 2017/2/14 01:15:35 To: "Reynold Xin"<r...@databricks.com>; Cc: "user"<user@spark.apache.org>;"Mendelson, Assaf"<assaf.mendel...@rsa.com>; Subject: Re: is dataframe thread safe?
How about having a thread that update and cache a dataframe in-memory next to other threads requesting this dataframe, is it thread safe ? 2017-02-13 9:02 GMT+01:00 Reynold Xin <r...@databricks.com>: Yes your use case should be fine. Multiple threads can transform the same data frame in parallel since they create different data frames.  On Sun, Feb 12, 2017 at 9:07 AM Mendelson, Assaf <assaf.mendel...@rsa.com> wrote: Hi, I was wondering if dataframe is considered thread safe. I know the spark session and spark context are thread safe (and actually have tools to manage jobs from different threads) but the question is, can I use the same dataframe in both threads. The idea would be to create a dataframe in the main thread and then in two sub threads do different transformations and actions on it. I understand that some things might not be thread safe (e.g. if I unpersist in one thread it would affect the other. Checkpointing would cause similar issues), however, I can??t find any documentation as to what operations (if any) are thread safe.   Thanks,                 Assaf.