Re: Does spark read the same file twice, if two stages are using the same DataFrame?

2023-05-07 Thread Nitin Siwach
Thank you for the help Mich :) I have not started with a pandas DF. I have used pandas to create a dummy .csv which I dump on the disk that I intend to use to showcase my pain point. Providing pandas code was to ensure an end-to-end runnable example is provided and the effort on anyone trying to h

Re: Does spark read the same file twice, if two stages are using the same DataFrame?

2023-05-07 Thread Mich Talebzadeh
You have started with panda DF which won't scale outside of the driver itself. Let us put that aside. df1.to_csv("./df1.csv",index_label = "index") ## write the dataframe to the underlying file system starting with spark df1 = spark.read.csv("./df1.csv", header=True, schema = schema) ## read th

Re: Does spark read the same file twice, if two stages are using the same DataFrame?

2023-05-07 Thread Nitin Siwach
Thank you for your response, Sir. My understanding is that the final ```df3.count()``` is the only action in the code I have attached. In fact, I tried running the rest of the code (commenting out just the final df3.count()) and, as I expected, no computations were triggered On Sun, 7 May, 2023,

Re: Does spark read the same file twice, if two stages are using the same DataFrame?

2023-05-07 Thread Mich Talebzadeh
...However, In my case here I am calling just one action. .. ok, which line in your code is called one action? Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile

Re: Does spark read the same file twice, if two stages are using the same DataFrame?

2023-05-07 Thread Nitin Siwach
@Vikas Kumar I am sorry but I thought that you had answered the other question that I had raised to the same email address yesterday. It was around the SQL tab in web UI and the output of .explain showing different plans. I get how using .cache I can ensure that the data from a particular checkpo

unsubscribe

2023-05-07 Thread Utkarsh Jain