Hi, even the cached data has different memory for the dataframes with exactly the same data depending on a lot of conditions.
I generally tend to try to understand the problem before jumping into conclusions through assumptions, sadly a habit I cannot overcome. Is there a way to understand what is the person trying to achieve here by knowing the size of dataframe? Regards, Gourav On Fri, Dec 24, 2021 at 2:49 PM Sean Owen <sro...@gmail.com> wrote: > I assume it means size in memory when cached, which does make sense. > Fastest thing is to look at it in the UI Storage tab after it is cached. > > On Fri, Dec 24, 2021, 4:54 AM Gourav Sengupta <gourav.sengu...@gmail.com> > wrote: > >> Hi, >> >> This question, once again like the last one, does not make much sense at >> all. Where are you trying to store the data frame, and how? >> >> Are you just trying to write a blog, as you were mentioning in an earlier >> email, and trying to fill in some gaps? I think that the questions are >> entirely wrong. >> >> Regards, >> Gourav Sengupta >> >> On Fri, Dec 24, 2021 at 2:04 AM <bit...@bitfox.top> wrote: >> >>> Hello >>> >>> Is it possible to know a dataframe's total storage size in bytes? such >>> as: >>> >>> >>> df.size() >>> Traceback (most recent call last): >>> File "<stdin>", line 1, in <module> >>> File "/opt/spark/python/pyspark/sql/dataframe.py", line 1660, in >>> __getattr__ >>> "'%s' object has no attribute '%s'" % (self.__class__.__name__, >>> name)) >>> AttributeError: 'DataFrame' object has no attribute 'size' >>> >>> Sure it won't work. but if there is such a method that would be great. >>> >>> Thanks. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >>>