Hi,

even the cached data has different memory for the dataframes with exactly
the same data depending on a lot of conditions.

I generally tend to try to understand the problem before jumping into
conclusions through assumptions, sadly a habit I cannot overcome.

Is there a way to understand what is the person trying to achieve here by
knowing the size of dataframe?



Regards,
Gourav

On Fri, Dec 24, 2021 at 2:49 PM Sean Owen <sro...@gmail.com> wrote:

> I assume it means size in memory when cached, which does make sense.
> Fastest thing is to look at it in the UI Storage tab after it is cached.
>
> On Fri, Dec 24, 2021, 4:54 AM Gourav Sengupta <gourav.sengu...@gmail.com>
> wrote:
>
>> Hi,
>>
>> This question, once again like the last one, does not make much sense at
>> all. Where are you trying to store the data frame, and how?
>>
>> Are you just trying to write a blog, as you were mentioning in an earlier
>> email, and trying to fill in some gaps? I think that the questions are
>> entirely wrong.
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Fri, Dec 24, 2021 at 2:04 AM <bit...@bitfox.top> wrote:
>>
>>> Hello
>>>
>>> Is it possible to know a dataframe's total storage size in bytes? such
>>> as:
>>>
>>> >>> df.size()
>>> Traceback (most recent call last):
>>>    File "<stdin>", line 1, in <module>
>>>    File "/opt/spark/python/pyspark/sql/dataframe.py", line 1660, in
>>> __getattr__
>>>      "'%s' object has no attribute '%s'" % (self.__class__.__name__,
>>> name))
>>> AttributeError: 'DataFrame' object has no attribute 'size'
>>>
>>> Sure it won't work. but if there is such a method that would be great.
>>>
>>> Thanks.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>

Reply via email to