Hi There Spark Users,

Curious what is going on here.  Not sure if possible bug or missing something.  
Extra eyes are much appreciated.

Spark UI (Python API 2.4.3) by default is reporting persisted data-frames to be 
de-serialized MEMORY_AND_DISK however I always thought they were serialized for 
Python by default according to official documentation.
However when explicitly changing the storage level to default … ex => 
df.persist(StorageLevel.MEMORY_AND_DISK) … the Spark UI returns the expected 
serialized data-frame under Storage Tab, but not when just calling … df.cache().

Do we have to explicitly set to … StorageLevel.MEMORY_AND_DISK … to get the 
serialized benefit in Python (which I thought was automatic)?  Or is the Spark 
UI incorrect?

SO post with specific example/details => 
https://stackoverflow.com/questions/56926337/conflicting-pyspark-storage-level-defaults

Thank you for your time and research!
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to