Have you seen this thread ?
http://search-hadoop.com/m/q3RTtRbEiIXuOOS=Re+PySpark+issue+with+sortByKey+IndexError+list+index+out+of+range+
which led to SPARK-4384
On Mon, May 16, 2016 at 8:09 PM, kramer2...@126.com
wrote:
> I know the cache operation can cache data in
I know the cache operation can cache data in memoyr/disk...
But I am expecting to know will other operation will do the same?
For example, I created a dataframe called df. The df is big so when I run
some action like :
df.sort(column_name).show()
df.collect()
It will throw error like :