I would still look at your executor logs. A count() is rewritten by the
optimizer to be much more efficient because you don't actually need any of
the columns. Also, writing parquet allocates quite a few large buffers.
On Wed, Jul 1, 2015 at 5:42 AM, Pooja Jain wrote:
> Join is happening succe
By any chance, are you using time field in your df. Time fields are known
to be notorious in rdd conversion.
On Jul 1, 2015 6:13 PM, "Pooja Jain" wrote:
> Join is happening successfully as I am able to do count() after the join.
>
> Error is coming only while trying to write in parquet format on
Join is happening successfully as I am able to do count() after the join.
Error is coming only while trying to write in parquet format on hdfs.
Thanks,
Pooja.
On Wed, Jul 1, 2015 at 1:06 PM, Akhil Das
wrote:
> It says:
>
> Caused by: java.net.ConnectException: Connection refused: slave2/...:54
It says:
Caused by: java.net.ConnectException: Connection refused: slave2/...:54845
Could you look in the executor logs (stderr on slave2) and see what made it
shut down? Since you are doing a join there's a high possibility of OOM etc.
Thanks
Best Regards
On Wed, Jul 1, 2015 at 10:20 AM, Pooj
Hi,
We are using Spark 1.4.0 on hadoop using yarn-cluster mode via
spark-submit. We are facing parquet write issue after doing dataframe joins
We have a full data set and then an incremental data. We are reading them
as dataframes, joining them, and then writing the data to the hdfs system
in par