Hi Jinxin,
Thanks for your suggestions, I will try to use foreachpartition later.
Best regards,
maqy
发件人: Tang Jinxin
发送时间: 2020年4月23日 7:31
收件人: maqy
抄送: Andrew Melo; user@spark.apache.org
主题: 回复:Can I collect Dataset[Row] to driver without converting it toArray [Row]?
Hi maqy,
Thanks
Hi Jinxin,
Thanks for your suggestions, I will try to use foreachpartition later.
Best regards,
maqy
发件人: Tang Jinxin
发送时间: 2020年4月23日 7:31
收件人: maqy
抄送: Andrew Melo; user@spark.apache.org
主题: 回复:Can I collect Dataset[Row] to driver without converting it toArray [Row]?
Hi maqy,
Thanks
Hi Jinxin,
Thanks for your suggestions, I will try to use foreachpartition later.
Best regards,
maqy
发件人: Tang Jinxin
发送时间: 2020年4月23日 7:31
收件人: maqy
抄送: Andrew Melo; user@spark.apache.org
主题: 回复:Can I collect Dataset[Row] to driver without converting it toArray [Row]?
Hi maqy,
Thanks
Hi maqy, Thanks for your question.Through consideration,I have some ideas as
follow:firstly,try not collect to driver if not nessessary,instead (use
foreachpartition)send data from ececutors;secondly,if not use some high
performance ser/deser like kryo, we could have a try.As a summary,I
Hi Andrew,
Thank you for your reply, I am using the scala api of spark, and the
tensorflow machine is not in the spark cluster. Is this JIRA / PR still valid
in this situation?
In addition, the current bottleneck of the application is that the amount of
data transferred through the