I will traverse this Dataset to convert it to Arrow and send it to Tensorflow 
through Socket.
I tried to use toLocalIterator() to traverse the dataset instead of collect  to 
the driver, but toLocalIterator() will create a lot of jobs and will bring a 
lot of time consumption.

Best regards,
maqy

发件人: Michael Artz
发送时间: 2020年4月22日 16:09
收件人: maqy
抄送: user@spark.apache.org
主题: Re: Can I collect Dataset[Row] to driver without converting it to Array 
[Row]?

What would you do with it once you get it into driver in a Dataset[Row]?
Sent from my iPhone


On Apr 22, 2020, at 3:06 AM, maqy <454618...@qq.com> wrote:

 When the data is stored in the Dataset [Row] format, the memory usage is very 
small. 
 When I use collect () to collect data to the driver, each line of the dataset 
will be converted to Row and stored in an array, which will bring great memory 
overhead.
 So, can I collect Dataset[Row] to driver and keep its data format?
 
Best regards,
maqy
 

Reply via email to