I will traverse this Dataset to convert it to Arrow and send it to Tensorflow through Socket. I tried to use toLocalIterator() to traverse the dataset instead of collect to the driver, but toLocalIterator() will create a lot of jobs and will bring a lot of time consumption.
Best regards, maqy 发件人: Michael Artz 发送时间: 2020年4月22日 16:09 收件人: maqy 抄送: user@spark.apache.org 主题: Re: Can I collect Dataset[Row] to driver without converting it to Array [Row]? What would you do with it once you get it into driver in a Dataset[Row]? Sent from my iPhone On Apr 22, 2020, at 3:06 AM, maqy <454618...@qq.com> wrote: When the data is stored in the Dataset [Row] format, the memory usage is very small. When I use collect () to collect data to the driver, each line of the dataset will be converted to Row and stored in an array, which will bring great memory overhead. So, can I collect Dataset[Row] to driver and keep its data format? Best regards, maqy