zhengruifeng commented on pull request #30889: URL: https://github.com/apache/spark/pull/30889#issuecomment-749880352
Tree model will create datafrme with numPartitions=`defaultParallelism`, which will generate lots of small files; while other models like `LinearSVC` will repartition(1) before saving. It maybe nice to reduce the num of files in another ticket. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
