zhengruifeng commented on pull request #31090:
URL: https://github.com/apache/spark/pull/31090#issuecomment-760122648


   I just create another rf model with 10 trees and totally 2,789,824 nodes:
   ```
   scala> rfcm.trees.length
   res3: Int = 10
   
   scala> rfcm.trees.map(_.numNodes).sum
   res4: Int = 2789824
   
   scala> rfcm.save("/tmp/rfcm")
   ```
   
   save it to disk and its size is 49M.
   ```
   du -sh /tmp/rfcm 
   49M  /tmp/rfcm
   ```
   
   Since the model size is in propotion to number of nodes, so what about 
determine the number of paraitions by a formula like `numNodes / 1,000,000`?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to