Github user rkarimi commented on the issue:

    https://github.com/apache/spark/pull/17972
  
    perhaps related: 
    Big Random Forest Models (example: 100 or more trees with depth of around 
20):
    
    
    
    Big models can be trained effectively even on machines with limited RAM 
(such as C series in AWS). However, they fail during .transform stage. My guess 
is that it tries to load model to RAM of individual worker nodes which do not 
have that much capacity. Above proposed fix (relying on DISK as well) can 
potentially fix this.
    
    An alternative fix is to load model tree by tree evaluate results and store 
evaluation/transform. This way, we may not need the Disk storage yet and it 
would better resemble the training process (also tree by tree).
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to