[GitHub] spark issue #18904: [SPARK-21624]optimzie RF communicaiton cost

mpjlu Mon, 15 Jan 2018 07:10:47 -0800

Github user mpjlu commented on the issue:

    https://github.com/apache/spark/pull/18904
  
    This is another case.
    Table 1 shows the improvement of random tree algorithm with sparse 
expression. We can see that when we use sparse expression, I/O can be reduced 
by 61% and total run time can be reduced by 39%. The dataset has 100k samples 
and 10k features in Gaussian distribution and its number of partitions is 300. 
The max depth of RF is 17 and number of bins is 40.
    
![image](https://user-images.githubusercontent.com/13826327/34948723-f1f0a262-fa48-11e7-860b-b744daf6196d.png)
    
    Only when the network is a bottleneck, this optimization will work better.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #18904: [SPARK-21624]optimzie RF communicaiton cost

Reply via email to