[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-08-01 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/295
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/630/



---


[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-08-01 Thread fmcquillan99
Github user fmcquillan99 commented on the issue:

https://github.com/apache/madlib/pull/295
  
LGTM, here is an RF example:
```
SELECT * FROM mt_imp_output ORDER BY am, oob_var_importance DESC;
 am | feature | oob_var_importance | impurity_var_importance 
+-++-
  0 | cyl |   31.6266798136018 |8.99888201496216
  0 | disp|   21.3534749649495 |30.5938284017064
  0 | vs  |   20.2312669968611 |25.4855561460076
  0 | wt  |   16.3410741245189 |19.7783684870616
  0 | qsec|   10.4475041000687 |15.1433649502623
  1 | wt  |34.239597267579 |24.9348163610914
  1 | disp|   29.4316514472623 |31.1638455198447
  1 | cyl |   21.9435741528927 |20.1221371309527
  1 | vs  |   14.3851771322661 |17.5142973837102
  1 | qsec|  0 |6.26490360440106
(10 rows)
```


---


[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-26 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/295
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/610/



---


[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-25 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/295
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/601/



---


[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-24 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/295
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/592/



---


[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-23 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/295
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/591/



---


[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-20 Thread fmcquillan99
Github user fmcquillan99 commented on the issue:

https://github.com/apache/madlib/pull/295
  
I like this last suggestion from @iyerr3, that we report raw values for oob 
and impurity VI in the model output file.  (OK to keep the shifted oob > 0 as 
we do now.)

For the helper/reporting function, compute and report out the 
scaled/normalized values 0-100 for both oob and impurity VI.  These should 
always add up to 100 unless there is some corner cases, if so pls let us know.


---


[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-19 Thread njayaram2
Github user njayaram2 commented on the issue:

https://github.com/apache/madlib/pull/295
  
@fmcquillan only impurity, I don't think we scale oob to 100.


---


[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-19 Thread fmcquillan
Github user fmcquillan commented on the issue:

https://github.com/apache/madlib/pull/295
  
Would this apply to oob too?
Or just impurity?


---


[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-19 Thread iyerr3
Github user iyerr3 commented on the issue:

https://github.com/apache/madlib/pull/295
  
Considering the above situation, I suggest the variable importance values 
not be scaled to sum to 100. We can make the normalization within 
`get_var_importance` just for the reporting (which is the behavior in rpart). 
In other words, the output table would keep the original values (for DT and RF) 
but the helper function would rescale during the report for ease in reading the 
values. 


---


[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-19 Thread fmcquillan99
Github user fmcquillan99 commented on the issue:

https://github.com/apache/madlib/pull/295
  
Another run I got
```
grp 0   grp1
31.01364943 31.6576
22.85881741 33.3245
13.70257438 0
6.344527751 3.33304
26.0804244  11.6654
total   99.9336 79.9806
```
so this does seem to be about trees contributing or not.


---


[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-19 Thread fmcquillan99
Github user fmcquillan99 commented on the issue:

https://github.com/apache/madlib/pull/295
  
Should impurity_var_importance always add up to 100?
From the regression example in the user docs:

```
DROP TABLE IF EXISTS mt_imp_output;
SELECT madlib.get_var_importance('mt_cars_output','mt_imp_output');
SELECT am, impurity_var_importance FROM mt_imp_output ORDER BY am, 
impurity_var_importance DESC;
```
results in
```

 am | impurity_var_importance 
+-
  0 |35.7664683110879
  0 |24.7481977075922
  0 |12.4401197123678
  0 |12.1559096708347
  0 |4.88929809351791
  1 |31.7259035495099
  1 |29.6146492693988
  1 |14.9602257795489
  1 |7.01369118455985
  1 |6.68552870777581
(10 rows)
```
which does not add up to 100
```
grp 0   grp 1
35.76646831 31.72590355
24.74819771 29.61464927
12.44011971 14.96022578
12.15590967 7.013691185
4.889298094 6.685528708
total   89.935  89.9849
```


---


[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-19 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/295
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/580/



---


[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-19 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/295
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/579/



---


[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-18 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/295
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/576/



---


[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-18 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/madlib/pull/295
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/573/



---