[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/295 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/630/ ---
[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/295 LGTM, here is an RF example: ``` SELECT * FROM mt_imp_output ORDER BY am, oob_var_importance DESC; am | feature | oob_var_importance | impurity_var_importance +-++- 0 | cyl | 31.6266798136018 |8.99888201496216 0 | disp| 21.3534749649495 |30.5938284017064 0 | vs | 20.2312669968611 |25.4855561460076 0 | wt | 16.3410741245189 |19.7783684870616 0 | qsec| 10.4475041000687 |15.1433649502623 1 | wt |34.239597267579 |24.9348163610914 1 | disp| 29.4316514472623 |31.1638455198447 1 | cyl | 21.9435741528927 |20.1221371309527 1 | vs | 14.3851771322661 |17.5142973837102 1 | qsec| 0 |6.26490360440106 (10 rows) ``` ---
[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/295 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/610/ ---
[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/295 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/601/ ---
[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/295 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/592/ ---
[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/295 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/591/ ---
[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/295 I like this last suggestion from @iyerr3, that we report raw values for oob and impurity VI in the model output file. (OK to keep the shifted oob > 0 as we do now.) For the helper/reporting function, compute and report out the scaled/normalized values 0-100 for both oob and impurity VI. These should always add up to 100 unless there is some corner cases, if so pls let us know. ---
[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...
Github user njayaram2 commented on the issue: https://github.com/apache/madlib/pull/295 @fmcquillan only impurity, I don't think we scale oob to 100. ---
[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...
Github user fmcquillan commented on the issue: https://github.com/apache/madlib/pull/295 Would this apply to oob too? Or just impurity? ---
[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...
Github user iyerr3 commented on the issue: https://github.com/apache/madlib/pull/295 Considering the above situation, I suggest the variable importance values not be scaled to sum to 100. We can make the normalization within `get_var_importance` just for the reporting (which is the behavior in rpart). In other words, the output table would keep the original values (for DT and RF) but the helper function would rescale during the report for ease in reading the values. ---
[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/295 Another run I got ``` grp 0 grp1 31.01364943 31.6576 22.85881741 33.3245 13.70257438 0 6.344527751 3.33304 26.0804244 11.6654 total 99.9336 79.9806 ``` so this does seem to be about trees contributing or not. ---
[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/295 Should impurity_var_importance always add up to 100? From the regression example in the user docs: ``` DROP TABLE IF EXISTS mt_imp_output; SELECT madlib.get_var_importance('mt_cars_output','mt_imp_output'); SELECT am, impurity_var_importance FROM mt_imp_output ORDER BY am, impurity_var_importance DESC; ``` results in ``` am | impurity_var_importance +- 0 |35.7664683110879 0 |24.7481977075922 0 |12.4401197123678 0 |12.1559096708347 0 |4.88929809351791 1 |31.7259035495099 1 |29.6146492693988 1 |14.9602257795489 1 |7.01369118455985 1 |6.68552870777581 (10 rows) ``` which does not add up to 100 ``` grp 0 grp 1 35.76646831 31.72590355 24.74819771 29.61464927 12.44011971 14.96022578 12.15590967 7.013691185 4.889298094 6.685528708 total 89.935 89.9849 ``` ---
[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/295 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/580/ ---
[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/295 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/579/ ---
[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/295 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/576/ ---
[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...
Github user asfgit commented on the issue: https://github.com/apache/madlib/pull/295 Refer to this link for build results (access rights to CI server needed): https://builds.apache.org/job/madlib-pr-build/573/ ---