[scikit-learn] Decision stubs?

Stuart Reynolds Sun, 27 Aug 2017 15:26:56 -0700

Is it possible to efficiently get at the branch statistics that
decision tree algorithms iterate over in scikit?


For example if the root population has the class counts in the output vector:
   c0: 5000
   c1: 500

Then I'd like to iterate over:
# For a boolean (2 valued category)
   f1=True:      c0=3000,  c1=450
   f1=False:    c0=300,    c1=30
   f1=Null:       c0=1700,  c1=20  # ? Is considered?

# For a continuous value
   f2<10:         c0= ...  c1= ...
   f2>=10:         c0= ...  c1= ...

   f2<22:         c0= ...  c1= ...
   f2>=22:         c0= ...  c1= ...


I'd like to experiment with building models on-demand for each input
row in a predict.
To work efficiently, I'd like to reduce the training set to the 'most
significant' sub-space(s) using the population statistics.

I can do it in pandas, although its fairly inefficient to iterate over
each feature column many times.

Thanks,
- Stu
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

[scikit-learn] Decision stubs?

Reply via email to