Is it possible to efficiently get at the branch statistics that decision tree algorithms iterate over in scikit?
For example if the root population has the class counts in the output vector: c0: 5000 c1: 500 Then I'd like to iterate over: # For a boolean (2 valued category) f1=True: c0=3000, c1=450 f1=False: c0=300, c1=30 f1=Null: c0=1700, c1=20 # ? Is considered? # For a continuous value f2<10: c0= ... c1= ... f2>=10: c0= ... c1= ... f2<22: c0= ... c1= ... f2>=22: c0= ... c1= ... I'd like to experiment with building models on-demand for each input row in a predict. To work efficiently, I'd like to reduce the training set to the 'most significant' sub-space(s) using the population statistics. I can do it in pandas, although its fairly inefficient to iterate over each feature column many times. Thanks, - Stu _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn