I had a similar situation and the solution I came up with was calculating
the standard deviation of the predictions of all the individual trees.
I found that when I trained my regressor on the lower half of my data, then
used the model to predict the upper half of my data my model generally
return
Update: I messed up with my training set (I included a variable I shouldn’t
have) and am now getting more reasonable results (score = .634)
My question about predicting error still stands, however. I should be able to
train a classifier on the error (now that I’ve got enough that are wrong) but
I got ExtraTreesRegressor running on IPython.parallel (Pyrallel doesn’t work
for me but the example at
http://nbviewer.ipython.org/github/ogrisel/notebooks/blob/master/Distributed%20Learning%20of%20Extra%20Trees%20with%20IPython.parallel.ipynbdid).
Now I’d like to be able to predict my error (i