Hi,
I am a novice here and am not aware of the exact source for implementation.
Probably one of the core devs can answer it. But to my knowledge, it
implements an optimised version of CART. The information regarding the
algorithms and complexity can be found in
http://scikit-learn.org/stable/modul
Hello,
I was curious if anyone has an original source or paper from which the
decision trees were implemented in scikit learn. I see general references
for Elements of Statistical Learning and other references but no specific
mention of which version of algorithm is actually implemented.
I couldn
>>>> ValueError: Found array with 0 sample(s) (shape=(0, 491)) while a
> >>>> minimum of 1 is required.
> >>>>
> >>>> Now I don't understand this because when I print shapes of the
> samples:
> >>>>
&
>>>> print (X_traina.shape, X_testa.shape, y_traina.shape, y_testa.shape)
>>>>
>>>> I'm getting:
>>>>
>>>> ((78, 491), (1489, 491), (78,), (1489,))
>>>>
>>>> Interestingly, if I change the test_si
Shouldn't you pass labels (binary) instead of continuous data? If you wish
to stick to logK's and keep the distribution unchanged then you'd better
reduce the number of classes (eg round the values to nearest integer?).
It might be the case that the counts per class are floored and you get 0
for s
Hi Maciek,
Thanks for suggestion, I think the problem indeed is related to the
StratifiedKFold because if I use KFold instead the code works fine.
However, if I print StratifiedKFold object it looks fine to me:
sklearn.cross_validation.StratifiedKFold(labels=[ 5.43 8.74 8.1
6.55 7.66 6.52 8.