Re: [scikit-learn] Original source for DecisionTreeClassifier Implementation

2016-07-11 Thread Maniteja Nandana
Hi, I am a novice here and am not aware of the exact source for implementation. Probably one of the core devs can answer it. But to my knowledge, it implements an optimised version of CART. The information regarding the algorithms and complexity can be found in http://scikit-learn.org/stable/modul

[scikit-learn] Original source for DecisionTreeClassifier Implementation

2016-07-11 Thread Praveen Gollakota
Hello, I was curious if anyone has an original source or paper from which the decision trees were implemented in scikit learn. I see general references for Elements of Statistical Learning and other references but no specific mention of which version of algorithm is actually implemented. I couldn

Re: [scikit-learn] Bm25 pull request

2016-07-11 Thread Joel Nothman
>>>> ValueError: Found array with 0 sample(s) (shape=(0, 491)) while a > >>>> minimum of 1 is required. > >>>> > >>>> Now I don't understand this because when I print shapes of the > samples: > >>>> &

[scikit-learn] Bm25 pull request

2016-07-11 Thread Basil Beirouti
>>>> print (X_traina.shape, X_testa.shape, y_traina.shape, y_testa.shape) >>>> >>>> I'm getting: >>>> >>>> ((78, 491), (1489, 491), (78,), (1489,)) >>>> >>>> Interestingly, if I change the test_si

Re: [scikit-learn] Scikit learn GridSearchCV fit method ValueError Found array with 0 sample

2016-07-11 Thread Maciek Wójcikowski
Shouldn't you pass labels (binary) instead of continuous data? If you wish to stick to logK's and keep the distribution unchanged then you'd better reduce the number of classes (eg round the values to nearest integer?). It might be the case that the counts per class are floored and you get 0 for s

Re: [scikit-learn] Scikit learn GridSearchCV fit method ValueError Found array with 0 sample

2016-07-11 Thread Michał Nowotka
Hi Maciek, Thanks for suggestion, I think the problem indeed is related to the StratifiedKFold because if I use KFold instead the code works fine. However, if I print StratifiedKFold object it looks fine to me: sklearn.cross_validation.StratifiedKFold(labels=[ 5.43 8.74 8.1 6.55 7.66 6.52 8.