Hi Everyone,

This is my first post in the list. I have been using scikit-learn actively
for the last six month in my M.Sc. thesis and now at my new job  I want to
use it for some tasks. I hope I can eventually become collaborator to the
project.

But lets start with a question :) - I wasn't sure if I should use
StackOverflow for this. Please let me know if it so.

I am using Scikit-learn for doing some multilabel classificaiton. I was
trying to use both 0.14 and master. However, when using master I get an
error. Even when using MultilabelBinarizer.

So here's the code working in 0.14.

#I instantiate the label binarizer to get the possible labels
lb = LabelBinarizer().fit()

# then I transfor the existing values (list of possible labels)
y_train =  lb.transform(y_val)


svm = LinearSVC()

ovr_svm = OneVsRestClassifier(svm)

C_range = 2.0 ** np.arange(-2, 7)

param_grid = dict(estimator__C=C_range)

grid = GridSearchCV(estimator=ovr_svm,
                    param_grid=param_grid,
                    n_jobs=1,
                    scoring='f1',
                    cv=StratifiedKFold(y=y_train, n_folds=3),
                    verbose=2)

grid.fit(X_train, y_train)

# This works OK, however when switching to 0.15 and using
MultilabelBinarizer I get the following error:



/Users/miguel/anaconda/envs/hclassifier/lib/python2.7/site-packages/sklearn/cross_validation.pyc
in __init__(self, y, n_folds, indices, shuffle, random_state)
    427         for test_fold_idx, per_label_splits in
enumerate(zip(*per_label_cvs)):
    428             for label, (_, test_split) in zip(unique_labels,
per_label_splits):
--> 429                 label_test_folds = test_folds[y == label]
    430                 # the test split can be too big because we used
    431                 # KFold(max(c, self.n_folds), self.n_folds) instead
of

ValueError: boolean index array should have 1 dimension


I have not been following the development of the 0.15 but based on the last
e-mails there was some changes on the Multilabel representation. Maybe is
related? What should I change to make my code work for 0.15?


Thanks in advance,


Cheers
-- 
Miguel Cabrera
http://mfcabrera.com
"A los hombres fuertes les pasa lo que a los barriletes; se elevan cuando es
mayor el viento que se opone a su ascenso." - José Ingenieros
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to