Hi Tom,
that's a bug - I'll open a ticket for it.
A quick fix: call partial_fit instead of fit just before the ``for`` loop.
- Peter
2013/10/4 Tom Kenter <[email protected]>
> Dear all,
>
> I am trying to run a linear_model.SGDClassifier() and have it update after
> every example it classifies.
> My code works for a small feature file (10 features), but when I give it a
> bigger feature file (some 80000 features, but very sparse) it keeps giving
> me errors straight away, the first time partial_fit() is called.
>
> This is what I do in pseudocode:
>
> X, y = load_svmlight_file(train_file)
> classifier = linear_model.SGDClassifier()
> classifier.fit(X, y)
>
> for every test_line in test file:
> test_X, test_y = getFeatures(test_line)
> # This gives me a Python list for X
> # and an integer label for y
>
> print "prediction: %f" % = classifier.predict([test_X])
>
> classifier.partial_fit(csr_matrix([test_X]),
> csr_matrix([Y_GroundTruth])
> classes=np.unique(y) )
>
> The error I keep getting for the partial_fit() line is:
>
> File
> "/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
> line 487, in partial_fit
> coef_init=None, intercept_init=None)
> File
> "/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
> line 371, in _partial_fit
> sample_weight=sample_weight, n_iter=n_iter)
> File
> "/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
> line 451, in _fit_multiclass
> for i in range(len(self.classes_)))
> File
> "/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
> line 517, in __call__
> self.dispatch(function, args, kwargs)
> File
> "/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
> line 312, in dispatch
> job = ImmediateApply(func, args, kwargs)
> File
> "/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
> line 136, in __init__
> self.results = func(*args, **kwargs)
> File
> "/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
> line 284, in fit_binary
> est.power_t, est.t_, intercept_decay)
> File "sgd_fast.pyx", line 327, in
> sklearn.linear_model.sgd_fast.plain_sgd
> (sklearn/linear_model/sgd_fast.c:7568)
> ValueError: ndarray is not C-contiguous
>
> I also tried feeding partial.fit() Python arrays, or numpy arrays (which
> are C-contiguous (sort=C) by default, I thought), but this gives the same
> result.
> The classes attribute is not the problem I think. The same error appears
> if I leave it out or if I give the right classes in hard code.
>
> I do notice that when I print the flags of the _coef array of the
> classifier, it says:
>
> Flags of coef_ array:
> C_CONTIGUOUS : False
> F_CONTIGUOUS : True
> OWNDATA : True
> WRITEABLE : True
> ALIGNED : True
> UPDATEIFCOPY : False
>
> I am sure I am doing something wrong, but really, I don't see what...
>
> Any help appreciated!
>
> Cheers,
>
> Tom
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
--
Peter Prettenhofer
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general