Dear all,
I am trying to run a linear_model.SGDClassifier() and have it update after
every example it classifies.
My code works for a small feature file (10 features), but when I give it a
bigger feature file (some 80000 features, but very sparse) it keeps giving
me errors straight away, the first time partial_fit() is called.
This is what I do in pseudocode:
X, y = load_svmlight_file(train_file)
classifier = linear_model.SGDClassifier()
classifier.fit(X, y)
for every test_line in test file:
test_X, test_y = getFeatures(test_line)
# This gives me a Python list for X
# and an integer label for y
print "prediction: %f" % = classifier.predict([test_X])
classifier.partial_fit(csr_matrix([test_X]),
csr_matrix([Y_GroundTruth])
classes=np.unique(y) )
The error I keep getting for the partial_fit() line is:
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
line 487, in partial_fit
coef_init=None, intercept_init=None)
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
line 371, in _partial_fit
sample_weight=sample_weight, n_iter=n_iter)
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
line 451, in _fit_multiclass
for i in range(len(self.classes_)))
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
line 517, in __call__
self.dispatch(function, args, kwargs)
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
line 312, in dispatch
job = ImmediateApply(func, args, kwargs)
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
line 136, in __init__
self.results = func(*args, **kwargs)
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
line 284, in fit_binary
est.power_t, est.t_, intercept_decay)
File "sgd_fast.pyx", line 327, in sklearn.linear_model.sgd_fast.plain_sgd
(sklearn/linear_model/sgd_fast.c:7568)
ValueError: ndarray is not C-contiguous
I also tried feeding partial.fit() Python arrays, or numpy arrays (which
are C-contiguous (sort=C) by default, I thought), but this gives the same
result.
The classes attribute is not the problem I think. The same error appears if
I leave it out or if I give the right classes in hard code.
I do notice that when I print the flags of the _coef array of the
classifier, it says:
Flags of coef_ array:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
I am sure I am doing something wrong, but really, I don't see what...
Any help appreciated!
Cheers,
Tom
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general