Dear all,

I am trying to run a linear_model.SGDClassifier() and have it update after
every example it classifies.
My code works for a small feature file (10 features), but when I give it a
bigger feature file (some 80000 features, but very sparse) it keeps giving
me errors straight away, the first time partial_fit() is called.

This is what I do in pseudocode:

X, y = load_svmlight_file(train_file)
classifier = linear_model.SGDClassifier()
classifier.fit(X, y)

for every test_line in test file:
  test_X, test_y = getFeatures(test_line)
  # This gives me a Python list for X
  # and an integer label for y

  print "prediction: %f" % = classifier.predict([test_X])

  classifier.partial_fit(csr_matrix([test_X]),
                         csr_matrix([Y_GroundTruth])
                         classes=np.unique(y) )

The error I keep getting for the partial_fit() line is:

  File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
line 487, in partial_fit
    coef_init=None, intercept_init=None)
  File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
line 371, in _partial_fit
    sample_weight=sample_weight, n_iter=n_iter)
  File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
line 451, in _fit_multiclass
    for i in range(len(self.classes_)))
  File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
line 517, in __call__
    self.dispatch(function, args, kwargs)
  File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
line 312, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
line 136, in __init__
    self.results = func(*args, **kwargs)
  File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
line 284, in fit_binary
    est.power_t, est.t_, intercept_decay)
  File "sgd_fast.pyx", line 327, in sklearn.linear_model.sgd_fast.plain_sgd
(sklearn/linear_model/sgd_fast.c:7568)
ValueError: ndarray is not C-contiguous

I also tried feeding partial.fit() Python arrays, or numpy arrays (which
are C-contiguous (sort=C) by default, I thought), but this gives the same
result.
The classes attribute is not the problem I think. The same error appears if
I leave it out or if I give the right classes in hard code.

I do notice that when I print the flags of the _coef array of the
classifier, it says:

Flags of coef_ array:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

I am sure I am doing something wrong, but really, I don't see what...

Any help appreciated!

Cheers,

Tom
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to