On training with a partial_fit function in scikit learn I get the following
error without the program terminating , how is that possible and what are
the repurcussions of this even though the trained model behaves correctly
and gives correct output. Is this something to worry about?

    /usr/lib/python2.7/dist-packages/sklearn/naive_bayes.py:207:
RuntimeWarning: divide by zero encountered in log
      self.class_log_prior_ = (np.log(self.class_count_)

I am using the following modified training function as I have to maintain a
constant list of labels\classes as the partial_fit does not allow adding
new classes\labels on subsequent runs , the class prior is same in each
batch of training data:



    class MySklearnClassifier(SklearnClassifier):
        def train(self, labeled_featuresets,classes=None, partial=True):
            """
            Train (fit) the scikit-learn estimator.

            :param labeled_featuresets: A list of ``(featureset, label)``
                where each ``featureset`` is a dict mapping strings to
either
                numbers, booleans or strings.
            """

            X, y = list(compat.izip(*labeled_featuresets))
            X = self._vectorizer.fit_transform(X)
            y = self._encoder.fit_transform(y)



            if partial:
                classes=self._encoder.fit_transform(list(set(classes)))
                self._clf.partial_fit(X, y, classes=list(set(classes)))
            else:
                self._clf.fit(X, y)

            return self



Also on the second call to partial_fit it throws following error for class
count=2000 , and training samples are 3592 on calling  model =
self.train(featureset, classes=labels,partial=partial):

    self._clf.partial_fit(X, y, classes=list(set(classes)))
      File "/usr/lib/python2.7/dist-packages/sklearn/naive_bayes.py", line
277, in partial_fit
        self._count(X, Y)
      File "/usr/lib/python2.7/dist-packages/sklearn/naive_bayes.py", line
443, in _count
        self.feature_count_ += safe_sparse_dot(Y.T, X)
    ValueError: operands could not be broadcast together with shapes
(2000,11430) (2000,10728) (2000,11430)
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to