Hi folks. I'm new to Scikit-learn.
I have a very large Python project that seems to have a heisenbug which is manifesting in scikit-learn code. Short of constructing an SSCCE, are there any magical techniques I should try for pinning down the precise cause? Like valgrind or something? An SSCCE will most likely be pretty painful: the project has copious shared, mutable state, and I've already tried a largish test program that calls into the same code path with the error manifesting 0 times in 100. It's quite possible the root cause will turn out to be some other part of the software stack. The traceback from pytest looks like: sequential/test_training.py:101: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../rt/classifier/coach.py:146: in train **self.classifier_section ../domain/classifier/factories/classifier_academy.py:115: in create_classifier **kwargs) ../domain/classifier/factories/imp/xgb_factory.py:164: in create clf_random.fit(X_train, y_train) ../../../../.local/lib/python3.6/site-packages/sklearn/model_selection/_search.py:722: in fit self._run_search(evaluate_candidates) ../../../../.local/lib/python3.6/site-packages/sklearn/model_selection/_search.py:1515: in _run_search random_state=self.random_state)) ../../../../.local/lib/python3.6/site-packages/sklearn/model_selection/_search.py:711: in evaluate_candidates cv.split(X, y, groups))) ../../../../.local/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py:996: in __call__ self.retrieve() ../../../../.local/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py:899: in retrieve self._output.extend(job.get(timeout=self.timeout)) ../../../../.local/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py:517: in wrap_future_result return future.result(timeout=timeout) /usr/lib/python3.6/concurrent/futures/_base.py:425: in result return self.__get_result() _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <Future at 0x7f15571ec7f0 state=finished raised ValueError> def __get_result(self): if self._exception: > raise self._exception E ValueError: Input contains NaN, infinity or a value too large for dtype('float32'). /usr/lib/python3.6/concurrent/futures/_base.py:384: ValueError The above exception is raised about 12 to 14 times in 100 in full-blown automated testing. Thanks for the cool software.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn