Here are the inputs to _assert_all_finite() on one specific failed run. They look finite to me: X: array([0.6150936 , 0.24652782, 0.8880004 , 0.2016928 , 0.80948585, 0.10764928, 0.81631166, 0.25909033, 0.9299345 , 0.10186833, 0.81581795, 0.21659133, 0.8279047 , 0.11432098, 0.7335735 , 0.20154186, 0.85112196, 0.17447269, 0.5934462 , 0.3967309 , 0.83702815, 0.35380727, 0.75063705, 0.32200715, 0.85112196, 0.11191818, 0.6814021 , 0.11622761, 0.851942 , 0.1892652 , 0.8554932 , 0.17869748], dtype=float32) allow_nan: False
On Tue, Dec 17, 2019 at 7:50 AM Dan Stromberg <dstromb...@grokstream.com> wrote: > > Hi. > > Overflow does sound kind of possible. We're sending semi-random values to > the test. > > I believe our systems are all x86_64, Linux. Some are Ubuntu 16.04, some > are Mint 19.2. > > I realized on the way to work this morning, that I left out some important > information; I suspect a heisenbug for 3 reasons: > > 1) If I try to look at it with print functions, I get a traceback after > the print's, but no print output. This happens with both writing to a > disk-based file, and with printing to stdout. > > 2) If I try to look at it with pudb (a debugger) via pudb.set_trace(), I > get a failure to start pudb. > > 3) If I create a small test program that sends the same inputs to the > function in question, the function works fine. > > Thanks. > > On Mon, Dec 16, 2019 at 11:20 PM Joel Nothman <joel.noth...@gmail.com> > wrote: > >> Hi Dan, this kind of error can come from overflow. Are all of your test >> systems the same architecture? >> >> On Tue., 17 Dec. 2019, 12:03 pm Dan Stromberg, <dstromb...@grokstream.com> >> wrote: >> >>> Hi folks. >>> >>> I'm new to Scikit-learn. >>> >>> I have a very large Python project that seems to have a heisenbug which >>> is manifesting in scikit-learn code. >>> >>> Short of constructing an SSCCE, are there any magical techniques I >>> should try for pinning down the precise cause? Like valgrind or something? >>> >>> An SSCCE will most likely be pretty painful: the project has copious >>> shared, mutable state, and I've already tried a largish test program that >>> calls into the same code path with the error manifesting 0 times in 100. >>> >>> It's quite possible the root cause will turn out to be some other part >>> of the software stack. >>> >>> The traceback from pytest looks like: >>> sequential/test_training.py:101: >>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >>> _ _ _ _ _ _ _ _ _ _ _ _ _ >>> ../rt/classifier/coach.py:146: in train >>> **self.classifier_section >>> ../domain/classifier/factories/classifier_academy.py:115: in >>> create_classifier >>> **kwargs) >>> ../domain/classifier/factories/imp/xgb_factory.py:164: in create >>> clf_random.fit(X_train, y_train) >>> ../../../../.local/lib/python3.6/site-packages/sklearn/model_selection/_search.py:722: >>> in fit >>> self._run_search(evaluate_candidates) >>> ../../../../.local/lib/python3.6/site-packages/sklearn/model_selection/_search.py:1515: >>> in _run_search >>> random_state=self.random_state)) >>> ../../../../.local/lib/python3.6/site-packages/sklearn/model_selection/_search.py:711: >>> in evaluate_candidates >>> cv.split(X, y, groups))) >>> ../../../../.local/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py:996: >>> in __call__ >>> self.retrieve() >>> ../../../../.local/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py:899: >>> in retrieve >>> self._output.extend(job.get(timeout=self.timeout)) >>> ../../../../.local/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py:517: >>> in wrap_future_result >>> return future.result(timeout=timeout) >>> /usr/lib/python3.6/concurrent/futures/_base.py:425: in result >>> return self.__get_result() >>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >>> _ _ _ _ _ _ _ _ _ _ _ _ _ >>> >>> self = <Future at 0x7f15571ec7f0 state=finished raised ValueError> >>> >>> def __get_result(self): >>> if self._exception: >>> > raise self._exception >>> E ValueError: Input contains NaN, infinity or a value too >>> large for dtype('float32'). >>> >>> /usr/lib/python3.6/concurrent/futures/_base.py:384: ValueError >>> >>> >>> The above exception is raised about 12 to 14 times in 100 in full-blown >>> automated testing. >>> >>> Thanks for the cool software. >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn