It's looking, at this point, like: 1) The NaN's are real 2) They're coming from some XGBoost native code, or perhaps a Python<->native boundary, which is interfacing using ctypes.
The print's that didn't print were probably because of a misplaced flush. The debugger that didn't debug was probably because of pytest capturing stdout and async python code. Thanks. On Wed, Dec 18, 2019 at 4:09 PM Dan Stromberg <dstromb...@grokstream.com> wrote: > > Any (further) suggestions folks? > > BTW, when I say pudb fails to start, I mean it's tracebacking trying to > get None.fileno() In other pieces of (C)Python code I've tried it in, > pudb.set_trace() worked nicely. > > On Tue, Dec 17, 2019 at 7:50 AM Dan Stromberg <dstromb...@grokstream.com> > wrote: > >> >> Hi. >> >> Overflow does sound kind of possible. We're sending semi-random values >> to the test. >> >> I believe our systems are all x86_64, Linux. Some are Ubuntu 16.04, some >> are Mint 19.2. >> >> I realized on the way to work this morning, that I left out some >> important information; I suspect a heisenbug for 3 reasons: >> >> 1) If I try to look at it with print functions, I get a traceback after >> the print's, but no print output. This happens with both writing to a >> disk-based file, and with printing to stdout. >> >> 2) If I try to look at it with pudb (a debugger) via pudb.set_trace(), I >> get a failure to start pudb. >> >> 3) If I create a small test program that sends the same inputs to the >> function in question, the function works fine. >> >> Thanks. >> >> On Mon, Dec 16, 2019 at 11:20 PM Joel Nothman <joel.noth...@gmail.com> >> wrote: >> >>> Hi Dan, this kind of error can come from overflow. Are all of your test >>> systems the same architecture? >>> >>> On Tue., 17 Dec. 2019, 12:03 pm Dan Stromberg, < >>> dstromb...@grokstream.com> wrote: >>> >>>> Hi folks. >>>> >>>> I'm new to Scikit-learn. >>>> >>>> I have a very large Python project that seems to have a heisenbug which >>>> is manifesting in scikit-learn code. >>>> >>>> Short of constructing an SSCCE, are there any magical techniques I >>>> should try for pinning down the precise cause? Like valgrind or something? >>>> >>>> An SSCCE will most likely be pretty painful: the project has copious >>>> shared, mutable state, and I've already tried a largish test program that >>>> calls into the same code path with the error manifesting 0 times in 100. >>>> >>>> It's quite possible the root cause will turn out to be some other part >>>> of the software stack. >>>> >>>> The traceback from pytest looks like: >>>> sequential/test_training.py:101: >>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >>>> _ _ _ _ _ _ _ _ _ _ _ _ _ >>>> ../rt/classifier/coach.py:146: in train >>>> **self.classifier_section >>>> ../domain/classifier/factories/classifier_academy.py:115: in >>>> create_classifier >>>> **kwargs) >>>> ../domain/classifier/factories/imp/xgb_factory.py:164: in create >>>> clf_random.fit(X_train, y_train) >>>> ../../../../.local/lib/python3.6/site-packages/sklearn/model_selection/_search.py:722: >>>> in fit >>>> self._run_search(evaluate_candidates) >>>> ../../../../.local/lib/python3.6/site-packages/sklearn/model_selection/_search.py:1515: >>>> in _run_search >>>> random_state=self.random_state)) >>>> ../../../../.local/lib/python3.6/site-packages/sklearn/model_selection/_search.py:711: >>>> in evaluate_candidates >>>> cv.split(X, y, groups))) >>>> ../../../../.local/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py:996: >>>> in __call__ >>>> self.retrieve() >>>> ../../../../.local/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py:899: >>>> in retrieve >>>> self._output.extend(job.get(timeout=self.timeout)) >>>> ../../../../.local/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py:517: >>>> in wrap_future_result >>>> return future.result(timeout=timeout) >>>> /usr/lib/python3.6/concurrent/futures/_base.py:425: in result >>>> return self.__get_result() >>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >>>> _ _ _ _ _ _ _ _ _ _ _ _ _ >>>> >>>> self = <Future at 0x7f15571ec7f0 state=finished raised ValueError> >>>> >>>> def __get_result(self): >>>> if self._exception: >>>> > raise self._exception >>>> E ValueError: Input contains NaN, infinity or a value too >>>> large for dtype('float32'). >>>> >>>> /usr/lib/python3.6/concurrent/futures/_base.py:384: ValueError >>>> >>>> >>>> The above exception is raised about 12 to 14 times in 100 in full-blown >>>> automated testing. >>>> >>>> Thanks for the cool software. >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn@python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn