Doesn't seem like a sklearn issue, but an OS / hardware issue. Again, a full stack trace would be useful information. Either way, you can try training on a sample or via cross-validation. I believe some estimators can also use incremental training.
Andrew <~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD LinkedIn Profile <http://www.linkedin.com/in/ahowe42> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/> Open Researcher and Contributor ID (ORCID) <http://orcid.org/0000-0002-3553-1990> Github Profile <http://github.com/ahowe42> Personal Website <http://www.andrewhowe.com> I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~> On Fri, Jan 8, 2021 at 5:35 AM Liu James <icefrog1...@gmail.com> wrote: > Thanks for reply. I tested different size of data on different distros > ,and found when data is over 500 thousand rows (with 50 columns), the crash > will happened with same error message -- kernel page error. > > Guillaume Lemaître <g.lemaitr...@gmail.com> 于2021年1月6日周三 下午10:33写道: > >> And it seems that the piece of traceback refer to NumPy. >> >> On Wed, 6 Jan 2021 at 12:48, Andrew Howe <ahow...@gmail.com> wrote: >> >>> A core dump generally happens when a process tries to access memory >>> outside it's allocated address space. You've not specified what estimator >>> you were using, but I'd guess it attempted to do something with the dataset >>> that resulted in it being duplicated or otherwise expanded beyond the >>> memory capacity. Perhaps the full stack trace would be helpful. >>> >>> Andrew >>> >>> >>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >>> J. Andrew Howe, PhD >>> LinkedIn Profile <http://www.linkedin.com/in/ahowe42> >>> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/> >>> Open Researcher and Contributor ID (ORCID) >>> <http://orcid.org/0000-0002-3553-1990> >>> Github Profile <http://github.com/ahowe42> >>> Personal Website <http://www.andrewhowe.com> >>> I live to learn, so I can learn to live. - me >>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >>> >>> >>> On Wed, Jan 6, 2021 at 11:02 AM Liu James <icefrog1...@gmail.com> wrote: >>> >>>> Hi all, >>>> >>>> I'm using a medium dataset KDD99 IDS( >>>> https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset) >>>> for model training, and the dataset has 2 million samples. When using >>>> fit_transform(), the OS crashed with log "Process 13851(python) of user xxx >>>> dumped core. Stack trace >>>> .../numpy/core/_multiarray_umath_cpython_36m_x86_64... ". >>>> >>>> The hardware: Centos 8, Intel i9, 128GB RAM, stack size is set >>>> unlimited. Such crash can be reproduced. >>>> >>>> Thanks. >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn@python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> >> -- >> Guillaume Lemaitre >> Scikit-learn @ Inria Foundation >> https://glemaitre.github.io/ >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn