Re: [scikit-learn] 2 million samples dataset caused python and OS crash

Andrew Howe Fri, 08 Jan 2021 01:11:55 -0800

Doesn't seem like a sklearn issue, but an OS / hardware issue. Again, a
full stack trace would be useful information. Either way, you can try
training on a sample or via cross-validation. I believe some estimators can
also use incremental training.


Andrew

<~~~~~~~~~~~~~~~~~~~~~~~~~~~>
J. Andrew Howe, PhD
LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/>
Open Researcher and Contributor ID (ORCID)
<http://orcid.org/0000-0002-3553-1990>
Github Profile <http://github.com/ahowe42>
Personal Website <http://www.andrewhowe.com>
I live to learn, so I can learn to live. - me
<~~~~~~~~~~~~~~~~~~~~~~~~~~~>


On Fri, Jan 8, 2021 at 5:35 AM Liu James <[email protected]> wrote:

> Thanks for reply. I tested different size of data on different  distros
> ,and found when data is over 500 thousand rows (with 50 columns), the crash
> will happened with same error message -- kernel page error.
>
> Guillaume Lemaître <[email protected]> 于2021年1月6日周三 下午10:33写道：
>
>> And it seems that the piece of traceback refer to NumPy.
>>
>> On Wed, 6 Jan 2021 at 12:48, Andrew Howe <[email protected]> wrote:
>>
>>> A core dump generally happens when a process tries to access memory
>>> outside it's allocated address space. You've not specified what estimator
>>> you were using, but I'd guess it attempted to do something with the dataset
>>> that resulted in it being duplicated or otherwise expanded beyond the
>>> memory capacity. Perhaps the full stack trace would be helpful.
>>>
>>> Andrew
>>>
>>>
>>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>> J. Andrew Howe, PhD
>>> LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
>>> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/>
>>> Open Researcher and Contributor ID (ORCID)
>>> <http://orcid.org/0000-0002-3553-1990>
>>> Github Profile <http://github.com/ahowe42>
>>> Personal Website <http://www.andrewhowe.com>
>>> I live to learn, so I can learn to live. - me
>>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>>>
>>>
>>> On Wed, Jan 6, 2021 at 11:02 AM Liu James <[email protected]> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm using a medium dataset KDD99  IDS(
>>>> https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset)
>>>> for model training, and the dataset has 2 million  samples.  When using
>>>> fit_transform(), the OS crashed with log "Process 13851(python) of user xxx
>>>> dumped core. Stack trace
>>>> .../numpy/core/_multiarray_umath_cpython_36m_x86_64... ".
>>>>
>>>> The hardware: Centos 8, Intel i9, 128GB RAM, stack size is set
>>>> unlimited.  Such crash can be reproduced.
>>>>
>>>> Thanks.
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> [email protected]
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> [email protected]
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>
>>
>> --
>> Guillaume Lemaitre
>> Scikit-learn @ Inria Foundation
>> https://glemaitre.github.io/
>> _______________________________________________
>> scikit-learn mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] 2 million samples dataset caused python and OS crash

Reply via email to