[scikit-learn] 2 million samples dataset caused python and OS crash

2021-01-06 Thread Liu James
Hi all,

I'm using a medium dataset KDD99  IDS(
https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset)
for model training, and the dataset has 2 million  samples.  When using
fit_transform(), the OS crashed with log "Process 13851(python) of user xxx
dumped core. Stack trace
.../numpy/core/_multiarray_umath_cpython_36m_x86_64... ".

The hardware: Centos 8, Intel i9, 128GB RAM, stack size is set unlimited.
Such crash can be reproduced.

Thanks.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] 2 million samples dataset caused python and OS crash

2021-01-06 Thread Andrew Howe
A core dump generally happens when a process tries to access memory outside
it's allocated address space. You've not specified what estimator you were
using, but I'd guess it attempted to do something with the dataset that
resulted in it being duplicated or otherwise expanded beyond the memory
capacity. Perhaps the full stack trace would be helpful.

Andrew


<~~~>
J. Andrew Howe, PhD
LinkedIn Profile 
ResearchGate Profile 
Open Researcher and Contributor ID (ORCID)

Github Profile 
Personal Website 
I live to learn, so I can learn to live. - me
<~~~>


On Wed, Jan 6, 2021 at 11:02 AM Liu James  wrote:

> Hi all,
>
> I'm using a medium dataset KDD99  IDS(
> https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset)
> for model training, and the dataset has 2 million  samples.  When using
> fit_transform(), the OS crashed with log "Process 13851(python) of user xxx
> dumped core. Stack trace
> .../numpy/core/_multiarray_umath_cpython_36m_x86_64... ".
>
> The hardware: Centos 8, Intel i9, 128GB RAM, stack size is set unlimited.
> Such crash can be reproduced.
>
> Thanks.
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] 2 million samples dataset caused python and OS crash

2021-01-06 Thread Guillaume LemaƮtre
And it seems that the piece of traceback refer to NumPy.

On Wed, 6 Jan 2021 at 12:48, Andrew Howe  wrote:

> A core dump generally happens when a process tries to access memory
> outside it's allocated address space. You've not specified what estimator
> you were using, but I'd guess it attempted to do something with the dataset
> that resulted in it being duplicated or otherwise expanded beyond the
> memory capacity. Perhaps the full stack trace would be helpful.
>
> Andrew
>
>
> <~~~>
> J. Andrew Howe, PhD
> LinkedIn Profile 
> ResearchGate Profile 
> Open Researcher and Contributor ID (ORCID)
> 
> Github Profile 
> Personal Website 
> I live to learn, so I can learn to live. - me
> <~~~>
>
>
> On Wed, Jan 6, 2021 at 11:02 AM Liu James  wrote:
>
>> Hi all,
>>
>> I'm using a medium dataset KDD99  IDS(
>> https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset)
>> for model training, and the dataset has 2 million  samples.  When using
>> fit_transform(), the OS crashed with log "Process 13851(python) of user xxx
>> dumped core. Stack trace
>> .../numpy/core/_multiarray_umath_cpython_36m_x86_64... ".
>>
>> The hardware: Centos 8, Intel i9, 128GB RAM, stack size is set
>> unlimited.  Such crash can be reproduced.
>>
>> Thanks.
>>
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn