[scikit-learn] 2 million samples dataset caused python and OS crash

2021-01-06 Thread Liu James
Hi all, I'm using a medium dataset KDD99 IDS( https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset) for model training, and the dataset has 2 million samples. When using fit_transform(), the OS crashed with log "Process 13851(python) of user xxx dumped core. Sta

Re: [scikit-learn] 2 million samples dataset caused python and OS crash

2021-01-06 Thread Andrew Howe
A core dump generally happens when a process tries to access memory outside it's allocated address space. You've not specified what estimator you were using, but I'd guess it attempted to do something with the dataset that resulted in it being duplicated or otherwise expanded beyond the memory capa

Re: [scikit-learn] 2 million samples dataset caused python and OS crash

2021-01-06 Thread Guillaume LemaƮtre
And it seems that the piece of traceback refer to NumPy. On Wed, 6 Jan 2021 at 12:48, Andrew Howe wrote: > A core dump generally happens when a process tries to access memory > outside it's allocated address space. You've not specified what estimator > you were using, but I'd guess it attempted