If you work with deep net you need to check the utils from the deep net library. For instance in keras, you should create a batch generator if you need to deal with large dataset. In patch torch you can use the data loader which and the ImageFolder from torchvision which manage the loading for you.
On 5 March 2018 at 17:19, CHETHAN MURALI <chethanmural...@gmail.com> wrote: > Dear All, > > I am working on building a CNN model for image classification problem. > As par of it I have converted all my test images to numpy array. > > Now when I am trying to split the array into training and test set I am > getting memory error. > Details are as below: > > X = np.load("./data/X_train.npy", mmap_mode='r') > train_pct_index = int(0.8 * len(X)) > X_train, X_test = X[:train_pct_index], X[train_pct_index:] > X_train = X_train.reshape(X_train.shape[0], 256, 256, 3) > > X_train = X_train.astype('float32') > -------------------------------------------------MemoryError > Traceback (most recent call last)<ipython-input-46-9180807e01dc> > in <module>() > 2 print("Normalizing Data") > 3 ----> 4 X_train = X_train.astype('float32') > > *More information:* > > *1. my python version is* > > python --versionPython 3.6.4 :: Anaconda custom (64-bit) > > *2. I am running the code in ubuntu ubuntu 16.04.* > > *3. I have 32GB RAM* > > *4. X_train.npy file that I have loaded to np.array is of size 20GB* > > print("X_train Shape: ", X_train.shape) > X_train Shape: (85108, 256, 256, 3) > > I would be really glad if you can help me to overcome this problem. > > Regards, > - > Chethan > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn