Why not just do the train_test_split over directory names, and later (e.g. in a Pipeline) read in the images?
On 15 January 2018 at 10:18, Abdul Abdul <abdul.s...@gmail.com> wrote: > Hello, > > I'm trying to train an image classifier, but a bit confused on how to > label my data. The issue here is that for each class I have subdirectories, > each of which contains two images. So, it is not I have classes, and in > each class I simply have the images that come under that class (i.e. cats > vs. dogs). > > I will show here some attempts for grouping the data together, but not yet > able to figure how to assign the label, and pass the pairs of images along > with the label to the image classifier. > > So, that's how I simply read the two images: > > im1 = cv2.imread('img1.jpg') > im1 = img_to_array(im1) > > im2 = cv2.imread('img2.jpg') > im2 = img_to_array(im2) > > I then *pair* the images as follows: > > pair = (im1,im2) > > For labeling, this is what I did: > > label = root.split(os.path.sep)[-2] > label = 1 if label == 'cat' else 0 > > How can I group the above pairs of images (im1,im2) and attach the label > to them? Especially that I want to pass them to the following scikit-learn > function: > > (trainX, testX, trainY, testY) = train_test_split(data, > labels, test_size=0.25, random_state=42) > > Thanks. > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn