[GitHub] ThomasDelteil commented on a change in pull request #10537: [MX-307] Add .md tutorials to .ipynb for CI integration
ThomasDelteil commented on a change in pull request #10537: [MX-307] Add .md tutorials to .ipynb for CI integration URL: https://github.com/apache/incubator-mxnet/pull/10537#discussion_r181877584 ## File path: docs/tutorials/python/data_augmentation.md ## @@ -49,13 +47,21 @@ One of the most convenient ways to augment your image data is via arguments of [ We show a simple example of this below, after creating an `images.lst` file used by the [`ImageIter`](https://mxnet.incubator.apache.org/api/python/image/image.html?highlight=imageiter#mxnet.image.ImageIter). Use [`tools/im2rec.py`](https://github.com/apache/incubator-mxnet/blob/master/tools/im2rec.py) to create the `images.lst` if you don't already have this for your data. ```python -!echo -e "0\t0.00\timages/0.jpg" > ./data/images.lst +path_to_image = os.path.join("images","0.jpg") +index = 0 +label = "0.00" +list_file_content = "{}\t{}\t{}".format(index, label, path_to_image) Review comment: I find it slightly less visual that way but will update This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ThomasDelteil commented on a change in pull request #10537: [MX-307] Add .md tutorials to .ipynb for CI integration
ThomasDelteil commented on a change in pull request #10537: [MX-307] Add .md tutorials to .ipynb for CI integration URL: https://github.com/apache/incubator-mxnet/pull/10537#discussion_r181875579 ## File path: docs/tutorials/gluon/datasets.md ## @@ -175,58 +180,51 @@ for epoch in range(epochs): print("Epoch {}, training loss: {:.2f}, validation loss: {:.2f}".format(epoch, train_loss, valid_loss)) ``` -Epoch 0, training loss: 0.54, validation loss: 0.45 -Epoch 1, training loss: 0.40, validation loss: 0.39 -Epoch 2, training loss: 0.36, validation loss: 0.39 -Epoch 3, training loss: 0.33, validation loss: 0.34 -Epoch 4, training loss: 0.32, validation loss: 0.33 +`Epoch 0, training loss: 0.54, validation loss: 0.45` + +`...` + +`Epoch 4, training loss: 0.32, validation loss: 0.33` # Using own data with included `Dataset`s Gluon has a number of different [`Dataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=dataset#mxnet.gluon.data.Dataset) classes for working with your own image data straight out-of-the-box. You can get started quickly using the [`mxnet.gluon.data.vision.datasets.ImageFolderDataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=imagefolderdataset#mxnet.gluon.data.vision.datasets.ImageFolderDataset) which loads images directly from a user-defined folder, and infers the label (i.e. class) from the folders. We will run through an example for image classification, but a similar process applies for other vision tasks. If you already have your own collection of images to work with you should partition your data into training and test sets, and place all objects of the same class into seperate folders. Similar to: - +``` ./images/train/car/abc.jpg ./images/train/car/efg.jpg ./images/train/bus/hij.jpg ./images/train/bus/klm.jpg ./images/test/car/xyz.jpg ./images/test/bus/uvw.jpg +``` You can download the Caltech 101 dataset if you don't already have images to work with for this example, but please note the download is 126MB. ```python -!wget http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz -!tar -xzf 101_ObjectCategories.tar.gz + +data_folder = "data" +dataset_name = "101_ObjectCategories" +archive_file = "{}.tar.gz".format(dataset_name) +archive_path = os.path.join(data_folder, archive_file) +data_url = "https://s3.us-east-2.amazonaws.com/mxnet-public/; + +if not os.path.isfile(archive_path): +mx.test_utils.download("{}{}".format(data_url, archive_file), dirname = data_folder) +print('Extracting {} in {}...'.format(archive_file, data_folder)) +tar = tarfile.open(archive_path, "r:gz") +tar.extractall(data_folder) +tar.close() +print('Data extracted.') ``` -After downloading and extracting the data archive, we seperate the data into training and test sets (50:50 split), and place images of the same class into the same folders, as required for using [`ImageFolderDataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=imagefolderdataset#mxnet.gluon.data.vision.datasets.ImageFolderDataset). +After downloading and extracting the data archive, we have two folders: `data/101_ObjectCategories` and `data/101_ObjectCategories_test`. We load the data into a training and testing dataset [`ImageFolderDataset`](https://mxnet.incubator.apache.org/api/python/gluon/data.html?highlight=imagefolderdataset#mxnet.gluon.data.vision.datasets.ImageFolderDataset). Review comment: will update This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ThomasDelteil commented on a change in pull request #10537: [MX-307] Add .md tutorials to .ipynb for CI integration
ThomasDelteil commented on a change in pull request #10537: [MX-307] Add .md tutorials to .ipynb for CI integration URL: https://github.com/apache/incubator-mxnet/pull/10537#discussion_r181875787 ## File path: docs/tutorials/python/types_of_data_augmentation.md ## @@ -335,7 +332,7 @@ And lastly, you can use [`mxnet.image.RandomOrderAug`](https://mxnet.incubator.a ```python example_image_copy = example_image.copy() aug_list = [ -mx.image.RandomCropAug(size=(50, 50)), +mx.image.RandomCropAug(size=(250, 250)), Review comment: the generated image was really random, this one always shows at least a part of the giraffe, which makes more sense IMO. The image can remain since it is a sensible one. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ThomasDelteil commented on a change in pull request #10537: [MX-307] Add .md tutorials to .ipynb for CI integration
ThomasDelteil commented on a change in pull request #10537: [MX-307] Add .md tutorials to .ipynb for CI integration URL: https://github.com/apache/incubator-mxnet/pull/10537#discussion_r181875817 ## File path: docs/tutorials/vision/large_scale_classification.md ## @@ -11,6 +11,11 @@ Training a neural network with a large number of images presents several challen $ pip install opencv-python ``` +```python +import mxnet as mx +print(mx.__version__) Review comment: will update This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ThomasDelteil commented on a change in pull request #10537: [MX-307] Add .md tutorials to .ipynb for CI integration
ThomasDelteil commented on a change in pull request #10537: [MX-307] Add .md tutorials to .ipynb for CI integration URL: https://github.com/apache/incubator-mxnet/pull/10537#discussion_r181875420 ## File path: docs/tutorials/gluon/datasets.md ## @@ -245,7 +243,7 @@ As with the Fashion MNIST dataset the labels will be integer encoded. You can us ```python -sample_idx = 888 +sample_idx = 539 Review comment: no I found the same by doing a bisect This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ThomasDelteil commented on a change in pull request #10537: [MX-307] Add .md tutorials to .ipynb for CI integration
ThomasDelteil commented on a change in pull request #10537: [MX-307] Add .md tutorials to .ipynb for CI integration URL: https://github.com/apache/incubator-mxnet/pull/10537#discussion_r181875085 ## File path: docs/tutorials/speech_recognition/ctc.md ## @@ -1,5 +1,11 @@ # Connectionist Temporal Classification +```python + +import mxnet as mx +print(mx.__version__) Review comment: it's to enable the generation of the .ipynb, it needs to have at least one code statement. I think this one shows the user which version it is using. I will add 1.1.0 below, which you are right should at least inform the user that this worked with 1.1.0. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ThomasDelteil commented on a change in pull request #10537: [MX-307] Add .md tutorials to .ipynb for CI integration
ThomasDelteil commented on a change in pull request #10537: [MX-307] Add .md tutorials to .ipynb for CI integration URL: https://github.com/apache/incubator-mxnet/pull/10537#discussion_r181295076 ## File path: tests/nightly/test_tutorial_config.txt ## @@ -1,20 +1,31 @@ basic/ndarray +basic/ndarray_indexing basic/symbol basic/module basic/data -python/linear-regression -python/mnist -python/predict_image -onnx/super_resolution -onnx/fine_tuning_gluon -onnx/inference_on_onnx_model -basic/ndarray_indexing -python/matrix_factorization +gluon/customop +gluon/data_augmentation +gluon/datasets gluon/ndarray gluon/mnist gluon/autograd gluon/gluon gluon/hybrid +nlp/cnn +onnx/super_resolution +onnx/fine_tuning_gluon +onnx/inference_on_onnx_model +python/matrix_factorization +python/linear-regression +python/mnist +python/predict_image +python/data_augmentation +python/data_augmentation_with_masks +python/kvstore +python/types_of_data_augmentation sparse/row_sparse sparse/csr -sparse/train Review comment: Indeed, going forward, there will be a one individual test per tutorial, to allow the use of annotation like `@highCpu`, `@highMemory`, `@gpu`. And there will be an integration test that will check that each notebook has been added to the test suite. This will be part of my next PR, as part of this work of integrating tutorials to the CI This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ThomasDelteil commented on a change in pull request #10537: [MX-307] Add .md tutorials to .ipynb for CI integration
ThomasDelteil commented on a change in pull request #10537: [MX-307] Add .md tutorials to .ipynb for CI integration URL: https://github.com/apache/incubator-mxnet/pull/10537#discussion_r181295076 ## File path: tests/nightly/test_tutorial_config.txt ## @@ -1,20 +1,31 @@ basic/ndarray +basic/ndarray_indexing basic/symbol basic/module basic/data -python/linear-regression -python/mnist -python/predict_image -onnx/super_resolution -onnx/fine_tuning_gluon -onnx/inference_on_onnx_model -basic/ndarray_indexing -python/matrix_factorization +gluon/customop +gluon/data_augmentation +gluon/datasets gluon/ndarray gluon/mnist gluon/autograd gluon/gluon gluon/hybrid +nlp/cnn +onnx/super_resolution +onnx/fine_tuning_gluon +onnx/inference_on_onnx_model +python/matrix_factorization +python/linear-regression +python/mnist +python/predict_image +python/data_augmentation +python/data_augmentation_with_masks +python/kvstore +python/types_of_data_augmentation sparse/row_sparse sparse/csr -sparse/train Review comment: Indeed, going forward, there will be a one individual test per tutorial, to allow the use of annotation like `@highCpu`, `@hihMemory`. And there will be an integration test that will check that each notebook has been added to the test suite. This will be part of my next PR, as part of this work of integrating tutorials to the CI This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services