Hi,

For an image classification task, I need to extract random patches and their coordinates from images.

Until now, I used a custom code to extract them at the same time. I recently tested the extract_patches_2d function from scikit-learn and it seems very fast. To extract the coordinates along with the patches, I wrote this test script <https://gist.github.com/NicolasTr/5429897>. Logically and unfortunately, it uses 3 times more memory compared to the same script without the coordinates extraction. I want to create a better solution but I need your opinion:

 * I could modify extract_patches_2d to return a tuple (patches,
   coordinates)
     o The memory consumption would probably be the same since the
       coordinates are already computed in the function (here
       
<https://github.com/scikit-learn/scikit-learn/blob/85ec0fd1ae904f275f608b11044a2476ed4723e6/sklearn/feature_extraction/image.py#L322-L323>).
       If max_patches is not specified, the function could return an
       itertools.product
     o It could break the existing code because the return value will
       be different
 * I could create a new kind of PatchExtracor:
     o The existing code wouldn't break
     o The random_state would need to be copied before any extraction
       to have the correct coordinates with randint

What do you think?

Regards,

Nicolas Trésegnie

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to