[mlpack] GSOC : Adding Optimizers to MultiObjective module

2021-03-18 Thread Nanubala Gnana Sai
Hey all,
First, I must thank the devs for considering my ideas and suggesting the
feasibility of it. After some careful consideration of GSoC timeline I
propose the following:

Adding:
Month 1:
a) Strength Pareto Evolutionary Algorithm II (SPEA-II) : One of the core
multiobjective algorithm along with NSGA-II. It will be really nice to
have. I've already begun some work on it. ( 1 month)

Month2:
b) Fully implementing MOEA/D-DE: For the past week, I've refactored over
60% of the code of MOEA/D-DE, fixing bugs, cleaning APIs etc. It is almost
done, you can track it here .
I wish to continue this in GSoC (if its not merged already). (Less than a
week or even Pre-GSOC)

(Approx 2 weeks)
c) Adding test suite for MOO: Any MOO module is incomplete without a test
suite. I propose to add the following:
 i) ZDT : I've already coded it fully and it performs correctly on
trivial cases. Track it here 
 ii) DLTZ: I haven't started this yet but this would be a nice
addition.

d) Miscellanous: Needles to say, I'll be making PR's and fixing bugs which
are related to my main PR.

Potential Mentors: Anyone really :) , I've intentionally kept the scope
short and made it such that I can work 100% indepedently. But it'd be nice
to be able to discuss my ideas (although I'm sure IRC channel will help me
with this with open arms).

Best
NGS
___
mlpack mailing list
mlpack@lists.mlpack.org
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack


Re: [mlpack] GSoC-2021

2021-03-18 Thread Omar Shrit
Hello Gopi,

Sorry, I am forgot to answer you earlier (A little bit busy), thanks for the 
ping.

On 03/18, Gopi Manohar Tatiraju wrote:
> Hey Omar,
> 
> Thank you soo much for such detailed replies and inputs.
> 
> I see that there are many things that we need to get in order before we
> start working on this idea.
> I think we should start from this point:
> 
> 1) It would be nicer to have mlpack as header-only by moving these
> > implementations into header files.
> 
> 
> I went through the mlpack/core/data/ directory and List of .cpp files in
> core/data
> 
>1. detect_file_type.cpp
>2. load.cpp
>3. load_csv.cpp
>4. load_image.cpp
>5. save_image.cpp
> 
> The convention we follow in mlpack is having two .hpp files, one for
> declarations and one for implementations.

The convention is related to classes that are defined as
class-template. I do not think it is possible to convert an implementation
file to `_impl.hpp` if the class is not a class template.

Therefore, you can put the entire implementation in the header, where
the class is defined. There are several disadvantages of putting everything
in one place, for instance, increase compilation time and binary footprint.

It would be nice to have a small prototype and a benchmark to see how
much time the compilation time will increase, since we are trying to
reduce these numbers.

I did not look into the code yet, but I do not see anything that might
prevent from having everything in the headers.

Also, do not worry about details such as convention, these can be seen
later especially if a prototype comes back with good results.

> I just want to clarify some things so that I can start writing some code
> regarding this:
> 
>- Let's consider detect_file_type, we have for now:
>   - detect_file_type.hpp
>   
> 
>   - detect_file_type.cpp
>   
> 
> 
> What we want in the future:
> 
> 
>- detect_file_type.hpp
>- detect_file_type_impl.hpp
> 
> Can you just give me one example or reference or just a list of points that
> we need to consider while restructuring these files? What will be the major
> differences after converting them to .hpp files?
> 
> 
> It feels great getting feedback on my idea and I get to learn soo many new
> things on the way.
> 
> Thank you,
> Gopi.
> 
> 
> On Sat, Mar 13, 2021 at 4:12 AM Omar Shrit  wrote:
> 
> > Hello Gopi,
> >
> > The data frame class project is indeed a good idea, we have thought
> > about that, but as Ryan said, it can be a big project for GSoC given
> > the limited period of time this year.
> >
> > I have several ideas to add on what Ryan said. The objective is to make
> > the project lighter and more fit for a GSoC.
> >
> > Knowing that, the data load/save part from mlpack core is the only part
> > that
> > has implementation files (.cpp) while all methods of mlpack are
> > header-only,
> > therefore:
> >
> > 1) It would be nicer to have mlpack as header-only by moving these
> > implementations into header files.
> >
> > 2) I would avoid re-implementing things that have already implemented,
> > especially that these parts of code (Loading, Saving, Matrix
> > manipulation, and conversion) need a lot of
> > optimization, which requires years of work to have something feasible.
> > However, looking at Xtensor library seems to be similar to Pandas providing
> > what is in need, in C++, with a good performance.
> >
> > 3) Xtensor integration can be realized by adding a mlpack wrapper
> > (a small light wrapper) for Xtensor functionalities. This wrapper can be
> > integrated into mlpack source code, or can be kept separately (as
> > ensmallen)
> > allowing to be added when needed, therefore only link with library that
> > we use (avoiding dependencies).
> >
> > Knowing that the above steps will require more than one GSoC to
> > complete, but they can be done independently. You can choose what you
> > find the most suitable and build a proposal upon it allowing to have
> > the most possible decoupling between the tasks in order to maximize the
> > possible
> > feasibility of the project.
> >
> > I hope you find this helpful !
> >
> > Thanks,
> >
> > Omar
> >
> >
> > On 03/13, Gopi Manohar Tatiraju wrote:
> > > Hey Ryan,
> > >
> > > Thanks for the feedback.
> > > I agree that this can be a very big project considering the time span of
> > > GSoC this year, if we decide to go ahead with this project it will be
> > very
> > > important to decide on some base features as you already pointed out.
> > >
> > > how will users use this dataframe?
> > >
> > >
> > > We should do it in the same way as we do DatasetInfo, this will keep it
> > > separate from the dataset(arma::mat) so that we don't need to change how
> > we
> > > pass data to the agent.
> > > We will create an object of class mlFrame and pass that 

Re: [mlpack] GSoC-2021

2021-03-18 Thread Gopi Manohar Tatiraju
Hey Omar,

Thank you soo much for such detailed replies and inputs.

I see that there are many things that we need to get in order before we
start working on this idea.
I think we should start from this point:

1) It would be nicer to have mlpack as header-only by moving these
> implementations into header files.


I went through the mlpack/core/data/ directory and List of .cpp files in
core/data

   1. detect_file_type.cpp
   2. load.cpp
   3. load_csv.cpp
   4. load_image.cpp
   5. save_image.cpp

The convention we follow in mlpack is having two .hpp files, one for
declarations and one for implementations.

I just want to clarify some things so that I can start writing some code
regarding this:

   - Let's consider detect_file_type, we have for now:
  - detect_file_type.hpp
  

  - detect_file_type.cpp
  


What we want in the future:


   - detect_file_type.hpp
   - detect_file_type_impl.hpp

Can you just give me one example or reference or just a list of points that
we need to consider while restructuring these files? What will be the major
differences after converting them to .hpp files?


It feels great getting feedback on my idea and I get to learn soo many new
things on the way.

Thank you,
Gopi.


On Sat, Mar 13, 2021 at 4:12 AM Omar Shrit  wrote:

> Hello Gopi,
>
> The data frame class project is indeed a good idea, we have thought
> about that, but as Ryan said, it can be a big project for GSoC given
> the limited period of time this year.
>
> I have several ideas to add on what Ryan said. The objective is to make
> the project lighter and more fit for a GSoC.
>
> Knowing that, the data load/save part from mlpack core is the only part
> that
> has implementation files (.cpp) while all methods of mlpack are
> header-only,
> therefore:
>
> 1) It would be nicer to have mlpack as header-only by moving these
> implementations into header files.
>
> 2) I would avoid re-implementing things that have already implemented,
> especially that these parts of code (Loading, Saving, Matrix
> manipulation, and conversion) need a lot of
> optimization, which requires years of work to have something feasible.
> However, looking at Xtensor library seems to be similar to Pandas providing
> what is in need, in C++, with a good performance.
>
> 3) Xtensor integration can be realized by adding a mlpack wrapper
> (a small light wrapper) for Xtensor functionalities. This wrapper can be
> integrated into mlpack source code, or can be kept separately (as
> ensmallen)
> allowing to be added when needed, therefore only link with library that
> we use (avoiding dependencies).
>
> Knowing that the above steps will require more than one GSoC to
> complete, but they can be done independently. You can choose what you
> find the most suitable and build a proposal upon it allowing to have
> the most possible decoupling between the tasks in order to maximize the
> possible
> feasibility of the project.
>
> I hope you find this helpful !
>
> Thanks,
>
> Omar
>
>
> On 03/13, Gopi Manohar Tatiraju wrote:
> > Hey Ryan,
> >
> > Thanks for the feedback.
> > I agree that this can be a very big project considering the time span of
> > GSoC this year, if we decide to go ahead with this project it will be
> very
> > important to decide on some base features as you already pointed out.
> >
> > how will users use this dataframe?
> >
> >
> > We should do it in the same way as we do DatasetInfo, this will keep it
> > separate from the dataset(arma::mat) so that we don't need to change how
> we
> > pass data to the agent.
> > We will create an object of class mlFrame and pass that to the load
> > function. But we have to make sure that we don't end up making another
> copy
> > of the dataset here as well, might use a bit of help here to create the
> > skeleton of the class.
> >
> >  How will the dataframe integrate with mlpack's existing methods?
> >
> >
> > If we could follow the way I mentioned above we won't need to change any
> > existing implementations to access or use the data.
> >
> > Let's discuss point by point, let me know what you think about the
> > above-mentioned way to implement it or if I need to clear anything more
> > regarding this, I will address other questions soon as we get a basic
> idea
> > of the project.
> >
> > Regarding the image, there is an ImageInfo class we can extend its
> > functionality to work on a directory of images, but I have not yet
> figured
> > out if we need a way to display the methods, I mean the info regarding
> the
> > images should be fine right?
> >
> > Also, I was thinking of adding some stats to DatasetInfo class, methods
> to
> > show the numerical summary of the dataset which can include mean, std,
> min,
> > max, etc. These are the same methods that I suggested to implement in
> this
> > PR 

Re: [mlpack] GSOC 21

2021-03-18 Thread Marcus Edel
Hello Oleksandr,

I'm not sure how feasible it is to build the complete mlpack lib using the
web assembly stack, so I would take a look into that first. mlpack has a
few dependencies like armadillo and some parts of boost that I think have
to be integrated into the wasm build pipeline. I guess as a starting point
you can see if you can build armadillo and run a simple test.

About the PoC of bringing ES to RL, I like the idea, and I think that could
be an interesting starting point as well. It would be interesting to see if
CMAES can find a solution in a reasonable time.

Thanks,
Marcus

> On 17. Mar 2021, at 14:45, Oleksandr Nikolskyy  wrote:
> 
> Hello Marcus and Omar
> 
> Thanks for the fast response.
> 
> I think web assembly would be interesting for me.
> I read that c++ can be translated to web assembly. Do you have some proposed 
> vectors of attack/nice-to-have's for a potential mlpack-web edition? Just for 
> me to have a starting point for brain-storming.
> According to the Evolutional Strategies applied to RL:
> The proof of concept of bringing ES to RL I thought of is training a net in 
> an environment from Open-Ai-Gym using @Zoq's gym_tcp_api and CMA-ES to 
> optimize weights. If you think this is too week or something else would be a 
> better warm-up, please let me know.
> 
> I try to finish by the end of the week and push it to a repo on Github. 
> 
> Best
> 
> Oleksandr
> 
> Marcus Edel mailto:marcus.e...@fu-berlin.de>> 
> schrieb am Mo., 15. März 2021, 16:34:
> Hello Oleksandr,
> 
> right now, the implementation we have in ensmallen is not directly applicable
> to the RL code that is in the mlpack repository; that said it would be an
> interesting project to combine the two. In combination with an extension of 
> the
> existing CMA-ES implementation, it could make a neat project.
> 
> About the Rust bindings, I agree with Omar would be nice to have, since C++
> can be used from within Rust, it might be a tangible project for this GSoC. 
> About
> the JS bindings, wondering if webassembly might be a better way to go to bring
> mlpack to the web, what do you think?
> 
> I'm not sure about the Graph NN idea; supporting Graph NN's would come with a
> completely new representation of the network structure we currently support, 
> so
> I'm not sure we would have enough time to implement a solid solution by the 
> end
> of the summer.
> 
> I hope anything I said was helpful, let us know if you have any further 
> questions.
> 
> Thanks,
> Marcus
> 
>> On 15. Mar 2021, at 05:59, Omar Shrit mailto:o...@shrit.me>> 
>> wrote:
>> 
>> Hello Oleksandr,
>> 
>> Thank you for you interest in mlpack.
>> 
>> Evolutionary algorithms are welcomed and can be a good project, we
>> already have several algorithms in ensmallen such as PSO and cmaes.
>> I do not think that moving cmaes to mlpack would be a good idea. The
>> objective of ensmallen is to have all optimization methods in one place,
>> knowing that ensmallen was already part of mlpack.
>> 
>> Rust binding is a good idea too, as GNN. However, non of these ideas is
>> related to one another, which will make it very hard to create a solid
>> and consistent GSoC project, knowing that GSoC this year is shorter.
>> Therefore, I would concentrate on one idea and build a proposal on it,
>> and try to have some proof of concept in order to make the proposal more
>> convincing.
>> 
>> Hope you find this helpful.
>> 
>> Thanks,
>> 
>> Omar
>> 
>> On 03/15, Oleksandr Nikolskyy wrote:
>>> Hi, I am Oleksandr, CS Masters student from Bonn, Germany.
>>> 
>>> I was reading about cma es and its extensions(which is one of gsoc ideas)
>>> and it is really interesting.Found also some additional sources about
>>> evolution algorithms in general e.g
>>> https://openai.com/blog/evolution-strategies/ 
>>> 
>>> https://blog.otoro.net/2017/10/29/visual-evolution-strategies/ 
>>> 
>>> Sounds also like the results of evolutional algorithms can be used for some
>>> RL problems.I would be interested to work on a proposal for GSOC, if this
>>> topic is still free.Currently, the cma es is living in the ensmallen
>>> library.If working on this topic, is it a good idea to work towards the
>>> implementation of cma-es enhancements in the mlpack package? For example to
>>> enable bindings?
>>> 
>>> 
>>> Also, I had some other ideas:
>>> 
>>> 
>>>   1. Create Rust bindings
>>>   2. Start mlpack.js, as a ready to use node package
>>>   3. Add explicit support for graph neural networks to the ann module
>>>   along with the core module.
>>> 
>>> Would be happy about your feedback! :) In the meanwhile, I will continue my
>>> research on the realizability of these ideas.
>> 
>>> ___
>>> mlpack mailing list
>>> mlpack@lists.mlpack.org 
>>> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack 
>>> 

Re: [mlpack] Gsoc proposal discussion

2021-03-18 Thread Marcus Edel
Hello Abhinav,

the scope of the project looks reasonable to me; if you like you can add one
or two layers to the proposal as potential work that can be done if there is
time left at the end.

Also in case you haven't seen it we have an application guide:

https://github.com/mlpack/mlpack/wiki/Google-Summer-of-Code-Application-Guide

that could be helpful. That said, once the GSoC application submission
platform is open, you can submit drafts, and we can, if time permits, provide
feedback.

Thanks,
Marcus

> On 17. Mar 2021, at 12:42, Abhinav Anand  wrote:
> 
> Hi, I am Abhinav from India. I have been contributing towards mlpack for 
> quite some time. I am interested in applying for Gsoc this year with mlpack. 
> I have discussed my idea on Slack and received positive feedback with some 
> useful suggestions. Keeping in mind of this year shortened Gsoc commitment, I 
> have reduced my proposal idea to the below three layers:
> 1. Upsample Layer
> 2. Group Normalization
> 3. Channel Shuffle
> If you believe that this might be less work for Gsoc, let me know. I have a 
> couple more good layers that can be included in the proposal.
> I have attached my first draft of the proposal, please let me know what you 
> think. Feel free to give any suggestion.
> 
> Best Regards,
> Abhinav 
> 
> ___
> mlpack mailing list
> mlpack@lists.mlpack.org
> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

___
mlpack mailing list
mlpack@lists.mlpack.org
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack