[mlpack] GSoC 2018

2018-03-21 Thread Mohan Bhambhani
Hi,

I am Mohan, a final year Dual degree (B.Tech. + M.Tech.) student at
Indian Institute of Technology Madras. I came across some interesting GSoC
project ideas and would like to work on one of them.

I am most interested in 'Variational Autoencoders' project. I am very
much familiar with Variational autoencoders. I am not sure as to what is
the intended goal of the project. Are we trying to set up a framework for
easy implementation of VAEs and its extensions, and demonstrate it by
implementing VAEs and some of its extensions or applications? Or are we
looking into just VAE implementation.

Hope I am not very late. I appreciate your valuable feedback so I can
frame my proposal in the available short duration.

Thank you,
Mohan

-- 
Mohan Bhambhani,
Final year DD,
Dept. of CSE,
IIT Madras.
___
mlpack mailing list
mlpack@lists.mlpack.org
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Re: [mlpack] Profiling for parallelization

2018-03-21 Thread Ryan Curtin
On Wed, Mar 21, 2018 at 12:20:05AM +0530, Nikhil Goel wrote:
> Hi Ryan
> 
> Thank you for your help. I've submitted the draft of my proposal and it
> would be really helpful if you could review it and tell me the changes I
> should make.
> My main concerns regarding my proposal are -
> 1) The number of algorithms/functions I've chosen. I'm trying to research
> more but if you can tell your thoughts on the number of algorithms I've
> chosen, it would be really helpful.
> 2) I looked into logisitic regression, and it is using SGD and L-BFGS.
> Parallel-SGD has been implemented in mlpack but I'm unsure if that will
> actually provide a significant speedup as the parallelization is already
> there at low levels. Do you think it will be worst investing my time into?
> Should I mention it in my GSoC proposal?
> 3) Similar kind of problem for naive bayes. I've figured out the for loops
> that should be parallelized but the papers I followed showed no significant
> performance improvement in parallel naive bayes. Should I mention this in
> my proposal?
> 4) How much change is permitted before I should make another file for
> parallel implementation of the algorithm?
> 5) I've dropped the idea of providing API since you're right, it will be
> better for the user to learn openMP as it's pretty famous.
> 6)I've added bagging in my proposal. So I'll implement and parallelize it.
> I hope that's fine.

Hi Nikhil,

Thanks for the update.  I don't know how quickly or slowly you work, so
I can't provide much input on how many algorithms you should do---this
part is up to you.

Personally I think that since parallel SGD is already implemented, it's
not necessary to focus on it.  It's hard to say how much change we
should do before making another file.  I would say, if we can apply
OpenMP directly to an algorithm in a way that does not fundamentally
change the algorithm, there is no reason to have a second parallel
implementation.

Thanks,

Ryan

-- 
Ryan Curtin| "Lots of respectable people have been hit by
r...@ratml.org | trains."  - Penny
___
mlpack mailing list
mlpack@lists.mlpack.org
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Re: [mlpack] Fix MVU+LRSDP in GSoC 2018

2018-03-21 Thread Ryan Curtin
On Wed, Mar 21, 2018 at 04:43:56PM +0800, kaiqiang Xu wrote:
> Dear Ryan,
> 
> 
> I am Xu Kaiqiang, a 1st year master student  major in computer science
> studying at Univ. of Chinese Academy of Science.
> I am so excited to find the problem of MVU + LRSDP, which triggers my
> passion to fix it by using convex analysis, fluent C++ skills and big
> interest. The courses I have learned, such as covex analysis and machine
> learning, and experiences in development projects  may help me figure it
> out.
> 
> I have read the tutorials and tasted serveral functions of the mlpack by
> running some machine learning samples the other days.
> After roughly reading the papers, and carefully understanding your replies
> in the mlpack mail archives, I plan to check the correctness of MVU with
> SDP, and then try to understand it.
> Moreover, I plan to understand SDP/LRSDP. I think it may be helpful for my
> proposal.
> 
> Sorry to join project so late. Can you give me some advice about plans and
> applying for GSoC this year?

Hi there Kaiqiang,

Don't worry, it is not too late to join---the application deadline is
not passed yet.  Note that LRSDP is actually nonconvex optimization, not
convex optimization, so some of the tools for debugging will not
translate well.

I agree that it would be very important to fully understand SDPs and
LRSDPs before attempting this project.

There is an application guide that may be helpful:

https://github.com/mlpack/mlpack/wiki/Google-Summer-of-Code-Application-Guide

Also, there is an unusual amount of interest in the MVU project this
year.  I just answered another email about it here:

http://knife.lugatgt.org/pipermail/mlpack/2018-March/003697.html

Maybe that response will be helpful to you.

Thanks,

Ryan

-- 
Ryan Curtin| "It is very cold... in space."
r...@ratml.org |   - Khan
___
mlpack mailing list
mlpack@lists.mlpack.org
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Re: [mlpack] GSoC 2018 MVU+LRSDP Bugfix

2018-03-21 Thread Ryan Curtin
On Tue, Mar 20, 2018 at 03:22:04AM +, Abhijeet Krishnan wrote:
> Hi,
> 
> I am interested in working for mlpack for GSoC 2018, with an interest in
> trying to get the MVU + LRSDP implementation working. I have built mlpack
> and modified it (with help from zoq) to also build the MVU part of the
> code. I am currently attempting to re-write the MVU code to match the
> current APIs, since there were a number of compilation errors which I
> believe were due to the LRSDP APIs being updated without also updating the
> MVU code.
> 
> In terms of debugging the error, I have thought of the following approach -
> 1. Test the MVU code using the primal dual optimizer to verify if the error
> is in MVU or in LRSDP
> 2a. If MVU+Primal Dual converges, the error is most likely in LRSDP
> 2b. If MVU+Primal Dual also does not converge, the error is most likely in
> MVU (might also be in Primal Dual)
> 3. Wherever the error is, my plan is to be clear about how that algorithm
> works (from reading the research papers) and to test the values the
> function returns against some working implementation of it in another
> package
> 4. If the error is in LRSDP, I don't think there is a reference
> implementation. I will need to really dig into the code and algorithm to
> find the error. I guess this is exactly why the issue is marked 10/10 in
> difficulty.
> 
> I would like to know what approaches and efforts have been made previously
> to solve this problem, in particular, if anyone has any intuition as to
> where the problem might lie and what would be a good approach to solve it.
> 
> If anyone has any tips to share on debugging numerical algorithms, that
> would also be welcome.
> 
> I am currently pursuing a PhD in CS from NC State University with plans of
> conducting research in Generative Methods.

Hi Abhijeet,

Thanks for getting in touch.  This is a good start for a plan but I want
to emphasize the difficulty here.  A good knowledge of nonconvex
optimization and semidefinite programs will be involved, and in a good
proposal I think it would be important to include some number of
strategies for debugging optimizers that aren't converging.

There is an additional paper or tech note by Sam Burer (I can't remember
the title) that points out that saddle points do exist in LRSDP, so it
is possible (although I don't think this is the case) that LRSDP is
getting stuck in saddle points and not converging for MVU.

In addition, it's likely to be worthwhile to confirm that the MVU
solution matrix can even be considered to be low-rank---if it can't,
then it's not reasonable to ever expect the algorithm to converge to a
good solution.

I hope these pointers are helpful.

Thanks,

Ryan

-- 
Ryan Curtin| "You know, I think he's got a point."
r...@ratml.org |   - The Mayor
___
mlpack mailing list
mlpack@lists.mlpack.org
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Re: [mlpack] MVU Bug Fix GSOC

2018-03-21 Thread LI Xuran
Hi Ryan,

Thanks a lot for the advice and sorry for late reply. I spent the past two days 
to went through most of the papers and it really helps. Apart from the previous 
idea:

1.generate random simple dataset, and compare normal MVU using mlpack Primal 
dualsolver  on it. compare it with the result of MVU +LRSDP.
2.write unit tests and substitution for the original code in LRSDP, check their 
correctness over processing of the above datasets.

I also have some new ideas coming up:

1.it is mentioned in multiple papers that the selection of parameters for the 
penalty, knn, and learning rate etc is critical for the convergence of the 
algorithm. Thus, I think it would be helpful to either implement a dynamical 
parameter adjusting algorithm or to check manually on the variables in the 
cases where the algorithm is not convergent.
2. there are several propositions and characteristics that a functioning LRSDP 
should meet mentioned in the papers.(i.e.nonzero duality gap and XS==0 for 
optimal solution etc) I am making a list of these properties and the goal is 
that to implement tests to check that these properties would be met in each 
iteration.
3. a not recommended alternative would be to change the implementation in 
LRSDP, for example, change the optimization criteria to search for the maximum 
distance of furthest neighbor, or to integrate a dual approach into the 
original code. The attractiveness of such action is that it would provide a 
theoretical guarantee that the algorithm would converge and have an optimal 
result(while by using LRSDP alone we only have optimal results on most cases), 
but I don't quite like it because it might bring side effects and deficiency on 
runtime. Also, I think it is not guaranteed to work as it was only proposed as 
an alternative in the papers. So I am currently thinking of taking up this 
approach as a last alternative if no bug is found in previous sections.

So that is all I had for now, Would you think some of it worth a try? I will 
read the rest of the pdf tomorrow, and change my proposal according to your 
suggestion.

Best Wishes,
Daniel Li



From: LI Xuran
Sent: 18 March 2018 20:15:57
To: mlpack@lists.mlpack.org
Subject: Re: MVU Bug Fix GSOC


Dear Ryan,


 I am currently working on my proposal for the Fixes to MVU and low-rank 
semidefinite programs and have come up with the following ideas:
1.generate random simple dataset, and compare normal MVU with MVU +LRSDP on it. 
do visualization of the procedure and the result in 2d/3d.
2. write unit tests and substitution for the original code in mlpack's MVU 
implementation and check their correctness over processing of the above 
datasets.
3. base on the observation of the result of 1 and 2, create datasets that 
particularly points out the issue ... and check step by step on that sample
4.(or maybe datasets with a special property such that it should always 
converge by an implementation of MVU + LRSDP  and check if the expected result  
is met )

do you think any of the above ideas worth a try?

Thanks!

Daniel Li



From: LI Xuran
Sent: 17 March 2018 17:47:09
To: mlpack@lists.mlpack.org
Subject: MVU Bug Fix GSOC


Hello Ryan,

I am Daniel Li, a second-year student studying Artificial in the University of 
Edinburgh. I write fluent c++ code and is interested in taking up the quest to 
fix bugs regarding MVU and semidefinite programming in mlpack. I've read  about 
scalable semidefinite manifold learning and other articles and set up mlpack on 
my own computer. Could you give me some advice as for where to start my 
research on the project as I familiar myself with the code base? Also is it a 
good idea to implement the MVU with dual-tree algorithm to compare with the  
current version of MVU using LRSDP?

Thanks!

Daniel Li
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
mlpack mailing list
mlpack@lists.mlpack.org
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

[mlpack] Fix MVU+LRSDP in GSoC 2018

2018-03-21 Thread kaiqiang Xu
Dear Ryan,


I am Xu Kaiqiang, a 1st year master student  major in computer science
studying at Univ. of Chinese Academy of Science.
I am so excited to find the problem of MVU + LRSDP, which triggers my
passion to fix it by using convex analysis, fluent C++ skills and big
interest. The courses I have learned, such as covex analysis and machine
learning, and experiences in development projects  may help me figure it
out.

I have read the tutorials and tasted serveral functions of the mlpack by
running some machine learning samples the other days.
After roughly reading the papers, and carefully understanding your replies
in the mlpack mail archives, I plan to check the correctness of MVU with
SDP, and then try to understand it.
Moreover, I plan to understand SDP/LRSDP. I think it may be helpful for my
proposal.

Sorry to join project so late. Can you give me some advice about plans and
applying for GSoC this year?

Best,
Kaiqiang
___
mlpack mailing list
mlpack@lists.mlpack.org
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack