[mlpack] GSoC 2018
Hi, I am Mohan, a final year Dual degree (B.Tech. + M.Tech.) student at Indian Institute of Technology Madras. I came across some interesting GSoC project ideas and would like to work on one of them. I am most interested in 'Variational Autoencoders' project. I am very much familiar with Variational autoencoders. I am not sure as to what is the intended goal of the project. Are we trying to set up a framework for easy implementation of VAEs and its extensions, and demonstrate it by implementing VAEs and some of its extensions or applications? Or are we looking into just VAE implementation. Hope I am not very late. I appreciate your valuable feedback so I can frame my proposal in the available short duration. Thank you, Mohan -- Mohan Bhambhani, Final year DD, Dept. of CSE, IIT Madras. ___ mlpack mailing list mlpack@lists.mlpack.org http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
Re: [mlpack] Profiling for parallelization
On Wed, Mar 21, 2018 at 12:20:05AM +0530, Nikhil Goel wrote: > Hi Ryan > > Thank you for your help. I've submitted the draft of my proposal and it > would be really helpful if you could review it and tell me the changes I > should make. > My main concerns regarding my proposal are - > 1) The number of algorithms/functions I've chosen. I'm trying to research > more but if you can tell your thoughts on the number of algorithms I've > chosen, it would be really helpful. > 2) I looked into logisitic regression, and it is using SGD and L-BFGS. > Parallel-SGD has been implemented in mlpack but I'm unsure if that will > actually provide a significant speedup as the parallelization is already > there at low levels. Do you think it will be worst investing my time into? > Should I mention it in my GSoC proposal? > 3) Similar kind of problem for naive bayes. I've figured out the for loops > that should be parallelized but the papers I followed showed no significant > performance improvement in parallel naive bayes. Should I mention this in > my proposal? > 4) How much change is permitted before I should make another file for > parallel implementation of the algorithm? > 5) I've dropped the idea of providing API since you're right, it will be > better for the user to learn openMP as it's pretty famous. > 6)I've added bagging in my proposal. So I'll implement and parallelize it. > I hope that's fine. Hi Nikhil, Thanks for the update. I don't know how quickly or slowly you work, so I can't provide much input on how many algorithms you should do---this part is up to you. Personally I think that since parallel SGD is already implemented, it's not necessary to focus on it. It's hard to say how much change we should do before making another file. I would say, if we can apply OpenMP directly to an algorithm in a way that does not fundamentally change the algorithm, there is no reason to have a second parallel implementation. Thanks, Ryan -- Ryan Curtin| "Lots of respectable people have been hit by r...@ratml.org | trains." - Penny ___ mlpack mailing list mlpack@lists.mlpack.org http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
Re: [mlpack] Fix MVU+LRSDP in GSoC 2018
On Wed, Mar 21, 2018 at 04:43:56PM +0800, kaiqiang Xu wrote: > Dear Ryan, > > > I am Xu Kaiqiang, a 1st year master student major in computer science > studying at Univ. of Chinese Academy of Science. > I am so excited to find the problem of MVU + LRSDP, which triggers my > passion to fix it by using convex analysis, fluent C++ skills and big > interest. The courses I have learned, such as covex analysis and machine > learning, and experiences in development projects may help me figure it > out. > > I have read the tutorials and tasted serveral functions of the mlpack by > running some machine learning samples the other days. > After roughly reading the papers, and carefully understanding your replies > in the mlpack mail archives, I plan to check the correctness of MVU with > SDP, and then try to understand it. > Moreover, I plan to understand SDP/LRSDP. I think it may be helpful for my > proposal. > > Sorry to join project so late. Can you give me some advice about plans and > applying for GSoC this year? Hi there Kaiqiang, Don't worry, it is not too late to join---the application deadline is not passed yet. Note that LRSDP is actually nonconvex optimization, not convex optimization, so some of the tools for debugging will not translate well. I agree that it would be very important to fully understand SDPs and LRSDPs before attempting this project. There is an application guide that may be helpful: https://github.com/mlpack/mlpack/wiki/Google-Summer-of-Code-Application-Guide Also, there is an unusual amount of interest in the MVU project this year. I just answered another email about it here: http://knife.lugatgt.org/pipermail/mlpack/2018-March/003697.html Maybe that response will be helpful to you. Thanks, Ryan -- Ryan Curtin| "It is very cold... in space." r...@ratml.org | - Khan ___ mlpack mailing list mlpack@lists.mlpack.org http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
Re: [mlpack] GSoC 2018 MVU+LRSDP Bugfix
On Tue, Mar 20, 2018 at 03:22:04AM +, Abhijeet Krishnan wrote: > Hi, > > I am interested in working for mlpack for GSoC 2018, with an interest in > trying to get the MVU + LRSDP implementation working. I have built mlpack > and modified it (with help from zoq) to also build the MVU part of the > code. I am currently attempting to re-write the MVU code to match the > current APIs, since there were a number of compilation errors which I > believe were due to the LRSDP APIs being updated without also updating the > MVU code. > > In terms of debugging the error, I have thought of the following approach - > 1. Test the MVU code using the primal dual optimizer to verify if the error > is in MVU or in LRSDP > 2a. If MVU+Primal Dual converges, the error is most likely in LRSDP > 2b. If MVU+Primal Dual also does not converge, the error is most likely in > MVU (might also be in Primal Dual) > 3. Wherever the error is, my plan is to be clear about how that algorithm > works (from reading the research papers) and to test the values the > function returns against some working implementation of it in another > package > 4. If the error is in LRSDP, I don't think there is a reference > implementation. I will need to really dig into the code and algorithm to > find the error. I guess this is exactly why the issue is marked 10/10 in > difficulty. > > I would like to know what approaches and efforts have been made previously > to solve this problem, in particular, if anyone has any intuition as to > where the problem might lie and what would be a good approach to solve it. > > If anyone has any tips to share on debugging numerical algorithms, that > would also be welcome. > > I am currently pursuing a PhD in CS from NC State University with plans of > conducting research in Generative Methods. Hi Abhijeet, Thanks for getting in touch. This is a good start for a plan but I want to emphasize the difficulty here. A good knowledge of nonconvex optimization and semidefinite programs will be involved, and in a good proposal I think it would be important to include some number of strategies for debugging optimizers that aren't converging. There is an additional paper or tech note by Sam Burer (I can't remember the title) that points out that saddle points do exist in LRSDP, so it is possible (although I don't think this is the case) that LRSDP is getting stuck in saddle points and not converging for MVU. In addition, it's likely to be worthwhile to confirm that the MVU solution matrix can even be considered to be low-rank---if it can't, then it's not reasonable to ever expect the algorithm to converge to a good solution. I hope these pointers are helpful. Thanks, Ryan -- Ryan Curtin| "You know, I think he's got a point." r...@ratml.org | - The Mayor ___ mlpack mailing list mlpack@lists.mlpack.org http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
Re: [mlpack] MVU Bug Fix GSOC
Hi Ryan, Thanks a lot for the advice and sorry for late reply. I spent the past two days to went through most of the papers and it really helps. Apart from the previous idea: 1.generate random simple dataset, and compare normal MVU using mlpack Primal dualsolver on it. compare it with the result of MVU +LRSDP. 2.write unit tests and substitution for the original code in LRSDP, check their correctness over processing of the above datasets. I also have some new ideas coming up: 1.it is mentioned in multiple papers that the selection of parameters for the penalty, knn, and learning rate etc is critical for the convergence of the algorithm. Thus, I think it would be helpful to either implement a dynamical parameter adjusting algorithm or to check manually on the variables in the cases where the algorithm is not convergent. 2. there are several propositions and characteristics that a functioning LRSDP should meet mentioned in the papers.(i.e.nonzero duality gap and XS==0 for optimal solution etc) I am making a list of these properties and the goal is that to implement tests to check that these properties would be met in each iteration. 3. a not recommended alternative would be to change the implementation in LRSDP, for example, change the optimization criteria to search for the maximum distance of furthest neighbor, or to integrate a dual approach into the original code. The attractiveness of such action is that it would provide a theoretical guarantee that the algorithm would converge and have an optimal result(while by using LRSDP alone we only have optimal results on most cases), but I don't quite like it because it might bring side effects and deficiency on runtime. Also, I think it is not guaranteed to work as it was only proposed as an alternative in the papers. So I am currently thinking of taking up this approach as a last alternative if no bug is found in previous sections. So that is all I had for now, Would you think some of it worth a try? I will read the rest of the pdf tomorrow, and change my proposal according to your suggestion. Best Wishes, Daniel Li From: LI Xuran Sent: 18 March 2018 20:15:57 To: mlpack@lists.mlpack.org Subject: Re: MVU Bug Fix GSOC Dear Ryan, I am currently working on my proposal for the Fixes to MVU and low-rank semidefinite programs and have come up with the following ideas: 1.generate random simple dataset, and compare normal MVU with MVU +LRSDP on it. do visualization of the procedure and the result in 2d/3d. 2. write unit tests and substitution for the original code in mlpack's MVU implementation and check their correctness over processing of the above datasets. 3. base on the observation of the result of 1 and 2, create datasets that particularly points out the issue ... and check step by step on that sample 4.(or maybe datasets with a special property such that it should always converge by an implementation of MVU + LRSDP and check if the expected result is met ) do you think any of the above ideas worth a try? Thanks! Daniel Li From: LI Xuran Sent: 17 March 2018 17:47:09 To: mlpack@lists.mlpack.org Subject: MVU Bug Fix GSOC Hello Ryan, I am Daniel Li, a second-year student studying Artificial in the University of Edinburgh. I write fluent c++ code and is interested in taking up the quest to fix bugs regarding MVU and semidefinite programming in mlpack. I've read about scalable semidefinite manifold learning and other articles and set up mlpack on my own computer. Could you give me some advice as for where to start my research on the project as I familiar myself with the code base? Also is it a good idea to implement the MVU with dual-tree algorithm to compare with the current version of MVU using LRSDP? Thanks! Daniel Li The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ mlpack mailing list mlpack@lists.mlpack.org http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
[mlpack] Fix MVU+LRSDP in GSoC 2018
Dear Ryan, I am Xu Kaiqiang, a 1st year master student major in computer science studying at Univ. of Chinese Academy of Science. I am so excited to find the problem of MVU + LRSDP, which triggers my passion to fix it by using convex analysis, fluent C++ skills and big interest. The courses I have learned, such as covex analysis and machine learning, and experiences in development projects may help me figure it out. I have read the tutorials and tasted serveral functions of the mlpack by running some machine learning samples the other days. After roughly reading the papers, and carefully understanding your replies in the mlpack mail archives, I plan to check the correctness of MVU with SDP, and then try to understand it. Moreover, I plan to understand SDP/LRSDP. I think it may be helpful for my proposal. Sorry to join project so late. Can you give me some advice about plans and applying for GSoC this year? Best, Kaiqiang ___ mlpack mailing list mlpack@lists.mlpack.org http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack