I've gone ahead with the sample_weight implementation, along with some amount of code refactoring (mostly where I had to touch it). I have submitted a pull request.
Let me know if I can help fix any specific other concerns -- I've spent some time with the referenced paper, and the code; so less ramp up time to understand what's going on. The code is working fine for my real-world use case (weighted, 140 rows and 12 columns of data). Regards, Deepak Date: Fri, 10 Jul 2015 14:23:16 -0500 From: Andreas Mueller<t3k...@gmail.com> Subject: Re: [Scikit-learn-general] Planning on implementing sample_weight option for PLSRegression.fit() To:scikit-learn-general@lists.sourceforge.net Message-ID:<55a01ba4.1050...@gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed I think there are also some concerns about the actual implementation of the algorithm, and whether the NIPALS implementation is fast and stable enough, and the appropriate choice in all cases. On 07/07/2015 04:51 PM, Deepak Subburam wrote: > Cool. I spent some time inspecting the pls_.py source file. There are a > whole lot of if conditions prevalent in the code (on > deflation_mode="regression"|"canonical", mode="A"|"B", > algorithm="nipals"|"svd", Y.ndim==1|>1, etc.), and I wonder if that's > what you are thinking could be refactored. > > I can minimize adding another set of if statements ("if sample_weight is > not None") in my implementation. Or I can take a shot at refactoring the > code first, though that would involve significant changes you might > worry about a new contributor making. For example, I think the > _nipals_twoblocks...() function should be a class method attached to the > object, so you don't have to pass all the self.XYZ parameters to it and > it can update self.n_iter within itself, ca. line 286. > > Other comments I have on the code are that it can use more empty lines > (between separate blocks of logic) for readability, and that the > mode="A"|"B" parameter can be made more descriptive. > > Let me know your views on how I might proceed. > > Best, > Deepak > >> Date: Tue, 7 Jul 2015 02:03:41 +0200 >> From: Gael Varoquaux<gael.varoqu...@normalesup.org> >> Subject: Re: [Scikit-learn-general] Planning on implementing >> sample_weight option for PLSRegression.fit() >> To:scikit-learn-general@lists.sourceforge.net >> Message-ID:<20150707000341.ga2996...@phare.normalesup.org> >> Content-Type: text/plain; charset=us-ascii >> >> On Mon, Jul 06, 2015 at 04:55:07PM -0400, Deepak Subburam wrote: >> >>> I have some time to contribute, and would like to implement a >>> sample_weight option to the fit() method of >>> sklearn.cross_decomposition.PLSRegression, handling any other knock-on >>> effects. Let me know if you think this is not a good idea, or have any >>> other thoughts to guide me. >> In general, I think that this is a great idea. One question is: is the >> quality of the PLS code ready for these changes? >> >> G ------------------------------------------------------------------------------ Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general