[Scikit-learn-general] PEP8 alert!

2014-01-16 Thread Olivier Grisel
PEP8 violations are reaching a critical level causing a risk of code style meltdown: https://jenkins.shiningpanda-ci.com/scikit-learn/job/python-2.7-numpy-1.6.2-scipy-0.10.1/violations/ We should be more careful in checking pep8 compliance prior to merging PRs from now on :) And remember kids:

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-16 Thread Manoj Kumar
Thanks for your responses. @Kyle: At the risk of sounding really naive, I'd like to make the following comments. I'm referring to this paper that Sukru had posted, http://www.stat.osu.edu/~dmsl/Sarwar_2001.pdf which is item based collaborative filtering. I don't think there is really any need for

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-16 Thread Kyle Kastner
So X is the array of existing ratings, would y be a 2D array then? If not, how do you map the ratings given back to a single user (since y is typically, to my knowledge, 1D in sklearn)? I am still a little confused, but your example helped. Can you could go into a little more detail on X, x, and

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-16 Thread Manoj Kumar
Well y can be 2-D too, there are estimators like MultiTaskElasticNet especially meant for multi-task y. I was thinking something along these lines. Lets say [ham, spam, ram, bam, tam] are the five items. and if first user gives ham - 2 spam - 3 the second user gives ram - 1 bam - -3 tam - 4

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-16 Thread Manoj Kumar
I'm extremely sorry, that message got sent half way through. (I pressed Ctrl + Enter by mistake) X = [[ham, spam], [ram, bam, tam]], and y = [[2, 3], [1, -3, 4]] and we do clf.fit(X, y) Suppose we would like to predict, what we would recommend the user x who has already rated ram as 1 and bam as

[Scikit-learn-general] Suggestion to add author names/emails at the bottom of module documentations

2014-01-16 Thread Issam
Hi scikit-learn editors, Any documentation can have mistakes, but it's important to address them quickly and efficiently. One plausible way is to contact the author of an erroneous text to have him make proper changes. But, for all I know, scikit-learn's documentation lacks authors'

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-16 Thread nmura...@masonlive.gmu.edu
I agree that sparse matrices need to be supported as one of the main properties inherent to the user/item rating matrix in recommender systems is its sparsity. This sparsity is what has given rise to such a large scale of research in the field. Hence this property would have to be taken

Re: [Scikit-learn-general] Suggestion to add author names/emails at the bottom of module documentations

2014-01-16 Thread Vlad Niculae
I would rather have this sorted out through the github issue tracker. I don't think it's a good idea to encourage users to e-mail individual developers. Someone else could have the expertise and do the change confidently. My 2c, Vlad On Thu Jan 16 18:12:05 2014, Issam wrote: Hi scikit-learn

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-16 Thread Kyle Kastner
@Manoj The predict stage taking 2 parameters is what I was talking about - are there any other estimators that need anything more than a single matrix to do a prediction? I do not recall any - this would be something particular to CF. Maybe you could recast it as a matrix with alternating rows of

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-16 Thread Manoj Kumar
Yes indeed, getting two parameters for predict would be specific to CF. That was the most obvious idea that came to my mind. I would like to hear other's opinions also regarding the API, and the feasibility of such a project. On Thu, Jan 16, 2014 at 11:47 PM, Kyle Kastner

Re: [Scikit-learn-general] Suggestion to add author names/emails at the bottom of module documentations

2014-01-16 Thread Robert Layton
I agree with Vlad. Further, if there is documentation or a module that none of the active developers can touch (due to complexity or lack of expertise), the preference has generally been to move to remove it from scikit-learn. On 17 January 2014 05:12, Vlad Niculae zephy...@gmail.com wrote: I

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-16 Thread Kyle Kastner
The other thing to keep mind an ideal solution would be compatible with Pipeline() - it would be nice to be able to use it there, which is one of the reasons a different signature for the predict() method is an issue. Hopefully something can be figured out, as there is a lot interest in CF

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-16 Thread Joel Nothman
`y` is by definition hidden at prediction time for supervised learning, so I don't think your representation makes sense. But I see this as a completion problem, not a supervised learning problem: the same data is observed at training and predict time. Isn't the following: X = [[ham, spam], [ram,

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-16 Thread Olivier Grisel
2014/1/16 Joel Nothman joel.noth...@gmail.com: There are still issues of whether this is in scikit-learn scope. For example, does it make sense with sklearn's cross validation? Or will you want to cross validate on both axes? Given that there is plenty of work to be done that is well within

[Scikit-learn-general] Automated scikit-learn dev documentation build

2014-01-16 Thread Olivier Grisel
Hi all, Jaques and I have recently been working on moving the dev documentation build job out of Fabian's workstation to a server on the public Rackspace Cloud. The deployment of the documentation build server is now fully automated thanks to this script and configuration:

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-16 Thread Manoj Kumar
Thanks everyone for your quick responses. 1. Could someone point me to a list of GSoC ideas this year? 2. Is it okay, if I take up projects related to ideas, that have not yet been implemented. For example, a quick search tells me Improving GMM has not been implemented. Thanks.