Re: [Scikit-learn-general] Predicting on new data after OneVsRest Classifier (multi label)

2015-11-18 Thread Nick Pentreath
quot; function ie in > b=enc.fit(test), I saved b > > Then, I used b.transform(newdata) , but the error > now comes because of # of rows in test and newdata being different > > Am I doing something wrong in saving the OneHotEncoder and reu

Re: [Scikit-learn-general] Predicting on new data after OneVsRest Classifier (multi label)

2015-11-16 Thread Nick Pentreath
One-hot-encoding by nature requires the input feature dimension from fitting to be the same at transform time. Take a look at DictVectorizer ( http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.DictVectorizer.html#sklearn.feature_extraction.DictVectorizer), which will assi

Re: [Scikit-learn-general] GSoC - Blog post updates

2014-07-24 Thread Nick Pentreath
This contribution is looking really exciting! Looking forward to seeing it in scikit-learn!— Sent from Mailbox On Thu, Jul 24, 2014 at 8:52 AM, Maheshakya Wijewardena wrote: > Hi, > I have made my new post on testing LSH-ANN implementation: > http://maheshakya.github.io/gsoc/2014/07/24/testing-

Re: [Scikit-learn-general] GSoC- LSH next blog post

2014-06-13 Thread Nick Pentreath
Nice - results look good relative to annoy. Very promising— Sent from Mailbox On Fri, Jun 13, 2014 at 9:17 PM, Maheshakya Wijewardena wrote: > Hi, > I've added a new blog post about performance comparisons of available ANN > implementations and newly implemented LSH forest. > http://maheshakya.g

Re: [Scikit-learn-general] GSoC - Improving GMM

2014-02-04 Thread Nick Pentreath
10:32:12AM +0200, Nick Pentreath wrote: >> Are some of the algorithms too cutting edge or not cited enough, > Yes >> or some other reason? > I think that it is good practice to explore new ideas outside of > scikit-learn. It usually takes a lot of effort and time to figure out &g

Re: [Scikit-learn-general] GSoC - Improving GMM

2014-02-04 Thread Nick Pentreath
That does seem like it would be a very worthwhile project - but why was lightning outside scikit-learn initially? Are some of the algorithms too cutting edge or not cited enough, or some other reason? On Tue, Feb 4, 2014 at 10:28 AM, Gael Varoquaux < gael.varoqu...@normalesup.org> wrote: > On T

Re: [Scikit-learn-general] Contributing in a New Topic : Recommender Systems

2014-02-02 Thread Nick Pentreath
There have been many people asking about contributing recommender systems to scikit-learn, and generally the response has been that it doesn't quite fit in with the library. Though it can be shoehorned somewhat perhaps, I recommend you take a look at https://github.com/mendeley/mrec, which impleme

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-28 Thread Nick Pentreath
Another important and related use case is to reduce the search space, for example, in recommendation systems one often has to do the dot product, or cosine similarity, between two vectors of moderate dimension. But you have to do this in real-time across potentially millions of candidate items. In

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-28 Thread Nick Pentreath
This would be a great addition. Some ideas /code perhaps: http://nearpy.io/ On Tue, Jan 28, 2014 at 10:59 AM, Mathieu Blondel wrote: > If we have a suitable mentor for it, locality-sensitive hashing (LSH) > would be a great GSOC subject: > http://en.wikipedia.org/wiki/Locality-sensitive_hashing

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-15 Thread Nick Pentreath
While I think collaborative filtering / recommendations may have a place in sklearn, it is true that the problem setting is a little different from most of the sklearn models. You may want to take a look into mrec (https://github.com/mendeley/mrec) where many well established CF approaches are imp

Re: [Scikit-learn-general] Spark-backed implementations of scikit-learn estimators

2013-12-09 Thread Nick Pentreath
Great, interesting. I added a few ideas to the Wiki (feel free anyone to add or edit). On Mon, Dec 9, 2013 at 11:17 PM, Olivier Grisel wrote: > 2013/12/9 Nick Pentreath : > > This is a cool idea. And it is fairly straightforward. I hacked up an > > illustration this

Re: [Scikit-learn-general] Spark-backed implementations of scikit-learn estimators

2013-12-09 Thread Nick Pentreath
This is a cool idea. And it is fairly straightforward. I hacked up an illustration this evening: https://gist.github.com/MLnick/7880766 The better approach would be to amend the sklearn svmlight code to accept iterables of strings in addition to file handles, and then pretty much no additional cod

Re: [Scikit-learn-general] Spark-backed implementations of scikit-learn estimators

2013-11-26 Thread Nick Pentreath
CC'ing Spark Dev list I have been thinking about this for quite a while and would really love to see this happen. Most of my pipeline ends up in Scala/Spark these days - which I love, but it is partly because I am reliant on custom Hadoop input formats that are just way easier to use from Scala/J

Re: [Scikit-learn-general] recommendation systems

2013-10-13 Thread Nick Pentreath
Mendeley have also recently open-sourced their recommender framework, which relies on SGD to train models using scikit-learn, and seems to try to fit into the sklearn API. https://github.com/Mendeley/mrec/ Nick On Mon, Oct 14, 2013 at 1:37 AM, Andreas Mueller wrote: > On 10/09/2013 11:36 AM, O

Re: [Scikit-learn-general] Scikit-learn for large datasets?

2013-08-23 Thread Nick Pentreath
Hey Helge Funny I just saw this drop into my inbox! Hope you are well. What does your data look like? Is it sparse? For classification tasks (read: SGDClassifier), one can stream data one-by-one and thus be "out-of-core" - though in this case I'd recommend doing it in "mini-batches". This would u

Re: [Scikit-learn-general] Using sklearn in Hadoop

2013-02-04 Thread Nick Pentreath
or classification / regression, etc), with the only additional code needed being a training function and one for merging models. Nick On Sun, Jan 27, 2013 at 8:01 PM, Robert Kern wrote: > On Thu, Jan 24, 2013 at 10:06 AM, Nick Pentreath > wrote: > > May I suggest you look at

Re: [Scikit-learn-general] Using sklearn in Hadoop

2013-01-24 Thread Nick Pentreath
May I suggest you look at Spark (http://spark-project.org/ and https://github.com/mesos/spark). It is written in Scala, has a Java API and the current master branch has the new Python API (0.7.0 release when it happens). I've been doing some testing, including using sklearn together with Spark, an