quot; function ie in
> b=enc.fit(test), I saved b
>
> Then, I used b.transform(newdata) , but the error
> now comes because of # of rows in test and newdata being different
>
> Am I doing something wrong in saving the OneHotEncoder and reu
One-hot-encoding by nature requires the input feature dimension from
fitting to be the same at transform time.
Take a look at DictVectorizer (
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.DictVectorizer.html#sklearn.feature_extraction.DictVectorizer),
which will assi
This contribution is looking really exciting! Looking forward to seeing it in
scikit-learn!—
Sent from Mailbox
On Thu, Jul 24, 2014 at 8:52 AM, Maheshakya Wijewardena
wrote:
> Hi,
> I have made my new post on testing LSH-ANN implementation:
> http://maheshakya.github.io/gsoc/2014/07/24/testing-
Nice - results look good relative to annoy. Very promising—
Sent from Mailbox
On Fri, Jun 13, 2014 at 9:17 PM, Maheshakya Wijewardena
wrote:
> Hi,
> I've added a new blog post about performance comparisons of available ANN
> implementations and newly implemented LSH forest.
> http://maheshakya.g
10:32:12AM +0200, Nick Pentreath wrote:
>> Are some of the algorithms too cutting edge or not cited enough,
> Yes
>> or some other reason?
> I think that it is good practice to explore new ideas outside of
> scikit-learn. It usually takes a lot of effort and time to figure out
&g
That does seem like it would be a very worthwhile project - but why was
lightning outside scikit-learn initially? Are some of the algorithms too
cutting edge or not cited enough, or some other reason?
On Tue, Feb 4, 2014 at 10:28 AM, Gael Varoquaux <
gael.varoqu...@normalesup.org> wrote:
> On T
There have been many people asking about contributing recommender systems
to scikit-learn, and generally the response has been that it doesn't quite
fit in with the library. Though it can be shoehorned somewhat perhaps, I
recommend you take a look at https://github.com/mendeley/mrec, which
impleme
Another important and related use case is to reduce the search space, for
example, in recommendation systems one often has to do the dot product, or
cosine similarity, between two vectors of moderate dimension. But you have
to do this in real-time across potentially millions of candidate items. In
This would be a great addition.
Some ideas /code perhaps: http://nearpy.io/
On Tue, Jan 28, 2014 at 10:59 AM, Mathieu Blondel wrote:
> If we have a suitable mentor for it, locality-sensitive hashing (LSH)
> would be a great GSOC subject:
> http://en.wikipedia.org/wiki/Locality-sensitive_hashing
While I think collaborative filtering / recommendations may have a place in
sklearn, it is true that the problem setting is a little different from
most of the sklearn models.
You may want to take a look into mrec (https://github.com/mendeley/mrec)
where many well established CF approaches are imp
Great, interesting.
I added a few ideas to the Wiki (feel free anyone to add or edit).
On Mon, Dec 9, 2013 at 11:17 PM, Olivier Grisel wrote:
> 2013/12/9 Nick Pentreath :
> > This is a cool idea. And it is fairly straightforward. I hacked up an
> > illustration this
This is a cool idea. And it is fairly straightforward. I hacked up an
illustration this evening: https://gist.github.com/MLnick/7880766
The better approach would be to amend the sklearn svmlight code to accept
iterables of strings in addition to file handles, and then pretty much no
additional cod
CC'ing Spark Dev list
I have been thinking about this for quite a while and would really love to
see this happen.
Most of my pipeline ends up in Scala/Spark these days - which I love, but
it is partly because I am reliant on custom Hadoop input formats that are
just way easier to use from Scala/J
Mendeley have also recently open-sourced their recommender framework, which
relies on SGD to train models using scikit-learn, and seems to try to fit
into the sklearn API.
https://github.com/Mendeley/mrec/
Nick
On Mon, Oct 14, 2013 at 1:37 AM, Andreas Mueller
wrote:
> On 10/09/2013 11:36 AM, O
Hey Helge
Funny I just saw this drop into my inbox! Hope you are well.
What does your data look like? Is it sparse? For classification tasks
(read: SGDClassifier), one can stream data one-by-one and thus be
"out-of-core" - though in this case I'd recommend doing it in
"mini-batches". This would u
or classification / regression, etc), with the only
additional code needed being a training function and one for merging models.
Nick
On Sun, Jan 27, 2013 at 8:01 PM, Robert Kern wrote:
> On Thu, Jan 24, 2013 at 10:06 AM, Nick Pentreath
> wrote:
> > May I suggest you look at
May I suggest you look at Spark (http://spark-project.org/ and
https://github.com/mesos/spark).
It is written in Scala, has a Java API and the current master branch has
the new Python API (0.7.0 release when it happens). I've been doing some
testing, including using sklearn together with Spark, an
17 matches
Mail list logo