The method Bit sampling for Hamming distance is already included in
brute algorithm as the metric hamming in Nearest neighbor search.
Hence, I think that does not need to be implemented as a LSH algorithm.
On Wed, Feb 26, 2014 at 12:46 AM, Maheshakya Wijewardena
pmaheshak...@gmail.com wrote:
2014-02-25 7:52 GMT+01:00 Gael Varoquaux gael.varoqu...@normalesup.org:
Extreme learning machine: theory and applications has 1285 citations
and it got published in 2006; a large number of citations for a fairly
recent article. I believe scikit-learn could add such an interesting
learning
On Wed, Feb 26, 2014 at 01:29:43PM +0100, Lars Buitinck wrote:
I recently implemented baseline RBF networks in pretty much the same
way: k-means + RBF kernel + linear classifier. I didn't submit a PR
because it's just a pipeline of existing components.
All your points about transformers and
On Wed Feb 26 13:32:08 2014, Gael Varoquaux wrote:
documentation and example
This was exactly my thought. Many such (near-)equivalences are not
obvious, especially
for beginners. If Lars's hinge ELM and RBF network would work well (or
provide
interesting feature visualisations) on some
To scikit-learn-general,
I fit the bimodal 1D distribution with the strong overlap of Gaussian
components using scikits.mixture.GMM. The scikits.mixture.GMM.fit gives
result which is inconsistent with parameters of input distribution.
The code below demonstrates the issue.
In case the two
2014-02-26 13:40 GMT+01:00 Vlad Niculae zephy...@gmail.com:
This was exactly my thought. Many such (near-)equivalences are not
obvious, especially
for beginners. If Lars's hinge ELM and RBF network would work well (or
provide
interesting feature visualisations) on some sklearn.dataset, an
On Wed, Feb 26, 2014 at 03:42:50PM +0300, Issam wrote:
Or perhaps special pipelines to simplify such common tasks.
I'd rather avoid special pipelines. For we, that would mean that we have
an API problem with the pipeline, that needs to be identified and
solved.
G
2014-02-26 13:51 GMT+01:00 Gael Varoquaux gael.varoqu...@normalesup.org:
On Wed, Feb 26, 2014 at 03:42:50PM +0300, Issam wrote:
Or perhaps special pipelines to simplify such common tasks.
I'd rather avoid special pipelines. For we, that would mean that we have
an API problem with the
On Wed, Feb 26, 2014 at 01:55:11PM +0100, Lars Buitinck wrote:
I'd rather avoid special pipelines. For we, that would mean that we have
an API problem with the pipeline, that needs to be identified and
solved.
Well, for deep learning, you'd want a generalized backprop on the
final N
+1 for an RBF network transformer (with an option to choose between k-means
and random sampling).
Mathieu
On Wed, Feb 26, 2014 at 9:40 PM, Vlad Niculae zephy...@gmail.com wrote:
On Wed Feb 26 13:32:08 2014, Gael Varoquaux wrote:
documentation and example
This was exactly my thought. Many
As an aside Lars - I'd actually love to see the recepy, if you don't mind
putting up a gist or notebook.
On Wed, Feb 26, 2014 at 1:29 PM, Lars Buitinck larsm...@gmail.com wrote:
2014-02-25 7:52 GMT+01:00 Gael Varoquaux gael.varoqu...@normalesup.org:
Extreme learning machine: theory and
Dear All,
I am using RandomForest on a data set which has less than 20 features, but
about 40 lines.
The point is that, even if I work on a subset of about 3 lines to
train my model, when I save it using pickle, I get a large file in the
order of several hundreds of Mb of space (see
You can control the size of your random forest by adjusting the
parameters n_estimators, min_samples_split and even max_depth (read
the documentation for more details).
It's up to you to find parameter values that match your constraints in
terms of accuracy vs model size in RAM and prediction
On 02/26/2014 05:55 PM, Peter Prettenhofer wrote:
please make sure to pickle with the highest protocol - otherwise
pickle uses a textual serialization format which is quite inefficient:
pickle.dump(clf, f, protocol=pickle.HIGHEST_PROTOCOL)
Or simply protocol=-1. This usually makes a huge
On 02/26/2014 10:13 AM, Maheshakya Wijewardena wrote:
The method Bit sampling for Hamming distance is already included in
brute algorithm as the metric hamming in Nearest neighbor search.
Hence, I think that does not need to be implemented as a LSH algorithm
I would also rather focus on
I would also rather focus on non-binary representations.
Even when using Random Projection method for hashing, only sign of the
result of dot product is considered. So that, in that situation also, there
will be a binary representation( or +1s and -1s). What is your idea about
this method?
hi folks,
when using extra trees, one can compute an oob score. has anybody looked at
comparing the oob_score to performing a shufflesplit iteration on the data?
are these in someways equivalent or would converge to the same mean?
cheers,
satra
We seem to have a lot of PRs waiting for review in some form or another. I
think they could do with better management.
Can we use github features to make it more apparent that a PR has received
+1 (i.e. needs another reviewer) or +2 (i.e. waiting for merge)?
At the moment, [WIP] and [MRG] are
Hi,
I like the [MRG+1] and [MRG+2] idea. Let's see if it can help...
Best,
A
--
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow
hi,
do you know:
http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.FeatureUnion.html
?
it might do already what you want
A
On Thu, Feb 27, 2014 at 8:33 AM, michael kneier
michael.kne...@gmail.com wrote:
Hi all,
I would like to add a combiner class which would work with
20 matches
Mail list logo