Hi,
This is absolutely great news. Thanks a lot. Please do open a WIP PR. We
(at INRIA) were planning to allocate time from someone this summer to
work on this. So you'll have someone reviewing / advicing.
With regards to releasing the gil, you need to use the 'with nogil'
statement in cython.
What makes you think this is the main bottleneck? While it is not an
insignificant consumer of time, I really doubt this is what's making
scikit-learn's LSH implementation severely underperform with respect to
other implementations.
We need to profile. In order to do that, we need some sensible
That's true, I wasn't aware that score_samples is used already in the
context of density estimation. score_samples would be okay then in my
opinion.
Jan
On 29.07.2015 18:46, Andreas Mueller wrote:
Hm, I'm not entirely sure how score_samples is currently used, but I
think it is the
One approach to fixing the ascending phase would ensure that
_find_matching_indices is only searching over parts of the tree that have
not yet been explored, while currently it searches over the entire index at
each depth.
My preferred, but more experimental, solution is to memoize where the
While the Gaussian distribution has a PDF, the Poisson distribution has a
PMF. From Wikipedia (https://en.wikipedia.org/wiki/Probability_mass_function
):
A probability mass function differs from a probability density function
(pdf) in that the latter is associated with continuous rather than
(sorry, I should have said the first b layers, not 2**b layers, producing a
memoization of 2**b offsets)
On 30 July 2015 at 22:22, Joel Nothman joel.noth...@gmail.com wrote:
One approach to fixing the ascending phase would ensure that
_find_matching_indices is only searching over parts of the
On Thu, Jul 30, 2015 at 11:38 PM, Andreas Mueller t3k...@gmail.com wrote:
I am mostly concerned about API explosion.
I take your point of PDF vs PMF.
Maybe predict_proba(X, y) is better.
Would you also support predict_proba(X, y) for classifiers (which would be
Yes, I may be far more expensive than k-means. I just used it with Euclidean
distance -- was for a comparison. I think k-medoids can still be useful for
smaller, maybe noisier datasets, or if you have some distance measure were
calculating averages may not make sense.
On Jul 30, 2015, at
I support the inclusion of Poisson loss, although a quick note on
predict_prob_at:
The output of Poisson regression is a posterior distribution over the rate
parameter in the form of a Gamma distribution. If we assume no uncertainty
at all in the prediction, the posterior predictive distribution
I was looking for K-Medoids too couple of weeks ago and ended up implementing
it myself -- but more like quick dirty. I would really welcome a nice and
efficient implementation of available via scikit, for example, using voronoi
iteration.
Best,
Sebastian
On Jul 30, 2015, at 1:51 PM, Timo
Hi all,
I checked and could find no mention of KMedoids in Scikit-Learn. Me and my
friend have implemented the algorithm in Python, and were wondering if it
could be brought into Scikit-Learn. Thoughts?
Cheers,
Timo
PS: I am new to the mailing list, so please guide me in case I am doing
I think KMediods has come up before.
One issues is that it doesn't really scale to large n_samples, right?
There is an implementation mentioned here:
https://github.com/scikit-learn/scikit-learn/issues/3799
Do you use it because you have a custom distance matrix?
On 07/30/2015 02:27 PM,
My feeling is that it will perform better in cases where there are clusters
of correlated attributes, which the exact same case where it would make
sense to use a dimensionality reduction technique such as factor analysis
or PCA.
Hastie et al. in their book Elements of Statistical Learning
Hi Sebastian,
LDA is unsupervised. Supervised PCA finds components correlated with the
response variable.
Best regards,
Stelios
2015-07-29 22:55 GMT+01:00 Sebastian Raschka se.rasc...@gmail.com:
Out of curiosity, how does supervised PCA compare to LDA (Linear
Discriminant Analysis); in a
He was asking about Linear Discriminant Analysis, not Latent Dirichlet
Allocation.
Mathieu
On Thu, Jul 30, 2015 at 7:58 PM, Stylianos Kampakis
stylianos.kampa...@gmail.com wrote:
Hi Sebastian,
LDA is unsupervised. Supervised PCA finds components correlated with the
response variable.
Hi,
I've started to look into the matter of improving performance of LSHForest.
As we have discussed sometime before(in fact, quite a long time), main
concern is to Cythonize distance calculations. Currently, this done by
iteratively moving over all the query vectors when `kneighbors` method is
Sorry, my fault.
Supervised PCA is different to Linear Discriminant Analysis. It uses a
heuristic to keep only the variables that show some correlation with the
response when calculating the components. It does not incorporate
explicitly the class separation as an objective. Supervised PCA can be
17 matches
Mail list logo