To summarize - for mds, spectralembedding, it looks like there is no transform method that will satisfy both
1. fit(X).transform(X) == fit_transform(X) 2. transform(X)[i:i+1] == transform(X[i:i+1]) that's because the current fit_transform doesn't factor nicely into those 2 steps. The last step returns a subset of eigenvalues of a modified Gram matrix. For PCA, kernel PCA, LLE, fit_transform is something like: center data, do U,S,V = SVD, project data onto submatrix of V. The last step is matrix multiplication. Last step in transform methods there are np.dot(...). That factors nicely. There could be a transform_batch method for mds which would satisfy 1., then transform could call transform_batch rowwise, to satisfy 2, but no single method will work. I don't know if there is appetite for separation and modification of unittests involved. Charles On Tue, Jan 21, 2020 at 9:19 PM Charles Pehlivanian < pehlivanianchar...@gmail.com> wrote: > This is what I thought we usually do. It looks like you said we are > doing a greedy transform. > I'm not sure I follow that. In particular for spectral embedding for > example there is a pretty way to describe > the transform and that's what we're doing. You could also look at > doing transductive learning but that's > not really the standard formulation, is it? > > Batch transform becomes greedy if one does: > > for x_i in X: > X_new_i = self.transform(x_i) > > I said that LLE uses greedy algorithm. The algorithm implemented is > pointwise. It may be that that's the only approach (in which case it's not > greedy), but I don't think so - looks like all of the spectral embedding, > lle, mds transforms have batch versions. So I probably shouldn't call it > greedy. Taking a *true* batch transform and enclosing it in a loop like > that - I'm calling that greedy. I'm honestly not sure if the LLE qualifies. > > Spectral embedding - agree, the method you refer to is implemented in > fit_transform(). How to apply to oos points? > > Non-distributable, non-subset-invariant, optimal batch transform > Can you give an example of that? > > Most of the manifold learners can be expressed as solutions to > eigenvalue/vector problems. For MDS batch transform, form a new constrained > double-centered distance matrix and solve a constrained least-squares > problem that mimics the SVD solution to the eigenvalue problem. They're > all like this - least-squares estimates for some constrained eigenvalue > problem. The question is whether you want to solve the full problem, or > solve on each point, adding one row and optimzing each time, ... that would > be subset-invariant though. > > For this offline/batch approach to an oos transform, the only way I see to > make it pass tests is to enclose it in a loop as above. That's what I see > at least. > > > On Tue, Jan 21, 2020 at 8:35 PM Andreas Mueller <t3k...@gmail.com> wrote: > >> >> >> On 1/21/20 8:23 PM, Charles Pehlivanian wrote: >> >> I understand - I'm kind of conflating the idea of data sample with test set, >> my view assumes there are a sample space of samples, might require >> rethinking the cross-validation setup... >> >> I also think that part of it relies on the notion of online vs. offline >> algorithm. For offline fits, a batch transform (non-subset invariant) is >> preferred. For a transformer that can only be used in an online sense, or is >> primarily used that way, keep the invariant. >> >> >> I see 3 options here - all I can say is that I don't vote for the first >> >> + No transform method on the manifold learners, so no cross-validation >> >> This is what I thought we usually do. It looks like you said we are doing >> a greedy transform. >> I'm not sure I follow that. In particular for spectral embedding for >> example there is a pretty way to describe >> the transform and that's what we're doing. You could also look at doing >> transductive learning but that's >> not really the standard formulation, is it? >> >> + Pointwise, distributable, subset-invariant, suboptimal greedy transform >> >> + Non-distributable, non-subset-invariant, optimal batch transform >> >> Can you give an example of that? >> >> -Charles >> >> On Mon., Jan. 20, 21:24:52 2020 <joel.nothman at gmail.com >> <scikit-learn%40python.org?Subject=Re%3A%20%5Bscikit-learn%5D%20Why%20is%20subset%20invariance%20necessary%20for%0A%20transfom%28%29%3F&In-Reply-To=%3CCAAkaFLWfWyu%2BDdQ3RX5tBays6jLX6A3W_QpqAcWn_RAxbRz5cQ%40mail.gmail.com%3E>> >> wrote >> >> I think allowing subset invariance to not hold is making stronger >> >> assumptions than we usually do about what it means to have a "test set". >> Having a transformation like this that relies on test set statistics >> implies that the test set is more than just selected samples, but rather >> that a large collection of samples is available at one time, and that it is >> in some sense sufficient or complete (no more samples are available that >> would give a better fit). So in a predictive modelling context you might >> have to set up your cross validation splits with this in mind. >> >> In terms of API, the subset invariance constraint allows us to assume that >> the transformation can be distributed or parallelized over samples. I'm not >> sure whether we have exploited that assumption within scikit-learn or >> whether related projects do so. >> >> I see the benefit of using such transformations in a prediction Pipeline, >> and really appreciate this challenge to our assumptions of what "transform" >> means. >> >> Joel >> >> On Tue., 21 Jan. 2020, 11:50 am Charles Pehlivanian, <pehlivaniancharles at >> gmail.com <https://mail.python.org/mailman/listinfo/scikit-learn>> wrote: >> >> >* Not all data transformers have a transform method. For those that do, >> *>* subset invariance is assumed as expressed >> *>* in check_methods_subset_invariance(). It must be the case that >> *>* T.transform(X)[i] == T.transform(X[i:i+1]), e.g. This is true for classic >> *>* projections - PCA, kernel PCA, etc., but not for some manifold learning >> *>* transformers - MDS, SpectralEmbedding, etc. For those, an optimal >> placement >> *>* of the data in space is a constrained optimization, may take into account >> *>* the centroid of the dataset etc. >> *>>* The manifold learners have "batch" oos transform() methods that aren't >> *>* implemented, and wouldn't pass that test. Instead, those that do - >> *>* LocallyLinearEmbedding - use a pointwise version, essentially replacing a >> *>* batch fit with a suboptimal greedy one [for LocallyLinearEmbedding]: >> *>>* for i in range(X.shape[0]): >> *>* X_new[i] = np.dot(self.embedding_[ind[i]].T, weights[i]) >> *>>* Where to implement the batch transform() methods for MDS, >> *>* SpectralEmbedding, LocallyLinearEmbedding, etc? >> *>>* Another verb? Both batch and pointwise versions? The latter is easy to >> *>* implement once the batch version exists. Relax the test conditions? >> *>* transform() is necessary for oos testing, so necessary for cross >> *>* validation. The batch versions should be preferred, although as it >> stands, >> *>* the pointwise versions are. >> *>>* Thanks >> *>* Charles Pehlivanian >> *>* _______________________________________________ >> *>* scikit-learn mailing list >> *>* scikit-learn at python.org >> <https://mail.python.org/mailman/listinfo/scikit-learn> >> *>* https://mail.python.org/mailman/listinfo/scikit-learn >> <https://mail.python.org/mailman/listinfo/scikit-learn> >> *>-------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> <http://mail.python.org/pipermail/scikit-learn/attachments/20200121/b402c42e/attachment.html> >> >> >> >> _______________________________________________ >> scikit-learn mailing >> listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn