Hi Jason, memory is a problem in our implementation of MNIST. I sent a detailed list of the required memory to this mailing list some month ago. You can find it here:
http://sourceforge.net/p/scikit-learn/mailman/message/33090573/ The number of features is irrelevant. Only the number of samples is important. You have too many samples because the algorithm requires O(n^2) space (in your case probably about 30 GB). I would not use the original t-SNE algorithm for this dataset anyway because the complexity is O(n^2) as well, which means that you would have to wait some days or weeks for the result. There is a new pull request that implements Barnes-Hut t-SNE here: https://github.com/scikit-learn/scikit-learn/pull/4025 The advantage of Barnes-Hut t-SNE in comparison to t-SNE is that you would have a complexity of O(n log n). However, at the moment the full distance matrix is still computed so that would not fix your original problem but I think the memory problem should be solved soon. In your case you could take half of the dataset. The number of features is not critical at all. You can take all 93 features without any dimensionality reduction. Best regards, Alexander Am 2015-04-18 01:48, schrieb Jason Wolosonovich: > Hello All, > > My dataset has 93 features and just under 62,000 observations (61,878 > to be exact). I'm running out of memory right after the mean sigma > value is computed/displayed. I've tried using dimensionality reduction > via TruncatedSVD with n_components set at different levels (78, 50 and > 2 respectively) prior to sending the data to TSNE but I still run out > of memory. For TSNE, n_components=2 and perplexity=40 (I've also tried > 20). I've got 24GB of RAM on my 64-bit windows 7 machine. Should I try > a subsample of the dataset and if so, does anyone have a > recommendation on the size? Thanks! > > -Jason > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live > exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general