Hi Jason,

memory is a problem in our implementation of MNIST. I sent a detailed 
list of the required memory to this mailing list some month ago. You can 
find it here:

http://sourceforge.net/p/scikit-learn/mailman/message/33090573/

The number of features is irrelevant. Only the number of samples is 
important. You have too many samples because the algorithm requires 
O(n^2) space (in your case probably about 30 GB). I would not use the 
original t-SNE algorithm for this dataset anyway because the complexity 
is O(n^2) as well, which means that you would have to wait some days or 
weeks for the result.

There is a new pull request that implements Barnes-Hut t-SNE here:

https://github.com/scikit-learn/scikit-learn/pull/4025

The advantage of Barnes-Hut t-SNE in comparison to t-SNE is that you 
would have a complexity of O(n log n). However, at the moment the full 
distance matrix is still computed so that would not fix your original 
problem but I think the memory problem should be solved soon.

In your case you could take half of the dataset. The number of features 
is not critical at all. You can take all 93 features without any 
dimensionality reduction.

Best regards,

Alexander

Am 2015-04-18 01:48, schrieb Jason Wolosonovich:
> Hello All,
> 
> My dataset has 93 features and just under 62,000 observations (61,878
> to be exact). I'm running out of memory right after the mean sigma
> value is computed/displayed. I've tried using dimensionality reduction
> via TruncatedSVD with n_components set at different levels (78, 50 and
> 2 respectively) prior to sending the data to TSNE but I still run out
> of memory. For TSNE, n_components=2 and perplexity=40 (I've also tried
> 20). I've got 24GB of RAM on my 64-bit windows 7 machine. Should I try
> a subsample of the dataset and if so, does anyone have a
> recommendation on the size? Thanks!
> 
> -Jason
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live 
> exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- 
> event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> 
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to