Hi there,

This question relates to Pandas and visualisation as well ask sklearn, so
apologies if I am asking on the wrong list.

I have a dataset (imported as a Pandas data frame) that has a reasonably
large number of columns (~15) and I want to use PCA on the data.

The first column of the data frame is a string that describes each row.
e.g.:

  Sample      m1    m2    ...
  -----------------------------------
  sample1     0.1    0.2  ...
  ...

The "fit" function in sklearn.decomposition.PCA does not expect columns to
contain strings, so to perform PCA I'm removing the first column of the
data frame, like this:

  df_no_strings = df.drop("Sample", axis=1, inplace=False)
  pca = PCA(n_components=4)
  pca.fit(df_no_strings)
  print("Explained variance:", pca.explained_variance_)
  df_reduced = pca.fit_transform(df_no_strings)


I want to plot the results of performing PCA as a scatter plot, similarly
to Figure 3 on Page 10 of this document:

http://www.dacapobench.org/dacapo-TR-CS-06-01.pdf

Is there an easy way to do this, given that I have lost the first column of
the data frame?

Many thanks,

Sarah

-- 
Dr. Sarah Mount, Senior Lecturer, University of Wolverhampton
website:  http://www.snim2.org/
twitter: @snim2
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to