Looking at your data there are several issues. 1. Tank is an integer, but it sounds like you intend to use it as a categorical measure. If so that it should be a factor, but factors cannot be used in pca. Is Tank 10 10 times more of something than Tank 1?

## Advertising

2. Date is a factor. That means you are not measuring time, just the fact that 2 rows are the same time or different time. Factors cannot be used in pca. 3. Treatment is a factor, but factors cannot be used in pca. 4. Your log transformed data has many 0's and no negative values. Did you add 1 to each value before taking logarithms? First line of your code after reading the data: > meso.pca <- prcomp(mesocleaned, center=TRUE, scale.=TRUE) Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric > scale. = TRUE) ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Sarah Stinson Sent: Wednesday, September 21, 2016 10:25 AM To: r-help@r-project.org Subject: Re: [R] Help with PCA data file prep and R code Hello DRUGs, I'm new to R and would appreciate some expert advice on prepping files for, and running, PCA... My data set consists of aquatic invertebrate and zooplankton count data and physicochemical measurements from an ecotoxicology study. Four chemical treatments were applied to mesocosm tanks, 4 replicates per treatment (16 tanks total), then data were collected weekly over a 3 month period. I cleaned the data in excel by removing columns with all zero values, and all rows with NA values. All zooplankton values were volume normalized, then log normalized. All other data was log normalized in excel prior to analysis in R. All vectorss are numeric. I've attached the .txt file to this email rather that using dput(dataframe). My questions are: 1. Did I do the cleaning step appropriately? I know that there are ways to run PCA's using data that contain NA values (pcaMethods), but wasn't able to get the code to work... (I understand that this isn't strictly an R question, but any help would be appreciated.) 2. Does my code look correct for the PCA and visualization (see below)? Thanks in advance, Sarah #read data mesocleaned <- read.csv("MesoCleanedforPCA.9.16.16.csv") #run PCA meso.pca <- prcomp(mesocleaned, center = TRUE, scale. = TRUE) # print method print(meso.pca) #compute standard deviation of each principal component std_dev <- meso.pca$sdev #compute variance pr_var <- std_dev^2 #check variance of first 10 components pr_var[1:10] #proportion of variance explained prop_varex <- pr_var/sum(pr_var) prop_varex[1:20] #The first principal component explains 12.7% of the variance #The second explains 8.1% #visualize biplot(meso.pca) #for visualization, make Treatment vector a factor instead of numeric meso.treatment <- as.factor(mesocleaned[, 3]) #ggbiplot to visualize by Treatment group #reference: https://www.r-bloggers.com/computing-and-visualizing-pca-in-r/ library(devtools) install_github("ggbiplot", "vqv") library(ggbiplot) print(ggbiplot(meso.pca, obs.scale = 1, var.scale = 1, groups = meso.treatment, ellipse = TRUE, circle = TRUE)) g <- ggbiplot(meso.pca, obs.scale = 1, var.scale = 1, groups = meso.treatment, ellipse = TRUE, circle = TRUE) g <- g + scale_color_brewer(name = deparse(substitute(Treatments)), palette = 'Dark2') #must change meso.treatment to a factor for this to work g <- g + theme(legend.direction = 'horizontal', legend.position = 'top') print(g) #Circle plot #plot each variables coefficients inside a unit circle to get insight on a possible interpretation for PCs. #reference: https://www.r-bloggers.com/computing-and-visualizing-pca-in-r/ theta <- seq(0,2*pi,length.out = 100) circle <- data.frame(x = cos(theta), y = sin(theta)) p <- ggplot(circle,aes(x,y)) + geom_path() loadings <- data.frame(meso.pca$rotation, .names = row.names(meso.pca$rotation)) p + geom_text(data=loadings, mapping=aes(x = PC1, y = PC2, label = .names, colour = .names)) + coord_fixed(ratio=1) + labs(x = "PC1", y = "PC2") On Tue, Sep 20, 2016 at 10:28 PM, Sarah Stinson <sastin...@ucdavis.edu> wrote: > Hello DRUGs, > I'm new to R and would appreciate some expert advice on prepping files > for, and running, PCA... > > My data set consists of aquatic invertebrate and zooplankton count data > and physicochemical measurements from an ecotoxicology study. Four chemical > treatments were applied to mesocosm tanks, 4 replicates per treatment (16 > tanks total), then data were collected weekly over a 3 month period. > > I cleaned the data in excel by removing columns with all zero values, and > all rows with NA values. > All zooplankton values were volume normalized, then log normalized. All > other data was log normalized in excel prior to analysis in R. All vectorss > are numeric. I've attached the .csv file to this email rather that using > dput(dataframe). I hope that's acceptable. > > My questions are: > > 1. Did I do the cleaning step appropriately? I know that there are ways to > run PCA's using data that contain NA values (pcaMethods), but wasn't able > to get the code to work... > (I understand that this isn't strictly an R question, but any help would > be appreciated.) > 2. Does my code look correct for the PCA and visualization (see below)? > > Thanks in advance, > Sarah > > #read data > mesocleaned <- read.csv("MesoCleanedforPCA.9.16.16.csv") > > #run PCA > meso.pca <- prcomp(mesocleaned, > center = TRUE, > scale. = TRUE) > > # print method > print(meso.pca) > > #compute standard deviation of each principal component > std_dev <- meso.pca$sdev > > #compute variance > pr_var <- std_dev^2 > > #check variance of first 10 components > pr_var[1:10] > > #proportion of variance explained > prop_varex <- pr_var/sum(pr_var) > prop_varex[1:20] > > #The first principal component explains 12.7% of the variance > #The second explains 8.1% > > #visualize > biplot(meso.pca) > > #for visualization, make Treatment vector a factor instead of numeric > meso.treatment <- as.factor(mesocleaned[, 3]) > > #ggbiplot to visualize by Treatment group > #reference: https://www.r-bloggers.com/computing-and-visualizing-pca-in-r/ > > library(devtools) > install_github("ggbiplot", "vqv") > library(ggbiplot) > > print(ggbiplot(meso.pca, obs.scale = 1, var.scale = 1, groups = > meso.treatment, ellipse = TRUE, circle = TRUE)) > g <- ggbiplot(meso.pca, obs.scale = 1, var.scale = 1, > groups = meso.treatment, ellipse = TRUE, > circle = TRUE) > g <- g + scale_color_brewer(name = deparse(substitute(Treatments)), > palette = 'Dark2') #must change meso.treatment to a factor for this to work > g <- g + theme(legend.direction = 'horizontal', > legend.position = 'top') > print(g) > > #Circle plot > #plot each variables coefficients inside a unit circle to get insight on a > possible interpretation for PCs. > #reference: https://www.r-bloggers.com/computing-and-visualizing-pca-in-r/ > > theta <- seq(0,2*pi,length.out = 100) > circle <- data.frame(x = cos(theta), y = sin(theta)) > p <- ggplot(circle,aes(x,y)) + geom_path() > > loadings <- data.frame(meso.pca$rotation, > .names = row.names(meso.pca$rotation)) > p + geom_text(data=loadings, > mapping=aes(x = PC1, y = PC2, label = .names, colour = > .names)) + > coord_fixed(ratio=1) + > labs(x = "PC1", y = "PC2") > > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.