Re: [R] Subsetting multiple rows of a data frame at once
Hi Anika, ?merge() is a better solution. To get the row.names intact, you could do: carbon.fit- within(carbon.fit,{x-round(x,10);y- round(y,10)}) #Using Bill's solution dat1- data.frame(x=round(xt,10),y=round(yt,10)) carbon.fit1- data.frame(carbon.fit,rNames=row.names(carbon.fit),stringsAsFactors=FALSE) #changed here res1-merge(dat1,carbon.fit1,by=c(x,y)) row.names(res1)- res1[,3] res1- res1[,-3] A.K. - Original Message - From: William Dunlap wdun...@tibco.com To: arun smartpink...@yahoo.com; Shaun ♥ Anika pro_pa...@hotmail.com Cc: R help r-help@r-project.org Sent: Thursday, July 4, 2013 8:02 PM Subject: RE: [R] Subsetting multiple rows of a data frame at once xt- c(1.05, 2.85, 3.40, 4.25, 0.25, 3.05, 3.70, 0.20, 0.30, 0.70, 1.05, 1.20, 1.40, 1.90, 2.70, 3.25, 3.55, 4.60, 2.05, 2.15, 3.70, 4.85, 4.90, 1.60, 2.45, 3.20, 3.90, 4.45) yt- c(0.25, 0.10, 0.90, 0.25, 1.05, 1.70, 2.05, 2.90, 2.35, 2.60, 2.55, 2.15, 2.75, 2.05, 2.70, 2.25, 2.55, 2.05, 3.65, 3.05, 3.00, 3.50, 3.75, 4.85, 4.50, 4.50, 3.35, 4.90) carbon.fit = expand.grid(list(x=seq(0, 5, 0.01), y=seq(0, 5, 0.01))) trees-do.call(rbind,lapply(seq_along(xt),function(i) subset(carbon.fit,x==xt[i]y==yt[i]))) ## xt is 28 integers long and when i run the above code it only returns the values of 18 out of the 28 (xt,yt) pairs that i want. You are running into the problem that two different computational methods that give the same result when applied to real numbers often give different results when applied to 64-bit floating point numbers. (In your case you expect seq(0,5,.01) to contain, e.g., the floating point number generate by parsing the string 3.05.) Hence x==y is not true when you expect it to be. Here is where your 18 came from: R table(xt %in% carbon.fit$x, yt %in% carbon.fit$y) FALSE TRUE FALSE 1 6 TRUE 3 18 Round your number to the nearest 10^-10 and you get table(round(xt,10) %in% round(carbon.fit$x,10), round(yt,10) %in% round(carbon.fit$y,10)) TRUE TRUE 28 By the way, you may prefer using the merge() function rather than the do.call(rbind,lapply(...))) business. I think the following call to merge will do about what you want (the row names differ - if they are important it is possible to get them with some minor trickery): merge(data.frame(x=xt,y=yt), carbon.fit) (You still want to round your numbers as before.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of arun Sent: Wednesday, July 03, 2013 10:15 PM To: Shaun ♥ Anika Cc: R help Subject: Re: [R] Subsetting multiple rows of a data frame at once Hi, carbon.fit = expand.grid(list(x=seq(0, 5, 0.01), y=seq(0, 5, 0.01))) dim(carbon.fit) #[1] 251001 2 xtNew-sprintf(%.2f,xt) ytNew- sprintf(%.2f,yt) carbon.fit[]- lapply(carbon.fit,function(x) sprintf(%.2f,x)) res-do.call(rbind,lapply(seq_along(xtNew),function(i) subset(carbon.fit,x==xtNew[i]y==ytNew[i]))) nrow(res) #[1] 28 res # x y #12631 1.05 0.25 #5296 2.85 0.10 #45431 3.40 0.90 #12951 4.25 0.25 #52631 0.25 1.05 #85476 3.05 1.70 #103076 3.70 2.05 #145311 0.20 2.90 #117766 0.30 2.35 #130331 0.70 2.60 #127861 1.05 2.55 #107836 1.20 2.15 #137916 1.40 2.75 #102896 1.90 2.05 #135541 2.70 2.70 #113051 3.25 2.25 #128111 3.55 2.55 #103166 4.60 2.05 #183071 2.05 3.65 #153021 2.15 3.05 #150671 3.70 3.00 #175836 4.85 3.50 #188366 4.90 3.75 #243146 1.60 4.85 #225696 2.45 4.50 #225771 3.20 4.50 #168226 3.90 3.35 #245936 4.45 4.90 A.K. From: Shaun ♥ Anika pro_pa...@hotmail.com To: smartpink...@yahoo.com smartpink...@yahoo.com Sent: Thursday, July 4, 2013 12:08 AM Subject: RE: Subsetting multiple rows of a data frame at once Hi There, i can give you the data needed to perform this task... library(akima) library(fields) xt- c(1.05, 2.85, 3.40, 4.25, 0.25, 3.05, 3.70, 0.20, 0.30, 0.70, 1.05, 1.20, 1.40, 1.90, 2.70, 3.25, 3.55, 4.60, 2.05, 2.15, 3.70, 4.85, 4.90, 1.60, 2.45, 3.20, 3.90, 4.45) yt- c(0.25, 0.10, 0.90, 0.25, 1.05, 1.70, 2.05, 2.90, 2.35, 2.60, 2.55, 2.15, 2.75, 2.05, 2.70, 2.25, 2.55, 2.05, 3.65, 3.05, 3.00, 3.50, 3.75, 4.85, 4.50, 4.50, 3.35, 4.90) xs- c(0.45, 1.05, 2.75, 3.30, 4.95, 0.40, 1.05, 2.30, 3.45, 4.60, 0.05, 1.95, 2.95, 3.70, 4.55, 0.75, 1.60, 2.10, 3.60, 4.90, 0.05, 1.35, 2.60, 3.40, 4.25) ys- c(0.45, 0.95, 0.75, 0.95, 0.10, 1.90, 1.45, 1.25, 1.45, 1.05, 2.85, 2.60, 2.05, 2.60, 2.55, 3.75, 3.30, 3.95, 3.45, 3.70, 4.95, 4.35, 4.55, 4.40, 4.95) carbon- c(1.43, 1.82, 1.40, 1.43, 1.96, 1.61, 1.91, 1.53, 1.17, 1.83, 2.43, 2.02, 1.66, 2.45, 2.46, 1.39, 1.10, 1.38, 1.91, 2.13, 1.88, 1.26, 2.15, 1.89, 1.69) carbon.df=data.frame(x=xs,y=ys,z=carbon) carbon.loess= loess(z~x*y, data= carbon.df, degree= 2
Re: [R] Subsetting multiple rows of a data frame at once
Hi, Possibly, FAQ 7.31 Using the same example: set.seed(24) df- data.frame(x=sample(seq(0.25,4.25,by=.05),1e5,replace=TRUE),y= sample(seq(0.10,1.05,by=.05),1e5,replace=TRUE),z=rnorm(1e5)) dfOld- df df[,1:2]- lapply(df[,1:2],function(x) sprintf(%.2f,x)) x1- c(1.05,2.85,3.40,4.25,0.25) y1- c(0.25,0.10,0.90,0.25,1.05) x1New-sprintf(%.2f,x1) y1New- sprintf(%.2f,y1) res1-do.call(rbind,lapply(seq_along(x1New),function(i) subset(df,x==x1New[i]y==y1New[i]))) res-do.call(rbind,lapply(seq_along(x1),function(i) subset(dfOld,x==x1[i]y==y1[i]))) dim(res1) #[1] 318 3 dim(res) #[1] 250 3 res1[,1:2]- lapply(res1[,1:2],as.numeric) str(res1) #'data.frame': 318 obs. of 3 variables: # $ x: num 1.05 1.05 1.05 1.05 1.05 1.05 1.05 1.05 1.05 1.05 ... # $ y: num 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 ... # $ z: num 0.787 -1.568 -1.626 -0.221 -0.7 ... A.K. nevermind error on my behalf got it going. I have another issue, it leaves some values out. ive seperately searched the df and theyre definitely in there... so it there some sort of exclusion rule? there are about 8 of the 28 missing... the first row missing is 3.05,1.70 . i looked up the documentation for subset but i cant see why it would skip ones... thanks - Original Message - From: arun smartpink...@yahoo.com To: R help r-help@r-project.org Cc: Sent: Wednesday, July 3, 2013 7:37 AM Subject: Re: Subsetting multiple rows of a data frame at once Hi, Try this: set.seed(24) df- data.frame(x=sample(seq(0.25,4.25,by=.05),1e5,replace=TRUE),y= sample(seq(0.10,1.05,by=.05),1e5,replace=TRUE),z=rnorm(1e5)) #Used a shorter vector x1- c(1.05,2.85,3.40,4.25,0.25) y1- c(0.25,0.10,0.90,0.25,1.05) res-do.call(rbind,lapply(seq_along(x1),function(i) subset(df,x==x1[i]y==y1[i]))) head(res,2) # x y z #466 1.05 0.25 0.7865224 #4119 1.05 0.25 -1.5679096 tail(res,2) # x y z #98120 0.25 1.05 -2.1239596 #98178 0.25 1.05 0.3321464 A.K. Hi Everyone, First time poster so any posting rules i should know about feel free to advise... I've got a data frame of 250 000 rows in columns of x y and z. i need to extract 20-30 rows from the data frame with specific x and y values, such that i can find the z value that corresponds. There is no repeated data. (its actually 250 000 squares in a 5x5m grid) to find them individually i can use subset successfully result-subset(df,x==1.05 y==c0.25) gives me the row in the dataframe with that x and y value. so if i have x = 1.05 2.85 3.40 4.25 0.25 3.05 3.70 0.20 0.30 0.70 1.05 1.20 1.40 1.90 2.70 3.25 3.55 4.60 2.05 2.15 3.70 4.85 4.90 1.60 2.45 3.20 3.90 4.45 and y= 0.25 0.10 0.90 0.25 1.05 1.70 2.05 2.90 2.35 2.60 2.55 2.15 2.75 2.05 2.70 2.25 2.55 2.05 3.65 3.05 3.00 3.50 3.75 4.85 4.50 4.50 3.35 4.90 then how can i retrieve the rows for all those values at once. if i name x=xt and y=yt and then result-subset(df,x==xt y==yt) then i get result [1] x y Height 0 rows (or 0-length row.names) i dont understand why zero rows are selected. obviously im applying the vectors inappropriately, but i cant seem to find anything on this method of subsetting online. Thanks for any replies! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting multiple rows of a data frame at once
Hi, carbon.fit = expand.grid(list(x=seq(0, 5, 0.01), y=seq(0, 5, 0.01))) dim(carbon.fit) #[1] 251001 2 xtNew-sprintf(%.2f,xt) ytNew- sprintf(%.2f,yt) carbon.fit[]- lapply(carbon.fit,function(x) sprintf(%.2f,x)) res-do.call(rbind,lapply(seq_along(xtNew),function(i) subset(carbon.fit,x==xtNew[i]y==ytNew[i]))) nrow(res) #[1] 28 res # x y #12631 1.05 0.25 #5296 2.85 0.10 #45431 3.40 0.90 #12951 4.25 0.25 #52631 0.25 1.05 #85476 3.05 1.70 #103076 3.70 2.05 #145311 0.20 2.90 #117766 0.30 2.35 #130331 0.70 2.60 #127861 1.05 2.55 #107836 1.20 2.15 #137916 1.40 2.75 #102896 1.90 2.05 #135541 2.70 2.70 #113051 3.25 2.25 #128111 3.55 2.55 #103166 4.60 2.05 #183071 2.05 3.65 #153021 2.15 3.05 #150671 3.70 3.00 #175836 4.85 3.50 #188366 4.90 3.75 #243146 1.60 4.85 #225696 2.45 4.50 #225771 3.20 4.50 #168226 3.90 3.35 #245936 4.45 4.90 A.K. From: Shaun ♥ Anika pro_pa...@hotmail.com To: smartpink...@yahoo.com smartpink...@yahoo.com Sent: Thursday, July 4, 2013 12:08 AM Subject: RE: Subsetting multiple rows of a data frame at once Hi There, i can give you the data needed to perform this task... library(akima) library(fields) xt- c(1.05, 2.85, 3.40, 4.25, 0.25, 3.05, 3.70, 0.20, 0.30, 0.70, 1.05, 1.20, 1.40, 1.90, 2.70, 3.25, 3.55, 4.60, 2.05, 2.15, 3.70, 4.85, 4.90, 1.60, 2.45, 3.20, 3.90, 4.45) yt- c(0.25, 0.10, 0.90, 0.25, 1.05, 1.70, 2.05, 2.90, 2.35, 2.60, 2.55, 2.15, 2.75, 2.05, 2.70, 2.25, 2.55, 2.05, 3.65, 3.05, 3.00, 3.50, 3.75, 4.85, 4.50, 4.50, 3.35, 4.90) xs- c(0.45, 1.05, 2.75, 3.30, 4.95, 0.40, 1.05, 2.30, 3.45, 4.60, 0.05, 1.95, 2.95, 3.70, 4.55, 0.75, 1.60, 2.10, 3.60, 4.90, 0.05, 1.35, 2.60, 3.40, 4.25) ys- c(0.45, 0.95, 0.75, 0.95, 0.10, 1.90, 1.45, 1.25, 1.45, 1.05, 2.85, 2.60, 2.05, 2.60, 2.55, 3.75, 3.30, 3.95, 3.45, 3.70, 4.95, 4.35, 4.55, 4.40, 4.95) carbon- c(1.43, 1.82, 1.40, 1.43, 1.96, 1.61, 1.91, 1.53, 1.17, 1.83, 2.43, 2.02, 1.66, 2.45, 2.46, 1.39, 1.10, 1.38, 1.91, 2.13, 1.88, 1.26, 2.15, 1.89, 1.69) carbon.df=data.frame(x=xs,y=ys,z=carbon) carbon.loess= loess(z~x*y, data= carbon.df, degree= 2) carbon.fit = expand.grid(list(x=seq(0, 5, 0.01), y=seq(0, 5, 0.01))) z=predict(carbon.loess, newdata= carbon.fit) carbon.fit$Height=as.numeric(z) image.plot(seq(0,5,0.01,), seq(0,5,0.01), z, xlab = , ylab=,main = Carbon) trees-do.call(rbind,lapply(seq_along(xt),function(i) subset(carbon.fit,x==xt[i]y==yt[i]))) ## xt is 28 integers long and when i run the above code it only returns the values of 18 out of the 28 (xt,yt) pairs that i want. thanks for your help!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting multiple rows of a data frame at once
xt- c(1.05, 2.85, 3.40, 4.25, 0.25, 3.05, 3.70, 0.20, 0.30, 0.70, 1.05, 1.20, 1.40, 1.90, 2.70, 3.25, 3.55, 4.60, 2.05, 2.15, 3.70, 4.85, 4.90, 1.60, 2.45, 3.20, 3.90, 4.45) yt- c(0.25, 0.10, 0.90, 0.25, 1.05, 1.70, 2.05, 2.90, 2.35, 2.60, 2.55, 2.15, 2.75, 2.05, 2.70, 2.25, 2.55, 2.05, 3.65, 3.05, 3.00, 3.50, 3.75, 4.85, 4.50, 4.50, 3.35, 4.90) carbon.fit = expand.grid(list(x=seq(0, 5, 0.01), y=seq(0, 5, 0.01))) trees-do.call(rbind,lapply(seq_along(xt),function(i) subset(carbon.fit,x==xt[i]y==yt[i]))) ## xt is 28 integers long and when i run the above code it only returns the values of 18 out of the 28 (xt,yt) pairs that i want. You are running into the problem that two different computational methods that give the same result when applied to real numbers often give different results when applied to 64-bit floating point numbers. (In your case you expect seq(0,5,.01) to contain, e.g., the floating point number generate by parsing the string 3.05.) Hence x==y is not true when you expect it to be. Here is where your 18 came from: R table(xt %in% carbon.fit$x, yt %in% carbon.fit$y) FALSE TRUE FALSE 16 TRUE 3 18 Round your number to the nearest 10^-10 and you get table(round(xt,10) %in% round(carbon.fit$x,10), round(yt,10) %in% round(carbon.fit$y,10)) TRUE TRUE 28 By the way, you may prefer using the merge() function rather than the do.call(rbind,lapply(...))) business. I think the following call to merge will do about what you want (the row names differ - if they are important it is possible to get them with some minor trickery): merge(data.frame(x=xt,y=yt), carbon.fit) (You still want to round your numbers as before.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of arun Sent: Wednesday, July 03, 2013 10:15 PM To: Shaun ♥ Anika Cc: R help Subject: Re: [R] Subsetting multiple rows of a data frame at once Hi, carbon.fit = expand.grid(list(x=seq(0, 5, 0.01), y=seq(0, 5, 0.01))) dim(carbon.fit) #[1] 251001 2 xtNew-sprintf(%.2f,xt) ytNew- sprintf(%.2f,yt) carbon.fit[]- lapply(carbon.fit,function(x) sprintf(%.2f,x)) res-do.call(rbind,lapply(seq_along(xtNew),function(i) subset(carbon.fit,x==xtNew[i]y==ytNew[i]))) nrow(res) #[1] 28 res # x y #12631 1.05 0.25 #5296 2.85 0.10 #45431 3.40 0.90 #12951 4.25 0.25 #52631 0.25 1.05 #85476 3.05 1.70 #103076 3.70 2.05 #145311 0.20 2.90 #117766 0.30 2.35 #130331 0.70 2.60 #127861 1.05 2.55 #107836 1.20 2.15 #137916 1.40 2.75 #102896 1.90 2.05 #135541 2.70 2.70 #113051 3.25 2.25 #128111 3.55 2.55 #103166 4.60 2.05 #183071 2.05 3.65 #153021 2.15 3.05 #150671 3.70 3.00 #175836 4.85 3.50 #188366 4.90 3.75 #243146 1.60 4.85 #225696 2.45 4.50 #225771 3.20 4.50 #168226 3.90 3.35 #245936 4.45 4.90 A.K. From: Shaun ♥ Anika pro_pa...@hotmail.com To: smartpink...@yahoo.com smartpink...@yahoo.com Sent: Thursday, July 4, 2013 12:08 AM Subject: RE: Subsetting multiple rows of a data frame at once Hi There, i can give you the data needed to perform this task... library(akima) library(fields) xt- c(1.05, 2.85, 3.40, 4.25, 0.25, 3.05, 3.70, 0.20, 0.30, 0.70, 1.05, 1.20, 1.40, 1.90, 2.70, 3.25, 3.55, 4.60, 2.05, 2.15, 3.70, 4.85, 4.90, 1.60, 2.45, 3.20, 3.90, 4.45) yt- c(0.25, 0.10, 0.90, 0.25, 1.05, 1.70, 2.05, 2.90, 2.35, 2.60, 2.55, 2.15, 2.75, 2.05, 2.70, 2.25, 2.55, 2.05, 3.65, 3.05, 3.00, 3.50, 3.75, 4.85, 4.50, 4.50, 3.35, 4.90) xs- c(0.45, 1.05, 2.75, 3.30, 4.95, 0.40, 1.05, 2.30, 3.45, 4.60, 0.05, 1.95, 2.95, 3.70, 4.55, 0.75, 1.60, 2.10, 3.60, 4.90, 0.05, 1.35, 2.60, 3.40, 4.25) ys- c(0.45, 0.95, 0.75, 0.95, 0.10, 1.90, 1.45, 1.25, 1.45, 1.05, 2.85, 2.60, 2.05, 2.60, 2.55, 3.75, 3.30, 3.95, 3.45, 3.70, 4.95, 4.35, 4.55, 4.40, 4.95) carbon- c(1.43, 1.82, 1.40, 1.43, 1.96, 1.61, 1.91, 1.53, 1.17, 1.83, 2.43, 2.02, 1.66, 2.45, 2.46, 1.39, 1.10, 1.38, 1.91, 2.13, 1.88, 1.26, 2.15, 1.89, 1.69) carbon.df=data.frame(x=xs,y=ys,z=carbon) carbon.loess= loess(z~x*y, data= carbon.df, degree= 2) carbon.fit = expand.grid(list(x=seq(0, 5, 0.01), y=seq(0, 5, 0.01))) z=predict(carbon.loess, newdata= carbon.fit) carbon.fit$Height=as.numeric(z) image.plot(seq(0,5,0.01,), seq(0,5,0.01), z, xlab = , ylab=,main = Carbon) trees-do.call(rbind,lapply(seq_along(xt),function(i) subset(carbon.fit,x==xt[i]y==yt[i]))) ## xt is 28 integers long and when i run the above code it only returns the values of 18 out of the 28 (xt,yt) pairs that i want. thanks for your help!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
Re: [R] Subsetting multiple rows of a data frame at once
Hi, Try this: set.seed(24) df- data.frame(x=sample(seq(0.25,4.25,by=.05),1e5,replace=TRUE),y= sample(seq(0.10,1.05,by=.05),1e5,replace=TRUE),z=rnorm(1e5)) #Used a shorter vector x1- c(1.05,2.85,3.40,4.25,0.25) y1- c(0.25,0.10,0.90,0.25,1.05) res-do.call(rbind,lapply(seq_along(x1),function(i) subset(df,x==x1[i]y==y1[i]))) head(res,2) # x y z #466 1.05 0.25 0.7865224 #4119 1.05 0.25 -1.5679096 tail(res,2) # x y z #98120 0.25 1.05 -2.1239596 #98178 0.25 1.05 0.3321464 A.K. Hi Everyone, First time poster so any posting rules i should know about feel free to advise... I've got a data frame of 250 000 rows in columns of x y and z. i need to extract 20-30 rows from the data frame with specific x and y values, such that i can find the z value that corresponds. There is no repeated data. (its actually 250 000 squares in a 5x5m grid) to find them individually i can use subset successfully result-subset(df,x==1.05 y==c0.25) gives me the row in the dataframe with that x and y value. so if i have x = 1.05 2.85 3.40 4.25 0.25 3.05 3.70 0.20 0.30 0.70 1.05 1.20 1.40 1.90 2.70 3.25 3.55 4.60 2.05 2.15 3.70 4.85 4.90 1.60 2.45 3.20 3.90 4.45 and y= 0.25 0.10 0.90 0.25 1.05 1.70 2.05 2.90 2.35 2.60 2.55 2.15 2.75 2.05 2.70 2.25 2.55 2.05 3.65 3.05 3.00 3.50 3.75 4.85 4.50 4.50 3.35 4.90 then how can i retrieve the rows for all those values at once. if i name x=xt and y=yt and then result-subset(df,x==xt y==yt) then i get result [1] x y Height 0 rows (or 0-length row.names) i dont understand why zero rows are selected. obviously im applying the vectors inappropriately, but i cant seem to find anything on this method of subsetting online. Thanks for any replies! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subsetting by rows
Dear all, I would like to know how to subset a data.frame by rows. Example: Probesets348843488834892 1 19676_at A AA 2 10001_atP P P 3 10002_atA A A 4 10003_atA A A 5 100048912_at P A A For this data.frame I want to retrieve only the rows where at least one ´P´ is found. So in this example it would be rows 2 and 5. Best regards and thanks in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting by rows
Suppose your data is called df. df[rowSums(df == P)0,] In short, this tests each element for equality for P, sums the number of Ps found and subsets when that number is 0. Michael On Wed, Aug 31, 2011 at 1:28 PM, Joao Fadista joao.fadi...@med.lu.sewrote: Dear all, I would like to know how to subset a data.frame by rows. Example: Probesets348843488834892 1 19676_at A AA 2 10001_atP P P 3 10002_atA A A 4 10003_atA A A 5 100048912_at P A A For this data.frame I want to retrieve only the rows where at least one ´P´ is found. So in this example it would be rows 2 and 5. Best regards and thanks in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting by rows
If your data frame is called 'df', then something like the following should work: df[ apply(as.matrix(df[,-1]), 1, function(x) any(x == P)), ] This creates a logical vector as long as the number of rows. As Bill Dunlap recently noted, 'apply' really wants a matrix and not a data frame. On 31/08/2011 18:28, Joao Fadista wrote: Dear all, I would like to know how to subset a data.frame by rows. Example: Probesets348843488834892 1 19676_at A AA 2 10001_atP P P 3 10002_atA A A 4 10003_atA A A 5 100048912_at P A A For this data.frame I want to retrieve only the rows where at least one ´P´ is found. So in this example it would be rows 2 and 5. Best regards and thanks in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pbu...@pburns.seanet.com twitter: @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting by rows
Hi Joao, Here is one way (d is your data set): d[apply(d == 'P', 1, any),] HTH, Jorge On Wed, Aug 31, 2011 at 1:28 PM, Joao Fadista wrote: Dear all, I would like to know how to subset a data.frame by rows. Example: Probesets348843488834892 1 19676_at A AA 2 10001_atP P P 3 10002_atA A A 4 10003_atA A A 5 100048912_at P A A For this data.frame I want to retrieve only the rows where at least one ´P´ is found. So in this example it would be rows 2 and 5. Best regards and thanks in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.