[R] factor level issue after subsetting
Dear list, I cannot figure out why, after sub-setting my data, that particular item which I don't want to plot is still in the newly created subset (please see example below). R somehow remembers what was in the original data set. A work around is exporting and importing the new subset. Then it's all fine; but I don't like this idea and was wondering what am I missing here? Thanks! Stefan P.S. I am using R 2.13.2 for Mac. dat-read.csv(~/MyFiles/data.csv) class(dat$treat) [1] factor dat treat yield 1 cont 98.7 2 cont 97.2 3 cont 96.1 4 cont 98.1 5 10 103.0 6 10 101.3 7 10 102.1 8 10 101.9 9 30 121.1 1030 123.1 1130 119.7 1230 118.9 1360 109.9 1460 110.1 1560 113.1 1660 112.3 plot(dat$treat,dat$yield) dat.sub-dat[which(dat$treat!='cont')] class(dat.sub$treat) [1] factor dat.sub treat yield 5 10 103.0 6 10 101.3 7 10 102.1 8 10 101.9 9 30 121.1 1030 123.1 1130 119.7 1230 118.9 1360 109.9 1460 110.1 1560 113.1 1660 112.3 plot(dat.sub$treat,dat.sub$yield) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] factor level issue after subsetting
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Schreiber, Stefan Sent: Tuesday, November 01, 2011 2:29 PM To: r-help@r-project.org Subject: [R] factor level issue after subsetting Dear list, I cannot figure out why, after sub-setting my data, that particular item which I don't want to plot is still in the newly created subset (please see example below). R somehow remembers what was in the original data set. That is the nature of factors. Once created, unused levels must be xplicitly dropped plot(droplevels(dat.sub$treat),dat.sub$yield) Hope this is helpful, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 A work around is exporting and importing the new subset. Then it's all fine; but I don't like this idea and was wondering what am I missing here? Thanks! Stefan P.S. I am using R 2.13.2 for Mac. dat-read.csv(~/MyFiles/data.csv) class(dat$treat) [1] factor dat treat yield 1 cont 98.7 2 cont 97.2 3 cont 96.1 4 cont 98.1 5 10 103.0 6 10 101.3 7 10 102.1 8 10 101.9 9 30 121.1 1030 123.1 1130 119.7 1230 118.9 1360 109.9 1460 110.1 1560 113.1 1660 112.3 plot(dat$treat,dat$yield) dat.sub-dat[which(dat$treat!='cont')] class(dat.sub$treat) [1] factor dat.sub treat yield 5 10 103.0 6 10 101.3 7 10 102.1 8 10 101.9 9 30 121.1 1030 123.1 1130 119.7 1230 118.9 1360 109.9 1460 110.1 1560 113.1 1660 112.3 plot(dat.sub$treat,dat.sub$yield) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] factor level issue after subsetting
first of all, the subsetting line is overly complicated. dat.sub-dat[dat$treat!='cont',] will work just fine. R does exactly what you're describing. It knows the levels of the factor. Once you remove 'cont' from the data, that doesn't mean that the level is removed from the factor: df-data.frame(let=factor(sample(letters[1:5],100,replace=T)),num=rnorm(100)) str(df) 'data.frame': 100 obs. of 2 variables: $ let: Factor w/ 5 levels a,b,c,d,..: 1 5 1 4 3 5 2 2 1 3 ... $ num: num 0.224 -0.523 0.974 -0.268 -0.61 ... df.sub-df[df$let!='a',] str(df.sub) 'data.frame': 82 obs. of 2 variables: $ let: Factor w/ 5 levels a,b,c,d,..: 5 4 3 5 2 2 3 3 5 3 ... $ num: num -0.523 -0.268 -0.61 -1.383 -0.193 ... unique(df.sub$let) [1] e d c b Levels: a b c d e df.sub$let-factor(df.sub$let) unique(df.sub$let) [1] e d c b Levels: e d c b str(df.sub$let) Factor w/ 4 levels e,d,c,b: 1 2 3 1 4 4 3 3 1 3 ... by redefining your factor you can eliminate the problem. the other option, if you don't want factors to begin with is: options(stringsAsFactors=FALSE) # to set the global option or dat-read.csv(~/MyFiles/data.csv,stringsAsFactors=FALSE) # to set the option locally for this single read.csv call. On Tue, Nov 1, 2011 at 2:28 PM, Schreiber, Stefan stefan.schrei...@ales.ualberta.ca wrote: Dear list, I cannot figure out why, after sub-setting my data, that particular item which I don't want to plot is still in the newly created subset (please see example below). R somehow remembers what was in the original data set. A work around is exporting and importing the new subset. Then it's all fine; but I don't like this idea and was wondering what am I missing here? Thanks! Stefan P.S. I am using R 2.13.2 for Mac. dat-read.csv(~/MyFiles/data.csv) class(dat$treat) [1] factor dat treat yield 1 cont 98.7 2 cont 97.2 3 cont 96.1 4 cont 98.1 5 10 103.0 6 10 101.3 7 10 102.1 8 10 101.9 9 30 121.1 10 30 123.1 11 30 119.7 12 30 118.9 13 60 109.9 14 60 110.1 15 60 113.1 16 60 112.3 plot(dat$treat,dat$yield) dat.sub-dat[which(dat$treat!='cont')] class(dat.sub$treat) [1] factor dat.sub treat yield 5 10 103.0 6 10 101.3 7 10 102.1 8 10 101.9 9 30 121.1 10 30 123.1 11 30 119.7 12 30 118.9 13 60 109.9 14 60 110.1 15 60 113.1 16 60 112.3 plot(dat.sub$treat,dat.sub$yield) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] factor level issue after subsetting
Stefan: Use the droplevels function... dat - read.table(textConnection( treat yield 1 cont 98.7 2 cont 97.2 3 cont 96.1 4 cont 98.1 5 10 103.0 6 10 101.3 7 10 102.1 8 10 101.9 9 30 121.1 10 30 123.1 11 30 119.7 12 30 118.9 13 60 109.9 14 60 110.1 15 60 113.1 16 60 112.3),header=T) dat plot(dat$treat,dat$yield) dat.sub - subset(dat,treat!=cont);dat.sub dat.sub - droplevels(dat.sub) # drop unwanted levels plot(dat.sub$treat,dat.sub$yield) Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA http://www.fws.gov/redbluff/rbdd_jsmp.aspx From: Schreiber, Stefan stefan.schrei...@ales.ualberta.ca To: r-help@r-project.org Sent: Tuesday, November 1, 2011 2:28 PM Subject: [R] factor level issue after subsetting Dear list, I cannot figure out why, after sub-setting my data, that particular item which I don't want to plot is still in the newly created subset (please see example below). R somehow remembers what was in the original data set. A work around is exporting and importing the new subset. Then it's all fine; but I don't like this idea and was wondering what am I missing here? Thanks! Stefan P.S. I am using R 2.13.2 for Mac. dat-read.csv(~/MyFiles/data.csv) class(dat$treat) [1] factor dat treat yield 1 cont 98.7 2 cont 97.2 3 cont 96.1 4 cont 98.1 5 10 103.0 6 10 101.3 7 10 102.1 8 10 101.9 9 30 121.1 10 30 123.1 11 30 119.7 12 30 118.9 13 60 109.9 14 60 110.1 15 60 113.1 16 60 112.3 plot(dat$treat,dat$yield) dat.sub-dat[which(dat$treat!='cont')] class(dat.sub$treat) [1] factor dat.sub treat yield 5 10 103.0 6 10 101.3 7 10 102.1 8 10 101.9 9 30 121.1 10 30 123.1 11 30 119.7 12 30 118.9 13 60 109.9 14 60 110.1 15 60 113.1 16 60 112.3 plot(dat.sub$treat,dat.sub$yield) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] factor level issue after subsetting
Thanks for the fast response and your comments! That works perfect! Another little mystery solved ;) Stefan From: Felipe Carrillo [mailto:mazatlanmex...@yahoo.com] Sent: Tuesday, November 01, 2011 3:54 PM To: Schreiber, Stefan; r-help@r-project.org Subject: Re: [R] factor level issue after subsetting Stefan: Use the droplevels function... dat - read.table(textConnection( treat yield 1 cont 98.7 2 cont 97.2 3 cont 96.1 4 cont 98.1 510 103.0 610 101.3 710 102.1 810 101.9 930 121.1 1030 123.1 1130 119.7 1230 118.9 1360 109.9 1460 110.1 1560 113.1 1660 112.3),header=T) dat plot(dat$treat,dat$yield) dat.sub - subset(dat,treat!=cont);dat.sub dat.sub - droplevels(dat.sub)# drop unwanted levels plot(dat.sub$treat,dat.sub$yield) Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA http://www.fws.gov/redbluff/rbdd_jsmp.aspx From: Schreiber, Stefan stefan.schrei...@ales.ualberta.ca To: r-help@r-project.org Sent: Tuesday, November 1, 2011 2:28 PM Subject: [R] factor level issue after subsetting Dear list, I cannot figure out why, after sub-setting my data, that particular item which I don't want to plot is still in the newly created subset (please see example below). R somehow remembers what was in the original data set. A work around is exporting and importing the new subset. Then it's all fine; but I don't like this idea and was wondering what am I missing here? Thanks! Stefan P.S. I am using R 2.13.2 for Mac. dat-read.csv(~/MyFiles/data.csv) class(dat$treat) [1] factor dat treat yield 1 cont 98.7 2 cont 97.2 3 cont 96.1 4 cont 98.1 510 103.0 610 101.3 710 102.1 810 101.9 930 121.1 1030 123.1 1130 119.7 1230 118.9 1360 109.9 1460 110.1 1560 113.1 1660 112.3 plot(dat$treat,dat$yield) dat.sub-dat[which(dat$treat!='cont')] class(dat.sub$treat) [1] factor dat.sub treat yield 510 103.0 610 101.3 710 102.1 810 101.9 930 121.1 1030 123.1 1130 119.7 1230 118.9 1360 109.9 1460 110.1 1560 113.1 1660 112.3 plot(dat.sub$treat,dat.sub$yield) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.