Re: [R-sig-eco] subsetting data in R
Try pa2$influencia-factor(pa2$influencia) Chris Howden Founding Partner Tricky Solutions Tricky Solutions 4 Tricky Problems Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (fax / office) (+618) 8952 7878 ch...@trickysolutions.com.au -Original Message- From: r-sig-ecology-boun...@r-project.org [mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of Manuel Spínola Sent: Monday, 25 April 2011 12:37 AM To: Christian Parker Cc: r-sig-ecology@r-project.org Subject: Re: [R-sig-eco] subsetting data in R Thank you Christian. Following your suggestion I got the following result, pa2 = subset(pa, influencia==AP) pa2$influencia-as.factor(pa2$influencia) levels(pa$influencia) [1] AID AII AP On 24/04/2011 07:42 a.m., Christian Parker wrote: You are creating a new object, but the columns that are stored as factors are not being 'refactored' so you are retaining the original list of levels. To fix this you can use the factor function after you subset pa2 = subset(pa, influencia==AID) pa2$influencia-as.factor(pa2$influencia) On Apr 24, 2011, at 6:04 AM, Manuel SpC-nolamspinol...@gmail.com wrote: Dear list members, I have a question regarding too subsetting a data set in R. I created an object for my data: pa = read.csv(espec_indic.csv, header = T, sep=,, check.names = F) levels(pa$influencia) [1] AID AII AP The object has 3 levels for influencia (AP, AID, AII) Now I subset only observations with influencia = AID pa2 = subset(pa, influencia==AID) but if I ask for the levels of influencia still show me the 3 levels, AP, AID, AII. levels(pa2$influencia) [1] AID AII AP Why is that? I was thinking that I was creating a new data frame with only AID as a level for influencia. How can I make a complete new object with only the observations for AID and that the only level for influencia is indeed AID? Best, Manuel -- *Manuel SpC-nola, Ph.D.* Instituto Internacional en ConservaciC3n y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspin...@una.ac.cr mspinol...@gmail.com TelC)fono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de rC-o https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIShttp://www.icomvis.una.ac.cr/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- *Manuel SpC-nola, Ph.D.* Instituto Internacional en ConservaciC3n y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspin...@una.ac.cr mspinol...@gmail.com TelC)fono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de rC-o https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] subsetting data in R
If this isn't already answered: I don't quite understand the question: what do you mean by do a complete data set from an object in R? What do you mean by the subsetting is dangerous ... as you need to specify the levels for all your factors again? (What do your 3000 columns of data represent? If these are predictor variables I hope you have a truly enormous number of responses ...) It may have been mentioned already, but droplevels(subset(...)) will probably do what you want. (I have tried very hard over the years to get drop.levels= to be an optional argument to subset(), but so far I have failed. droplevels() is an improvement over the drop.levels() function in gdata because (1) it is in base R and (2) it doesn't reorder the factor by default (which is what gdata::drop.levels [insanely in my opinion] does). On 11-04-24 11:21 AM, Manuel Spínola wrote: Thank you for all the responses. Is there a way to do a complete data set from an object in R? I have a data set with more than 3000 columns. The subsetting is ok but it could be dangerous if you are using other factors to do some analysis as you need to specify the levels for all your factors again. Best, Manuel On 24/04/2011 08:30 a.m., Gustavo Carvalho wrote: pa2- subset(pa, influencia==AP) pa2$influencia- factor(pa2$influencia) levels(pa2$influencia) On Sun, Apr 24, 2011 at 11:24 AM, Manuel SpÃnolamspinol...@gmail.com wrote: Thank you very much for your response, Christian, Roman, and Sarah. Sarah, I am trying your suggestion but I cannot see the levels: pa2 = factor(subset(pa, influencia==AP)$influencia) levels(pa2$influencia) Error in pa2$influencia : $ operator is invalid for atomic vectors Best, Manuel On 24/04/2011 07:51 a.m., Sarah Goslee wrote: By default, read.csv() turns character variables into factors, using all the unique values as the levels. subset() retains those levels by default, as they are a vital element of the data. If you are studying some attribute of men and women, say height, even if you are only looking at the heights for women it's important to remember that men still exist. If you don't want influencia to be a factor, you can change that in the import stringsAsFactors=FALSE. If you do want influencia to be a factor, but want the unused levels to be removed, you can use factor() to do that. testdata- data.frame(group=c(A, B, C, A, B, C), value=1:6) testdata group value 1 A 1 2 B 2 3 C 3 4 A 4 5 B 5 6 C 6 str(testdata) 'data.frame': 6 obs. of 2 variables: $ group: Factor w/ 3 levels A,B,C: 1 2 3 1 2 3 $ value: int 1 2 3 4 5 6 subset(testdata, group==A) group value 1 A 1 4 A 4 subset(testdata, group==A)$group [1] A A Levels: A B C ?subset factor(subset(testdata, group==A)$group) [1] A A Levels: A Sarah On Sun, Apr 24, 2011 at 9:04 AM, Manuel SpÃnolamspinol...@gmail.com wrote: Dear list members, I have a question regarding too subsetting a data set in R. I created an object for my data: pa = read.csv(espec_indic.csv, header = T, sep=,, check.names = F) levels(pa$influencia) [1] AID AII AP The object has 3 levels for influencia (AP, AID, AII) Now I subset only observations with influencia = AID pa2 = subset(pa, influencia==AID) but if I ask for the levels of influencia still show me the 3 levels, AP, AID, AII. levels(pa2$influencia) [1] AID AII AP Why is that? I was thinking that I was creating a new data frame with only AID as a level for influencia. How can I make a complete new object with only the observations for AID and that the only level for influencia is indeed AID? Best, Manuel -- *Manuel SpÃnola, Ph.D.* Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspin...@una.ac.cr mspinol...@gmail.com Teléfono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de rÃo https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIShttp://www.icomvis.una.ac.cr/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] subsetting data in R
Thank you very much Ben. I was doing an analysis of indicator species with the subset data and the other levels were still in my subset data and the analysis was considering them in the analysis. My 3000 columns are plant species presence/absence type of data. Best, Manuel On 26/04/2011 12:06 p.m., Ben Bolker wrote: If this isn't already answered: I don't quite understand the question: what do you mean by do a complete data set from an object in R? What do you mean by the subsetting is dangerous ... as you need to specify the levels for all your factors again? (What do your 3000 columns of data represent? If these are predictor variables I hope you have a truly enormous number of responses ...) It may have been mentioned already, but droplevels(subset(...)) will probably do what you want. (I have tried very hard over the years to get drop.levels= to be an optional argument to subset(), but so far I have failed. droplevels() is an improvement over the drop.levels() function in gdata because (1) it is in base R and (2) it doesn't reorder the factor by default (which is what gdata::drop.levels [insanely in my opinion] does). On 11-04-24 11:21 AM, Manuel Spínola wrote: Thank you for all the responses. Is there a way to do a complete data set from an object in R? I have a data set with more than 3000 columns. The subsetting is ok but it could be dangerous if you are using other factors to do some analysis as you need to specify the levels for all your factors again. Best, Manuel On 24/04/2011 08:30 a.m., Gustavo Carvalho wrote: pa2- subset(pa, influencia==AP) pa2$influencia- factor(pa2$influencia) levels(pa2$influencia) On Sun, Apr 24, 2011 at 11:24 AM, Manuel SpÃnolamspinol...@gmail.com wrote: Thank you very much for your response, Christian, Roman, and Sarah. Sarah, I am trying your suggestion but I cannot see the levels: pa2 = factor(subset(pa, influencia==AP)$influencia) levels(pa2$influencia) Error in pa2$influencia : $ operator is invalid for atomic vectors Best, Manuel On 24/04/2011 07:51 a.m., Sarah Goslee wrote: By default, read.csv() turns character variables into factors, using all the unique values as the levels. subset() retains those levels by default, as they are a vital element of the data. If you are studying some attribute of men and women, say height, even if you are only looking at the heights for women it's important to remember that men still exist. If you don't want influencia to be a factor, you can change that in the import stringsAsFactors=FALSE. If you do want influencia to be a factor, but want the unused levels to be removed, you can use factor() to do that. testdata- data.frame(group=c(A, B, C, A, B, C), value=1:6) testdata group value 1 A 1 2 B 2 3 C 3 4 A 4 5 B 5 6 C 6 str(testdata) 'data.frame': 6 obs. of 2 variables: $ group: Factor w/ 3 levels A,B,C: 1 2 3 1 2 3 $ value: int 1 2 3 4 5 6 subset(testdata, group==A) group value 1 A 1 4 A 4 subset(testdata, group==A)$group [1] A A Levels: A B C ?subset factor(subset(testdata, group==A)$group) [1] A A Levels: A Sarah On Sun, Apr 24, 2011 at 9:04 AM, Manuel SpÃnolamspinol...@gmail.com wrote: Dear list members, I have a question regarding too subsetting a data set in R. I created an object for my data: pa = read.csv(espec_indic.csv, header = T, sep=,, check.names = F) levels(pa$influencia) [1] AID AII AP The object has 3 levels for influencia (AP, AID, AII) Now I subset only observations with influencia = AID pa2 = subset(pa, influencia==AID) but if I ask for the levels of influencia still show me the 3 levels, AP, AID, AII. levels(pa2$influencia) [1] AID AII AP Why is that? I was thinking that I was creating a new data frame with only AID as a level for influencia. How can I make a complete new object with only the observations for AID and that the only level for influencia is indeed AID? Best, Manuel -- *Manuel SpÃnola, Ph.D.* Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspin...@una.ac.cr mspinol...@gmail.com Teléfono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de rÃo https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIShttp://www.icomvis.una.ac.cr/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- *Manuel Spínola, Ph.D.* Instituto Internacional en Conservación y Manejo de Vida
Re: [R-sig-eco] subsetting data in R
You are creating a new object, but the columns that are stored as factors are not being 'refactored' so you are retaining the original list of levels. To fix this you can use the factor function after you subset pa2 = subset(pa, influencia==AID) pa2$influencia-as.factor(pa2$influencia) On Apr 24, 2011, at 6:04 AM, Manuel Spínola mspinol...@gmail.com wrote: Dear list members, I have a question regarding too subsetting a data set in R. I created an object for my data: pa = read.csv(espec_indic.csv, header = T, sep=,, check.names = F) levels(pa$influencia) [1] AID AII AP The object has 3 levels for influencia (AP, AID, AII) Now I subset only observations with influencia = AID pa2 = subset(pa, influencia==AID) but if I ask for the levels of influencia still show me the 3 levels, AP, AID, AII. levels(pa2$influencia) [1] AID AII AP Why is that? I was thinking that I was creating a new data frame with only AID as a level for influencia. How can I make a complete new object with only the observations for AID and that the only level for influencia is indeed AID? Best, Manuel -- *Manuel Spínola, Ph.D.* Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspin...@una.ac.cr mspinol...@gmail.com Teléfono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de río https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] subsetting data in R
You can also use droplevels() on your new object (as of R 2.12). Cheers, Roman On Sun, Apr 24, 2011 at 3:42 PM, Christian Parker cpar...@pdx.edu wrote: You are creating a new object, but the columns that are stored as factors are not being 'refactored' so you are retaining the original list of levels. To fix this you can use the factor function after you subset pa2 = subset(pa, influencia==AID) pa2$influencia-as.factor(pa2$influencia) On Apr 24, 2011, at 6:04 AM, Manuel Spínola mspinol...@gmail.com wrote: Dear list members, I have a question regarding too subsetting a data set in R. I created an object for my data: pa = read.csv(espec_indic.csv, header = T, sep=,, check.names = F) levels(pa$influencia) [1] AID AII AP The object has 3 levels for influencia (AP, AID, AII) Now I subset only observations with influencia = AID pa2 = subset(pa, influencia==AID) but if I ask for the levels of influencia still show me the 3 levels, AP, AID, AII. levels(pa2$influencia) [1] AID AII AP Why is that? I was thinking that I was creating a new data frame with only AID as a level for influencia. How can I make a complete new object with only the observations for AID and that the only level for influencia is indeed AID? Best, Manuel -- *Manuel Spínola, Ph.D.* Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspin...@una.ac.cr mspinol...@gmail.com Teléfono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de río https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- In God we trust, all others bring data. [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] subsetting data in R
By default, read.csv() turns character variables into factors, using all the unique values as the levels. subset() retains those levels by default, as they are a vital element of the data. If you are studying some attribute of men and women, say height, even if you are only looking at the heights for women it's important to remember that men still exist. If you don't want influencia to be a factor, you can change that in the import stringsAsFactors=FALSE. If you do want influencia to be a factor, but want the unused levels to be removed, you can use factor() to do that. testdata - data.frame(group=c(A, B, C, A, B, C), value=1:6) testdata group value 1 A 1 2 B 2 3 C 3 4 A 4 5 B 5 6 C 6 str(testdata) 'data.frame': 6 obs. of 2 variables: $ group: Factor w/ 3 levels A,B,C: 1 2 3 1 2 3 $ value: int 1 2 3 4 5 6 subset(testdata, group==A) group value 1 A 1 4 A 4 subset(testdata, group==A)$group [1] A A Levels: A B C ?subset factor(subset(testdata, group==A)$group) [1] A A Levels: A Sarah On Sun, Apr 24, 2011 at 9:04 AM, Manuel Spínola mspinol...@gmail.com wrote: Dear list members, I have a question regarding too subsetting a data set in R. I created an object for my data: pa = read.csv(espec_indic.csv, header = T, sep=,, check.names = F) levels(pa$influencia) [1] AID AII AP The object has 3 levels for influencia (AP, AID, AII) Now I subset only observations with influencia = AID pa2 = subset(pa, influencia==AID) but if I ask for the levels of influencia still show me the 3 levels, AP, AID, AII. levels(pa2$influencia) [1] AID AII AP Why is that? I was thinking that I was creating a new data frame with only AID as a level for influencia. How can I make a complete new object with only the observations for AID and that the only level for influencia is indeed AID? Best, Manuel -- Sarah Goslee http://www.functionaldiversity.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] subsetting data in R
Thank you very much for your response, Christian, Roman, and Sarah. Sarah, I am trying your suggestion but I cannot see the levels: pa2 = factor(subset(pa, influencia==AP)$influencia) levels(pa2$influencia) Error in pa2$influencia : $ operator is invalid for atomic vectors Best, Manuel On 24/04/2011 07:51 a.m., Sarah Goslee wrote: By default, read.csv() turns character variables into factors, using all the unique values as the levels. subset() retains those levels by default, as they are a vital element of the data. If you are studying some attribute of men and women, say height, even if you are only looking at the heights for women it's important to remember that men still exist. If you don't want influencia to be a factor, you can change that in the import stringsAsFactors=FALSE. If you do want influencia to be a factor, but want the unused levels to be removed, you can use factor() to do that. testdata- data.frame(group=c(A, B, C, A, B, C), value=1:6) testdata group value 1 A 1 2 B 2 3 C 3 4 A 4 5 B 5 6 C 6 str(testdata) 'data.frame': 6 obs. of 2 variables: $ group: Factor w/ 3 levels A,B,C: 1 2 3 1 2 3 $ value: int 1 2 3 4 5 6 subset(testdata, group==A) group value 1 A 1 4 A 4 subset(testdata, group==A)$group [1] A A Levels: A B C ?subset factor(subset(testdata, group==A)$group) [1] A A Levels: A Sarah On Sun, Apr 24, 2011 at 9:04 AM, Manuel Spínolamspinol...@gmail.com wrote: Dear list members, I have a question regarding too subsetting a data set in R. I created an object for my data: pa = read.csv(espec_indic.csv, header = T, sep=,, check.names = F) levels(pa$influencia) [1] AID AII AP The object has 3 levels for influencia (AP, AID, AII) Now I subset only observations with influencia = AID pa2 = subset(pa, influencia==AID) but if I ask for the levels of influencia still show me the 3 levels, AP, AID, AII. levels(pa2$influencia) [1] AID AII AP Why is that? I was thinking that I was creating a new data frame with only AID as a level for influencia. How can I make a complete new object with only the observations for AID and that the only level for influencia is indeed AID? Best, Manuel -- *Manuel Spínola, Ph.D.* Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspin...@una.ac.cr mspinol...@gmail.com Teléfono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de río https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] subsetting data in R
pa2 - subset(pa, influencia==AP) pa2$influencia - factor(pa2$influencia) levels(pa2$influencia) On Sun, Apr 24, 2011 at 11:24 AM, Manuel Spínola mspinol...@gmail.com wrote: Thank you very much for your response, Christian, Roman, and Sarah. Sarah, I am trying your suggestion but I cannot see the levels: pa2 = factor(subset(pa, influencia==AP)$influencia) levels(pa2$influencia) Error in pa2$influencia : $ operator is invalid for atomic vectors Best, Manuel On 24/04/2011 07:51 a.m., Sarah Goslee wrote: By default, read.csv() turns character variables into factors, using all the unique values as the levels. subset() retains those levels by default, as they are a vital element of the data. If you are studying some attribute of men and women, say height, even if you are only looking at the heights for women it's important to remember that men still exist. If you don't want influencia to be a factor, you can change that in the import stringsAsFactors=FALSE. If you do want influencia to be a factor, but want the unused levels to be removed, you can use factor() to do that. testdata- data.frame(group=c(A, B, C, A, B, C), value=1:6) testdata group value 1 A 1 2 B 2 3 C 3 4 A 4 5 B 5 6 C 6 str(testdata) 'data.frame': 6 obs. of 2 variables: $ group: Factor w/ 3 levels A,B,C: 1 2 3 1 2 3 $ value: int 1 2 3 4 5 6 subset(testdata, group==A) group value 1 A 1 4 A 4 subset(testdata, group==A)$group [1] A A Levels: A B C ?subset factor(subset(testdata, group==A)$group) [1] A A Levels: A Sarah On Sun, Apr 24, 2011 at 9:04 AM, Manuel Spínolamspinol...@gmail.com wrote: Dear list members, I have a question regarding too subsetting a data set in R. I created an object for my data: pa = read.csv(espec_indic.csv, header = T, sep=,, check.names = F) levels(pa$influencia) [1] AID AII AP The object has 3 levels for influencia (AP, AID, AII) Now I subset only observations with influencia = AID pa2 = subset(pa, influencia==AID) but if I ask for the levels of influencia still show me the 3 levels, AP, AID, AII. levels(pa2$influencia) [1] AID AII AP Why is that? I was thinking that I was creating a new data frame with only AID as a level for influencia. How can I make a complete new object with only the observations for AID and that the only level for influencia is indeed AID? Best, Manuel -- *Manuel Spínola, Ph.D.* Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspin...@una.ac.cr mspinol...@gmail.com Teléfono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de río https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] subsetting data in R
Thank you Christian. Following your suggestion I got the following result, pa2 = subset(pa, influencia==AP) pa2$influencia-as.factor(pa2$influencia) levels(pa$influencia) [1] AID AII AP On 24/04/2011 07:42 a.m., Christian Parker wrote: You are creating a new object, but the columns that are stored as factors are not being 'refactored' so you are retaining the original list of levels. To fix this you can use the factor function after you subset pa2 = subset(pa, influencia==AID) pa2$influencia-as.factor(pa2$influencia) On Apr 24, 2011, at 6:04 AM, Manuel SpÃnolamspinol...@gmail.com wrote: Dear list members, I have a question regarding too subsetting a data set in R. I created an object for my data: pa = read.csv(espec_indic.csv, header = T, sep=,, check.names = F) levels(pa$influencia) [1] AID AII AP The object has 3 levels for influencia (AP, AID, AII) Now I subset only observations with influencia = AID pa2 = subset(pa, influencia==AID) but if I ask for the levels of influencia still show me the 3 levels, AP, AID, AII. levels(pa2$influencia) [1] AID AII AP Why is that? I was thinking that I was creating a new data frame with only AID as a level for influencia. How can I make a complete new object with only the observations for AID and that the only level for influencia is indeed AID? Best, Manuel -- *Manuel SpÃnola, Ph.D.* Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspin...@una.ac.cr mspinol...@gmail.com Teléfono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de rÃo https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIShttp://www.icomvis.una.ac.cr/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- *Manuel SpÃnola, Ph.D.* Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspin...@una.ac.cr mspinol...@gmail.com Teléfono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de rÃo https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] subsetting data in R
Thank you very much Gustavo. That works. Manuel On 24/04/2011 08:30 a.m., Gustavo Carvalho wrote: pa2- subset(pa, influencia==AP) pa2$influencia- factor(pa2$influencia) levels(pa2$influencia) On Sun, Apr 24, 2011 at 11:24 AM, Manuel SpÃnolamspinol...@gmail.com wrote: Thank you very much for your response, Christian, Roman, and Sarah. Sarah, I am trying your suggestion but I cannot see the levels: pa2 = factor(subset(pa, influencia==AP)$influencia) levels(pa2$influencia) Error in pa2$influencia : $ operator is invalid for atomic vectors Best, Manuel On 24/04/2011 07:51 a.m., Sarah Goslee wrote: By default, read.csv() turns character variables into factors, using all the unique values as the levels. subset() retains those levels by default, as they are a vital element of the data. If you are studying some attribute of men and women, say height, even if you are only looking at the heights for women it's important to remember that men still exist. If you don't want influencia to be a factor, you can change that in the import stringsAsFactors=FALSE. If you do want influencia to be a factor, but want the unused levels to be removed, you can use factor() to do that. testdata- data.frame(group=c(A, B, C, A, B, C), value=1:6) testdata group value 1 A 1 2 B 2 3 C 3 4 A 4 5 B 5 6 C 6 str(testdata) 'data.frame': 6 obs. of 2 variables: $ group: Factor w/ 3 levels A,B,C: 1 2 3 1 2 3 $ value: int 1 2 3 4 5 6 subset(testdata, group==A) group value 1 A 1 4 A 4 subset(testdata, group==A)$group [1] A A Levels: A B C ?subset factor(subset(testdata, group==A)$group) [1] A A Levels: A Sarah On Sun, Apr 24, 2011 at 9:04 AM, Manuel SpÃnolamspinol...@gmail.com wrote: Dear list members, I have a question regarding too subsetting a data set in R. I created an object for my data: pa = read.csv(espec_indic.csv, header = T, sep=,, check.names = F) levels(pa$influencia) [1] AID AII AP The object has 3 levels for influencia (AP, AID, AII) Now I subset only observations with influencia = AID pa2 = subset(pa, influencia==AID) but if I ask for the levels of influencia still show me the 3 levels, AP, AID, AII. levels(pa2$influencia) [1] AID AII AP Why is that? I was thinking that I was creating a new data frame with only AID as a level for influencia. How can I make a complete new object with only the observations for AID and that the only level for influencia is indeed AID? Best, Manuel -- *Manuel SpÃnola, Ph.D.* Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspin...@una.ac.cr mspinol...@gmail.com Teléfono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de rÃo https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIShttp://www.icomvis.una.ac.cr/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- *Manuel SpÃnola, Ph.D.* Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspin...@una.ac.cr mspinol...@gmail.com Teléfono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de rÃo https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] subsetting data in R
Thank you for all the responses. Is there a way to do a complete data set from an object in R? I have a data set with more than 3000 columns. The subsetting is ok but it could be dangerous if you are using other factors to do some analysis as you need to specify the levels for all your factors again. Best, Manuel On 24/04/2011 08:30 a.m., Gustavo Carvalho wrote: pa2- subset(pa, influencia==AP) pa2$influencia- factor(pa2$influencia) levels(pa2$influencia) On Sun, Apr 24, 2011 at 11:24 AM, Manuel SpÃnolamspinol...@gmail.com wrote: Thank you very much for your response, Christian, Roman, and Sarah. Sarah, I am trying your suggestion but I cannot see the levels: pa2 = factor(subset(pa, influencia==AP)$influencia) levels(pa2$influencia) Error in pa2$influencia : $ operator is invalid for atomic vectors Best, Manuel On 24/04/2011 07:51 a.m., Sarah Goslee wrote: By default, read.csv() turns character variables into factors, using all the unique values as the levels. subset() retains those levels by default, as they are a vital element of the data. If you are studying some attribute of men and women, say height, even if you are only looking at the heights for women it's important to remember that men still exist. If you don't want influencia to be a factor, you can change that in the import stringsAsFactors=FALSE. If you do want influencia to be a factor, but want the unused levels to be removed, you can use factor() to do that. testdata- data.frame(group=c(A, B, C, A, B, C), value=1:6) testdata group value 1 A 1 2 B 2 3 C 3 4 A 4 5 B 5 6 C 6 str(testdata) 'data.frame': 6 obs. of 2 variables: $ group: Factor w/ 3 levels A,B,C: 1 2 3 1 2 3 $ value: int 1 2 3 4 5 6 subset(testdata, group==A) group value 1 A 1 4 A 4 subset(testdata, group==A)$group [1] A A Levels: A B C ?subset factor(subset(testdata, group==A)$group) [1] A A Levels: A Sarah On Sun, Apr 24, 2011 at 9:04 AM, Manuel SpÃnolamspinol...@gmail.com wrote: Dear list members, I have a question regarding too subsetting a data set in R. I created an object for my data: pa = read.csv(espec_indic.csv, header = T, sep=,, check.names = F) levels(pa$influencia) [1] AID AII AP The object has 3 levels for influencia (AP, AID, AII) Now I subset only observations with influencia = AID pa2 = subset(pa, influencia==AID) but if I ask for the levels of influencia still show me the 3 levels, AP, AID, AII. levels(pa2$influencia) [1] AID AII AP Why is that? I was thinking that I was creating a new data frame with only AID as a level for influencia. How can I make a complete new object with only the observations for AID and that the only level for influencia is indeed AID? Best, Manuel -- *Manuel SpÃnola, Ph.D.* Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspin...@una.ac.cr mspinol...@gmail.com Teléfono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de rÃo https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIShttp://www.icomvis.una.ac.cr/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- *Manuel SpÃnola, Ph.D.* Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspin...@una.ac.cr mspinol...@gmail.com Teléfono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de rÃo https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology