Re: [R] levels
There is an interesting item on stringsAsFactors in this useR! 2020 session: https://www.youtube.com/watch?v=X_eDHNVceCU=youtu.be It's about 27 minutes in. Chris Gordon-Smith On 15/07/2020 17:16, Marc Schwartz via R-help wrote: >> On Jul 15, 2020, at 4:31 AM, andy elprama wrote: >> >> Dear R-users, >> >> Something strange happened within the command "levels" >> >> R version 3.6.1 >> name <- c("a","b","c") >> values <- c(1,2,3) >> data <- data.frame(name,values) >> levels(data$name) >> [1] "a" "b" "c" >> >> R version 4.0 >> name <- c("a","b","c") >> values <- c(1,2,3) >> data <- data.frame(name,values) >> levels(data$name) >> [1] NULL >> >> What is happening here? > > Hi, > > The default value for 'stringsAsFactors' for data.frame() and read.table() > changed from TRUE to FALSE in version 4.0.0, per the news() file: > > "R now uses a stringsAsFactors = FALSE default, and hence by default no > longer converts strings to factors in calls to data.frame() and read.table()." > > > Using 4.0.2: > > data <- data.frame(name, values, stringsAsFactors = TRUE) > >> levels(data$name) > [1] "a" "b" "c" > > > If you see behavioral changes from one version of R to another, especially > major version increments, check the news() file. > > Regards, > > Marc Schwartz > > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels
Thanks, I will check it out. Op za 18 jul. 2020 om 00:47 schreef Chris Gordon-Smith < c.gordonsm...@gmail.com>: > There is an interesting item on stringsAsFactors in this useR! 2020 > session: > > https://www.youtube.com/watch?v=X_eDHNVceCU=youtu.be > > It's about 27 minutes in. > > Chris Gordon-Smith > On 15/07/2020 17:16, Marc Schwartz via R-help wrote: > > On Jul 15, 2020, at 4:31 AM, andy elprama > wrote: > > Dear R-users, > > Something strange happened within the command "levels" > > R version 3.6.1 > name <- c("a","b","c") > values <- c(1,2,3) > data <- data.frame(name,values) > levels(data$name) > [1] "a" "b" "c" > > R version 4.0 > name <- c("a","b","c") > values <- c(1,2,3) > data <- data.frame(name,values) > levels(data$name) > [1] NULL > > What is happening here? > > Hi, > > The default value for 'stringsAsFactors' for data.frame() and read.table() > changed from TRUE to FALSE in version 4.0.0, per the news() file: > > "R now uses a stringsAsFactors = FALSE default, and hence by default no > longer converts strings to factors in calls to data.frame() and read.table()." > > > Using 4.0.2: > > data <- data.frame(name, values, stringsAsFactors = TRUE) > > > levels(data$name) > > [1] "a" "b" "c" > > > If you see behavioral changes from one version of R to another, especially > major version increments, check the news() file. > > Regards, > > Marc Schwartz > > > __r-h...@r-project.org mailing > list -- To UNSUBSCRIBE and more, > seehttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels
Hi Andy: I just checked in "options", and the following appears: $stringsAsFactors [1] FALSE I think this might be it. You may want to look at options() in R-3.6.1. Thanks, Erin Erin Hodgess, PhD mailto: erinm.hodg...@gmail.com On Wed, Jul 15, 2020 at 9:45 AM andy elprama wrote: > Dear R-users, > > Something strange happened within the command "levels" > > R version 3.6.1 > name <- c("a","b","c") > values <- c(1,2,3) > data <- data.frame(name,values) > levels(data$name) > [1] "a" "b" "c" > > R version 4.0 > name <- c("a","b","c") > values <- c(1,2,3) > data <- data.frame(name,values) > levels(data$name) > [1] NULL > > What is happening here? > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels
> On Jul 15, 2020, at 4:31 AM, andy elprama wrote: > > Dear R-users, > > Something strange happened within the command "levels" > > R version 3.6.1 > name <- c("a","b","c") > values <- c(1,2,3) > data <- data.frame(name,values) > levels(data$name) > [1] "a" "b" "c" > > R version 4.0 > name <- c("a","b","c") > values <- c(1,2,3) > data <- data.frame(name,values) > levels(data$name) > [1] NULL > > What is happening here? Hi, The default value for 'stringsAsFactors' for data.frame() and read.table() changed from TRUE to FALSE in version 4.0.0, per the news() file: "R now uses a stringsAsFactors = FALSE default, and hence by default no longer converts strings to factors in calls to data.frame() and read.table()." Using 4.0.2: data <- data.frame(name, values, stringsAsFactors = TRUE) > levels(data$name) [1] "a" "b" "c" If you see behavioral changes from one version of R to another, especially major version increments, check the news() file. Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels
Read the NEWS about R4.0.0 [1] (search for stringsAsFactors), or read any of the many announcements in blogs and forums around the Internet. [1] https://cran.r-project.org/doc/manuals/r-release/NEWS.html On July 15, 2020 1:31:06 AM PDT, andy elprama wrote: >Dear R-users, > >Something strange happened within the command "levels" > >R version 3.6.1 >name <- c("a","b","c") >values <- c(1,2,3) >data <- data.frame(name,values) >levels(data$name) >[1] "a" "b" "c" > >R version 4.0 >name <- c("a","b","c") >values <- c(1,2,3) >data <- data.frame(name,values) >levels(data$name) >[1] NULL > >What is happening here? > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels
Hi Andy, I believe this is because R 4.0 has changed the default behavior of data.frame(). Prior to 4.0, the default was stringsAsFactors=TRUE. In 4.0, the default is stringsAsFactors=FALSE. If you run your code in R 3.6.1 and change the command to data <- data.frame(name,values,stringsAsFactors=FALSE) you will get the same behavior as in R 4.0. HTH, Eric On Wed, Jul 15, 2020 at 6:45 PM andy elprama wrote: > Dear R-users, > > Something strange happened within the command "levels" > > R version 3.6.1 > name <- c("a","b","c") > values <- c(1,2,3) > data <- data.frame(name,values) > levels(data$name) > [1] "a" "b" "c" > > R version 4.0 > name <- c("a","b","c") > values <- c(1,2,3) > data <- data.frame(name,values) > levels(data$name) > [1] NULL > > What is happening here? > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] levels
Dear R-users, Something strange happened within the command "levels" R version 3.6.1 name <- c("a","b","c") values <- c(1,2,3) data <- data.frame(name,values) levels(data$name) [1] "a" "b" "c" R version 4.0 name <- c("a","b","c") values <- c(1,2,3) data <- data.frame(name,values) levels(data$name) [1] NULL What is happening here? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Levels of a factor
That makes sense. Thanks all! 2013/7/24 David Carlson dcarl...@tamu.edu Benchmark is probably a subset from a larger dataframe. R does not automatically remove empty levels but you can do it: set.seed(42) dataset - data.frame(Benchmark=factor(sample(LETTERS[1:26], 50, replace=TRUE), levels=LETTERS[1:26])) levels(dataset$Benchmark) # [1] A B C D E F G H I J K L M N O P Q R S # [20] T U V W X Y Z dataset$Benchmark - factor(dataset$Benchmark) levels(dataset$Benchmark) # [1] A C D F G H J K L M N O P Q R S T V X # [20] Y Z There are times when you want to know if certain factor levels do not appear in a subset of the original data. - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Borja Rivier Sent: Wednesday, July 24, 2013 8:25 AM To: r-help@r-project.org Subject: [R] Levels of a factor Hi all, I am having a bit of trouble using the levels() function. I have a factor with many elements, and when I use the function levels() to extract the list of unique elements, some of the elements returned are not actually in the factor. For example I would have this: vector - dataset$Benchmark class(vector) [1] factor length(vector) [1] 35615 vector2 - levels(vector) length(which(!(vector2 %in% vector))) [1] 235 Does anyone know how this is possible? Many thanks! Borja [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Levels of a factor
Hi all, I am having a bit of trouble using the levels() function. I have a factor with many elements, and when I use the function levels() to extract the list of unique elements, some of the elements returned are not actually in the factor. For example I would have this: vector - dataset$Benchmark class(vector) [1] factor length(vector) [1] 35615 vector2 - levels(vector) length(which(!(vector2 %in% vector))) [1] 235 Does anyone know how this is possible? Many thanks! Borja [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Levels of a factor
Hi, vec1- factor(1:5,levels=1:10) vec1 #[1] 1 2 3 4 5 #Levels: 1 2 3 4 5 6 7 8 9 10 vec2-droplevels(vec1) levels(vec2) #[1] 1 2 3 4 5 vec2 #[1] 1 2 3 4 5 #Levels: 1 2 3 4 5 A.K. Hi all, I am having a bit of trouble using the levels() function. I have a factor with many elements, and when I use the function levels() to extract the list of unique elements, some of the elements returned are not actually in the factor. For example I would have this: vector - dataset$Benchmark class(vector) [1] factor length(vector) [1] 35615 vector2 - levels(vector) length(which(!(vector2 %in% vector))) [1] 235 Does anyone know how this is possible? Many thanks! Borja __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Levels of a factor
On Jul 24, 2013, at 6:25 AM, Borja Rivier wrote: Hi all, I am having a bit of trouble using the levels() function. I have a factor with many elements, and when I use the function levels() to extract the list of unique elements, some of the elements returned are not actually in the factor. For example I would have this: vector - dataset$Benchmark class(vector) [1] factor length(vector) [1] 35615 vector2 - levels(vector) length(which(!(vector2 %in% vector))) [1] 235 Does anyone know how this is possible? When you take a subset of a factor vector, the levels are not reduced to the unique values in the new vector. There is droplevels function that would need to be applied if you already have such a vector, and there is a drop argument that you need to set to TRUE in the `[.factors` call if you want to attack the problem at the source. ?`[.factor ?droplevels -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Levels of a factor
On Jul 24, 2013, at 11:35 AM, David Winsemius wrote: On Jul 24, 2013, at 6:25 AM, Borja Rivier wrote: Hi all, I am having a bit of trouble using the levels() function. I have a factor with many elements, and when I use the function levels() to extract the list of unique elements, some of the elements returned are not actually in the factor. snipped When you take a subset of a factor vector, the levels are not reduced to the unique values in the new vector. There is droplevels function that would need to be applied if you already have such a vector, and there is a drop argument that you need to set to TRUE in the `[.factors` Make that `[.factor` call if you want to attack the problem at the source. ?`[.factor # missing trailing back-tick ?`[.factor` ?droplevels -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Levels of a factor
Benchmark is probably a subset from a larger dataframe. R does not automatically remove empty levels but you can do it: set.seed(42) dataset - data.frame(Benchmark=factor(sample(LETTERS[1:26], 50, replace=TRUE), levels=LETTERS[1:26])) levels(dataset$Benchmark) # [1] A B C D E F G H I J K L M N O P Q R S # [20] T U V W X Y Z dataset$Benchmark - factor(dataset$Benchmark) levels(dataset$Benchmark) # [1] A C D F G H J K L M N O P Q R S T V X # [20] Y Z There are times when you want to know if certain factor levels do not appear in a subset of the original data. - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Borja Rivier Sent: Wednesday, July 24, 2013 8:25 AM To: r-help@r-project.org Subject: [R] Levels of a factor Hi all, I am having a bit of trouble using the levels() function. I have a factor with many elements, and when I use the function levels() to extract the list of unique elements, some of the elements returned are not actually in the factor. For example I would have this: vector - dataset$Benchmark class(vector) [1] factor length(vector) [1] 35615 vector2 - levels(vector) length(which(!(vector2 %in% vector))) [1] 235 Does anyone know how this is possible? Many thanks! Borja [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Levels and labels in factor
Hi R users, I have a imputed dataset of undefinedundefined cycles which I generated using StAta version undefinedundefined. Then I imported my data from Stata into R and I used a loop to run Mclust package in R. My observation starts with ID=2 (ID=1 has been excluded from the sample) and ends with 27950. Here is my code: library(mclust) library(foreign) dat-read.dta(file=tempeundefined.dta) impdat-subset(dat,mim!=undefined) datn-impdat apply(datn,undefined,range) fix(datn) mdlnc-matrix(,undefinedundefined,undefined) undefinedgetting the final output n-dim(datn)[undefined] datf-matrix(undefined,n,undefined) for(i in undefined:undefinedundefined){ set.seed(undefinedundefinedundefinedundefinedundefinedundefined) datnss - subset(datn, mim==i) datnssMclust-Mclust(datnss[,undefined:undefinedundefined],model=VEV,G=undefined) zv-datnssMclust$z clas-datnssMclust$classification zval-cbind(zv,clas)) colnames(zval)-c(Pundefined,Pundefined,Pundefined,class) impd-datnss[,c(cid_undefinedundefinedundefineda,qlet,mim)] fd-as.matrix(cbind(impd,zval)) datf[((undefinedundefinedundefinedundefinedundefined*(i-undefined)+undefined):(i*undefinedundefinedundefinedundefinedundefined)),]-fd } cid_731a is my observation ID and mim is the number of imputed dataset. When I write the output in dta format (Stata data format), the IDs were reorganised. ID is now started with 1,2,3,4,...13797 which is not right. Label values have been attached to the existing data. The variables were now in long format. I guess that is because the factor in R is always begins with 1,2,3,4,... Is there anyway I can fix this? Please help SY __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Levels and labels in factor
Perhaps write.dta(..., convert.factors=string) might help. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of chong shiauyun Sent: Mittwoch, 10. April 2013 10:01 To: r-help@r-project.org Subject: [R] Levels and labels in factor Hi R users, I have a imputed dataset of undefinedundefined cycles which I generated using StAta version undefinedundefined. Then I imported my data from Stata into R and I used a loop to run Mclust package in R. My observation starts with ID=2 (ID=1 has been excluded from the sample) and ends with 27950. Here is my code: library(mclust) library(foreign) dat-read.dta(file=tempeundefined.dta) impdat-subset(dat,mim!=undefined) datn-impdat apply(datn,undefined,range) fix(datn) mdlnc-matrix(,undefinedundefined,undefined) undefinedgetting the final output n-dim(datn)[undefined] datf-matrix(undefined,n,undefined) for(i in undefined:undefinedundefined){ set.seed(undefinedundefinedundefinedundefinedundefinedundefined) datnss - subset(datn, mim==i) datnssMclust-Mclust(datnss[,undefined:undefinedundefined],model=VEV,G =undefined) zv-datnssMclust$z clas-datnssMclust$classification zval-cbind(zv,clas)) colnames(zval)-c(Pundefined,Pundefined,Pundefined,class) impd-datnss[,c(cid_undefinedundefinedundefineda,qlet,mim)] fd-as.matrix(cbind(impd,zval)) datf[((undefinedundefinedundefinedundefinedundefined*(i-undefined)+undef ined):(i*undefinedundefinedundefinedundefinedundefined)),]-fd } cid_731a is my observation ID and mim is the number of imputed dataset. When I write the output in dta format (Stata data format), the IDs were reorganised. ID is now started with 1,2,3,4,...13797 which is not right. Label values have been attached to the existing data. The variables were now in long format. I guess that is because the factor in R is always begins with 1,2,3,4,... Is there anyway I can fix this? Please help SY __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Levels in new data fed to SVM
On 08.01.2013 21:14, Claus O'Rourke wrote: Hi all, I've encountered an issue using svm (e1071) in the specific case of supplying new data which may not have the full range of levels that were present in the training data. I've constructed this really primitive example to illustrate the point: library(e1071) training.data - data.frame(x = c(yellow,red,yellow,red), a = c(alpha,alpha,beta,beta), b = c(a, b, a, c)) my.model - svm(x ~ .,data=training.data) test.data - data.frame(x = c(yellow,red), a = c(alpha,beta), b = c(a, b)) predict(my.model,test.data) Error in predict.svm(my.model, test.data) : test data does not match model ! levels(test.data$b) - levels(training.data$b) predict(my.model,test.data) 1 2 yellowred Levels: red yellow In the first case test.data$b does not have the level c and this results in the input data being rejected. I've debugged this down to the point of model matrix creation in the SVM R code. Once I fill up the levels in the test data with the levels from the original data, then there is no problem at all. Assuming my test data has to come from another source where the number of category levels seen might not always be as large as those for the original training data, is there a better way I should be handling this? You have to tell the factor about the possible levels, it does not necessarily contain examples. That means: levels(test.data$b) - C(a, b, c) predict(my.model,test.data) will help. Best, Uwe Ligges Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Levels in new data fed to SVM
Thanks for clarifying! On Thu, Jan 10, 2013 at 12:47 PM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: On 08.01.2013 21:14, Claus O'Rourke wrote: Hi all, I've encountered an issue using svm (e1071) in the specific case of supplying new data which may not have the full range of levels that were present in the training data. I've constructed this really primitive example to illustrate the point: library(e1071) training.data - data.frame(x = c(yellow,red,yellow,red), a = c(alpha,alpha,beta,beta), b = c(a, b, a, c)) my.model - svm(x ~ .,data=training.data) test.data - data.frame(x = c(yellow,red), a = c(alpha,beta), b = c(a, b)) predict(my.model,test.data) Error in predict.svm(my.model, test.data) : test data does not match model ! levels(test.data$b) - levels(training.data$b) predict(my.model,test.data) 1 2 yellowred Levels: red yellow In the first case test.data$b does not have the level c and this results in the input data being rejected. I've debugged this down to the point of model matrix creation in the SVM R code. Once I fill up the levels in the test data with the levels from the original data, then there is no problem at all. Assuming my test data has to come from another source where the number of category levels seen might not always be as large as those for the original training data, is there a better way I should be handling this? You have to tell the factor about the possible levels, it does not necessarily contain examples. That means: levels(test.data$b) - C(a, b, c) predict(my.model,test.data) will help. Best, Uwe Ligges Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Levels in new data fed to SVM
Hi all, I've encountered an issue using svm (e1071) in the specific case of supplying new data which may not have the full range of levels that were present in the training data. I've constructed this really primitive example to illustrate the point: library(e1071) training.data - data.frame(x = c(yellow,red,yellow,red), a = c(alpha,alpha,beta,beta), b = c(a, b, a, c)) my.model - svm(x ~ .,data=training.data) test.data - data.frame(x = c(yellow,red), a = c(alpha,beta), b = c(a, b)) predict(my.model,test.data) Error in predict.svm(my.model, test.data) : test data does not match model ! levels(test.data$b) - levels(training.data$b) predict(my.model,test.data) 1 2 yellowred Levels: red yellow In the first case test.data$b does not have the level c and this results in the input data being rejected. I've debugged this down to the point of model matrix creation in the SVM R code. Once I fill up the levels in the test data with the levels from the original data, then there is no problem at all. Assuming my test data has to come from another source where the number of category levels seen might not always be as large as those for the original training data, is there a better way I should be handling this? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels of comma separated data
analyst41 at hotmail.com analyst41 at hotmail.com writes: I have a data set that has some comma separated strings in each row. I'd like to create a vector consisting of all distinct strings that occur. The number of strings in each row may vary. Thanks for any help. # # # Some data: d - data.frame(id = 1:5, text = c('one,two', 'two,three,three,four', 'one,three,three,five', 'five,five,five,five', 'one,two,three'), stringsAsFactors = FALSE ) # # # A function. I'm not a black belt at this, so there # are probably a more efficient way of writing this. fcn - function(x){ a - strsplit(x, ',') # Split the string by comma unique(a[[1]]) # Uniquify the vector } # # # Use the function with sapply. sapply(d[,2], fcn) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels of comma separated data
On May 25, 4:46 am, Stefan ste...@inizio.se wrote: analyst41 at hotmail.com analyst41 at hotmail.com writes: I have a data set that has some comma separated strings in each row. I'd like to create a vector consisting of all distinct strings that occur. The number of strings in each row may vary. Thanks for any help. # # # Some data: d - data.frame(id = 1:5, text = c('one,two', 'two,three,three,four', 'one,three,three,five', 'five,five,five,five', 'one,two,three'), stringsAsFactors = FALSE ) # # # A function. I'm not a black belt at this, so there # are probably a more efficient way of writing this. fcn - function(x){ a - strsplit(x, ',') # Split the string by comma unique(a[[1]]) # Uniquify the vector} # # # Use the function with sapply. sapply(d[,2], fcn) Thanks - but this solves a slightly different problem - it outputs the unique values in each row. I want a list of the unique values in the whole data frame. In this case the output should be a single vector = c(one,two,three,four,five). __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels of comma separated data
On May 25, 7:23 am, analys...@hotmail.com analys...@hotmail.com wrote: On May 25, 4:46 am, Stefan ste...@inizio.se wrote: analyst41 at hotmail.com analyst41 at hotmail.com writes: I have a data set that has some comma separated strings in each row. I'd like to create a vector consisting of all distinct strings that occur. The number of strings in each row may vary. Thanks for any help. # # # Some data: d - data.frame(id = 1:5, text = c('one,two', 'two,three,three,four', 'one,three,three,five', 'five,five,five,five', 'one,two,three'), stringsAsFactors = FALSE ) # # # A function. I'm not a black belt at this, so there # are probably a more efficient way of writing this. fcn - function(x){ a - strsplit(x, ',') # Split the string by comma unique(a[[1]]) # Uniquify the vector} # # # Use the function with sapply. sapply(d[,2], fcn) Thanks - but this solves a slightly different problem - it outputs the unique values in each row. I want a list of the unique values in the whole data frame. In this case the output should be a single vector = c(one,two,three,four,five). Actually I figured it out after I posted this: levels(as.factor(unlist(strsplit(d$text,',' [1] five four one three two Thanks for pointing me the right way. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] levels of comma separated data
I have a data set that has some comma separated strings in each row. I'd like to create a vector consisting of all distinct strings that occur. The number of strings in each row may vary. Thanks for any help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Levels of interaction terms between numeric and factor in glm
Hello everyone, I have been working on a model to describe the counts of a certain event. I use glm function with Poisson family and log link. the model is: model-glm(event~week+year+week:var1+year:var1+year:var2, family=poisson), where week and season are factor variables with 52 and 7 levels respectively and var1 and var2 are numerical variables. The model seems to describe well the actual counts of events and it is reasonable in its structure. My problem is how to interpret the coefficients. When I use anova(model, test=Chisq), or summary(model) I see that the degrees of freedom in variables week and year are 51 and 6 respectively, which makes sense since the first level is used as a reference. The problem is in the interaction terms: week:var1 has 52 degrees of freedom, year:var1 has 6 and year:var2 has 7 degrees of freedom. I am able to interpret the results in week and year coefficients, but not in the rest of the terms. Why are there differences in the degrees of freedom in the interaction terms? How could the results be explained? Any assistance would be valuable. Thank you in advance Achilles Tsoumanis -- View this message in context: http://r.789695.n4.nabble.com/Levels-of-interaction-terms-between-numeric-and-factor-in-glm-tp3005967p3005967.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Levels in returned data.frame after subset
Thanks for the replies! Obviously I must have used to wrong search terms - sorry. @greg: I care about the levels after the subset, because if they are not dropped, then they still appear in the subsequent heatmap I make with ggplot (with my read data-set of course). Admittedly I am quite green, and may do things in a rather silly way - but it works (at least I think it does) On 4 September 2010 15:41, Ista Zahn iz...@psych.rochester.edu wrote: Hi Ulrik On Sat, Sep 4, 2010 at 12:52 PM, Ulrik Stervbo ulrik.ster...@gmail.com wrote: Dear List, When I subset a data.frame, the levels are not re-adjusted (see example). Why is this? Am I missing out on some basic stuff here? Only that this issue has come up many times before, and that this list is archived and searchable. Try RSiteSearch(subset drop levels, restrict = c(Rhelp10, Rhelp08, Rhelp02)) -Ista Thanks Ulrik m - data.frame(gender = c(M, M,F), ht = c(172, 186.5, 165), wt = c(91,99, 74)) dim(m) [1] 3 3 levels(m$gender) [1] F M s - subset(m, m$gender == M) dim(s) [1] 2 3 levels(s$gender) [1] F M cat - sapply(s, is.factor); s[cat] - lapply(s[cat], factor) dim(s) [1] 2 3 levels(s$gender) [1] M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Levels in returned data.frame after subset
Dear List, When I subset a data.frame, the levels are not re-adjusted (see example). Why is this? Am I missing out on some basic stuff here? Thanks Ulrik m - data.frame(gender = c(M, M,F), ht = c(172, 186.5, 165), wt = c(91,99, 74)) dim(m) [1] 3 3 levels(m$gender) [1] F M s - subset(m, m$gender == M) dim(s) [1] 2 3 levels(s$gender) [1] F M cat - sapply(s, is.factor); s[cat] - lapply(s[cat], factor) dim(s) [1] 2 3 levels(s$gender) [1] M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Levels in returned data.frame after subset
Hi Ulrik On Sat, Sep 4, 2010 at 12:52 PM, Ulrik Stervbo ulrik.ster...@gmail.com wrote: Dear List, When I subset a data.frame, the levels are not re-adjusted (see example). Why is this? Am I missing out on some basic stuff here? Only that this issue has come up many times before, and that this list is archived and searchable. Try RSiteSearch(subset drop levels, restrict = c(Rhelp10, Rhelp08, Rhelp02)) -Ista Thanks Ulrik m - data.frame(gender = c(M, M,F), ht = c(172, 186.5, 165), wt = c(91,99, 74)) dim(m) [1] 3 3 levels(m$gender) [1] F M s - subset(m, m$gender == M) dim(s) [1] 2 3 levels(s$gender) [1] F M cat - sapply(s, is.factor); s[cat] - lapply(s[cat], factor) dim(s) [1] 2 3 levels(s$gender) [1] M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Levels in returned data.frame after subset
The advantage of computers is that they do exactly what they are told. The disadvantage of computers is that they do exactly what they are told. R is a set of instructions to the computer, those instructions are a combinations from the original programmers and from you. Who should make important decisions about the structure of your data? A group of (admittedly brilliant) programmers who have never seen your data nor know what questions you are trying to answer, or you (who hopefully knows more about your data and questions)? I don't claim to be more intelligent/knowledgable than the programmers of R, but I am grateful that they have/had sufficient humility to allow for the possibility that I may actually know something about my data and questions that they don't (or maybe they are just to lazy to do my job for me, but that is also appropriate). In your example below, why do you care what the levels of gender are after the subset? Why waste time/effort dropping the levels for a column that by definition only has one value? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Ulrik Stervbo Sent: Saturday, September 04, 2010 6:53 AM To: r-help@r-project.org Subject: [R] Levels in returned data.frame after subset Dear List, When I subset a data.frame, the levels are not re-adjusted (see example). Why is this? Am I missing out on some basic stuff here? Thanks Ulrik m - data.frame(gender = c(M, M,F), ht = c(172, 186.5, 165), wt = c(91,99, 74)) dim(m) [1] 3 3 levels(m$gender) [1] F M s - subset(m, m$gender == M) dim(s) [1] 2 3 levels(s$gender) [1] F M cat - sapply(s, is.factor); s[cat] - lapply(s[cat], factor) dim(s) [1] 2 3 levels(s$gender) [1] M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] levels update
Hello, I hope this question is not too stupid. I would like to know how to update levels after subsetting data from a data.frame. df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), c(9,1,2,3,4)) names(df) - c(X1,X2,X3) my.sub - subset(df, X1 == a | X1 == b) levels(my.sub$X1) # still gives me a,b,c, though the subset does not contain entries with c anymore I guess, the solution is rather simple, but I cannot find it. Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels update
try this: df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), c(9,1,2,3,4)) names(df) - c(X1,X2,X3) my.sub - subset(df, X1 == a | X1 == b) levels(my.sub$X1) [1] a b c my.sub$X1 - factor(my.sub$X1) levels(my.sub$X1) [1] a b On Fri, Dec 5, 2008 at 7:50 AM, Antje [EMAIL PROTECTED] wrote: Hello, I hope this question is not too stupid. I would like to know how to update levels after subsetting data from a data.frame. df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), c(9,1,2,3,4)) names(df) - c(X1,X2,X3) my.sub - subset(df, X1 == a | X1 == b) levels(my.sub$X1) # still gives me a,b,c, though the subset does not contain entries with c anymore I guess, the solution is rather simple, but I cannot find it. Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels update
I do the following for a subsetted dataframe: cleanfactors - function(mydf){ outdf-mydf for (i in 1:dim(mydf)[2]){ if (is.factor(mydf[,i])) outdf[,i]-factor(mydf[,i]) } outdf } Antje wrote: Hello, I hope this question is not too stupid. I would like to know how to update levels after subsetting data from a data.frame. df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), c(9,1,2,3,4)) names(df) - c(X1,X2,X3) my.sub - subset(df, X1 == a | X1 == b) levels(my.sub$X1) # still gives me a,b,c, though the subset does not contain entries with c anymore I guess, the solution is rather simple, but I cannot find it. Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. No virus found in this incoming message. Checked by AVG - http://www.avg.com Version: 8.0.176 / Virus Database: 270.9.14/1831 - Release Date: 12/4/2008 9:55 PM -- Erich Neuwirth, University of Vienna Faculty of Computer Science Computer Supported Didactics Working Group Visit our SunSITE at http://sunsite.univie.ac.at Phone: +43-1-4277-39464 Fax: +43-1-4277-39459 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels update
On Fri, Dec 5, 2008 at 6:50 AM, Antje [EMAIL PROTECTED] wrote: Hello, I hope this question is not too stupid. I would like to know how to update levels after subsetting data from a data.frame. df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), c(9,1,2,3,4)) names(df) - c(X1,X2,X3) my.sub - subset(df, X1 == a | X1 == b) levels(my.sub$X1) # still gives me a,b,c, though the subset does not contain entries with c anymore I guess, the solution is rather simple, but I cannot find it. You might find it easier just to work with character vectors: options(stringsAsFactors = FALSE) Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels update
On Fri, 5 Dec 2008, jim holtman wrote: try this: df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), c(9,1,2,3,4)) names(df) - c(X1,X2,X3) my.sub - subset(df, X1 == a | X1 == b) levels(my.sub$X1) [1] a b c my.sub$X1 - factor(my.sub$X1) I find my.sub$X1 - my.sub$X1[drop=TRUE] a lot more self-explanatory. See ?[.factor. However, if you find yourself wanting to do this, ask why you have a factor (rather than a character vector) in the first place. levels(my.sub$X1) [1] a b On Fri, Dec 5, 2008 at 7:50 AM, Antje [EMAIL PROTECTED] wrote: Hello, I hope this question is not too stupid. I would like to know how to update levels after subsetting data from a data.frame. df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), c(9,1,2,3,4)) names(df) - c(X1,X2,X3) my.sub - subset(df, X1 == a | X1 == b) levels(my.sub$X1) # still gives me a,b,c, though the subset does not contain entries with c anymore I guess, the solution is rather simple, but I cannot find it. Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels update
Thanks a lot!!! the drop thing was exactly what I was looking for (I already used it some time ago but forgot about it). Thanks to everybody else too. Antje Prof Brian Ripley schrieb: On Fri, 5 Dec 2008, jim holtman wrote: try this: df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), c(9,1,2,3,4)) names(df) - c(X1,X2,X3) my.sub - subset(df, X1 == a | X1 == b) levels(my.sub$X1) [1] a b c my.sub$X1 - factor(my.sub$X1) I find my.sub$X1 - my.sub$X1[drop=TRUE] a lot more self-explanatory. See ?[.factor. However, if you find yourself wanting to do this, ask why you have a factor (rather than a character vector) in the first place. levels(my.sub$X1) [1] a b On Fri, Dec 5, 2008 at 7:50 AM, Antje [EMAIL PROTECTED] wrote: Hello, I hope this question is not too stupid. I would like to know how to update levels after subsetting data from a data.frame. df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), c(9,1,2,3,4)) names(df) - c(X1,X2,X3) my.sub - subset(df, X1 == a | X1 == b) levels(my.sub$X1) # still gives me a,b,c, though the subset does not contain entries with c anymore I guess, the solution is rather simple, but I cannot find it. Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels update
I hope this question is not too stupid. I would like to know how to update levels after subsetting data from a data.frame. df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), c(9,1,2,3,4)) names(df) - c(X1,X2,X3) my.sub - subset(df, X1 == a | X1 == b) levels(my.sub$X1) # still gives me a,b,c, though the subset does not contain entries with c anymore Two questions in one afternon; aren't I good to you! levels(my.sub$X1[,drop=TRUE]) [1] a b levels(factor(my.sub$X1)) [1] a b Regards, Richie. Mathematical Sciences Unit HSL ATTENTION: This message contains privileged and confidential inform...{{dropped:20}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] levels values of cut()
Dear list, I have the following example, from which I am hoping to retrieve numeric values of the factor levels (that is, without the brackets): x - seq(1, 15, length=100) y - sin(x) my.cuts - cut(which(abs(y) 1e-1), 3) levels(my.cuts) hist() does not suit me for this, as it does not necessarily respect the number of breaks. getAnywhere hasn't got me very far: I cannot seem to find a readable code for the built-in cut function in the base library. I think getMethod should do it but I don't understand the arguments to pass. Any pointers appreciated, Thanks, baptiste _ Baptiste Auguié School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels values of cut()
Not sure what you're looking for, but does this help? Extending your code, library(gsubfn) t(strapply(levels(my.cuts),([0-9.]+),([0-9.]+), + function(...) as.numeric(c(...)),backref=-2,simplify=TRUE)) [,1] [,2] [1,] 15.9 38.3 [2,] 38.3 60.7 [3,] 60.7 83.1 - Original Message From: baptiste auguie [EMAIL PROTECTED] To: r-help@r-project.org Sent: Saturday, August 9, 2008 1:51:01 AM Subject: [R] levels values of cut() Dear list, I have the following example, from which I am hoping to retrieve numeric values of the factor levels (that is, without the brackets): x - seq(1, 15, length=100) y - sin(x) my.cuts - cut(which(abs(y) 1e-1), 3) levels(my.cuts) hist() does not suit me for this, as it does not necessarily respect the number of breaks. getAnywhere hasn't got me very far: I cannot seem to find a readable code for the built-in cut function in the base library. I think getMethod should do it but I don't understand the arguments to pass. Any pointers appreciated, Thanks, baptiste _ Baptiste Auguié School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels values of cut()
On Sat, 9 Aug 2008, baptiste auguie wrote: Dear list, I have the following example, from which I am hoping to retrieve numeric values of the factor levels (that is, without the brackets): x - seq(1, 15, length=100) y - sin(x) my.cuts - cut(which(abs(y) 1e-1), 3) levels(my.cuts) hist() does not suit me for this, as it does not necessarily respect the number of breaks. getAnywhere hasn't got me very far: I cannot seem to find a readable code for the built-in cut function in the base library. I think getMethod should do it but I don't understand the arguments to pass. Not getMethod (that's for S4 methods). Just type cut.default at the R prompt. However, try example(cut) foo - levels(cut(aaa, 3)) lims - matrix(nrow=length(foo), ncol=2) lims[,1] - as.numeric( sub(\\((.+),.*, \\1, foo) ) lims[,2] - as.numeric( sub([^,]*,([^]]*)\\], \\1, foo) ) -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels values of cut()
Thank you all for the precious tips. For memory I've made the following wrapper function for this. I wonder whether a short note on these regular expressions could be useful on the help page of cut(). cutIntervals - function(x, ...){ dotArgs - unlist(c(...)) if( any(names(dotArgs) == labels)) stop(labels cannot be specified, use cut instead) cut.fact - levels(cut(x,labels=NULL, ...)) # tip from Brian Ripley lims - matrix(nrow=length(cut.fact), ncol=2) lims[,1] - as.numeric( sub(\\((.+),.*, \\1, cut.fact) ) lims[,2] - as.numeric( sub([^,]*,([^]]*)\\], \\1, cut.fact) ) # alternatively (Stephen Tucker) # library(gsubfn) # lims - t(strapply(cut.fact,([0-9.]+),([0-9.]+), # function(...) as.numeric(c(...)),backref=-2,simplify=TRUE)) lims } cutIntervals(1:5, 3) Many thanks, baptiste On 9 Aug 2008, at 11:12, Prof Brian Ripley wrote: On Sat, 9 Aug 2008, baptiste auguie wrote: Dear list, I have the following example, from which I am hoping to retrieve numeric values of the factor levels (that is, without the brackets): x - seq(1, 15, length=100) y - sin(x) my.cuts - cut(which(abs(y) 1e-1), 3) levels(my.cuts) hist() does not suit me for this, as it does not necessarily respect the number of breaks. getAnywhere hasn't got me very far: I cannot seem to find a readable code for the built-in cut function in the base library. I think getMethod should do it but I don't understand the arguments to pass. Not getMethod (that's for S4 methods). Just type cut.default at the R prompt. However, try example(cut) foo - levels(cut(aaa, 3)) lims - matrix(nrow=length(foo), ncol=2) lims[,1] - as.numeric( sub(\\((.+),.*, \\1, foo) ) lims[,2] - as.numeric( sub([^,]*,([^]]*)\\], \\1, foo) ) -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 _ Baptiste Auguié School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels values of cut()
On Sat, 9 Aug 2008, baptiste auguie wrote: Thank you all for the precious tips. For memory I've made the following wrapper function for this. I wonder whether a short note on these regular expressions could be useful on the help page of cut(). Already there in R-devel cutIntervals - function(x, ...){ dotArgs - unlist(c(...)) if( any(names(dotArgs) == labels)) stop(labels cannot be specified, use cut instead) cut.fact - levels(cut(x,labels=NULL, ...)) # tip from Brian Ripley lims - matrix(nrow=length(cut.fact), ncol=2) lims[,1] - as.numeric( sub(\\((.+),.*, \\1, cut.fact) ) lims[,2] - as.numeric( sub([^,]*,([^]]*)\\], \\1, cut.fact) ) # alternatively (Stephen Tucker) # library(gsubfn) # lims - t(strapply(cut.fact,([0-9.]+),([0-9.]+), # function(...) as.numeric(c(...)),backref=-2,simplify=TRUE)) lims } cutIntervals(1:5, 3) Many thanks, baptiste On 9 Aug 2008, at 11:12, Prof Brian Ripley wrote: On Sat, 9 Aug 2008, baptiste auguie wrote: Dear list, I have the following example, from which I am hoping to retrieve numeric values of the factor levels (that is, without the brackets): x - seq(1, 15, length=100) y - sin(x) my.cuts - cut(which(abs(y) 1e-1), 3) levels(my.cuts) hist() does not suit me for this, as it does not necessarily respect the number of breaks. getAnywhere hasn't got me very far: I cannot seem to find a readable code for the built-in cut function in the base library. I think getMethod should do it but I don't understand the arguments to pass. Not getMethod (that's for S4 methods). Just type cut.default at the R prompt. However, try example(cut) foo - levels(cut(aaa, 3)) lims - matrix(nrow=length(foo), ncol=2) lims[,1] - as.numeric( sub(\\((.+),.*, \\1, foo) ) lims[,2] - as.numeric( sub([^,]*,([^]]*)\\], \\1, foo) ) -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 _ Baptiste Auguié School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Levels error after printing
Hi all, After running this code: __BEGIN__ dat - read.table(gene_prob.txt, sep = \t) n - length(dat$V1) print(n) print(dat$V1) __END__ With this input in gene_prob.txt __INPUT__ HFE 0.00107517988586552 NF1 0.000744355305599206 PML 0.000661649160532628 TCF30.000661649160532628 NF2 0.000578943015466049 GNAS0.000578943015466049 GGA20.000578943015466049 . I get this print out. .. [8541] LOC552889 GPR15 SLC2A11 GRIP2 SGEF [8546] PIK3IP1 RPS27 AQP7 8548 Levels: 3.8-1 A2M A4GALT A4GNT AAAS AAK1 AAMP AANAT AARSD1 AASS ... hCG_1730474 What's the meaning of the last line? Is it an error? How can I fix it? -- Gundala Viswanath __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] levels in dataframes
Dear R community, I wish to ask a short question concerning factor-data in dataframes: When I subset the data and get rid of all data for one level, I still retain the level name (obtained by levels(dataframe$variablename) ). Is there a convenient way to get rid of the levels for which all data has been deleted? Thank you and wishing you an excellent day! Georg. Georg Ehret Johns Hopkins Baltimore, US [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels in dataframes
Georg One way is to call factor() on the subsetted object. georg - factor(LETTERS[1:4]) summary(georg) A B C D 1 1 1 1 georg - georg[georg!='A'] summary(georg) # the level is still there A B C D 0 1 1 1 georg - factor(georg) summary(georg) # now it is gone B C D 1 1 1 HTH Peter Alspach -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Georg Ehret Sent: Wednesday, 23 April 2008 8:58 a.m. To: r-help Subject: [R] levels in dataframes Dear R community, I wish to ask a short question concerning factor-data in dataframes: When I subset the data and get rid of all data for one level, I still retain the level name (obtained by levels(dataframe$variablename) ). Is there a convenient way to get rid of the levels for which all data has been deleted? Thank you and wishing you an excellent day! Georg. Georg Ehret Johns Hopkins Baltimore, US [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are privileged and/or confidential to the named recipient and are not to be used by any other person and/or organisation. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] levels() function for a vector
Karen levels returns the levels attribute of a variable, and a vector has no such attribute. This is usually used with a factor, e.g. temp - c(3, 5, 5, NA) levels(factor(temp)) [1] 3 5 Best wishes Richard Chang Liu wrote: Hello: I'm trying to use levels function, but I don't know why it's returning NULL. For example: temp[1] 3 5 5 NA levels(temp)NULL Also, I've tried: list(temp)[[1]][1] 3 5 5 NA levels(list(temp))NULL Is there a specific requirement on the parameter? Karen _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] levels() function for a vector
Hello: I'm trying to use levels function, but I don't know why it's returning NULL. For example: temp[1] 3 5 5 NA levels(temp)NULL Also, I've tried: list(temp)[[1]][1] 3 5 5 NA levels(list(temp))NULL Is there a specific requirement on the parameter? Karen _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.