Re: [R] cbind, data.frame | numeric to string?
Don't use cbind() -- it forces everything into a single type, here string, which in turn becomes factor. Simply, data.frame(a, b, c) Like David mentioned a few days ago, I have no idea who is promoting this data.frame(cbind(...)) idiom, but it's a terrible idea (albeit one that seems to be very frequent over the last few weeks) Michael On Tue, Apr 10, 2012 at 10:33 AM, Anser Chen anser.c...@gmail.com wrote: Complete newbie to R -- struggling with something which should be pretty basic. Trying to create a simple data set (which I gather R refers to as a data.frame). So a - c(1,2,3,4,5); b - c(0.3,0.4,0.5,0,6,0.7); Stick the two together into a data frame (call test) using cbind test - data.frame(cbind(a,b)) Seems to do the trick: test a b 1 1 0.3 2 2 0.4 3 3 0.5 4 4 0.6 5 5 0.7 Confirm that each variable is numeric: is.numeric(test$a) [1] TRUE is.numeric(test$b) [1] TRUE OK, so far so good. But, now I want to merge in a vector of characters: c - c('y1,y2,y3,y4,y5) Confirm that this is string: is.numeric(c); [1] FALSE cbind c into the data frame: test - data.frame(cbind(a,b,c)) Looks like everything is in place: test a b c 1 1 0.3 y1 2 2 0.4 y2 3 3 0.5 y3 4 4 0.6 y4 5 5 0.7 y5 Except that it seems as if the moment I cbind in a character vector, it changes numeric data to string: is.numeric(test$a) [1] FALSE is.numeric(test$b) [1] FALSE which would explain why the operations I'm trying to perform on elements of a and b columns are failing. If I look at the structure of the data.frame, I see that in fact *all* the variables are being entered as factors. str(test) 'data.frame': 5 obs. of 3 variables: $ a: Factor w/ 5 levels 1,2,3,4,..: 1 2 3 4 5 $ b: Factor w/ 5 levels 0.3,0.4,0.5,..: 1 2 3 4 5 $ c: Factor w/ 5 levels y1,y2,y3,..: 1 2 3 4 5 But, if I try test - data.frame(cbind(a,b)) str(test) 'data.frame': 5 obs. of 2 variables: $ a: num 1 2 3 4 5 $ b: num 0.3 0.4 0.5 0.6 0.7 a and b are coming back as numeric. So, why does cbind'ing a column of character variables change everything else? And, more to the point, what do I need to do to 'correct' the problem (i.e., stop this from happening). [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cbind, data.frame | numeric to string?
Still didn't work for me without cbind , although you really don't need it ;) worked after i set options(stringsAsFactors=F). options(stringsAsFactors=F) df-data.frame(intVec,chaVec) df intVec chaVec 1 1 a 2 2 b 3 3 c df$chaVec [1] a b c documentation of data.frame says the option is true by default. Am 10.04.2012 um 17:38 schrieb R. Michael Weylandt: Don't use cbind() -- it forces everything into a single type, here string, which in turn becomes factor. Simply, data.frame(a, b, c) Like David mentioned a few days ago, I have no idea who is promoting this data.frame(cbind(...)) idiom, but it's a terrible idea (albeit one that seems to be very frequent over the last few weeks) Michael On Tue, Apr 10, 2012 at 10:33 AM, Anser Chen anser.c...@gmail.com wrote: Complete newbie to R -- struggling with something which should be pretty basic. Trying to create a simple data set (which I gather R refers to as a data.frame). So a - c(1,2,3,4,5); b - c(0.3,0.4,0.5,0,6,0.7); Stick the two together into a data frame (call test) using cbind test - data.frame(cbind(a,b)) Seems to do the trick: test a b 1 1 0.3 2 2 0.4 3 3 0.5 4 4 0.6 5 5 0.7 Confirm that each variable is numeric: is.numeric(test$a) [1] TRUE is.numeric(test$b) [1] TRUE OK, so far so good. But, now I want to merge in a vector of characters: c - c('y1,y2,y3,y4,y5) Confirm that this is string: is.numeric(c); [1] FALSE cbind c into the data frame: test - data.frame(cbind(a,b,c)) Looks like everything is in place: test a b c 1 1 0.3 y1 2 2 0.4 y2 3 3 0.5 y3 4 4 0.6 y4 5 5 0.7 y5 Except that it seems as if the moment I cbind in a character vector, it changes numeric data to string: is.numeric(test$a) [1] FALSE is.numeric(test$b) [1] FALSE which would explain why the operations I'm trying to perform on elements of a and b columns are failing. If I look at the structure of the data.frame, I see that in fact *all* the variables are being entered as factors. str(test) 'data.frame': 5 obs. of 3 variables: $ a: Factor w/ 5 levels 1,2,3,4,..: 1 2 3 4 5 $ b: Factor w/ 5 levels 0.3,0.4,0.5,..: 1 2 3 4 5 $ c: Factor w/ 5 levels y1,y2,y3,..: 1 2 3 4 5 But, if I try test - data.frame(cbind(a,b)) str(test) 'data.frame': 5 obs. of 2 variables: $ a: num 1 2 3 4 5 $ b: num 0.3 0.4 0.5 0.6 0.7 a and b are coming back as numeric. So, why does cbind'ing a column of character variables change everything else? And, more to the point, what do I need to do to 'correct' the problem (i.e., stop this from happening). [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cbind, data.frame | numeric to string?
cbind() works as well, but only if c is attached to the existing test variable: tst - cbind( test, c ) tst ab c 1 1 0.3 y1 2 2 0.4 y2 3 3 0.5 y3 4 4 0.6 y4 5 5 0.7 y5 str( tst ) 'data.frame': 5 obs. of 3 variables: $ a: num 1 2 3 4 5 $ b: num 0.3 0.4 0.5 0.6 0.7 $ c: Factor w/ 5 levels y1,y2,y3,..: 1 2 3 4 5 Not saying it is a good idea, though... Rainer On Tuesday 10 April 2012 11:38:51 R. Michael Weylandt wrote: Don't use cbind() -- it forces everything into a single type, here string, which in turn becomes factor. Simply, data.frame(a, b, c) Like David mentioned a few days ago, I have no idea who is promoting this data.frame(cbind(...)) idiom, but it's a terrible idea (albeit one that seems to be very frequent over the last few weeks) Michael On Tue, Apr 10, 2012 at 10:33 AM, Anser Chen anser.c...@gmail.com wrote: Complete newbie to R -- struggling with something which should be pretty basic. Trying to create a simple data set (which I gather R refers to as a data.frame). So a - c(1,2,3,4,5); b - c(0.3,0.4,0.5,0,6,0.7); Stick the two together into a data frame (call test) using cbind test - data.frame(cbind(a,b)) Seems to do the trick: test a b 1 1 0.3 2 2 0.4 3 3 0.5 4 4 0.6 5 5 0.7 Confirm that each variable is numeric: is.numeric(test$a) [1] TRUE is.numeric(test$b) [1] TRUE OK, so far so good. But, now I want to merge in a vector of characters: c - c('y1,y2,y3,y4,y5) Confirm that this is string: is.numeric(c); [1] FALSE cbind c into the data frame: test - data.frame(cbind(a,b,c)) Looks like everything is in place: test a b c 1 1 0.3 y1 2 2 0.4 y2 3 3 0.5 y3 4 4 0.6 y4 5 5 0.7 y5 Except that it seems as if the moment I cbind in a character vector, it changes numeric data to string: is.numeric(test$a) [1] FALSE is.numeric(test$b) [1] FALSE which would explain why the operations I'm trying to perform on elements of a and b columns are failing. If I look at the structure of the data.frame, I see that in fact *all* the variables are being entered as factors. str(test) 'data.frame': 5 obs. of 3 variables: $ a: Factor w/ 5 levels 1,2,3,4,..: 1 2 3 4 5 $ b: Factor w/ 5 levels 0.3,0.4,0.5,..: 1 2 3 4 5 $ c: Factor w/ 5 levels y1,y2,y3,..: 1 2 3 4 5 But, if I try test - data.frame(cbind(a,b)) str(test) 'data.frame': 5 obs. of 2 variables: $ a: num 1 2 3 4 5 $ b: num 0.3 0.4 0.5 0.6 0.7 a and b are coming back as numeric. So, why does cbind'ing a column of character variables change everything else? And, more to the point, what do I need to do to 'correct' the problem (i.e., stop this from happening). [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cbind, data.frame | numeric to string?
Sorry, I missed that the OP's real question was in character/factor, not in the why are these all factors bit...good catch. Rant about cbind() still stands though. :-) [Your way with cbind() would give him all characters, not some characters and some numerics since cbind() gives a matrix by default -- note that in Rainer's construction, it doesn't coerce because it's giving a data.frame (as cbind.data.frame does but cdbind.default doesn't) so no coercion is necessary ] Best, Michael On Tue, Apr 10, 2012 at 11:55 AM, Jessica Streicher j.streic...@micromata.de wrote: Still didn't work for me without cbind , although you really don't need it ;) worked after i set options(stringsAsFactors=F). options(stringsAsFactors=F) df-data.frame(intVec,chaVec) df intVec chaVec 1 1 a 2 2 b 3 3 c df$chaVec [1] a b c documentation of data.frame says the option is true by default. Am 10.04.2012 um 17:38 schrieb R. Michael Weylandt: Don't use cbind() -- it forces everything into a single type, here string, which in turn becomes factor. Simply, data.frame(a, b, c) Like David mentioned a few days ago, I have no idea who is promoting this data.frame(cbind(...)) idiom, but it's a terrible idea (albeit one that seems to be very frequent over the last few weeks) Michael On Tue, Apr 10, 2012 at 10:33 AM, Anser Chen anser.c...@gmail.com wrote: Complete newbie to R -- struggling with something which should be pretty basic. Trying to create a simple data set (which I gather R refers to as a data.frame). So a - c(1,2,3,4,5); b - c(0.3,0.4,0.5,0,6,0.7); Stick the two together into a data frame (call test) using cbind test - data.frame(cbind(a,b)) Seems to do the trick: test a b 1 1 0.3 2 2 0.4 3 3 0.5 4 4 0.6 5 5 0.7 Confirm that each variable is numeric: is.numeric(test$a) [1] TRUE is.numeric(test$b) [1] TRUE OK, so far so good. But, now I want to merge in a vector of characters: c - c('y1,y2,y3,y4,y5) Confirm that this is string: is.numeric(c); [1] FALSE cbind c into the data frame: test - data.frame(cbind(a,b,c)) Looks like everything is in place: test a b c 1 1 0.3 y1 2 2 0.4 y2 3 3 0.5 y3 4 4 0.6 y4 5 5 0.7 y5 Except that it seems as if the moment I cbind in a character vector, it changes numeric data to string: is.numeric(test$a) [1] FALSE is.numeric(test$b) [1] FALSE which would explain why the operations I'm trying to perform on elements of a and b columns are failing. If I look at the structure of the data.frame, I see that in fact *all* the variables are being entered as factors. str(test) 'data.frame': 5 obs. of 3 variables: $ a: Factor w/ 5 levels 1,2,3,4,..: 1 2 3 4 5 $ b: Factor w/ 5 levels 0.3,0.4,0.5,..: 1 2 3 4 5 $ c: Factor w/ 5 levels y1,y2,y3,..: 1 2 3 4 5 But, if I try test - data.frame(cbind(a,b)) str(test) 'data.frame': 5 obs. of 2 variables: $ a: num 1 2 3 4 5 $ b: num 0.3 0.4 0.5 0.6 0.7 a and b are coming back as numeric. So, why does cbind'ing a column of character variables change everything else? And, more to the point, what do I need to do to 'correct' the problem (i.e., stop this from happening). [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cbind, data.frame | numeric to string?
On Apr 10, 2012, at 11:58 AM, Rainer Schuermann wrote: cbind() works as well, but only if c is attached to the existing test variable: tst - cbind( test, c ) tst ab c 1 1 0.3 y1 2 2 0.4 y2 3 3 0.5 y3 4 4 0.6 y4 5 5 0.7 y5 str( tst ) 'data.frame': 5 obs. of 3 variables: $ a: num 1 2 3 4 5 $ b: num 0.3 0.4 0.5 0.6 0.7 $ c: Factor w/ 5 levels y1,y2,y3,..: 1 2 3 4 5 Not saying it is a good idea, though... To be somewhat more expansive ... 'cbind' is not just one function, but rather a set of functions, since it is generic. The one that is chosen by the interpreter will depend on whether the first argument has a class. If it does have a class as in the example above having a class of data.frame, then the cbind.data.frame function will be dispatched to process the list of arguments. If the first argument doesn't have a class as in the OP's second example below, then the internal cbind function will be used and returns a matrics which strips off all but a few attributes and forces a lowest common denominator mode. If only one of the arguments were logical, then cbind would return a a matrix of all TRUEs and FALSEs. (This all assumes that the typos in the OP's original example that created 'c' as an incomplete expression and a and b with unequal lengths were fixed.) a - c(1,2,3,4,5); b - c(0.3,0.4,0.5,0,6,0.7); test - data.frame(cbind(a,b)) Warning message: In cbind(a, b) : number of rows of result is not a multiple of vector length (arg 1) c - c(y1,y2,y3,y4,y5) cbind(c, test) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 5, 6 -- David. Rainer On Tuesday 10 April 2012 11:38:51 R. Michael Weylandt wrote: Don't use cbind() -- it forces everything into a single type, here string, which in turn becomes factor. Simply, data.frame(a, b, c) Like David mentioned a few days ago, I have no idea who is promoting this data.frame(cbind(...)) idiom, but it's a terrible idea (albeit one that seems to be very frequent over the last few weeks) Michael On Tue, Apr 10, 2012 at 10:33 AM, Anser Chen anser.c...@gmail.com wrote: Complete newbie to R -- struggling with something which should be pretty basic. Trying to create a simple data set (which I gather R refers to as a data.frame). So a - c(1,2,3,4,5); b - c(0.3,0.4,0.5,0,6,0.7); Stick the two together into a data frame (call test) using cbind test - data.frame(cbind(a,b)) Seems to do the trick: test a b 1 1 0.3 2 2 0.4 3 3 0.5 4 4 0.6 5 5 0.7 Confirm that each variable is numeric: is.numeric(test$a) [1] TRUE is.numeric(test$b) [1] TRUE OK, so far so good. But, now I want to merge in a vector of characters: c - c('y1,y2,y3,y4,y5) Confirm that this is string: is.numeric(c); [1] FALSE cbind c into the data frame: test - data.frame(cbind(a,b,c)) Looks like everything is in place: test a b c 1 1 0.3 y1 2 2 0.4 y2 3 3 0.5 y3 4 4 0.6 y4 5 5 0.7 y5 Except that it seems as if the moment I cbind in a character vector, it changes numeric data to string: is.numeric(test$a) [1] FALSE is.numeric(test$b) [1] FALSE which would explain why the operations I'm trying to perform on elements of a and b columns are failing. If I look at the structure of the data.frame, I see that in fact *all* the variables are being entered as factors. str(test) 'data.frame': 5 obs. of 3 variables: $ a: Factor w/ 5 levels 1,2,3,4,..: 1 2 3 4 5 $ b: Factor w/ 5 levels 0.3,0.4,0.5,..: 1 2 3 4 5 $ c: Factor w/ 5 levels y1,y2,y3,..: 1 2 3 4 5 But, if I try test - data.frame(cbind(a,b)) str(test) 'data.frame': 5 obs. of 2 variables: $ a: num 1 2 3 4 5 $ b: num 0.3 0.4 0.5 0.6 0.7 a and b are coming back as numeric. So, why does cbind'ing a column of character variables change everything else? And, more to the point, what do I need to do to 'correct' the problem (i.e., stop this from happening). [[alternative HTML version deleted]] David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cbind, data.frame | numeric to string?
On Apr 10, 2012, at 12:19 PM, David Winsemius wrote: On Apr 10, 2012, at 11:58 AM, Rainer Schuermann wrote: cbind() works as well, but only if c is attached to the existing test variable: tst - cbind( test, c ) tst ab c 1 1 0.3 y1 2 2 0.4 y2 3 3 0.5 y3 4 4 0.6 y4 5 5 0.7 y5 str( tst ) 'data.frame': 5 obs. of 3 variables: $ a: num 1 2 3 4 5 $ b: num 0.3 0.4 0.5 0.6 0.7 $ c: Factor w/ 5 levels y1,y2,y3,..: 1 2 3 4 5 Not saying it is a good idea, though... To be somewhat more expansive ... 'cbind' is not just one function, but rather a set of functions, since it is generic. The one that is chosen by the interpreter will depend on whether the first argument has a class. That was just my erroneous impression. If _any_ of the objects in the argument list is a data.frame then cbind.data.frame appears to get used. There is a Dispatch section on the help page for cbind that appears to cover this adequately. If it does have a class as in the example above having a class of data.frame, then the cbind.data.frame function will be dispatched to process the list of arguments. If the first argument doesn't have a class as in the OP's second example below, then the internal cbind function will be used and returns a matrics which strips off all but a few attributes and forces a lowest common denominator mode. If only one of the arguments were logical, then cbind would return a a matrix of all TRUEs and FALSEs. (This all assumes that the typos in the OP's original example that created 'c' as an incomplete expression and a and b with unequal lengths were fixed.) a - c(1,2,3,4,5); b - c(0.3,0.4,0.5,0,6,0.7); test - data.frame(cbind(a,b)) Warning message: In cbind(a, b) : number of rows of result is not a multiple of vector length (arg 1) c - c(y1,y2,y3,y4,y5) cbind(c, test) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 5, 6 -- David. Rainer On Tuesday 10 April 2012 11:38:51 R. Michael Weylandt wrote: Don't use cbind() -- it forces everything into a single type, here string, which in turn becomes factor. Simply, data.frame(a, b, c) Like David mentioned a few days ago, I have no idea who is promoting this data.frame(cbind(...)) idiom, but it's a terrible idea (albeit one that seems to be very frequent over the last few weeks) Michael On Tue, Apr 10, 2012 at 10:33 AM, Anser Chen anser.c...@gmail.com wrote: Complete newbie to R -- struggling with something which should be pretty basic. Trying to create a simple data set (which I gather R refers to as a data.frame). So a - c(1,2,3,4,5); b - c(0.3,0.4,0.5,0,6,0.7); Stick the two together into a data frame (call test) using cbind test - data.frame(cbind(a,b)) Seems to do the trick: test a b 1 1 0.3 2 2 0.4 3 3 0.5 4 4 0.6 5 5 0.7 Confirm that each variable is numeric: is.numeric(test$a) [1] TRUE is.numeric(test$b) [1] TRUE OK, so far so good. But, now I want to merge in a vector of characters: c - c('y1,y2,y3,y4,y5) Confirm that this is string: is.numeric(c); [1] FALSE cbind c into the data frame: test - data.frame(cbind(a,b,c)) Looks like everything is in place: test a b c 1 1 0.3 y1 2 2 0.4 y2 3 3 0.5 y3 4 4 0.6 y4 5 5 0.7 y5 Except that it seems as if the moment I cbind in a character vector, it changes numeric data to string: is.numeric(test$a) [1] FALSE is.numeric(test$b) [1] FALSE which would explain why the operations I'm trying to perform on elements of a and b columns are failing. If I look at the structure of the data.frame, I see that in fact *all* the variables are being entered as factors. str(test) 'data.frame': 5 obs. of 3 variables: $ a: Factor w/ 5 levels 1,2,3,4,..: 1 2 3 4 5 $ b: Factor w/ 5 levels 0.3,0.4,0.5,..: 1 2 3 4 5 $ c: Factor w/ 5 levels y1,y2,y3,..: 1 2 3 4 5 But, if I try test - data.frame(cbind(a,b)) str(test) 'data.frame': 5 obs. of 2 variables: $ a: num 1 2 3 4 5 $ b: num 0.3 0.4 0.5 0.6 0.7 a and b are coming back as numeric. So, why does cbind'ing a column of character variables change everything else? And, more to the point, what do I need to do to 'correct' the problem (i.e., stop this from happening). [[alternative HTML version deleted]] David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.