Re: [R] cut2 once, bin twice...
sdanzige wrote: sdanzige wrote: Thank you, but the regular expression example doesn't seem to work correctly. I wrote a regular expression that does seem to work, so I'll post it here for anyone else that needs it. labs<-levels(df$p_bin) cbind(lower=as.numeric(sub("[[(]","",sub(",.*","",labs))), upper=as.numeric(sub("[])]","",sub("[[(].*, *","",labs))) ) I fear my inelegance will peg me as a Windows programmer, but so be it... -S You can also use the onlycuts=TRUE option to cut2 to get the vector of cut points, although they are not arranged as a vector of lower and a vector of upper values. It would be easy to customize cut2 to do that. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cut2 once, bin twice...
sdanzige wrote: > > > Thank you, but the regular expression example doesn't seem to work > correctly. > > I wrote a regular expression that does seem to work, so I'll post it here for anyone else that needs it. labs<-levels(df$p_bin) cbind(lower=as.numeric(sub("[[(]","",sub(",.*","",labs))), upper=as.numeric(sub("[])]","",sub("[[(].*, *","",labs))) ) I fear my inelegance will peg me as a Windows programmer, but so be it... -S -- View this message in context: http://www.nabble.com/cut2-once%2C-bin-twice...-tp26020736p26028296.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cut2 once, bin twice...
Dieter Menne wrote: > > > It used to be quite tricky, but on popular request Brian Ripley has added > an example how to extract the intervals using regular expression on the > bottom of the examples for cut (note:cut in base, not cut2 in Hmisc). > > Thank you, but the regular expression example doesn't seem to work correctly. > labs<-levels(df$p_bin) > labs [1] " 0" " 1" " 2" " 3" " 4" " 5" [7] " 6" " 7" " 8" " 9" "10" "11" [13] "12" "13" "14" "15" "16" "17" [19] "18" "19" "20" "[21, 24)" "[24, 28)" "[28, 35)" [25] "[35, 49)" "[49, 69)" "[69, 96)" "[96,270]" > cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs) ), upper = > as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) )) Warning in cbind(lower = as.numeric(sub("\\((.+),.*", "\\1", labs)), upper = as.numeric(sub("[^,]*,([^]]*)\\]", : NAs introduced by coercion Warning in cbind(lower = as.numeric(sub("\\((.+),.*", "\\1", labs)), upper = as.numeric(sub("[^,]*,([^]]*)\\]", : NAs introduced by coercion lower upper [1,] 0 0 [2,] 1 1 [3,] 2 2 [4,] 3 3 [5,] 4 4 [6,] 5 5 [7,] 6 6 [8,] 7 7 [9,] 8 8 [10,] 9 9 [11,]1010 [12,]1111 [13,]1212 [14,]1313 [15,]1414 [16,]1515 [17,]1616 [18,]1717 [19,]1818 [20,]1919 [21,]2020 [22,]NANA [23,]NANA [24,]NANA [25,]NANA [26,]NANA [27,]NANA [28,]NA 270 -- Any ideas? Thank you, -S -- View this message in context: http://www.nabble.com/cut2-once%2C-bin-twice...-tp26020736p26027643.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cut2 once, bin twice...
On Fri, Oct 23, 2009 at 3:58 AM, Dieter Menne wrote: > > > > sdanzige wrote: >> >> >> I'm using the Hmisc cut2 function to bin a set of data. It produces bins >> that I like with results like this: >> >> [96,270]:171 >> [69, 96): 54 >> [49, 69): 40 >> [35, 49): 28 >> [28, 35): 14 >> [24, 28): 8 >> (Other) : 48 >> >> I would like to take a second set of data, and assign it to bins based on >> factors defined by my call to cut 2. >> > > It used to be quite tricky, but on popular request Brian Ripley has added an > example how to extract the intervals using regular expression on the bottom > of the examples for cut (note:cut in base, not cut2 in Hmisc). > > If someone knows of an easier way, please correct me. How about adding this > information as attribute to the standard cut? > The strapply function in gsubfn can do it with a simpler regular expression since it extracts based on content rather than delimiters, which is what you want here: > # create sample data > library(gsubfn) > set.seed(1) > dat <- seq(4, 7, by = 0.05) > x <- sample(dat, 30) . > # use cut > groups <- cut(x, breaks = 10) > # extract interval boundaries using strapply > strapply(levels(groups), "[[:digit:].]+", as.numeric, simplify = TRUE) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 4.0 4.3 4.6 4.9 5.2 5.5 5.8 6.1 6.4 6.7 [2,] 4.3 4.6 4.9 5.2 5.5 5.8 6.1 6.4 6.7 7.0 The above is from demo("gsubfn-cut") For more see the gsubfn home page at http://gsubfn.googlecode.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cut2 once, bin twice...
sdanzige wrote: > > > I'm using the Hmisc cut2 function to bin a set of data. It produces bins > that I like with results like this: > > [96,270]:171 > [69, 96): 54 > [49, 69): 40 > [35, 49): 28 > [28, 35): 14 > [24, 28): 8 > (Other) : 48 > > I would like to take a second set of data, and assign it to bins based on > factors defined by my call to cut 2. > It used to be quite tricky, but on popular request Brian Ripley has added an example how to extract the intervals using regular expression on the bottom of the examples for cut (note:cut in base, not cut2 in Hmisc). If someone knows of an easier way, please correct me. How about adding this information as attribute to the standard cut? Dieter -- View this message in context: http://www.nabble.com/cut2-once%2C-bin-twice...-tp26020736p26022244.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cut2 once, bin twice...
Hello, I'm using the Hmisc cut2 function to bin a set of data. It produces bins that I like with results like this: [96,270]:171 [69, 96): 54 [49, 69): 40 [35, 49): 28 [28, 35): 14 [24, 28): 8 (Other) : 48 I would like to take a second set of data, and assign it to bins based on factors defined by my call to cut 2. Does anyone know how I can do this? Thank you, -S -- View this message in context: http://www.nabble.com/cut2-once%2C-bin-twice...-tp26020736p26020736.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.