Re: [R] cut2 once, bin twice...

2009-10-23 Thread Frank E Harrell Jr

sdanzige wrote:


sdanzige wrote:


Thank you, but the regular expression example doesn't seem to work
correctly.




I wrote a regular expression that does seem to work, so I'll post it here
for anyone else that needs it.

labs<-levels(df$p_bin)
cbind(lower=as.numeric(sub("[[(]","",sub(",.*","",labs))),
upper=as.numeric(sub("[])]","",sub("[[(].*, *","",labs))) )


I fear my inelegance will peg me as a Windows programmer, but so be it... 
-S


You can also use the onlycuts=TRUE option to cut2 to get the vector of 
cut points, although they are not arranged as a vector of lower and a 
vector of upper values.  It would be easy to customize cut2 to do that.


Frank

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cut2 once, bin twice...

2009-10-23 Thread sdanzige


sdanzige wrote:
> 
> 
> Thank you, but the regular expression example doesn't seem to work
> correctly.
> 
> 

I wrote a regular expression that does seem to work, so I'll post it here
for anyone else that needs it.

labs<-levels(df$p_bin)
cbind(lower=as.numeric(sub("[[(]","",sub(",.*","",labs))),
upper=as.numeric(sub("[])]","",sub("[[(].*, *","",labs))) )


I fear my inelegance will peg me as a Windows programmer, but so be it... 
-S
-- 
View this message in context: 
http://www.nabble.com/cut2-once%2C-bin-twice...-tp26020736p26028296.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cut2 once, bin twice...

2009-10-23 Thread sdanzige


Dieter Menne wrote:
> 
> 
> It used to be quite tricky, but on popular request Brian Ripley has added
> an example how to extract the intervals using regular expression on the
> bottom of the examples for cut (note:cut in base, not cut2 in Hmisc).
> 
> 


Thank you, but the regular expression example doesn't seem to work
correctly.

> labs<-levels(df$p_bin)

> labs
 [1] " 0"   " 1"   " 2"   " 3"   " 4"   " 5"  
 [7] " 6"   " 7"   " 8"   " 9"   "10"   "11"  
[13] "12"   "13"   "14"   "15"   "16"   "17"  
[19] "18"   "19"   "20"   "[21, 24)" "[24, 28)" "[28, 35)"
[25] "[35, 49)" "[49, 69)" "[69, 96)" "[96,270]"

> cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs) ), upper =
> as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) ))
Warning in cbind(lower = as.numeric(sub("\\((.+),.*", "\\1", labs)), upper =
as.numeric(sub("[^,]*,([^]]*)\\]",  :
  NAs introduced by coercion
Warning in cbind(lower = as.numeric(sub("\\((.+),.*", "\\1", labs)), upper =
as.numeric(sub("[^,]*,([^]]*)\\]",  :
  NAs introduced by coercion
  lower upper
 [1,] 0 0
 [2,] 1 1
 [3,] 2 2
 [4,] 3 3
 [5,] 4 4
 [6,] 5 5
 [7,] 6 6
 [8,] 7 7
 [9,] 8 8
[10,] 9 9
[11,]1010
[12,]1111
[13,]1212
[14,]1313
[15,]1414
[16,]1515
[17,]1616
[18,]1717
[19,]1818
[20,]1919
[21,]2020
[22,]NANA
[23,]NANA
[24,]NANA
[25,]NANA
[26,]NANA
[27,]NANA
[28,]NA   270

--

Any ideas?

Thank you,
-S
-- 
View this message in context: 
http://www.nabble.com/cut2-once%2C-bin-twice...-tp26020736p26027643.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cut2 once, bin twice...

2009-10-23 Thread Gabor Grothendieck
On Fri, Oct 23, 2009 at 3:58 AM, Dieter Menne
 wrote:
>
>
>
> sdanzige wrote:
>>
>>
>> I'm using the Hmisc cut2 function to bin a set of data.  It produces bins
>> that I like with results like this:
>>
>> [96,270]:171
>> [69, 96): 54
>> [49, 69): 40
>> [35, 49): 28
>> [28, 35): 14
>> [24, 28):  8
>> (Other) : 48
>>
>> I would like to take a second set of data, and assign it to bins based on
>> factors defined by my call to cut 2.
>>
>
> It used to be quite tricky, but on popular request Brian Ripley has added an
> example how to extract the intervals using regular expression on the bottom
> of the examples for cut (note:cut in base, not cut2 in Hmisc).
>
> If someone knows of an easier way, please correct me. How about adding this
> information as attribute to the standard cut?
>

The strapply function in gsubfn can do it with a simpler regular
expression since it extracts based on content rather than delimiters,
which is what you want here:

> # create sample data
> library(gsubfn)
> set.seed(1)
> dat <- seq(4, 7, by = 0.05)
> x <- sample(dat, 30)
.
> # use cut
> groups <- cut(x, breaks = 10)

> # extract interval boundaries using strapply
> strapply(levels(groups), "[[:digit:].]+", as.numeric, simplify = TRUE)
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]  4.0  4.3  4.6  4.9  5.2  5.5  5.8  6.1  6.4   6.7
[2,]  4.3  4.6  4.9  5.2  5.5  5.8  6.1  6.4  6.7   7.0

The above is from

   demo("gsubfn-cut")

For more see the gsubfn home page at http://gsubfn.googlecode.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cut2 once, bin twice...

2009-10-23 Thread Dieter Menne



sdanzige wrote:
> 
> 
> I'm using the Hmisc cut2 function to bin a set of data.  It produces bins
> that I like with results like this:
> 
> [96,270]:171
> [69, 96): 54
> [49, 69): 40
> [35, 49): 28
> [28, 35): 14
> [24, 28):  8
> (Other) : 48
> 
> I would like to take a second set of data, and assign it to bins based on
> factors defined by my call to cut 2.
> 

It used to be quite tricky, but on popular request Brian Ripley has added an
example how to extract the intervals using regular expression on the bottom
of the examples for cut (note:cut in base, not cut2 in Hmisc).

If someone knows of an easier way, please correct me. How about adding this
information as attribute to the standard cut?

Dieter




-- 
View this message in context: 
http://www.nabble.com/cut2-once%2C-bin-twice...-tp26020736p26022244.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cut2 once, bin twice...

2009-10-22 Thread sdanzige

Hello,

I'm using the Hmisc cut2 function to bin a set of data.  It produces bins
that I like with results like this:

[96,270]:171
[69, 96): 54
[49, 69): 40
[35, 49): 28
[28, 35): 14
[24, 28):  8
(Other) : 48

I would like to take a second set of data, and assign it to bins based on
factors defined by my call to cut 2.

Does anyone know how I can do this?

Thank you,
-S
-- 
View this message in context: 
http://www.nabble.com/cut2-once%2C-bin-twice...-tp26020736p26020736.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.