Re: [R] Cut intervals (character) to numeric midpoint; regex problem

Gabor Grothendieck Tue, 01 Dec 2009 12:34:38 -0800

Try this:

> library(gsubfn)
> strapply(testvec, "[-+.0-9]+", as.numeric, simplify = ~
colMeans(cbind(...)))
[1] -5.8500 -2.9800 -2.8160 -2.7120 -2.6325 -2.5680



On Tue, Dec 1, 2009 at 3:14 PM, David Winsemius <dwinsem...@comcast.net>wrote:

> I'm sitting here chuckling. Your solution is just so "pure".
>
> I would offer an enhancement. When I tested with my cuts that had "-"
> before the digits, you solution dropped them, so my suggestion for the
> pattern would be:   "[-[:digit:].]+"
>
> I will admit that I thought it might fail with positive numbers but it does
> not seem to:
>
> > interv <- strapply(testvec, "[-[:digit:].]+", as.numeric, simplify =
> TRUE)
> > interv
>       [,1]     [,2]   [,3]   [,4]   [,5]   [,6]   [,7]    [,8]   [,9] [,10]
> [1,] -8.616   -3.084 -2.876 -2.756 -2.668 -2.597 -1.008 -1.0000 0.9914
> 1.000
> [2,] -3.084   -2.876 -2.756 -2.668 -2.597 -2.539 -1.000 -0.9922 1.0000
> 1.009
>
> I was not able to get that pattern to give acceptable results in gsubfn, so
> I obviously need to study this more closely.
>
> --
> David.
>
>
> On Dec 1, 2009, at 2:47 PM, Gabor Grothendieck wrote:
>
>  You also might want to look at
>>
>> demo("gsubfn-cut")
>>
>>
>> On Tue, Dec 1, 2009 at 2:41 PM, David Winsemius <dwinsem...@comcast.net>
>> wrote:
>> Starting with the head of a 499 element matrix whose column names are now
>> the labels trom a cut() operation, I needed to get to a vector of midpoints
>> to serve as the basis for plotting a calibration curve ( exp(linear
>> predictor) vs.  :
>>
>> > dput(head(dimnames(mtcal)[2][[1]])) # was starting point
>>
>>
>> testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]", "(-2.876,-2.756]",
>> "(-2.756,-2.668]",
>> "(-2.668,-2.597]", "(-2.597,-2.539]")
>>
>> I started this message with the thought of requesting an answer but kept
>> asking myself if I really had check the docs and tested my understanding. I
>> eventually solved it using the gsubfn from the gsubfn package:
>>
>> testintvl <-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*),
>> (-?[[:digit:]]+.?[[:digit:]]*)\\]",
>> ~ (as.numeric(x)+as.numeric(y))/2,  testvec))
>>
>> # I did discover that carriage returns in the middle of the pattern will
>> not give desired results, so if this is broken by your mail-client, be sure
>> to rejoin in the console.
>>
>> The extra "?"'s after the decimal point are in there because I had 4 NA's
>> around the median linear predictor:
>>
>> > dimnames(mtcal)[2][[1]][which(is.na(testintvl))]
>> [1] "(-1.008,-1]"  "(-1,-0.9922]" "(0.9914,1]"   "(1,1.009]"
>>
>> So a better test vector would be:
>>
>> testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]", "(-2.876,-2.756]",
>> "(-2.756,-2.668]",
>> "(-2.668,-2.597]", "(-2.597,-2.539]", "(-1.008,-1]",  "(-1,-0.9922]",
>> "(0.9914,1]", "(1,1.009]" )
>>
>> > testintvl
>> <-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*),(-?[[:digit:]]+.?[[:digit:]]*)\\]",
>> + ~ (as.numeric(x)+as.numeric(y))/2,  testvec))
>>
>> > testintvl
>>  [1] -5.8500 -2.9800 -2.8160 -2.7120 -2.6325 -2.5680 -1.0040 -0.9961
>>  0.9957  1.0045
>>
>> I offer this to those who may feel regex challenged (as I often do). The
>> gsubfn function is pretty slick. I don't see an author listed for the
>> function, but the author of the package documents is Gabor Grothendieck.
>>
>> --
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cut intervals (character) to numeric midpoint; regex problem

Reply via email to