Dear Marna,

If you want to extract the middle of those intervals, please find below an improved variant of Luigi's code.


Note:
- it is more efficient to process the levels of a factor, instead of all the individual strings; - I envision that there are benefits in a large data frame (> 1 million rows) - although I have not explicitly checked it;
- the code also handles better the open/closed intervals;
- the returned data structure may require some tweaking (currently returns a data.frame);



### Middle of an Interval
mid.factor = function(x, inf.to = NULL, split.str=",") {
    lvl0 = levels(x); lvl = lvl0;
    lvl = sub("^[(\\[]", "", lvl);
    lvl = sub("[])]$", "", lvl); # tricky;
    lvl = strsplit(lvl, split.str);
    lvl = lapply(lvl, function(x) as.numeric(x));
    if( ! is.null(inf.to)) {
        FUN = function(x) {
            if(any(x == Inf)) 1
            else if(any(x == - Inf)) -1
            else 0;
        }
        whatInf = sapply(lvl, FUN);
        # TODO: more advanced;
        lvl[whatInf == -1] = inf.to[1];
        lvl[whatInf ==  1] = inf.to[2];
    }
    mid = sapply(lvl, mean);
    lvl = data.frame(lvl=lvl0, mid=mid);
    merge(data.frame(lvl=x), lvl, by="lvl");
}


# uses the daT data frame;
# requires a factor:
# - this is probably the case with the original data;
daT$group = as.factor(daT$group);
mid.factor(daT$group);


I have uploaded this code also on my GitHub list of useful data tools:

https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R


Sincerely,


Leonard

______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to