Thanks Gabor, It is a very tricky task and your comment helped. I modified the function to handle average of two numbers when it is like 2-3 minutes. I also improved on the regex part to parse the decimal parts also. Right now i can parse 100% of one sample.
Thanks Susanta On Wed, Oct 27, 2010 at 5:11 AM, Gabor Grothendieck <ggrothendi...@gmail.com > wrote: > On Tue, Oct 26, 2010 at 7:17 PM, Gabor Grothendieck > <ggrothendi...@gmail.com> wrote: > > On Tue, Oct 26, 2010 at 3:28 PM, Susanta Mohapatra > > <mohapatra.susa...@gmail.com> wrote: > >> Hi, > >> > >> I am working with a dataset for sometime and I need some help in parsing > >> some data. > >> > >> There is a column called "Duration" which has data like following: > >> > >> 2 minutes => 120 > >> 2 min => 120 > >> 10 seconds =>10 > >> 2 hrs =>7200 > >> 2-3 minutes => 150 or 120 > >> 5 minutes (when i arrived => 300 > >> Flyby approx 20 sec. => 20 > >> felt like 10 mins but tim => 600 > >> > >> I need to convert them to numerics as given. Any help in this regard > will be > >> highly appreciated. > > > > Assuming that "convert to numerics as given" means creating a list of > > numeric vectors, one per row. > > > > or if => was supposed to mean that that is the desired result then try > this: > > > f <- function(n1, n2, units) { > if (n2 == "" && substr(units, 1, 3) == "sec") n1 > else if (n2 == "" && substr(units, 1, 3) == "min") paste(60 * > as.numeric(n1)) > else if (n2 == "" && substr(units, 1, 3) == "hrs") paste(3600 * > as.numeric(n1)) > else if (n2 != "" && substr(units, 1, 3) == "sec") paste(n1, "or", > -as.numeric(n2)) > > else if (n2 != "" && substr(units, 1, 3) == "min") paste(60 * > as.numeric(n1), "or", -60 * as.numeric(n2)) > else if (n2 != "" && substr(units, 1, 3) == "hrs") paste(3600 * > as.numeric(n1), "or", -3660 * as.numeric(n2)) > else NA > } > > > xx <- c("2 minutes ", "2 min ", "10 seconds ", "2 hrs ", " 2-3 minutes ", > "5 minutes (when i arrived ", "Flyby approx 20 sec. ", > "felt like 10 mins but tim ") > > library(gsubfn) > out2 <- strapply(xx, "(\\d+)(-\\d+)? (\\S+)", f) > > The output looks like this: > > > str(out2) > List of 8 > $ : chr "120" > $ : chr "120" > $ : chr "10" > $ : chr "7200" > $ : chr "120 or 180" > $ : chr "300" > $ : chr "20" > $ : chr "600" > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.