On 5/4/05, Jeff Enos <[EMAIL PROTECTED]> wrote: > R-devel, > > The performance of as.Date differs by a large degree between one of my > machines with glibc 2.3.2: > > > system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y")) > [1] 1.17 0.00 1.18 0.00 0.00 > > and a comparable machine with glibc 2.3.3: > > > system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y")) > [1] 31.20 46.89 81.01 0.00 0.00 > > both with the same R version: > > > R.version > _ > platform i686-pc-linux-gnu > arch i686 > os linux-gnu > system i686, linux-gnu > status > major 2 > minor 1.0 > year 2005 > month 04 > day 18 > language R > > I'm focusing on differences in glibc versions because of as.Date's use > of strptime. > > Does it seem likely that the cause of this discrepancy is in fact > glibc? If so, can anyone tell me how to make the performance of the > second machine more like the first? > > I have verified that using the chron package, which I don't believe > uses strptime, for the above character conversion performs equally > well on both machines.
I think its likely the character processing that is the bottleneck. You can speed that part up by extracting the substrings directly: > system.time({ + dd <- rep("01-01-2005", 10000) + year <- as.numeric(substr(dd, 7, 10)) + mon <- as.numeric(substr(dd, 1, 2)) + day <- as.numeric(substr(dd, 4, 5)) + x <- as.Date(ISOdate(year, mon, day)) + }, gc = TRUE) [1] 0.42 0.00 0.51 NA NA > system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y"), > gc=TRUE) [1] 1.08 0.00 1.22 NA NA ______________________________________________ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel