On 5/5/05, Gabor Grothendieck <[EMAIL PROTECTED]> wrote: > On 5/4/05, Jeff Enos <[EMAIL PROTECTED]> wrote: > > R-devel, > > > > The performance of as.Date differs by a large degree between one of my > > machines with glibc 2.3.2: > > > > > system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y")) > > [1] 1.17 0.00 1.18 0.00 0.00 > > > > and a comparable machine with glibc 2.3.3: > > > > > system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y")) > > [1] 31.20 46.89 81.01 0.00 0.00 > > > > both with the same R version: > > > > > R.version > > _ > > platform i686-pc-linux-gnu > > arch i686 > > os linux-gnu > > system i686, linux-gnu > > status > > major 2 > > minor 1.0 > > year 2005 > > month 04 > > day 18 > > language R > > > > I'm focusing on differences in glibc versions because of as.Date's use > > of strptime. > > > > Does it seem likely that the cause of this discrepancy is in fact > > glibc? If so, can anyone tell me how to make the performance of the > > second machine more like the first? > > > > I have verified that using the chron package, which I don't believe > > uses strptime, for the above character conversion performs equally > > well on both machines. > > I think its likely the character processing that is the bottleneck. You > can speed that part up by extracting the substrings directly: > > > system.time({ > + dd <- rep("01-01-2005", 10000) > + year <- as.numeric(substr(dd, 7, 10)) > + mon <- as.numeric(substr(dd, 1, 2)) > + day <- as.numeric(substr(dd, 4, 5)) > + x <- as.Date(ISOdate(year, mon, day)) > + }, gc = TRUE) > [1] 0.42 0.00 0.51 NA NA > > > system.time(x <- as.Date(rep("01-01-2005", 100000), format = "%m-%d-%Y"), > > gc=TRUE) > [1] 1.08 0.00 1.22 NA NA >
Sorry but I got the number of zeros in the reps wrong. Its actually slower. ______________________________________________ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel