Re: [Rd] Varying as.Date performance

2005-05-06 Thread Jeff Enos
Thanks for this optimization and for adding it to R-devel.  My
original command on the glibc 2.3.3 machine now runs much faster:

> system.time(x <- as.Date(rep("01-01-2005", 10), format = "%m-%d-%Y"))
[1] 0.97 0.91 1.88 0.00 0.00

and is in line with the time seen on the glibc 2.3.2 machine.

Jeff

Prof Brian Ripley writes:
 > On Thu, 5 May 2005, Peter Dalgaard wrote:
 > 
 > > Jeff Enos <[EMAIL PROTECTED]> writes:
 > >
 > >> Thanks for these suggestions.  C-level profiling yields the following:
 > >>
 > >>   %   cumulative   self  self total
 > >>  time   seconds   secondscalls   s/call   s/call  name
 > >>  36.01  5.34 5.34   10 0.00 0.00  get_locale_strings
 > >>   4.32  5.98 0.64   10 0.00 0.00  mktime00
 > >>   3.98  6.57 0.59   277462 0.00 0.00  Rf_eval
 > >>   3.71  7.12 0.55   472935 0.00 0.00  Rf_findVarInFrame3
 > >>   3.64  7.66 0.54   10 0.00 0.00  strptime_internal
 > >>   3.51  8.18 0.521 0.52 7.51  do_strptime
 > >>
 > >> It looks like strftime is called from get_locale_strings, which might
 > >> be the culprit.  Any suggestions on where I might go from here?
 > >
 > > You might try modifying get_locale_strings (and its wide counterpart)
 > > with a check for an unchanged locale. E.g.
 > >
 > > static char *last_LC_TIME=NULL;
 > >
 > > 
 > >
 > >  tmp = setlocale(LC_TIME, NULL)
 > >  if (strcmp(tmp, last_LC_TIME)) return;
 > >
 > >  last_LC_TIME = tmp;
 > >
 > >  set the strings 
 > >
 > > if the call to setlocale is considerably faster than 40 calls to
 > > strftime(), you might have a winner.
 > 
 > Yes, I think that would be a worthwhile optimization.  I didn't bother 
 > because I figured it would be fast enough (which at 50musec it almost 
 > always is).
 > 
 > However, get_locale_strings is only 36% of the total, and we have at least 
 > another 60% to account for.  (81.01 vs 1.18 secs.)
 > 
 > -- 
 > Brian D. Ripley,      [EMAIL PROTECTED]
 > Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 > University of Oxford, Tel:  +44 1865 272861 (self)
 > 1 South Parks Road, +44 1865 272866 (PA)
 > Oxford OX1 3TG, UKFax:  +44 1865 272595

-- 
Jeff Enos
Kane Capital Management
[EMAIL PROTECTED]

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Varying as.Date performance

2005-05-05 Thread Jeff Enos
Prof Brian Ripley writes:
 > One other possibly difference would be locale, but this is slow on FC3 
 > (2.3.4 now) in the C locale.  Almost all the time is in strptime:
 > R profiling shows
 > 
 > > summaryRprof()
 > $by.self
 >  self.time self.pct total.time total.pct
 > "strptime"  29.58 99.7  29.58  99.7
 > "as.Date.character"  0.10  0.3  29.68 100.0
 > "as.Date"0.00  0.0  29.68 100.0
 > "eval"   0.00  0.0  29.68 100.0
 > "system.time"0.00  0.0  29.68 100.0
 > 
 > Now on a glibc 2.3.x system R's internal replacement for strptime will be 
 > used (to work around bugs) so it must be some other part of the POSIX 
 > time-handling that has changed.
 > 
 > The next step would be to do C-level profiling and then retrofit the 
 > crucial code from glibc 2.3.2.

Thanks for these suggestions.  C-level profiling yields the following:

  %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
 36.01  5.34 5.34   10 0.00 0.00  get_locale_strings
  4.32  5.98 0.64   10 0.00 0.00  mktime00
  3.98  6.57 0.59   277462 0.00 0.00  Rf_eval
  3.71  7.12 0.55   472935 0.00 0.00  Rf_findVarInFrame3
  3.64  7.66 0.54   10 0.00 0.00  strptime_internal
  3.51  8.18 0.521 0.52 7.51  do_strptime

It looks like strftime is called from get_locale_strings, which might
be the culprit.  Any suggestions on where I might go from here?

 > It does seem a pretty unusual application of R for 10^5 date conversions 
 > to be needed and for 30 secs to be an appreciable part of the analysis 
 > time on such a data set.

This is an issue for me when interactively loading a sizable
timeseries dataset into R from Postgres, converting character strings
into objects of class Date.

Thanks,

Jeff

 > 
 > On Wed, 4 May 2005, Jeff Enos wrote:
 > 
 > > R-devel,
 > >
 > > The performance of as.Date differs by a large degree between one of my
 > > machines with glibc 2.3.2:
 > >
 > >> system.time(x <- as.Date(rep("01-01-2005", 10), format = "%m-%d-%Y"))
 > > [1] 1.17 0.00 1.18 0.00 0.00
 > >
 > > and a comparable machine with glibc 2.3.3:
 > >
 > >> system.time(x <- as.Date(rep("01-01-2005", 10), format = "%m-%d-%Y"))
 > > [1] 31.20 46.89 81.01  0.00  0.00
 > >
 > > both with the same R version:
 > >
 > >> R.version
 > > _
 > > platform i686-pc-linux-gnu
 > > arch i686
 > > os   linux-gnu
 > > system   i686, linux-gnu
 > > status
 > > major2
 > > minor1.0
 > > year 2005
 > > month04
 > > day  18
 > > language R
 > >
 > > I'm focusing on differences in glibc versions because of as.Date's use
 > > of strptime.
 > >
 > > Does it seem likely that the cause of this discrepancy is in fact
 > > glibc?  If so, can anyone tell me how to make the performance of the
 > > second machine more like the first?
 > >
 > > I have verified that using the chron package, which I don't believe
 > > uses strptime, for the above character conversion performs equally
 > > well on both machines.
 > 
 > -- 
 > Brian D. Ripley,  [EMAIL PROTECTED]
 > Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 > University of Oxford, Tel:  +44 1865 272861 (self)
 > 1 South Parks Road, +44 1865 272866 (PA)
 > Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Varying as.Date performance

2005-05-04 Thread Jeff Enos
R-devel,

The performance of as.Date differs by a large degree between one of my
machines with glibc 2.3.2:

> system.time(x <- as.Date(rep("01-01-2005", 10), format = "%m-%d-%Y"))
[1] 1.17 0.00 1.18 0.00 0.00

and a comparable machine with glibc 2.3.3:

> system.time(x <- as.Date(rep("01-01-2005", 10), format = "%m-%d-%Y"))
[1] 31.20 46.89 81.01  0.00  0.00

both with the same R version:

> R.version
 _
platform i686-pc-linux-gnu
arch i686 
os   linux-gnu
system   i686, linux-gnu  
status
major2
minor1.0  
year 2005 
month04   
day  18   
language R

I'm focusing on differences in glibc versions because of as.Date's use
of strptime.

Does it seem likely that the cause of this discrepancy is in fact
glibc?  If so, can anyone tell me how to make the performance of the
second machine more like the first?

I have verified that using the chron package, which I don't believe
uses strptime, for the above character conversion performs equally
well on both machines.

Thanks in advance,

Jeff

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel