Re: DateTime performance
On Saturday 05 May 2012 17:14:52 N Heinrichs wrote: ++ to using precomposed TZ object (as you observed, supplying only the name as a string still results in lengthy DT:TZ object creation overhead.) If you use them, I would also precompose any necessary Formatter or Locale objects. Thanks for the hint, but it does not apply to my situation: most of the time, I am creating DateTime objects from Unix epoch seconds; very occasionally from individual pieces (year, month, etc). So, stuff is already broken down quite nicely. I'll check out the NO_VALIDATION option. Depending on how you're parsing the date strings and composing the DT objects, `local $Params::Validate::NO_VALIDATION = 1;` can speed things up for you. If you're using a DateTime::Formatter class to parse strings into DT objects for you, you might investigate using your own regex to chop up the string and call DT-new directly. This code (including comments) is from early 2011 and I unfortunately do not have benchmark data handy: # NOTE: This method is faster than using DateTime::Formatter::MySQL # NOTE2: It's also faster than `split m#[ /:T-]#` $timestamp =~ m!^ (?:\s+)?(\d{4,4})[/-](\d{1,2})[/-](\d{1,2}) # Required date portion (?:[T\s](\d{1,2}):(\d{1,2}):(\d{1,2}))? # Optional time portion (?:\s?([\w/\+:]+))? # Optional timezone $!x; my ($y, $m, $d, $hr, $min, $sec, $tz) = ($1, $2, $3, $4, $5, $6, $7); On 4 May 2012 13:20, Philipp K. Janert jan...@ieee.org wrote: On Thursday 03 May 2012 02:14:45 you wrote: From: Philipp K. Janert [mailto:jan...@ieee.org] Sent: Wednesday, 2 May 2012 8:29 AM Question: When using DateTime for a large number of instances, it becomes a serious performance drag. ... Is this expected behavior? And are there access patterns that I can use to mitigate this effect? (I tried to supply a time_zone explicitly, but that does not seem to improve things significantly.) Hi Phillip, My #1 tip is to pre-prepare/cache the DateTime::TimeZone object and pass it in to each creation of a DateTime object (via whatever mechanism you're using to do that). I have seen a case where we were using time_zone = 'local' in a reasonably tight datetime object creation loop and saw significant speed increases just by cutting out that chunk of processing. In hindsight that was a silly thing to do but it became an easy win :-) I apologise if this is what you meant by supplying a time_zone explicitly in your comment above. I have tried to specify the timezone explicitly as a string: $dt = DateTime-new( ..., time_zone = America/Chicago ) which does not seem to help, but I have not tried to do: $tz = DateTime::TimeZone( 'America/Chicago' ) $dt = DateTime-new( ..., time_zone = $tz ) I'll try that the next time I have to process one of my data sets again. ;-) Thanks for the hint. I can't recommend using a tool like NYTProf highly enough on a run of your tool to spot the low hanging fruit. See https://metacpan.org/module/Devel::NYTProf Cheers, Andrew
Re: DateTime performance
++ to using precomposed TZ object (as you observed, supplying only the name as a string still results in lengthy DT:TZ object creation overhead.) If you use them, I would also precompose any necessary Formatter or Locale objects. Depending on how you're parsing the date strings and composing the DT objects, `local $Params::Validate::NO_VALIDATION = 1;` can speed things up for you. If you're using a DateTime::Formatter class to parse strings into DT objects for you, you might investigate using your own regex to chop up the string and call DT-new directly. This code (including comments) is from early 2011 and I unfortunately do not have benchmark data handy: # NOTE: This method is faster than using DateTime::Formatter::MySQL # NOTE2: It's also faster than `split m#[ /:T-]#` $timestamp =~ m!^ (?:\s+)?(\d{4,4})[/-](\d{1,2})[/-](\d{1,2}) # Required date portion (?:[T\s](\d{1,2}):(\d{1,2}):(\d{1,2}))? # Optional time portion (?:\s?([\w/\+:]+))? # Optional timezone $!x; my ($y, $m, $d, $hr, $min, $sec, $tz) = ($1, $2, $3, $4, $5, $6, $7); On 4 May 2012 13:20, Philipp K. Janert jan...@ieee.org wrote: On Thursday 03 May 2012 02:14:45 you wrote: From: Philipp K. Janert [mailto:jan...@ieee.org] Sent: Wednesday, 2 May 2012 8:29 AM Question: When using DateTime for a large number of instances, it becomes a serious performance drag. ... Is this expected behavior? And are there access patterns that I can use to mitigate this effect? (I tried to supply a time_zone explicitly, but that does not seem to improve things significantly.) Hi Phillip, My #1 tip is to pre-prepare/cache the DateTime::TimeZone object and pass it in to each creation of a DateTime object (via whatever mechanism you're using to do that). I have seen a case where we were using time_zone = 'local' in a reasonably tight datetime object creation loop and saw significant speed increases just by cutting out that chunk of processing. In hindsight that was a silly thing to do but it became an easy win :-) I apologise if this is what you meant by supplying a time_zone explicitly in your comment above. I have tried to specify the timezone explicitly as a string: $dt = DateTime-new( ..., time_zone = America/Chicago ) which does not seem to help, but I have not tried to do: $tz = DateTime::TimeZone( 'America/Chicago' ) $dt = DateTime-new( ..., time_zone = $tz ) I'll try that the next time I have to process one of my data sets again. ;-) Thanks for the hint. I can't recommend using a tool like NYTProf highly enough on a run of your tool to spot the low hanging fruit. See https://metacpan.org/module/Devel::NYTProf Cheers, Andrew
Re: DateTime performance
On Thursday 03 May 2012 02:10:04 you wrote: On 2012.5.1 3:29 PM, Philipp K. Janert wrote: However, when working through a files with a few tens of millions of records, DateTime turns into a REAL drag on performance. Is this expected behavior? And are there access patterns that I can use to mitigate this effect? (I tried to supply a time_zone explicitly, but that does not seem to improve things significantly.) Unfortunately due to the way DateTime is architected it does a lot of precalculation upon object instantiation which is usually not used. So yes, it is expected in that sense. Ok. If all you need is date objects with a sensible interface, try DateTimeX::Lite. It claims to replicate a good chunk of the DateTime interface in a fraction of the memory. I'll check it out, thanks. Given how much time it takes to make a DateTime object, and your scale of tens of millions of records, you could cache DateTime objects for each timestamp and use clone() to get a new instance. I considered that, but in reality, most of my timestamps are actually different. (There are about 30M seconds in a year, so I won't have much duplication, looking at 10-50M records spread over a year...) sub get_datetime { my $timestamp = shift; state $cache = {}; if( defined $cache-{$timestamp} ) { return $cache-{$timestamp}-clone; } else { $cache-{$timestamp} = make_datetime_from_timestamp($timestamp); return $cache-{$timestamp}; } }
Re: DateTime performance
On Thursday 03 May 2012 02:14:45 you wrote: From: Philipp K. Janert [mailto:jan...@ieee.org] Sent: Wednesday, 2 May 2012 8:29 AM Question: When using DateTime for a large number of instances, it becomes a serious performance drag. ... Is this expected behavior? And are there access patterns that I can use to mitigate this effect? (I tried to supply a time_zone explicitly, but that does not seem to improve things significantly.) Hi Phillip, My #1 tip is to pre-prepare/cache the DateTime::TimeZone object and pass it in to each creation of a DateTime object (via whatever mechanism you're using to do that). I have seen a case where we were using time_zone = 'local' in a reasonably tight datetime object creation loop and saw significant speed increases just by cutting out that chunk of processing. In hindsight that was a silly thing to do but it became an easy win :-) I apologise if this is what you meant by supplying a time_zone explicitly in your comment above. I have tried to specify the timezone explicitly as a string: $dt = DateTime-new( ..., time_zone = America/Chicago ) which does not seem to help, but I have not tried to do: $tz = DateTime::TimeZone( 'America/Chicago' ) $dt = DateTime-new( ..., time_zone = $tz ) I'll try that the next time I have to process one of my data sets again. ;-) Thanks for the hint. I can't recommend using a tool like NYTProf highly enough on a run of your tool to spot the low hanging fruit. See https://metacpan.org/module/Devel::NYTProf Cheers, Andrew
DateTime performance
Question: When using DateTime for a large number of instances, it becomes a serious performance drag. A typical application for me involves things like log files: I use DateTime to translate the timestamps in these files into a canonical format, and then get information such as day-of-week or time-of-day from DateTime. However, when working through a files with a few tens of millions of records, DateTime turns into a REAL drag on performance. Is this expected behavior? And are there access patterns that I can use to mitigate this effect? (I tried to supply a time_zone explicitly, but that does not seem to improve things significantly.) Best, Ph.
Re: DateTime performance
On 2012.5.1 3:29 PM, Philipp K. Janert wrote: However, when working through a files with a few tens of millions of records, DateTime turns into a REAL drag on performance. Is this expected behavior? And are there access patterns that I can use to mitigate this effect? (I tried to supply a time_zone explicitly, but that does not seem to improve things significantly.) Unfortunately due to the way DateTime is architected it does a lot of precalculation upon object instantiation which is usually not used. So yes, it is expected in that sense. If all you need is date objects with a sensible interface, try DateTimeX::Lite. It claims to replicate a good chunk of the DateTime interface in a fraction of the memory. Given how much time it takes to make a DateTime object, and your scale of tens of millions of records, you could cache DateTime objects for each timestamp and use clone() to get a new instance. sub get_datetime { my $timestamp = shift; state $cache = {}; if( defined $cache-{$timestamp} ) { return $cache-{$timestamp}-clone; } else { $cache-{$timestamp} = make_datetime_from_timestamp($timestamp); return $cache-{$timestamp}; } } -- 100. Claymore mines are not filled with yummy candy, and it is wrong to tell new soldiers that they are. -- The 213 Things Skippy Is No Longer Allowed To Do In The U.S. Army http://skippyslist.com/list/
RE: DateTime performance
From: Philipp K. Janert [mailto:jan...@ieee.org] Sent: Wednesday, 2 May 2012 8:29 AM Question: When using DateTime for a large number of instances, it becomes a serious performance drag. ... Is this expected behavior? And are there access patterns that I can use to mitigate this effect? (I tried to supply a time_zone explicitly, but that does not seem to improve things significantly.) Hi Phillip, My #1 tip is to pre-prepare/cache the DateTime::TimeZone object and pass it in to each creation of a DateTime object (via whatever mechanism you're using to do that). I have seen a case where we were using time_zone = 'local' in a reasonably tight datetime object creation loop and saw significant speed increases just by cutting out that chunk of processing. In hindsight that was a silly thing to do but it became an easy win :-) I apologise if this is what you meant by supplying a time_zone explicitly in your comment above. I can't recommend using a tool like NYTProf highly enough on a run of your tool to spot the low hanging fruit. See https://metacpan.org/module/Devel::NYTProf Cheers, Andrew
Re: DateTime performance
In the spirit of TIMTOWTDI, there's my DateTime::LazyInit module which I wrote for this sort of case. It only inflates to a full DateTime object when you call methods that aren't simple. http://search.cpan.org/~rickm/DateTime-LazyInit-1.0200/lib/DateTime/LazyInit.pm Caveat: I haven't tested it against any recent DateTime releases. Cheers! Rick Measham On 02/05/2012, at 8:29, Philipp K. Janert jan...@ieee.org wrote: Question: When using DateTime for a large number of instances, it becomes a serious performance drag. A typical application for me involves things like log files: I use DateTime to translate the timestamps in these files into a canonical format, and then get information such as day-of-week or time-of-day from DateTime. However, when working through a files with a few tens of millions of records, DateTime turns into a REAL drag on performance. Is this expected behavior? And are there access patterns that I can use to mitigate this effect? (I tried to supply a time_zone explicitly, but that does not seem to improve things significantly.) Best, Ph. -- Message protected for iSite by MailGuard: e-mail anti-virus, anti-spam and content filtering.http://www.mailguard.com.au Click here to report this message as spam: https://login.mailguard.com.au/report/1EEXMobD68/14EZiTvCo3I3sbAw7UgxdE/0 -- Message protected for iSite by MailGuard: e-mail anti-virus, anti-spam and content filtering.http://www.mailguard.com.au
Re: DateTime performance
I love and use DateTime for for 10s of millions of records at once I would be choosing Date::Calc instead and dealing with any necessary futzy bits manually. On Thu, May 3, 2012 at 2:53 AM, Rick Measham r...@measham.id.au wrote: In the spirit of TIMTOWTDI, there's my DateTime::LazyInit module which I wrote for this sort of case. It only inflates to a full DateTime object when you call methods that aren't simple. http://search.cpan.org/~rickm/DateTime-LazyInit-1.0200/lib/DateTime/LazyInit.pm Caveat: I haven't tested it against any recent DateTime releases. Cheers! Rick Measham On 02/05/2012, at 8:29, Philipp K. Janert jan...@ieee.org wrote: Question: When using DateTime for a large number of instances, it becomes a serious performance drag. A typical application for me involves things like log files: I use DateTime to translate the timestamps in these files into a canonical format, and then get information such as day-of-week or time-of-day from DateTime. However, when working through a files with a few tens of millions of records, DateTime turns into a REAL drag on performance. Is this expected behavior? And are there access patterns that I can use to mitigate this effect? (I tried to supply a time_zone explicitly, but that does not seem to improve things significantly.) Best, Ph. -- Message protected for iSite by MailGuard: e-mail anti-virus, anti-spam and content filtering.http://www.mailguard.com.au Click here to report this message as spam: https://login.mailguard.com.au/report/1EEXMobD68/14EZiTvCo3I3sbAw7UgxdE/0 -- Message protected for iSite by MailGuard: e-mail anti-virus, anti-spam and content filtering.http://www.mailguard.com.au
Re: DateTime performance
arie.ha...@gmail.com wrote: Our project requres getting time-zone offset for the given time-zone id at the current time. You can speed things up a bit by using the timezone modules in isolation. You can construct a fake DateTime class, which only provides the methods -utc_rd_as_seconds and -utc_year. Use that class to construct an object representing the current time. Then call -offset_for_datetime on a timezone object, passing in your fake DateTime. -zefram
DateTime performance
Hey! Why DateTime module is loaded so slow? This simple script that just imports DateTime is executed for 1 second approximately: use DateTime; Can I make it faster? Our project requres getting time-zone offset for the given time-zone id at the current time. This is the primary reason I use DateTime modules family. Is there any alternative for DateTime to solve the task? Thanks! Arie
Re: DateTime performance
On Fri, 23 Jan 2009, arie.ha...@gmail.com wrote: Why DateTime module is loaded so slow? This simple script that just imports DateTime is executed for 1 second approximately: use DateTime; Can I make it faster? Yes, you need a faster computer! auta...@houseabsolute:~/projects/R2$ time perl -MDateTime -e1 real 0m0.109s user 0m0.096s sys0m0.016s That's my desktop, which is a Core2 Duo of some sort. Note that once you do this once it gets much quicker because the OS keeps the data in memory until it gets paged out by something else. If you keep using it, it won't get paged out. The results above are _not_ from the first load. -dave /* http://VegGuide.org http://blog.urth.org Your guide to all that's veg House Absolute(ly Pointless) */
Re: DateTime performance
On Mon, Jan 16, 2006 at 06:21:54PM -0800, [EMAIL PROTECTED] wrote: One might hope that a script like this: test3 #!/usr/bin/perl BEGIN { no lib qw|/usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi /usr/ lib/perl5/site_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/ site_perl/5.8.4/i386-linux-thread-multi /usr/lib/perl5/site_perl/ 5.8.3/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.6 /usr/lib/ perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl/5.8.4 /usr/lib/perl5/ site_perl/5.8.3 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/ 5.8.6/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.5/i386- linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread- multi /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi /usr/ lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/ perl5/vendor_perl/5.8.4 /usr/lib/perl5/vendor_perl/5.8.3 /usr/lib/ perl5/vendor_perl /usr/lib/perl5/5.8.6/i386-linux-thread-multi /usr/ lib/perl5/5.8.6 .|; use lib qw|/usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi / usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl /usr/lib/perl5/ vendor_perl/5.8.6/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/ 5.8.6 /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.6/i386-linux- thread-multi /usr/lib/perl5/5.8.6 .|; } use DateTime; Might improve the situation. However even this has no significant improvement and from additional traces it doesn't actually stop perl from using the built in paths. Then no lib isn't doing what you want. Try just: BEGIN { @INC = grep !/5\.8\.[0-5]/, @INC }
Re: DateTime performance
On Wed, Jan 18, 2006 at 08:38:13AM -0800, [EMAIL PROTECTED] wrote: Then no lib isn't doing what you want. Agree. But, that is the point. Outside of recompiling perl with new paths or significantly altering DateTime to use far fewer dependancies nothing can really be done. test4 #!/usr/bin/perl BEGIN { @INC = grep !/5\.8\.[0-5]/, @INC } use DateTime; Do your traces show it still searching all the removed paths? There's no way the above should be doing that, unless you're loading DateTime earlier, via sitecustomize.pl or $PERL5OPT?
Re: DateTime performance
Then no lib isn't doing what you want. Agree. But, that is the point. Outside of recompiling perl with new paths or significantly altering DateTime to use far fewer dependancies nothing can really be done. test4 #!/usr/bin/perl BEGIN { @INC = grep !/5\.8\.[0-5]/, @INC } use DateTime; [EMAIL PROTECTED] tmp]$ time perl test4 real0m5.780s user0m5.524s sys 0m0.188s Matthew
Re: DateTime performance
Do your traces show it still searching all the removed paths? yes There's no way the above should be doing that, unless you're loading DateTime earlier, via sitecustomize.pl or $PERL5OPT? Neither of the items you have identified are used in any way during these tests. I would expect if either of those had been the issue then even test1 would be slow. Regards, Matthew
DateTime performance
I don't consider this to be completely a DateTime issue however I thought I would share my findings to this list for consideration. I'm using the latest release of DateTime with perl 5.8 (Standard RPM distro) for FC4 on a very old 166MHz Pentium system. So I don't expect this system to fast. Using this system if I take the following two scripts: test1 #!/usr/bin/perl test2 #!/usr/bin/perl use DateTime; The performance of them is like this: [EMAIL PROTECTED] tmp]$ time perl test1 real0m0.060s user0m0.016s sys 0m0.044s [EMAIL PROTECTED] tmp]$ time perl test2 real0m5.805s user0m5.456s sys 0m0.284s That to me is a huge performance hit to just load a module. This is the distribution of where all the time for test2 is getting spent: [EMAIL PROTECTED] tmp]$ strace -c perl test2 % time seconds usecs/call callserrors syscall -- --- --- - - 41.510.098303 209 471 420 open 32.340.076580 181 424 420 stat64 13.730.032522 290 112 read 3.740.008861 26933 old_mmap 2.200.005213 5791 3 _llseek 1.560.003700 7351 close 1.080.0025492549 1 execve 1.020.002408 623936 ioctl 0.950.002241 9324 brk 0.460.001085 121 9 mprotect 0.450.001054 6616 fstat64 0.210.000508 508 1 readlink 0.190.000446 149 3 mmap2 0.130.000301 151 2 munmap 0.090.000213 213 1 _sysctl 0.080.000181 45 4 rt_sigaction 0.050.000129 129 1 1 access 0.040.89 45 2 time 0.020.59 59 1 futex 0.020.50 50 1 fcntl64 0.020.48 48 1 getrlimit 0.020.44 44 1 set_thread_area 0.020.42 42 1 rt_sigprocmask 0.020.38 38 1 getuid32 0.020.37 37 1 set_tid_address 0.020.36 36 1 geteuid32 0.010.35 35 1 getgid32 0.010.34 34 1 getegid32 -- --- --- - - 100.000.236806 1295 880 total From this the biggest time consumer is opening and stating files followed by reading them. This also generates a lot of errors because of how FC4 has decided to support old path's in perl's default @INC. They include a lot of old directories that are empty but this forces perl to search them all anyways. So with a path of about 20+ directories most empty plus DateTime loading 100+ different files just to get started turns into a fair amount of searching and loading. One might hope that a script like this: test3 #!/usr/bin/perl BEGIN { no lib qw|/usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi /usr/ lib/perl5/site_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/ site_perl/5.8.4/i386-linux-thread-multi /usr/lib/perl5/site_perl/ 5.8.3/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.6 /usr/lib/ perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl/5.8.4 /usr/lib/perl5/ site_perl/5.8.3 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/ 5.8.6/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.5/i386- linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread- multi /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi /usr/ lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/ perl5/vendor_perl/5.8.4 /usr/lib/perl5/vendor_perl/5.8.3 /usr/lib/ perl5/vendor_perl /usr/lib/perl5/5.8.6/i386-linux-thread-multi /usr/ lib/perl5/5.8.6 .|; use lib qw|/usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi / usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl /usr/lib/perl5/ vendor_perl/5.8.6/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/ 5.8.6 /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.6/i386-linux- thread-multi /usr/lib/perl5/5.8.6 .|; } use DateTime; Might improve the situation. However even this has no significant improvement and from additional traces it doesn't actually stop perl from using the built in paths. [EMAIL PROTECTED] tmp]$ time perl test3 real0m5.721s user0m5.424s sys 0m0.216s Not that I expect anyone to fix anything about this but I just thought I would pass it along. On most fast computers today this busy work probably isn't noticed as a delay but on this box 5 sec just to get started is a very
Re: DateTime performance
Unfortunately, it's a known problem that CentOS suffers from too (@[EMAIL PROTECTED]). This also makes reading error output incredibly difficult since a full screen is given to list @INC. Instead of a few folks who are upgrading systems having to set PERL5LIB everyone else has to recompile perl or put up with the shit they fed us, funk you very much Red Hat. (I doubt they cripple gcc in a similar manner, but you never know). See also: http://www.perl.com/pub/a/2005/12/21/a_timely_start.html -- H4sICNoBwDoAA3NpZwA9jbsNwDAIRHumuC4NklvXTOD0KSJEnwU8fHz4Q8M9i3sGzkS7BBrm OkCTwsycb4S3DloZuMIYeXpLFqw5LaMhXC2ymhreVXNWMw9YGuAYdfmAbwomoPSyFJuFn2x8 Opr8bBBidcc= -- MOTD on Boomtime, the 17th of Chaos, in the YOLD 3172: Running on empty...
Re: DateTime performance
Jerrad Pierce wrote: Unfortunately, it's a known problem that CentOS suffers from too (@[EMAIL PROTECTED]). This also makes reading error output incredibly difficult since a full screen is given to list @INC. Instead of a few folks who are upgrading systems having to set PERL5LIB everyone else has to recompile perl or put up with the shit they fed us, funk you very much Red Hat. (I doubt they cripple gcc in a similar manner, but you never know). How about python. Seems like Redhat has got very snake-ish in the last 3-4 years. So if perl stands for Practical Extraction and Report Language ( not to mention my favorite Pathological Eclectic Rubbish Lister ) does python mean slowly squeeze the life out of you? See also: http://www.perl.com/pub/a/2005/12/21/a_timely_start.html Sorry to vent here. Especially bad day. Rod --
Re: DateTime Performance
John Siracusa schreef: Okay, here's a simple implementation of a memoized DateTime::Locale::load(). A solution that is more or less equivalent, is to change the DefaultLocale routine. At the moment, $DefaultLocale is saved as a string; every time DT::new() is called without a locale argument, the default locale is loaded again. It should be a bit faster than your version, because DT::Locale::load is never called in DT::new(). (Except if you specify another locale; in that case you should pass the locale object, not the locale name, if you want speed.) Probably this changes the behaviour if the default locale is aliased. But IMHO, that's probably for the better. Probably this should happen with the timezone parameter as well: change default = 'floating' to default = DateTime::TimeZone-new( name = 'floating' ) Eugene
Re: DateTime Performance
A solution that is more or less equivalent, is to change the DefaultLocale routine. [...] Probably this changes the behaviour if the default locale is aliased. But IMHO, that's probably for the better. Yeah, that was my concern: add_aliases() and friends in DateTime::Locale would have to reach back into DateTime and blank the cached locale, which seemed evil to me. But I was just thinking of preserving the existing behavior. If this is not a constraint, then I'm all for the alternative you suggested. Ack - lets not go around fiddling with caches in other namespaces. The caching mechanism should be _internal_ to DateTime::Locale. -J --
Re: DateTime Performance
On Mon, Aug 04, 2003 at 11:32:15PM -0500, Dave Rolsky wrote: Maybe that looks more sane to you? What makes no sense is for BEGIN to show up as a significant chunk of the time it would take to do anything, since this stuff should only happen once. I'm somewhat skeptical that Devel::DProf is working, or works properly at all in general. It's working okay but unhelpfully... Devel::DProf records the name of the sub only when it's first called (naturally, for performance reasons). The problem with BEGIN blocks is that they're called once *then freed* and then the same address is then reused for the next sub definition. Probably easy to fix but I've never had the time. Tim.
Re: DateTime Performance
On 8/4/03 12:26 AM, Dave Rolsky wrote: # ... includes args: year, month, day, hour, minute, second DateTime-new(...): 16 wallclock secs @ 687.29/s (14.48 usr + 0.07 sys = 14.55 CPU) This does a lot of work, including calculating both UTC local times, which involves calculating leap seconds, etc. Does it need to do that? I mean, sure, eventually it might have to do that if I want to do some sort of date manipulation, or even just fetch or print the date. But does it have to really do anything at all during object construction other than stash the args somewhere? DateTime-now(): 21 wallclock secs @ 547.95/s (18.13 usr + 0.12 sys = 18.25 CPU) Ditto. I'm assuming now() is slower than new() due to the system call overhead of getting the current time...? Total Elapsed Time = 19.91729 Seconds User+System Time = 14.60729 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 27.6 4.035 4.685 20274 0.0002 0.0002 Params::Validate::_validate 24.0 3.510 17.549 1 0.0004 0.0018 DateTime::new 18.9 2.770 3.809 10001 0.0003 0.0004 DateTime::Locale::_load_class_from_id This seems quite odd. It really doesn't do much. 8.96 1.309 2.647 10020 0.0001 0.0003 DateTime::TimeZone::BEGIN And this is completely mystifying. Can you show us your code? Sure, here it is: for(1 .. 1) { my $d = DateTime-new(year = 200, month = 1, day = 1, hour = 2, minute = 3, second = 4); } Those stats were produced on a G3/400 running a development release of OS X that uses some build of Perl 5.8.1, which could explain some oddness. Here is the same code run on a G4/800 using Perl 5.8.0 on the latest released version of OS X 10.2: Total Elapsed Time = 8.817281 Seconds User+System Time = 5.352659 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 60.4 3.236 10.844 1 0.0003 0.0011 DateTime::new 44.7 2.395 3.305 10001 0.0002 0.0003 DateTime::Locale::_load_class_from_id 43.3 2.318 2.127 20274 0.0001 0.0001 Params::Validate::_validate 22.5 1.207 1.095 10001 0.0001 0.0001 DateTime::Locale::Base::new 18.4 0.987 1.223 10020 0.0001 0.0001 DateTime::TimeZone::BEGIN 17.5 0.939 0.465 5 0. 0. DateTime::__ANON__ 15.2 0.818 0.645 10002 0.0001 0.0001 DateTime::_calc_local_components 12.8 0.687 1.025 10002 0.0001 0.0001 DateTime::_calc_local_rd 10.6 0.568 0.525 10002 0.0001 0.0001 DateTime::_calc_utc_rd 8.20 0.439 0.225 10002 0. 0. DateTime::_normalize_seconds 7.83 0.419 0.275 1 0. 0. DateTime::_last_day_of_month 7.47 0.400 0.115 30006 0. 0. DateTime::TimeZone::Floating::is_floating 7.27 0.389 3.505 10001 0. 0.0004 DateTime::Locale::load 5.79 0.310 0.214 10006 0. 0. DateTime::TimeZone::Floating::BEGIN 4.86 0.260 0.070 20004 0. 0. DateTime::TimeZone::OffsetOnly::is_utc Maybe that looks more sane to you? So, what does everyone else think of the object creation performance situation? Is it simply good enough to be 3x faster that Date::Manip::ParseDate()? Are there any obvious areas that I should consider before I start mucking around with DateTime::new()? Considering that up til now my concern has been primarily on getting things correct, I wouldn't worry about it. There are definitely some big performance improvements possible. One possibility is to move the leap second bits into the DateTime XS code, which should help a lot. The timezone stuff can also benefit from being rewritten as XS, but that won't help the particular cases you benchmarked, since the UTC and floating time zones are quite fast already. What about what I mentioned earlier: lazy (or lazier) evaluation in the constructor? Basically, I want construction with known values to be as fast as possible since there's a chance I may not even look at the date fields of my objects. But it's a hassle to have special-case code that either doesn't fetch or doesn't set the date fields of my objects, just so I can avoid the relatively expansive calls to DateTime-new() -John
Re: DateTime Performance
On 8/4/03 10:10 AM, John Siracusa wrote: On 8/4/03 12:26 AM, Dave Rolsky wrote: # ... includes args: year, month, day, hour, minute, second DateTime-new(...): 16 wallclock secs @ 687.29/s (14.48 usr + 0.07 sys = 14.55 CPU) This does a lot of work, including calculating both UTC local times, which involves calculating leap seconds, etc. Does it need to do that? I mean, sure, eventually it might have to do that if I want to do some sort of date manipulation, or even just fetch or print the date. But does it have to really do anything at all during object construction other than stash the args somewhere? I played around with DateTime::new() and found that the biggest culprit is this line: $self-{locale} = DateTime::Locale-load( $p{locale} ); The removal of which more than doubles the performance of calling DateTime::new(...) with ymdhms args. The only way to get a comparable speedup is to remove every line below that one except for these two: bless $self, $class; return $self; And even that only gives a ~90% speedup vs. the 100%+ gained by ditching DateTime::Locale-load(). (Obviously all of this will hose DateTime's actual functionality, but bear with me :) Profiling showed that DateTime::Locale::_load_class_from_id() was being called N+1 times during N calls to DateTime-new(...), and that it was #3 in the dprofpp list (2000 iterations shown): %Time ExclSec CumulS #Calls sec/call Csec/c Name 47.8 0.663 2.135 2000 0.0003 0.0011 DateTime::new 35.2 0.488 0.399 4274 0.0001 0.0001 Params::Validate::_validate 31.6 0.439 0.517 2001 0.0002 0.0003 DateTime::Locale::_load_class_from_id 15.8 0.219 0.313 2020 0.0001 0.0002 DateTime::TimeZone::BEGIN I found that _load_class_from_id() unconditionally executes this code: eval require $real_class; Skipping that line was good for a 30%+ speed boost, but that got me thinking...aren't the Locale objects loaded/created by _load_class_from_id() singletons? Replacing calls to _load_class_from_id() within DateTime::Locale::load() with some dumb caching like this: $Cache_By_Id{$id} ||= $class-_load_from_id($id) Resulted in an easy 50% speed-up for DateTime-new(...), and _load_class_from_id() dropped completely off the dprofpp output: Total Elapsed Time = 0.841889 Seconds User+System Time = 0.501889 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 116. 0.584 1.290 2000 0.0003 0.0006 DateTime::new 79.3 0.398 0.287 4274 0.0001 0.0001 Params::Validate::_validate 41.6 0.209 0.220 2002 0.0001 0.0001 DateTime::_calc_local_rd 37.6 0.189 0.238 2020 0.0001 0.0001 DateTime::TimeZone::BEGIN 31.6 0.159 0.150 2002 0.0001 0.0001 DateTime::_calc_utc_rd 27.8 0.140 0.070 2002 0.0001 0. DateTime::_calc_local_components 25.9 0.130 0.030 1 0. 0. DateTime::__ANON__ 17.9 0.090 0.070 2001 0. 0. DateTime::DefaultLocale 15.9 0.080 0.040 4004 0. 0. DateTime::TimeZone::OffsetOnly::is_utc 15.9 0.080 0.030 2000 0. 0. DateTime::_last_day_of_month 15.9 0.080 0.040 2002 0. 0. DateTime::_normalize_seconds 13.9 0.070 0.010 6006 0. 0. DateTime::TimeZone::Floating::is_floating 13.9 0.070 0.069 2006 0. 0. DateTime::TimeZone::Floating::BEGIN 11.3 0.057 0.115 1 0.0573 0.1145 DateTime::Locale::register 7.97 0.040 0.154 6 0.0067 0.0257 DateTime::Locale::BEGIN (An aside: why is DateTime::DefaultLocale on this list at all?) To test my theory that this kind of dumb caching is valid, I ran all of DateTime::Locale's tests, and then ran DateTime's tests while using the modified DateTime::Locale. Everything passed. So, assuming I'm not missing a finer point here, I'm thinking that one easy speed-up for DateTime object creation would be to make the various DateTime::Locale::* classes into singletons (using whatever the proper method is for this in the DT project) and avoid repeated string evals and repeated calls to _load_class_from_id(). Going further, if calls to DateTime::Locale-load(...) could be memoized safely, that'd be great too :) -John
Re: DateTime Performance
On 8/4/03 1:25 PM, Ben Bennett wrote: Why not make your module be lazy about whether or not it creates a DateTime? I thought of that, but I also use the act of creating a DateTime object to check the validity of date attributes. Anyway, I think there's room for DateTime-new() optimization even without adding lazy evaluation (see earlier posts). -John
DateTime Performance
I was profiling a database-backed mod_perl application recently. A particular request was taking several seconds to complete. At first I thought the database was the bottleneck, but the request included only one database query, and that query completed in about 300msec when run from a command-line script. Something Perl-ish was the culprit, so I set out to find it. This task was made more difficult by my inability to get Devel::DProf working in Mac OS X (see my posts to the mod_perl and [EMAIL PROTECTED] lists), so I had to resort to the use of Time::HiRes and a smattering of calls to my own simple timer routines. I eventually narrowed the time-suck down to a loop that looked something like this: # bind columns to %row here while($sth-fetch) { push(@widgets, Widget-new(%row)); } Now I suspected some sort of DBI issue, so I replaced the loop body with a no-op. Suddenly, the request completed in one second or less. Now I suspected my Widget class, and benchmarked its constructor offline. (The constructor just calls $self-$key($value) for each k/v pair in %row.) This eventually led me to find that setting the date fields in the Widget object was the culprit. I use DateTime objects for my internal date representation, but I have a set of wrapper functions that hide this fact. Now I suspected that my date parsing wrapper code was the problem, so I replaced my parse function's body with a simple call to DateTime-now. The request became slow again, taking several seconds to complete. There was no avoiding it: the bottleneck for my web app was not the database, not HTML::Mason, not my object classes, not even my date parsing code, but DateTime object creation! (Perl 5.8, latest DateTime from CPAN.) My quick fix was to make sure that %row only contains a single date field, rather than the four that each object has when completely filled out. This produced a noticeable (~2x) speed increase for the whole request. Sorry to provide so many gory details, but I wanted to try to establish exactly how I'm using DateTime, and how its performance came to my attention in the first place. I benchmarked DateTime's object creation speed against a few random classes, just to get a feel for where it stands: CGI-new(''): 5 wallclock secs @ 1869.16/s (5.25 usr + 0.10 sys = 5.35 CPU) Date::Manip::ParseDate('now'): 49 wallclock secs @ 223.81/s (44.44 usr 0.24 sys + 0.01 cusr 0.01 csys = 44.70 CPU) Date::Simple-new('2003-01-01'): 2 wallclock secs @ 4273.50/s (2.31 usr + 0.03 sys = 2.34 CPU) # ... includes args: year, month, day, hour, minute, second DateTime-new(...): 16 wallclock secs @ 687.29/s (14.48 usr + 0.07 sys = 14.55 CPU) DateTime-now(): 21 wallclock secs @ 547.95/s (18.13 usr + 0.12 sys = 18.25 CPU) DateTime does well against Date::Manip, but not so well against even a big module like CGI. But for object creation alone, should it really be ~5x as slow as Date::Simple? My final step was to profile 10,000 calls to DateTime-new(...) using Devel::DProf (which works from the command line in OS X). dprofpp had this to say: Total Elapsed Time = 19.91729 Seconds User+System Time = 14.60729 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 27.6 4.035 4.685 20274 0.0002 0.0002 Params::Validate::_validate 24.0 3.510 17.549 1 0.0004 0.0018 DateTime::new 18.9 2.770 3.809 10001 0.0003 0.0004 DateTime::Locale::_load_class_from _id 8.96 1.309 2.647 10020 0.0001 0.0003 DateTime::TimeZone::BEGIN 6.44 0.940 1.030 10001 0.0001 0.0001 DateTime::Locale::Base::new 6.23 0.910 1.190 10002 0.0001 0.0001 DateTime::_calc_local_components 4.45 0.650 0.650 5 0. 0. DateTime::__ANON__ 3.90 0.570 1.009 10002 0.0001 0.0001 DateTime::_calc_utc_rd 2.88 0.420 0.490 1 0. 0. DateTime::_last_day_of_month 2.67 0.390 0.399 10006 0. 0. DateTime::TimeZone::Floating::BEGI N 2.40 0.350 1.619 10002 0. 0.0002 DateTime::_calc_local_rd 1.92 0.280 0.299 10001 0. 0. DateTime::DefaultLocale 1.64 0.240 0.240 30006 0. 0. DateTime::TimeZone::Floating::is_f loating 1.51 0.220 0.220 1 0. 0. DateTime::_rd2ymd 1.37 0.200 4.009 10001 0. 0.0004 DateTime::Locale::load These numbers confuse me a bit, because I'm only creating about 30 Widget objects in my mod_perl request, not 10,000. But I see a very significant speed hit, even if I replace my entire Widget-new() call with a simple call to DateTime-new(). Maybe it's some sort of mod_perl/DateTime interaction? Anyway, I don't want to get sidetracked into mod_perl stuff. I'm not sure what (else) to make of the results above, other than a possible wish that I could
Re: DateTime Performance
On Sun, 3 Aug 2003, John Siracusa wrote: CGI-new(''): 5 wallclock secs @ 1869.16/s (5.25 usr + 0.10 sys = 5.35 CPU) CGI's constructor really doesn't do much at all, especially if there's no query string or form submission to handle. Date::Simple-new('2003-01-01'): 2 wallclock secs @ 4273.50/s (2.31 usr + 0.03 sys = 2.34 CPU) This also doesn't really do much of anything. # ... includes args: year, month, day, hour, minute, second DateTime-new(...): 16 wallclock secs @ 687.29/s (14.48 usr + 0.07 sys = 14.55 CPU) This does a lot of work, including calculating both UTC local times, which involves calculating leap seconds, etc. DateTime-now(): 21 wallclock secs @ 547.95/s (18.13 usr + 0.12 sys = 18.25 CPU) Ditto. DateTime does well against Date::Manip, but not so well against even a big module like CGI. But for object creation alone, should it really be ~5x as slow as Date::Simple? Yeah, probably. Total Elapsed Time = 19.91729 Seconds User+System Time = 14.60729 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 27.6 4.035 4.685 20274 0.0002 0.0002 Params::Validate::_validate 24.0 3.510 17.549 1 0.0004 0.0018 DateTime::new 18.9 2.770 3.809 10001 0.0003 0.0004 DateTime::Locale::_load_class_from_id This seems quite odd. It really doesn't do much. 8.96 1.309 2.647 10020 0.0001 0.0003 DateTime::TimeZone::BEGIN And this is completely mystifying. Can you show us your code? These numbers confuse me a bit, because I'm only creating about 30 Widget objects in my mod_perl request, not 10,000. But I see a very significant speed hit, even if I replace my entire Widget-new() call with a simple call to DateTime-new(). Maybe it's some sort of mod_perl/DateTime interaction? No, DateTime just does a lot of stuff. Anyway, I don't want to get sidetracked into mod_perl stuff. I'm not sure what (else) to make of the results above, other than a possible wish that I could turn off Params::Validate's validation in certain performance-critical situations. You can turn it off for everything by setting the PERL_NO_VALIDATION environment variable to true. There's no way to turn it off and on at runtime currently, though this could be added. So, what does everyone else think of the object creation performance situation? Is it simply good enough to be 3x faster that Date::Manip::ParseDate()? Are there any obvious areas that I should consider before I start mucking around with DateTime::new()? Considering that up til now my concern has been primarily on getting things correct, I wouldn't worry about it. There are definitely some big performance improvements possible. One possibility is to move the leap second bits into the DateTime XS code, which should help a lot. The timezone stuff can also benefit from being rewritten as XS, but that won't help the particular cases you benchmarked, since the UTC and floating time zones are quite fast already. -dave /*=== House Absolute Consulting www.houseabsolute.com ===*/