Re: DateTime performance
I love and use DateTime for for 10s of millions of records at once I would be choosing Date::Calc instead and dealing with any necessary futzy bits manually. On Thu, May 3, 2012 at 2:53 AM, Rick Measham wrote: > In the spirit of TIMTOWTDI, there's my DateTime::LazyInit module which I > wrote for this sort of case. It only inflates to a full DateTime object when > you call methods that aren't "simple". > > http://search.cpan.org/~rickm/DateTime-LazyInit-1.0200/lib/DateTime/LazyInit.pm > > Caveat: I haven't tested it against any recent DateTime releases. > > Cheers! > Rick Measham > 📱 > > On 02/05/2012, at 8:29, "Philipp K. Janert" wrote: > >> >> Question: >> >> When using DateTime for a large number of >> instances, it becomes a serious performance >> drag. >> >> A typical application for me involves things like >> log files: I use DateTime to translate the timestamps >> in these files into a canonical format, and then get >> information such as "day-of-week" or "time-of-day" >> from DateTime. >> >> However, when working through a files with a few >> tens of millions of records, DateTime turns into a >> REAL drag on performance. >> >> Is this expected behavior? And are there access >> patterns that I can use to mitigate this effect? >> (I tried to supply a time_zone explicitly, but that >> does not seem to improve things significantly.) >> >> Best, >> >>     Ph. >> >> -- >> Message  protected for iSite by MailGuard: e-mail anti-virus, anti-spam and >> content filtering.http://www.mailguard.com.au >> Click here to report this message as spam: >> https://login.mailguard.com.au/report/1EEXMobD68/14EZiTvCo3I3sbAw7UgxdE/0 >> > -- > Message  protected for iSite by MailGuard: e-mail anti-virus, anti-spam and > content filtering.http://www.mailguard.com.au >
Re: DateTime performance
In the spirit of TIMTOWTDI, there's my DateTime::LazyInit module which I wrote for this sort of case. It only inflates to a full DateTime object when you call methods that aren't "simple". http://search.cpan.org/~rickm/DateTime-LazyInit-1.0200/lib/DateTime/LazyInit.pm Caveat: I haven't tested it against any recent DateTime releases. Cheers! Rick Measham 📱 On 02/05/2012, at 8:29, "Philipp K. Janert" wrote: > > Question: > > When using DateTime for a large number of > instances, it becomes a serious performance > drag. > > A typical application for me involves things like > log files: I use DateTime to translate the timestamps > in these files into a canonical format, and then get > information such as "day-of-week" or "time-of-day" > from DateTime. > > However, when working through a files with a few > tens of millions of records, DateTime turns into a > REAL drag on performance. > > Is this expected behavior? And are there access > patterns that I can use to mitigate this effect? > (I tried to supply a time_zone explicitly, but that > does not seem to improve things significantly.) > > Best, > >Ph. > > -- > Message protected for iSite by MailGuard: e-mail anti-virus, anti-spam and > content filtering.http://www.mailguard.com.au > Click here to report this message as spam: > https://login.mailguard.com.au/report/1EEXMobD68/14EZiTvCo3I3sbAw7UgxdE/0 > -- Message protected for iSite by MailGuard: e-mail anti-virus, anti-spam and content filtering.http://www.mailguard.com.au
RE: DateTime performance
> From: Philipp K. Janert [mailto:jan...@ieee.org] > Sent: Wednesday, 2 May 2012 8:29 AM > > Question: > > When using DateTime for a large number of > instances, it becomes a serious performance > drag. ... > Is this expected behavior? And are there access > patterns that I can use to mitigate this effect? > (I tried to supply a time_zone explicitly, but that > does not seem to improve things significantly.) Hi Phillip, My #1 tip is to pre-prepare/cache the DateTime::TimeZone object and pass it in to each creation of a DateTime object (via whatever mechanism you're using to do that). I have seen a case where we were using time_zone => 'local' in a reasonably tight datetime object creation loop and saw significant speed increases just by cutting out that chunk of processing. In hindsight that was a silly thing to do but it became an easy win :-) I apologise if this is what you meant by supplying a time_zone explicitly in your comment above. I can't recommend using a tool like NYTProf highly enough on a run of your tool to spot the low hanging fruit. See https://metacpan.org/module/Devel::NYTProf Cheers, Andrew
Re: DateTime performance
On 2012.5.1 3:29 PM, Philipp K. Janert wrote: > However, when working through a files with a few > tens of millions of records, DateTime turns into a > REAL drag on performance. > > Is this expected behavior? And are there access > patterns that I can use to mitigate this effect? > (I tried to supply a time_zone explicitly, but that > does not seem to improve things significantly.) Unfortunately due to the way DateTime is architected it does a lot of precalculation upon object instantiation which is usually not used. So yes, it is expected in that sense. If all you need is date objects with a sensible interface, try DateTimeX::Lite. It claims to replicate a good chunk of the DateTime interface in a fraction of the memory. Given how much time it takes to make a DateTime object, and your scale of tens of millions of records, you could cache DateTime objects for each timestamp and use clone() to get a new instance. sub get_datetime { my $timestamp = shift; state $cache = {}; if( defined $cache->{$timestamp} ) { return $cache->{$timestamp}->clone; } else { $cache->{$timestamp} = make_datetime_from_timestamp($timestamp); return $cache->{$timestamp}; } } -- 100. Claymore mines are not filled with yummy candy, and it is wrong to tell new soldiers that they are. -- The 213 Things Skippy Is No Longer Allowed To Do In The U.S. Army http://skippyslist.com/list/
DateTime performance
Question: When using DateTime for a large number of instances, it becomes a serious performance drag. A typical application for me involves things like log files: I use DateTime to translate the timestamps in these files into a canonical format, and then get information such as "day-of-week" or "time-of-day" from DateTime. However, when working through a files with a few tens of millions of records, DateTime turns into a REAL drag on performance. Is this expected behavior? And are there access patterns that I can use to mitigate this effect? (I tried to supply a time_zone explicitly, but that does not seem to improve things significantly.) Best, Ph.