Re: DateTime performance

2012-05-03 Thread Ashley Pond V
I love and use DateTime for for 10s of millions of records at once I
would be choosing Date::Calc instead and dealing with any necessary
futzy bits manually.

On Thu, May 3, 2012 at 2:53 AM, Rick Measham  wrote:
> In the spirit of TIMTOWTDI, there's my DateTime::LazyInit module which I 
> wrote for this sort of case. It only inflates to a full DateTime object when 
> you call methods that aren't "simple".
>
> http://search.cpan.org/~rickm/DateTime-LazyInit-1.0200/lib/DateTime/LazyInit.pm
>
> Caveat: I haven't tested it against any recent DateTime releases.
>
> Cheers!
> Rick Measham
> 📱
>
> On 02/05/2012, at 8:29, "Philipp K. Janert"  wrote:
>
>>
>> Question:
>>
>> When using DateTime for a large number of
>> instances, it becomes a serious performance
>> drag.
>>
>> A typical application for me involves things like
>> log files: I use DateTime to translate the timestamps
>> in these files into a canonical format, and then get
>> information such as "day-of-week" or "time-of-day"
>> from DateTime.
>>
>> However, when working through a files with a few
>> tens of millions of records, DateTime turns into a
>> REAL drag on performance.
>>
>> Is this expected behavior? And are there access
>> patterns that I can use to mitigate this effect?
>> (I tried to supply a time_zone explicitly, but that
>> does not seem to improve things significantly.)
>>
>> Best,
>>
>>        Ph.
>>
>> --
>> Message  protected for iSite by MailGuard: e-mail anti-virus, anti-spam and 
>> content filtering.http://www.mailguard.com.au
>> Click here to report this message as spam:
>> https://login.mailguard.com.au/report/1EEXMobD68/14EZiTvCo3I3sbAw7UgxdE/0
>>
> --
> Message  protected for iSite by MailGuard: e-mail anti-virus, anti-spam and 
> content filtering.http://www.mailguard.com.au
>


Re: DateTime performance

2012-05-03 Thread Rick Measham
In the spirit of TIMTOWTDI, there's my DateTime::LazyInit module which I wrote 
for this sort of case. It only inflates to a full DateTime object when you call 
methods that aren't "simple". 

http://search.cpan.org/~rickm/DateTime-LazyInit-1.0200/lib/DateTime/LazyInit.pm

Caveat: I haven't tested it against any recent DateTime releases. 

Cheers!
Rick Measham
📱

On 02/05/2012, at 8:29, "Philipp K. Janert"  wrote:

> 
> Question:
> 
> When using DateTime for a large number of
> instances, it becomes a serious performance
> drag. 
> 
> A typical application for me involves things like
> log files: I use DateTime to translate the timestamps 
> in these files into a canonical format, and then get 
> information such as "day-of-week" or "time-of-day" 
> from DateTime. 
> 
> However, when working through a files with a few 
> tens of millions of records, DateTime turns into a 
> REAL drag on performance.
> 
> Is this expected behavior? And are there access
> patterns that I can use to mitigate this effect? 
> (I tried to supply a time_zone explicitly, but that
> does not seem to improve things significantly.)
> 
> Best,
> 
>Ph.
> 
> -- 
> Message  protected for iSite by MailGuard: e-mail anti-virus, anti-spam and 
> content filtering.http://www.mailguard.com.au
> Click here to report this message as spam:
> https://login.mailguard.com.au/report/1EEXMobD68/14EZiTvCo3I3sbAw7UgxdE/0
> 
-- 
Message  protected for iSite by MailGuard: e-mail anti-virus, anti-spam and 
content filtering.http://www.mailguard.com.au



RE: DateTime performance

2012-05-03 Thread Andrew O'Brien
> From: Philipp K. Janert [mailto:jan...@ieee.org]
> Sent: Wednesday, 2 May 2012 8:29 AM
> 
> Question:
> 
> When using DateTime for a large number of
> instances, it becomes a serious performance
> drag.
...
> Is this expected behavior? And are there access
> patterns that I can use to mitigate this effect?
> (I tried to supply a time_zone explicitly, but that
> does not seem to improve things significantly.)

Hi Phillip,

My #1 tip is to pre-prepare/cache the DateTime::TimeZone object and pass it in 
to each creation of a DateTime object (via whatever mechanism you're using to 
do that). I have seen a case where we were using time_zone => 'local' in a 
reasonably tight datetime object creation loop and saw significant speed 
increases just by cutting out that chunk of processing.

In hindsight that was a silly thing to do but it became an easy win :-)

I apologise if this is what you meant by supplying a time_zone explicitly in 
your comment above.

I can't recommend using a tool like NYTProf highly enough on a run of your tool 
to spot the low hanging fruit. See https://metacpan.org/module/Devel::NYTProf

Cheers,

Andrew


Re: DateTime performance

2012-05-03 Thread Michael G Schwern
On 2012.5.1 3:29 PM, Philipp K. Janert wrote:
> However, when working through a files with a few 
> tens of millions of records, DateTime turns into a 
> REAL drag on performance.
> 
> Is this expected behavior? And are there access
> patterns that I can use to mitigate this effect? 
> (I tried to supply a time_zone explicitly, but that
> does not seem to improve things significantly.)

Unfortunately due to the way DateTime is architected it does a lot of
precalculation upon object instantiation which is usually not used.  So yes,
it is expected in that sense.

If all you need is date objects with a sensible interface, try
DateTimeX::Lite.  It claims to replicate a good chunk of the DateTime
interface in a fraction of the memory.

Given how much time it takes to make a DateTime object, and your scale of tens
of millions of records, you could cache DateTime objects for each timestamp
and use clone() to get a new instance.

sub get_datetime {
my $timestamp = shift;

state $cache = {};

if( defined $cache->{$timestamp} ) {
return $cache->{$timestamp}->clone;
}
else {
$cache->{$timestamp} = make_datetime_from_timestamp($timestamp);
return $cache->{$timestamp};
}
}


-- 
100. Claymore mines are not filled with yummy candy, and it is wrong
 to tell new soldiers that they are.
-- The 213 Things Skippy Is No Longer Allowed To Do In The U.S. Army
   http://skippyslist.com/list/


DateTime performance

2012-05-03 Thread Philipp K. Janert

Question:

When using DateTime for a large number of
instances, it becomes a serious performance
drag. 

A typical application for me involves things like
log files: I use DateTime to translate the timestamps 
in these files into a canonical format, and then get 
information such as "day-of-week" or "time-of-day" 
from DateTime. 

However, when working through a files with a few 
tens of millions of records, DateTime turns into a 
REAL drag on performance.

Is this expected behavior? And are there access
patterns that I can use to mitigate this effect? 
(I tried to supply a time_zone explicitly, but that
does not seem to improve things significantly.)

Best,

Ph.