Re: DateTime::TimeZone issues...
I don't think the memory usage of DT::TZ is really that excessive. It's been over 3 years since I've had a sever with less then 1GB of ram in it. That's easily enough memory for 64 x 12.5MB apache children that aren't doing page sharing. -J --
Re: DateTime::TimeZone issues...
On Thu, 13 Nov 2003, Matt Sisk wrote: > Now I'm starting to think we can have our cake and eat it too vis-a-vis > unique key generation for the spans. > > If you don't mind, I'll take a crack at the templating in the tz module > generation script to construct the modules sharing the common data > structure. Go for it. -dave /*=== House Absolute Consulting www.houseabsolute.com ===*/
Re: DateTime::TimeZone issues...
Now I'm starting to think we can have our cake and eat it too vis-a-vis unique key generation for the spans. If you don't mind, I'll take a crack at the templating in the tz module generation script to construct the modules sharing the common data structure. No __DATA__ or external files will be required -- we'll just undef our private data once we're done populating the common structure when the module loads. Matt
Re: DateTime::TimeZone issues...
Matt Sisk wrote: If you wanted to avoid the up-front cost, as well as the cost of unique key generation at compile/runtime, another option would be to have the program that generates the modules from Olson data pre-generate unique keys for each span. Then have a 'status' hash in each TZ-specific module that informs whether the module-specific spans have been inflated or not. If not, inflate them and store them in the common pool. Spans on demand, and only at the cost of a small list of unique keys in each TZ module. I forgot to add...the structure of each span would obviously have to be stored somewhere. __DATA__ might not suffice since it takes up memory, but perhaps a flat text file containing all spans would suffice. Then the unique keys could simply be the offsets into that file. Matt
Re: DateTime::TimeZone issues...
Dave Rolsky wrote: But that only applies when you load _all_ the zones How would these be shared if you only wanted to load 10-20 zones, or even 150 zones? It seems like the overhead of determining what is shared would outweigh the savings. As we saw in the beginning of this thread, there are some cases out there where people are faced with loading many, if not all, of the modules. This might be a minority of users, but I'd hate to see the problem become an impedement to the adoption of DateTime. Do you have any idea of how to implement this in a way that doesn't require all the zones to be loaded up front As for overhead and implementation. Perhaps a BEGIN block in each TZ module could populate the common data structure with each span array. I'm assuming that would be a hash, and the main thing required would be an efficient way to generate a unique key for each span. Then the same block would populate the private module-specific data structure with references to the shared span arrays. Once the data structure is built (compile time) there should be no further impacts on efficiency, but there is the extra cost of having to generate unique keys for each span at compile time. If you wanted to avoid the up-front cost, as well as the cost of unique key generation at compile/runtime, another option would be to have the program that generates the modules from Olson data pre-generate unique keys for each span. Then have a 'status' hash in each TZ-specific module that informs whether the module-specific spans have been inflated or not. If not, inflate them and store them in the common pool. Spans on demand, and only at the cost of a small list of unique keys in each TZ module. I'm probably missing some crucial detail, but on the surface of it I like the smell of the second option since it has virtually no additional performance impact from today's behavior. Matt
Re: DateTime::TimeZone issues...
On Thu, 13 Nov 2003, Matt Sisk wrote: > Matt Sisk wrote: > > I have not verified this, but IF there is a lot of overlap of spans > > between various timezones, perhaps a 'span registry' could be shared > > between all the zone modules, thereby avoiding duplication of span objects. > > I just ran a quick check on TZ 0.2505: > > TZ module count: 367 > Span count: 16969 > Crunched: 9333 > > If you reuse duplicat span arrays by reference, you shave the memory > footprint by approximately 45%. But that only applies when you load _all_ the zones How would these be shared if you only wanted to load 10-20 zones, or even 150 zones? It seems like the overhead of determining what is shared would outweigh the savings. Do you have any idea of how to implement this in a way that doesn't require all the zones to be loaded up front, because I can't think of one. I like the idea of sharing the data somehow though. -dave /*=== House Absolute Consulting www.houseabsolute.com ===*/
Re: DateTime::TimeZone issues...
Here's the script I used if you want to verify... #!/usr/bin/perl $sd = '/usr/lib/perl5/site_perl/5.8.0/DateTime/TimeZone'; my %Spans; use File::Find; use Data::Dumper; -d $sd or die "oops\n"; my $span_count = 0; my $file_count = 0; find (\&wanted, $sd); print "TZ module count: $file_count\n"; print " Span count: $span_count\n"; print " Crunched: ", scalar keys %Spans, "\n"; exit; sub wanted { return unless /\.pm$/; my $name = $File::Find::name; open(M, "<$name") or die "Problem opening $name\n"; my $olson = 0; my $spans = 0; my $str; my $span; while () { ++$olson if /Olson data version/; next unless $olson; if (/my\s+\$spans/) { ++$spans; ++$file_count; next; } $str .= $_ if $spans; if (/\];/) { $spans = $olson = 0; eval "\$span = $str"; print "oops: [EMAIL PROTECTED]" if $@; foreach (@$span) { ++$span_count; my $str = Dumper($_); ++$Spans{$str}; } $str = ''; } } } #
Re: DateTime::TimeZone issues...
Matt Sisk wrote: I have not verified this, but IF there is a lot of overlap of spans between various timezones, perhaps a 'span registry' could be shared between all the zone modules, thereby avoiding duplication of span objects. I just ran a quick check on TZ 0.2505: TZ module count: 367 Span count: 16969 Crunched: 9333 If you reuse duplicat span arrays by reference, you shave the memory footprint by approximately 45%. Matt
Re: DateTime::TimeZone issues...
On Thu, 13 Nov 2003, Matt Sisk wrote: > > Since the time zone classes are generated, it'd be possible to generate XS > > code instead of Perl. Patches or a shipment of tuits would be extremely > > welcome. > > The timezone modules use lots of spans, correct? No, it's just a big data structure (an array of arrays). Each part of the big array represents a span, but it's not actually a DateTime::Span object. That wouldn't provide any added functionality in this case. -dave /*=== House Absolute Consulting www.houseabsolute.com ===*/
Re: DateTime::TimeZone issues...
Dave Rolsky wrote: I do a heck of a lot of web dev, and I've used DateTime and the time zone classes without any problems. For example, you could just preload the time zones that your current users are using, which would almost certainly be a small fraction of all possible zones. Since the time zone classes are generated, it'd be possible to generate XS code instead of Perl. Patches or a shipment of tuits would be extremely welcome. The timezone modules use lots of spans, correct? I have not verified this, but IF there is a lot of overlap of spans between various timezones, perhaps a 'span registry' could be shared between all the zone modules, thereby avoiding duplication of span objects. Of course if there is no overlap of spans then the effort would be pointless. Matt
Re: DateTime::TimeZone issues...
On Thu, 13 Nov 2003, Joshua Hoblitt wrote: > > Seriously, I'd like to eventually speed up/slim down the time zone stuff > > but just getting it working took an enormous amount of development effort. > > Making a super-fast whiz-bang version that still works is not trivial. > > Maybe we should ask around to try and determine the level of interest in > using DT::TZ::* under mod_perl. If it's high we can apply for a > linuxfund or TPF grant to fund Dave. Of course those that have a > _financial_ interest in this can fund Dave directly... There's definitely interest. I do a lot of mod_perl app development myself, and I know that there are various other web app type places using the DateTime code. -dave /*=== House Absolute Consulting www.houseabsolute.com ===*/
Re: DateTime::TimeZone issues...
> Seriously, I'd like to eventually speed up/slim down the time zone stuff > but just getting it working took an enormous amount of development effort. > Making a super-fast whiz-bang version that still works is not trivial. Maybe we should ask around to try and determine the level of interest in using DT::TZ::* under mod_perl. If it's high we can apply for a linuxfund or TPF grant to fund Dave. Of course those that have a _financial_ interest in this can fund Dave directly... -J --
Re: DateTime::TimeZone issues...
On Thu, 13 Nov 2003, Rob Mueller wrote: > The only way really to provide a fast (to initialize, and access) timezone > DB is to either provide a DB (e.g. CDB, SDBM, etc) with the module, or have > something in the make process that creates such a DB based on the DBM > modules available on the user's system (or, as you mentioned, use structs > with XS). Sounds great. I welcome patches and/or funding and/or a time machine so I can develop these ;) Seriously, I'd like to eventually speed up/slim down the time zone stuff but just getting it working took an enormous amount of development effort. Making a super-fast whiz-bang version that still works is not trivial. -dave /*=== House Absolute Consulting www.houseabsolute.com ===*/
Re: DateTime::TimeZone issues...
<...> > > Having a look at the code, I noticed that each timezone has it's own class, > > and also a lot of data in perl structures. I'm not really sure why the > > timezone classes were developed this way, it seems fine for a simple case > > where you only need a couple of timezones, but in a case where you can > > possibly be using ANY timezone in the same script, it seems a HUGE overhead > > in memory and time to have to load all those structures into memory. > > Patches welcome ;) That's probably how I would have responded too... ;-) <...> > > We're fine using our POSIX solution now, but I thought you folks might be > > interested in this feedback - we chatted about it with Rick Measham at Perl > > Mongers yesterday and he asked if we could provide this summary. It's > > probably a good idea to keep web app developers in mind as you develop the > > DateTime namespace, since it's a place where a lot of date/time calcuations > > in Perl are required. > > I do a heck of a lot of web dev, and I've used DateTime and the time zone > classes without any problems. > > For example, you could just preload the time zones that your current users > are using, which would almost certainly be a small fraction of all > possible zones. > <...> That's OK - we're happy with the solution we are using, and were just trying to give back a bit by providing the feedback. We have users in nearly 200 countries with more all the time, so we there are a lot of modules to preload, and we don't know when starting mod_perl the first time which new countries will be represented in the next batch of signups... The only way really to provide a fast (to initialize, and access) timezone DB is to either provide a DB (e.g. CDB, SDBM, etc) with the module, or have something in the make process that creates such a DB based on the DBM modules available on the user's system (or, as you mentioned, use structs with XS). Regards, Rob
Re: DateTime::TimeZone issues...
On Thu, 13 Nov 2003, Rob Mueller wrote: > ./perlbloat.pl 'use DateTime::TimeZone; $TZ{$_} = > DateTime::TimeZone->new(name => $_) for (DateTime::TimeZone::all_names)' > use DateTime::TimeZone; $TZ{$_} = DateTime::TimeZone->new(name => $_) for > (DateTime::TimeZone::all_names) added 12.7M But of course you don't actually need all of those time zones. There are quite a number of time zones that only have historical interest, for example. Then there's the ones for various South Pacific islands, the antarctic, etc. So if you just loaded the ones you _needed_, it'd use a lot less memory. > Having a look at the code, I noticed that each timezone has it's own class, > and also a lot of data in perl structures. I'm not really sure why the > timezone classes were developed this way, it seems fine for a simple case > where you only need a couple of timezones, but in a case where you can > possibly be using ANY timezone in the same script, it seems a HUGE overhead > in memory and time to have to load all those structures into memory. Patches welcome ;) I'm not sure how else we could provide an accurate and complete view of the time zone database data, other than generating Perl modules. We can't rely on the database being present on any given system, much less being up to date. The C-level API makes it pretty hard to reasonably do something like convert a datetime from one zone to another, because it's all controlled by one environment variable. Plus that probably doesn't even work on non-Unixy/POSIXy systems. Remember, Perl runs in a lot of places, and there's no reason these modules shouldn't work in all of them. > In the end, we ended up going with the POSIX timezone related calls. > Although they're pretty hacky, they give us what we want (a seconds offset > from GMT for a given timezone name at a particular time) in a simple, quick > interface without needing 13M of overhead! Right, but that's _all_ you wanted. > We're fine using our POSIX solution now, but I thought you folks might be > interested in this feedback - we chatted about it with Rick Measham at Perl > Mongers yesterday and he asked if we could provide this summary. It's > probably a good idea to keep web app developers in mind as you develop the > DateTime namespace, since it's a place where a lot of date/time calcuations > in Perl are required. I do a heck of a lot of web dev, and I've used DateTime and the time zone classes without any problems. For example, you could just preload the time zones that your current users are using, which would almost certainly be a small fraction of all possible zones. Since the time zone classes are generated, it'd be possible to generate XS code instead of Perl. Patches or a shipment of tuits would be extremely welcome. -dave /*=== House Absolute Consulting www.houseabsolute.com ===*/
DateTime::TimeZone issues...
Hi I help develop an email website (http://www.fastmail.fm) and recently we wanted to move over to providing proper timezone support for users (eg give us a location, and we'll keep the time up to date, rather than having to change for daylight savings every 6 months). As part of this, we were going to use the DateTime::TimeZone modules, however it's turned out to be a bit of a problem. Basically we're using a mod_perl environment, and we have lots of users all around the world. Because we end up wanting to use lots of different timezones, and often a different one for every web request, it's generally a good idea to pre-load the modules in the parent process so that all the data is shared by each child process. Additionally, it's often even worthwhile pre-instantiating each timezone in the parent, and storing it in a hash for later retrieval, rather than constructing a TimeZone object during each request. So my intial code involved in the startup.pl phase of our mod_perl Apache server: use DateTime::Timezone; $::TZ{$_} = DateTime::TimeZone->new(name => $_) for (DateTime::Timezone::all_names()); So this will force loading of all timezones at startup, which will then be shared amongst all children. Then in the code you can then get a timezone object with just: my $TZ = $::TZ{$ZoneName}; The problem was I just didn't realise HOW much timezone data needs to be loaded... ./perlbloat.pl 'use DateTime::TimeZone; $TZ{$_} = DateTime::TimeZone->new(name => $_) for (DateTime::TimeZone::all_names)' use DateTime::TimeZone; $TZ{$_} = DateTime::TimeZone->new(name => $_) for (DateTime::TimeZone::all_names) added 12.7M So just loading all those timezone classes takes 12.7M of RAM. That increases our process size by almost 50% over it's current size. Now this all gets shared in the children, but it's still an issue on some of our development machines which have less RAM (they're linux in vmware), and on my 1Ghz PIII laptop it takes almost 4 seconds just to load this: [robm test]$ time perl -e 'use DateTime::TimeZone; $TZ{$_} = DateTime::TimeZone->new(name => $_) for (DateTime::TimeZone::all_names)' real0m3.821s user0m2.970s sys 0m0.500s Having a look at the code, I noticed that each timezone has it's own class, and also a lot of data in perl structures. I'm not really sure why the timezone classes were developed this way, it seems fine for a simple case where you only need a couple of timezones, but in a case where you can possibly be using ANY timezone in the same script, it seems a HUGE overhead in memory and time to have to load all those structures into memory. In the end, we ended up going with the POSIX timezone related calls. Although they're pretty hacky, they give us what we want (a seconds offset from GMT for a given timezone name at a particular time) in a simple, quick interface without needing 13M of overhead! I find this a bit of a pity, because we'd really hoped to move more and more to using pure use of DateTime modules for all time related work, since it's a really nice looking library. However, for now we've found the TimeZone classes impractical in a persistent Perl environment (e.g. mod_perl, Net::Server daemons, etc). We're fine using our POSIX solution now, but I thought you folks might be interested in this feedback - we chatted about it with Rick Measham at Perl Mongers yesterday and he asked if we could provide this summary. It's probably a good idea to keep web app developers in mind as you develop the DateTime namespace, since it's a place where a lot of date/time calcuations in Perl are required. Regards, Rob