Re: DateTime::TimeZone issues...

2003-11-13 Thread Joshua Hoblitt
I don't think the memory usage of DT::TZ is really that excessive.  It's been over 3 
years since I've had a sever with less then 1GB of ram in it.  That's easily enough 
memory for 64 x 12.5MB apache children that aren't doing page sharing.

-J

--


Re: DateTime::TimeZone issues...

2003-11-13 Thread Dave Rolsky
On Thu, 13 Nov 2003, Matt Sisk wrote:

> Now I'm starting to think we can have our cake and eat it too vis-a-vis
> unique key generation for the spans.
>
> If you don't mind, I'll take a crack at the templating in the tz module
> generation script to construct the modules sharing the common data
> structure.

Go for it.


-dave

/*===
House Absolute Consulting
www.houseabsolute.com
===*/


Re: DateTime::TimeZone issues...

2003-11-13 Thread Matt Sisk
Now I'm starting to think we can have our cake and eat it too vis-a-vis
unique key generation for the spans.
If you don't mind, I'll take a crack at the templating in the tz module 
generation script to construct the modules sharing the common data 
structure.

No __DATA__ or external files will be required -- we'll just undef our 
private data once we're done populating the common structure when the 
module loads.

Matt







Re: DateTime::TimeZone issues...

2003-11-13 Thread Matt Sisk
Matt Sisk wrote:
If you wanted to avoid the up-front cost, as well as the cost of unique 
key generation at compile/runtime, another option would be to have the 
program that generates the modules from Olson data pre-generate unique 
keys for each span. Then have a 'status' hash in each TZ-specific module 
that informs whether the module-specific spans have been inflated or 
not. If not, inflate them and store them in the common pool. Spans on 
demand, and only at the cost of a small list of unique keys in each TZ 
module.
I forgot to add...the structure of each span would obviously have to be 
stored somewhere. __DATA__ might not suffice since it takes up memory, 
but perhaps a flat text file containing all spans would suffice. Then 
the unique keys could simply be the offsets into that file.

Matt



Re: DateTime::TimeZone issues...

2003-11-13 Thread Matt Sisk
Dave Rolsky wrote:
But that only applies when you load _all_ the zones  How would these be
shared if you only wanted to load 10-20 zones, or even 150 zones?  It
seems like the overhead of determining what is shared would outweigh the
savings.
As we saw in the beginning of this thread, there are some cases out 
there where people are faced with loading many, if not all, of the 
modules. This might be a minority of users, but I'd hate to see the 
problem become an impedement to the adoption of DateTime.

Do you have any idea of how to implement this in a way that doesn't
require all the zones to be loaded up front
As for overhead and implementation. Perhaps a BEGIN block in each TZ 
module could populate the common data structure with each span array. 
I'm assuming that would be a hash, and the main thing required would be 
an efficient way to generate a unique key for each span. Then the same 
block would populate the private module-specific data structure with 
references to the shared span arrays.

Once the data structure is built (compile time) there should be no 
further impacts on efficiency, but there is the extra cost of having to 
generate unique keys for each span at compile time.

If you wanted to avoid the up-front cost, as well as the cost of unique 
key generation at compile/runtime, another option would be to have the 
program that generates the modules from Olson data pre-generate unique 
keys for each span. Then have a 'status' hash in each TZ-specific module 
that informs whether the module-specific spans have been inflated or 
not. If not, inflate them and store them in the common pool. Spans on 
demand, and only at the cost of a small list of unique keys in each TZ 
module.

I'm probably missing some crucial detail, but on the surface of it I 
like the smell of the second option since it has virtually no additional 
performance impact from today's behavior.

Matt



Re: DateTime::TimeZone issues...

2003-11-13 Thread Dave Rolsky
On Thu, 13 Nov 2003, Matt Sisk wrote:

> Matt Sisk wrote:
> > I have not verified this, but IF there is a lot of overlap of spans
> > between various timezones, perhaps a 'span registry' could be shared
> > between all the zone modules, thereby avoiding duplication of span objects.
>
> I just ran a quick check on TZ 0.2505:
>
> TZ module count: 367
>   Span count: 16969
> Crunched: 9333
>
> If you reuse duplicat span arrays by reference, you shave the memory
> footprint by approximately 45%.

But that only applies when you load _all_ the zones  How would these be
shared if you only wanted to load 10-20 zones, or even 150 zones?  It
seems like the overhead of determining what is shared would outweigh the
savings.

Do you have any idea of how to implement this in a way that doesn't
require all the zones to be loaded up front, because I can't think of one.

I like the idea of sharing the data somehow though.


-dave

/*===
House Absolute Consulting
www.houseabsolute.com
===*/


Re: DateTime::TimeZone issues...

2003-11-13 Thread Matt Sisk
Here's the script I used if you want to verify...

#!/usr/bin/perl

$sd = '/usr/lib/perl5/site_perl/5.8.0/DateTime/TimeZone';
my %Spans;
use File::Find;
use Data::Dumper;
-d $sd or die "oops\n";

my $span_count = 0;
my $file_count = 0;
find (\&wanted, $sd);

print "TZ module count: $file_count\n";
print " Span count: $span_count\n";
print "   Crunched: ", scalar keys %Spans, "\n";
exit;

sub wanted {
  return unless /\.pm$/;
  my $name = $File::Find::name;
  open(M, "<$name") or die "Problem opening $name\n";
  my $olson = 0;
  my $spans = 0;
  my $str;
  my $span;
  while () {
++$olson if /Olson data version/;
next unless $olson;
if (/my\s+\$spans/) {
  ++$spans;
  ++$file_count;
  next;
}
$str .= $_ if $spans;
if (/\];/) {
  $spans = $olson = 0;
  eval "\$span = $str";
  print "oops: [EMAIL PROTECTED]" if $@;
  foreach (@$span) {
++$span_count;
my $str = Dumper($_);
++$Spans{$str};
  }
  $str = '';
}
  }
}
#




Re: DateTime::TimeZone issues...

2003-11-13 Thread Matt Sisk
Matt Sisk wrote:
I have not verified this, but IF there is a lot of overlap of spans 
between various timezones, perhaps a 'span registry' could be shared 
between all the zone modules, thereby avoiding duplication of span objects.
I just ran a quick check on TZ 0.2505:

TZ module count: 367
 Span count: 16969
   Crunched: 9333
If you reuse duplicat span arrays by reference, you shave the memory 
footprint by approximately 45%.

Matt



Re: DateTime::TimeZone issues...

2003-11-13 Thread Dave Rolsky
On Thu, 13 Nov 2003, Matt Sisk wrote:

> > Since the time zone classes are generated, it'd be possible to generate XS
> > code instead of Perl.  Patches or a shipment of tuits would be extremely
> > welcome.
>
> The timezone modules use lots of spans, correct?

No, it's just a big data structure (an array of arrays).  Each part of the
big array represents a span, but it's not actually a DateTime::Span
object.  That wouldn't provide any added functionality in this case.


-dave

/*===
House Absolute Consulting
www.houseabsolute.com
===*/


Re: DateTime::TimeZone issues...

2003-11-13 Thread Matt Sisk
Dave Rolsky wrote:
I do a heck of a lot of web dev, and I've used DateTime and the time zone
classes without any problems.
For example, you could just preload the time zones that your current users
are using, which would almost certainly be a small fraction of all
possible zones.
Since the time zone classes are generated, it'd be possible to generate XS
code instead of Perl.  Patches or a shipment of tuits would be extremely
welcome.
The timezone modules use lots of spans, correct?

I have not verified this, but IF there is a lot of overlap of spans 
between various timezones, perhaps a 'span registry' could be shared 
between all the zone modules, thereby avoiding duplication of span objects.

Of course if there is no overlap of spans then the effort would be 
pointless.

Matt



Re: DateTime::TimeZone issues...

2003-11-13 Thread Dave Rolsky
On Thu, 13 Nov 2003, Joshua Hoblitt wrote:

> > Seriously, I'd like to eventually speed up/slim down the time zone stuff
> > but just getting it working took an enormous amount of development effort.
> > Making a super-fast whiz-bang version that still works is not trivial.
>
> Maybe we should ask around to try and determine the level of interest in
> using DT::TZ::* under mod_perl.  If it's high we can apply for a
> linuxfund or TPF grant to fund Dave.  Of course those that have a
> _financial_ interest in this can fund Dave directly...

There's definitely interest.  I do a lot of mod_perl app development
myself, and I know that there are various other web app type places using
the DateTime code.


-dave

/*===
House Absolute Consulting
www.houseabsolute.com
===*/


Re: DateTime::TimeZone issues...

2003-11-13 Thread Joshua Hoblitt
> Seriously, I'd like to eventually speed up/slim down the time zone stuff
> but just getting it working took an enormous amount of development effort.
> Making a super-fast whiz-bang version that still works is not trivial.

Maybe we should ask around to try and determine the level of interest in using 
DT::TZ::* under mod_perl.  If it's high we can apply for a linuxfund or TPF grant to 
fund Dave.  Of course those that have a _financial_ interest in this can fund Dave 
directly...

-J

--


Re: DateTime::TimeZone issues...

2003-11-12 Thread Dave Rolsky
On Thu, 13 Nov 2003, Rob Mueller wrote:

> The only way really to provide a fast (to initialize, and access) timezone
> DB is to either provide a DB (e.g. CDB, SDBM, etc) with the module, or have
> something in the make process that creates such a DB based on the DBM
> modules available on the user's system (or, as you mentioned, use structs
> with XS).

Sounds great.  I welcome patches and/or funding and/or a time machine so I
can develop these ;)

Seriously, I'd like to eventually speed up/slim down the time zone stuff
but just getting it working took an enormous amount of development effort.
Making a super-fast whiz-bang version that still works is not trivial.


-dave

/*===
House Absolute Consulting
www.houseabsolute.com
===*/


Re: DateTime::TimeZone issues...

2003-11-12 Thread Rob Mueller
<...>
> > Having a look at the code, I noticed that each timezone has it's own
class,
> > and also a lot of data in perl structures. I'm not really sure why the
> > timezone classes were developed this way, it seems fine for a simple
case
> > where you only need a couple of timezones, but in a case where you can
> > possibly be using ANY timezone in the same script, it seems a HUGE
overhead
> > in memory and time to have to load all those structures into memory.
>
> Patches welcome ;)

That's probably how I would have responded too... ;-)
<...>
> > We're fine using our POSIX solution now, but I thought you folks might
be
> > interested in this feedback - we chatted about it with Rick Measham at
Perl
> > Mongers yesterday and he asked if we could provide this summary. It's
> > probably a good idea to keep web app developers in mind as you develop
the
> > DateTime namespace, since it's a place where a lot of date/time
calcuations
> > in Perl are required.
>
> I do a heck of a lot of web dev, and I've used DateTime and the time zone
> classes without any problems.
>
> For example, you could just preload the time zones that your current users
> are using, which would almost certainly be a small fraction of all
> possible zones.
>
<...>
That's OK - we're happy with the solution we are using, and were just trying
to give back a bit by providing the feedback. We have users in nearly 200
countries with more all the time, so we there are a lot of modules to
preload, and we don't know when starting mod_perl the first time which new
countries will be represented in the next batch of signups...

The only way really to provide a fast (to initialize, and access) timezone
DB is to either provide a DB (e.g. CDB, SDBM, etc) with the module, or have
something in the make process that creates such a DB based on the DBM
modules available on the user's system (or, as you mentioned, use structs
with XS).

Regards,
  Rob



Re: DateTime::TimeZone issues...

2003-11-12 Thread Dave Rolsky
On Thu, 13 Nov 2003, Rob Mueller wrote:

> ./perlbloat.pl 'use DateTime::TimeZone; $TZ{$_} =
> DateTime::TimeZone->new(name => $_) for (DateTime::TimeZone::all_names)'
> use DateTime::TimeZone; $TZ{$_} = DateTime::TimeZone->new(name => $_) for
> (DateTime::TimeZone::all_names) added 12.7M

But of course you don't actually need all of those time zones.  There are
quite a number of time zones that only have historical interest, for
example.  Then there's the ones for various South Pacific islands, the
antarctic, etc.

So if you just loaded the ones you _needed_, it'd use a lot less memory.

> Having a look at the code, I noticed that each timezone has it's own class,
> and also a lot of data in perl structures. I'm not really sure why the
> timezone classes were developed this way, it seems fine for a simple case
> where you only need a couple of timezones, but in a case where you can
> possibly be using ANY timezone in the same script, it seems a HUGE overhead
> in memory and time to have to load all those structures into memory.

Patches welcome ;)

I'm not sure how else we could provide an accurate and complete view of
the time zone database data, other than generating Perl modules.  We can't
rely on the database being present on any given system, much less being up
to date.  The C-level API makes it pretty hard to reasonably do something
like convert a datetime from one zone to another, because it's all
controlled by one environment variable.

Plus that probably doesn't even work on non-Unixy/POSIXy systems.
Remember, Perl runs in a lot of places, and there's no reason these
modules shouldn't work in all of them.

> In the end, we ended up going with the POSIX timezone related calls.
> Although they're pretty hacky, they give us what we want (a seconds offset
> from GMT for a given timezone name at a particular time) in a simple, quick
> interface without needing 13M of overhead!

Right, but that's _all_ you wanted.

> We're fine using our POSIX solution now, but I thought you folks might be
> interested in this feedback - we chatted about it with Rick Measham at Perl
> Mongers yesterday and he asked if we could provide this summary. It's
> probably a good idea to keep web app developers in mind as you develop the
> DateTime namespace, since it's a place where a lot of date/time calcuations
> in Perl are required.

I do a heck of a lot of web dev, and I've used DateTime and the time zone
classes without any problems.

For example, you could just preload the time zones that your current users
are using, which would almost certainly be a small fraction of all
possible zones.

Since the time zone classes are generated, it'd be possible to generate XS
code instead of Perl.  Patches or a shipment of tuits would be extremely
welcome.


-dave

/*===
House Absolute Consulting
www.houseabsolute.com
===*/


DateTime::TimeZone issues...

2003-11-12 Thread Rob Mueller
Hi

I help develop an email website (http://www.fastmail.fm) and recently we
wanted to move over to providing proper timezone support for users (eg give
us a location, and we'll keep the time up to date, rather than having to
change for daylight savings every 6 months). As part of this, we were going
to use the DateTime::TimeZone modules, however it's turned out to be a bit
of a problem.

Basically we're using a mod_perl environment, and we have lots of users all
around the world. Because we end up wanting to use lots of different
timezones, and often a different one for every web request, it's generally a
good idea to pre-load the modules in the parent process so that all the data
is shared by each child process. Additionally, it's often even worthwhile
pre-instantiating each timezone in the parent, and storing it in a hash for
later retrieval, rather than constructing a TimeZone object during each
request.

So my intial code involved in the startup.pl phase of our mod_perl Apache
server:

use DateTime::Timezone;
$::TZ{$_} = DateTime::TimeZone->new(name => $_)
  for (DateTime::Timezone::all_names());

So this will force loading of all timezones at startup, which will then be
shared amongst all children.

Then in the code you can then get a timezone object with just:

my $TZ = $::TZ{$ZoneName};

The problem was I just didn't realise HOW much timezone data needs to be
loaded...

./perlbloat.pl 'use DateTime::TimeZone; $TZ{$_} =
DateTime::TimeZone->new(name => $_) for (DateTime::TimeZone::all_names)'
use DateTime::TimeZone; $TZ{$_} = DateTime::TimeZone->new(name => $_) for
(DateTime::TimeZone::all_names) added 12.7M

So just loading all those timezone classes takes 12.7M of RAM. That
increases our process size by almost 50% over it's current size. Now this
all gets shared in the children, but it's still an issue on some of our
development machines which have less RAM (they're linux in vmware), and on
my 1Ghz PIII laptop it takes almost 4 seconds just to load this:

[robm test]$ time perl -e 'use DateTime::TimeZone; $TZ{$_} =
DateTime::TimeZone->new(name => $_) for (DateTime::TimeZone::all_names)'

real0m3.821s
user0m2.970s
sys 0m0.500s

Having a look at the code, I noticed that each timezone has it's own class,
and also a lot of data in perl structures. I'm not really sure why the
timezone classes were developed this way, it seems fine for a simple case
where you only need a couple of timezones, but in a case where you can
possibly be using ANY timezone in the same script, it seems a HUGE overhead
in memory and time to have to load all those structures into memory.

In the end, we ended up going with the POSIX timezone related calls.
Although they're pretty hacky, they give us what we want (a seconds offset
from GMT for a given timezone name at a particular time) in a simple, quick
interface without needing 13M of overhead!

I find this a bit of a pity, because we'd really hoped to move more and more
to using pure use of DateTime modules for all time related work, since it's
a really nice looking library. However, for now we've found the TimeZone
classes impractical in a persistent Perl environment (e.g. mod_perl,
Net::Server daemons, etc).

We're fine using our POSIX solution now, but I thought you folks might be
interested in this feedback - we chatted about it with Rick Measham at Perl
Mongers yesterday and he asked if we could provide this summary. It's
probably a good idea to keep web app developers in mind as you develop the
DateTime namespace, since it's a place where a lot of date/time calcuations
in Perl are required.

Regards,
  Rob