Re: forks: shared variables between different applications or hosts

Eric Rybski Thu, 05 Jul 2007 23:03:36 -0700

The intent of forks::BerkeleyDB is to abstract away the usual grunt
effort of writing an application to use SysV, BerkeleyDB, or some
socket-communication framework, and instead adding this capability to
an interface that Perl developers are already likely familiar with: the
ithreads API.  Yet unlike standard perl ithreads, this model works when
the perl library is embedded in a forking application, like Apache
httpd.

With Apache2, it is possible to have shared variables between threads
within a multi-threaded, single process, but this defeats the ability
for all handlers (processes) to share the same shared memory.  This
same "data orphaned in a process" issue applies to native perl hashes. 
Thus, you shouldn't expect anywhere near the performance of native
hashes to an IPC model like forks::BerkeleyDB: it's an apples to
oranges comparison.

You could try an existing TCP-oriented cache mechanism like memcached
(via Cache::Memcached, or Cache::Memcached::XS for better performance),
but in past experience I've found TCP daemon models (no matter how well
tuned) for frequent, small data access to be consistently slower than
most in-memory IPC models, such as SysV shmem or BerkeleyDB (with
shared memory caching enabled).  Some TCP models do excel in handling
extremely large hashes of data (like memcached), but are inherently
limited in raw performance by their socket interface.

With regards to IPC models, my choice of using BerkeleyDB over SysV
shared (end result being forks::BerkeleyDB) was based on BerkeleyDB's
native database types that efficiently support _very_ large arrays and
hashes.  It has outstanding tablespace optimizations to both prevent
long-term data memory fragmentation as elements are added & deleted and
reclaim deleted space.  Additionally, BerkeleyDB has an optimized,
transparent shared memory interface that keeps most actively accessed
data in physical memory, eliminating unnecessary physical disk
overhead.

A comparison of BerkeleyDB to many other in-memory Perl cache modules
may be seen here:
        http://cpan.robm.fastmail.fm/cache_perf.html

Here are results from an Ultrasparc3 1.4Ghz Solaris 9 system, with a
15K RPM drive.  This example fetches every value from a hash of 100K
elements:

perl -Mforks::BerkeleyDB -Mforks::BerkeleyDB::shared -MBenchmark=:all
-e 'my %h:shared=(1..100000); my $a; timethis(100000, sub { $a=$h{$i++}
});

timethis 100000:  9 wallclock secs ( 8.79 usr +  0.01 sys =  8.80 CPU)
@ 11363.64/s (n=100000)

Given similar hardware, this does appear to meet your 1000 max hash
lookups per request @ <0.8 seconds of CPU time.  To make
forks::BerkeleyDB really shine on Linux, you can further reduce CPU
overhead and eliminate any disk I/O wait time by allocating a ramdisk
and re-mapping the location of all BerkeleyDB files to that location:

http://www.vanemery.com/Linux/Ramdisk/ramdisk.html

Then do:

export TMPDIR=/path/to/ramdisk
perl -Mforks::BerkeleyDB -Mforks::BerkeleyDB::shared -e yourscript.pl

As always, the perl ithreads model isn't a solution for every
multi-process perl problem.  My hope is that forks::BerkeleyDB allows
problems such as yours to be both solved elegantly and simply, without
overly sacrificing performance.

If these modules still don't live up to your performance requirements,
then it's reasonably unlikely any IPC model will be a best fit for your
problem: you may need to re-evaluate the architectural design of your
large in-memory hash.

I hope this helps.

-Eric

--- Alvar Freude <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> -- Alvar Freude <[EMAIL PROTECTED]> wrote:
> 
> > hmmm, for a typical page I need between 50 and 1000 hash lookups.
> And
> > sometimes more and sometimes fewer. But I'll make some tests
> 
> so, the results for a first test with small hash lookup are:
> With forks::BerkeleyDB it is about 10 to 20 times slower then with an
> 
> unshared hash and about 10 times faster as with the usual
> forks::shared.
> 
> That's pretty fast, but for my problem it seems that the solution
> with an 
> extra daemon is better.
> 
> 
> Ciao
>   Alvar
> 
> 
> -- 
> ** Alvar C.H. Freude, http://alvar.a-blast.org/
> ** http://www.assoziations-blaster.de/
> ** http://www.wen-waehlen.de/
> ** http://odem.org/
>

Re: forks: shared variables between different applications or hosts

Reply via email to