Peter,

--- "Lobsingerp,Peter [CIS]" <[EMAIL PROTECTED]> wrote:

> As for sharing resources, the data that I am analysing is a 3-4 Gs
> (more
> than half of what I have available in RAM), so I wasn't too keen to
> copy
> all of this into 101-1001 processes, although forks.pm shows promise.
> Sounds a lot easier than convincing my sysadmin to recompile Perl.
> 

Given this memory usage, I also *highly* recommend you evaluate
forks::BerkeleyDB:

http://search.cpan.org/~rybskej/forks-BerkeleyDB-0.03/lib/forks/BerkeleyDB/shared.pm

It is a strap-on module for forks.pm that abstracts all shared variable
data into separate, high-performance BerkeleyDB databases.


This means means *all* your shared data will be stored/accessed from
physical drive with very efficient shared mem caching of commonly
accessed data via the BerkeleyDB shared memory architecture.  In your
case, it sounds like this should result in huge memory savings,
allowing you to run many more threads.

You could further optimize performance if you were to set up a RAM disk
to logically remap all this data back into memory:

http://search.cpan.org/~rybskej/forks-BerkeleyDB-0.03/lib/forks/BerkeleyDB/shared.pm#Location_of_database_files

which may seem like an odd workaround, except that you now have an
ithreads application that benefits from fork() copy-on-write but still
keeps all shared data in RAM.  I've actually done this for one
forks::BerkelyDB real-time data processing application that required
fast access to cached data stored in nested forks::BerkeyeyDB::shared
hashes.

forks::BerkeleyDB also frees forks::shared up from the heaviest IPC
socket load--shared data access--so it massively improves overall
forks::shared performance under medium to high volume shared data
access (to a level surprisingly comparable to native threads::shared,
in some preliminary analysis I've conducted).

> Another thing to note is that I do not use locking because this is an
> analysis and I do not write to my data.
> 

Both forks::shared and forks::BerleleyDB::shared support safe,
concurrent reads of shared data. (I'm pretty certain this is also safe
with native threads::shared.)

In general, if you do any modify shared data while other threads read
it, you must lock before all associated reads to insure data state is
consistent; otherwise, it's "at your own risk" to not use locks (and
may even affect application stability with native threads::shared).

Regards,
Eric

Reply via email to