> >I'd be interested in any accumulated wisdom on how ithreads
> scale to large
> >programs wrt the copying required to create a new thread. I
> suppose I could
> >write some test scripts...if they don't already exist somewhere...
>
> Maybe an addition to Benchmark.pm, which would just benchmark creating an
> empty thread thus:
>
> threads->new( sub {} )->join;
>
> would give some indication?
What I'm wondering is how execution times for the following compare:
1. a program that just creates and joins a thread
2. a program that allocates a LOT of data and then
creates and joins a thread
3. a program that allocates a LOT of data, makes it shared,
and then creates and joins a thread
4. a program that uses a LOT of packages and then creates
and joins a thread.
?. any other combinations that seem interesting
The question is how thread allocation times change with the scale of the
program that employs them. I'm sure there is some change, but it may in
fact be minimal. Seems like a good thing to know.
-------------------------------------------------
So...I've hacked together a naive test script to see what happens. I'm
attaching it in case anyone else is interested. The script runs itself a
number of times (to avoid having a non-portable script file alongside of it)
and collects data for some different cases corresponding to #s 1-4 above.
For the data allocation tests I create a tree of hash references. The tree
goes X levels deep (X=1..5) and each level expands by a factor of 6 (why 6?
why not?). the table shows the number of hashes so allocated.
Here's the results from my Windows 2000 box:
program(s) thrd spawn thrd exec hashes test #
-----------------------------------------------------------
00.050074, 00.040059, 00.010015, 000000, simple (1)
02.894278, 01.762605, 00.110163, 000000, packages (4)
00.050074, 00.040059, 00.010015, 000007, data=1 (2)
00.070103, 00.040060, 00.020029, 000043, data=2 (2)
00.100148, 00.050074, 00.030044, 000259, data=3 (2)
00.430636, 00.200296, 00.130192, 001555, data=4 (2)
19.108239, 17.766255, 00.751110, 009331, data=5 (2)
00.050074, 00.040059, 00.010015, 000007, shared data=1 (3)
00.080119, 00.030044, 00.030045, 000043, shared data=2 (3)
00.250370, 00.040059, 00.100148, 000259, shared data=3 (3)
01.211791, 00.030044, 00.540800, 001555, shared data=4 (3)
06.970301, 00.040059, 03.084559, 009331, shared data=5 (3)
It kind of does what I thought it would...for example:
Loading a pile of packages slows down thread allocation. It also slows down
thread execution somehow. I find that strange, since the thread doesn't
actually _use_ the packages that were loaded. Perhaps some of the data
duplication is actually done during thread execution???
The more global data is allocated prior to thread launch the more time it
takes to launch the thread. It also slows down thread execution, but that
is to be expected, since the thread is actually counting the nodes on the
allocated data tree.
But _shared_ global data allocated prior to thread launch has little or no
effect on thread startup time. This is to be expected. On the other hand,
the thread execution time (involving counting the data nodes) is _way_ up
(looks like maybe a factor of 4), probably due to data locking/unlocking.
Note that the _total_ thread or program time for the largest data test is
longer for unshared data. With the largest amount of data it seems to be
better to share than not to share. This holds up at the next highest data
level as well. This might have some interesting implications for programs
with large amounts of preloaded data.
-------------------------------------------------
Having gotten this far, I think that the next set of tests would explore
running more than one thread sequentially for each test, using the thread
pool, and running multiple threads in parallel. I would also be interesting
to look at memory sizes (and GC behavior) for all of these cases. Don't
know when I'll get to that...if ever. Kind of depends on whether or not
there is any interest in this.
mma
threading.pl
Description: Binary data
