On Wed, Dec 10, 2003 at 01:27:08AM -0700 Desree Sowers wrote:
> Hi Tassilo, and thanks for the prompt reply,
>
> I tried your changes and they seemed to work better than what I had.
>
> My only problem is that I have 65+Mb files that I can read in a couple of
> seconds, but I have 200Mb files that take over 4 minutes(!). I suspect the
> problem is in the fread statements. I seem to be spending more time on
> those than anything else.
I am not sure. What you do (in a nutshell):
while (!feof(file)){
...
XPUSHs(sv_2mortal(newRV_noinc(SV*)AoA));
}
Each array-ref you push onto the stack refers to an array of three
elements. Since the three array elements are just one integer and two
doubles packed as a chars, you must push an awful amount of those
references onto the stack. Since I don't see you fseek()ing forward in
the file, for 200MB and each record being sizeof(int)+2*sizeof(double)
which is (on most platforms) 20 bytes: A file of 200MB (== 209,715,200
bytes), divided by 20 yields 10,485,760 records. If you put this many
array-references onto the stack (each one eating considerably more than
20 bytes), you'll get a problem with your RAM. So what could happen is
that at some point your OS starts swapping out data to disk which means
a *massive* slow-down.
You have to think of other ways to make these data accessible. What do
you do with the huge list of array-references that you return from your
XSUB? Maybe you consider a callback mechamism. Instead of
@huge_array_of_refs = get_data();
you do
sub process_data {
my @rec = @{ shift() };
...
}
get_data(\&process_data);
In other words: your XSUB receives an additional argument being a
reference to a Perl subroutine which gets triggered for each record you
extract from your data. This way, you never keep more than one record in
memory. If you need more than one record at a time, the callback
function can store it in an array and later remove records no longer
needed (maybe a stack?). This model is quite flexible, because
get_data() could for instance only return what process_data() returns
(similar to what map() does).
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval