Wow, got back to this thread this afternoon and I see mucho discussion.

Early this morning I had tried a few things with my test program and
confirmed (to the extent such is possible) that the problem results
from the apparent extra blocks allocated in valgrind's support of my
program's repeated realloc's.

Bart suggest that I try --freelist-vol=0, but upon doing so I did not
see any difference (will try again to make sure).

Phillippe suggested I try a patch (
https://bugs.kde.org/show_bug.cgi?id=250101 ).  I will look into how
much effort is involved.  I haven't previously built valgrind from
source.  Sounds like it is in SVN (which I have used for my own
stuff).  Probably easy to build but I have a couple other ideas to try
before I will go that route.

I'm now going to focus on my original program, and I'm not planning to
bother folks here with that until I have something useful to post.
But I will give you a brief overview of why the program has that
allocation pattern, and point you at the source code (only if you are
curious), and then mention one idea I have to address that usage
pattern.

The program is a (fairly popular) DNA sequence aligner called lastz,
written by me, which you can find here:
    http://www.bx.psu.edu/~rsharris/lastz/newer
The latest version is 1.02.37 and I assume it still exhibits the
problem but not in the normal build case.  I discovered the problem
last week (which technically predates the release) during routine
testing on a large-memory build that I have recently added to the
source but have not yet made available in the distributed Makefile.
For anyone who is curious and wants to build it for the configuration
that fails, let me know.  (I certainly don't expect this, and this is
why I didn't post any specifics about my real program in the first
post).

Anyway, the reason for its allocation pattern is that I was loading in
the entire human genome, chromosome by chromosome.  m14 in the demo
program corresponds to one large string of ACGT (mostly) which will
hold the concatenation of all the chromosomes.  chr1 is about 250M,
chr2 is a little smaller, and so on.  Since the program doesn't know
the overall size a priori, it reallocs the block as each chromosome
"arrives".  Because of the way the genome file is stored, the realloc
pattern gets the biggest stuff first, and then at the end adds lots of
little pieces (the genome file contains more than the usual 23
chromosomes, with a lot of little "chrumbs").

The later-allocated 12G block will hold an index of the positions of
all the 12-letter words in the genome.  This size is also related to
the overall size of the genome, but at the point that I allocate it, I
know the size of the genome.

The usual build of the program, the build that people have been using
for years, has a much smaller limit on m14.  Typically the largest
would be a single chromosome, so 250M for that, and that index is also
smaller.  The program was originally designed to fit in a 2G
footprint.

Now, what I intend to try, to work around the problem, is to predict
the overall size and do a single allocation (not exactly rocket
science, I know).  Depending on the input file format I can derive
this from the file before I start reading chromosomes.  Or, I can
allow the user to estimate it.  For my immediate purpose, the user
(me) will provide it.

This won't change the fact that I still need to decide if I really
have a problem.  I think I probably do.  Refreshing your memory, the
real program didn't fail without valgrind, but segfaulted with it (see
first post if you are curious).  I don't think it's likely that the
real program's failure is due to me failing to check for a NULL, as it
was in the test case (another refresher: my program is supposed to
check all allocation and commit suicide on any failure).  The more I
think about it, I think that the reallocation problem may be a red
herring, and there may be some other problem with my code in the
larger-memory build.  My hope is that by allocating m14 in one call I
will expose that hypothesized error.

Thanks again for the attention this has gotten!
Bob H

------------------------------------------------------------------------------
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to