Micah Cowan [EMAIL PROTECTED] writes:
I agree that it's probably a good idea to move HTML parsing to a model
that doesn't require slurping everything into memory;
Note that Wget mmaps the file whenever possible, so it's not actually
allocated on the heap (slurped). You need some memory to
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Hrvoje Niksic wrote:
Micah Cowan [EMAIL PROTECTED] writes:
I agree that it's probably a good idea to move HTML parsing to a model
that doesn't require slurping everything into memory;
Note that Wget mmaps the file whenever possible, so it's
Micah Cowan [EMAIL PROTECTED] writes:
Yes, but when mmap()ping with MEM_PRIVATE, once you actually start
_using_ the mapped space, is there much of a difference?
As long as you don't write to the mapped region, there should be no
difference between shared and private mapped space -- that's
Hrvoje Niksic wrote:
mmap() isn't failing; but wget's memory space gets huge through the
simple use of memchr() (on '', for instance) on the mapped address
space.
Wget's virtual memory footprint does get huge, but the resident memory
needn't.
Sorry, I should've been clearer: specifically,
Micah Cowan [EMAIL PROTECTED] writes:
Actually, I was wrong though: sometimes mmap() _is_ failing for me
(did just now), which of course means that everything is in resident
memory.
I don't understand why mmapping a regular would fail on Linux. What
error code are you getting?
(Wget tries
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Hrvoje Niksic wrote:
Micah Cowan [EMAIL PROTECTED] writes:
Actually, I was wrong though: sometimes mmap() _is_ failing for me
(did just now), which of course means that everything is in resident
memory.
I don't understand why mmapping a
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Micah Cowan wrote:
A bug report made to Savannah
(https://savannah.gnu.org/bugs/index.php?20496) detailed an example
where wget would download a recursive fetch normally, but then when run
again (with -c), it would eat up vast (_vast_) amounts
Hi List,
Micah Cowan wrote:
Micah Cowan wrote:
I'm expecting that, when a file of such size or greater is encountered,
it would simply be left alone and not parsed, rather than read up to the
limit, and parse up to that point, but if anyone would like to argue for
the latter behavior, I'm
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Matthias Vill wrote:
I just converted some project and a log-html was created with 6MB in
size and I agree to you, that this is a rare case and opening this file
with a browser is no fun, but still I don't like hardcoded sizes.
Maybe there will