[PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by)
Read and write each 1024 byte buffer, rather than trying to buffer the entire content of the file. Previous code would crash on all files 2 Gib, when the offset variable became negative (perhaps below the level of perl), resulting in a crash. On a 32 bit system, or a system with low memory it might crash before reaching 2 GiB due to memory exhaustion. Signed-off-by: Joshua Clayton stillcompil...@gmail.com Reviewed-by: Jeff King p...@peff.net --- perl/Git.pm | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/perl/Git.pm b/perl/Git.pm index 931047c..cc91288 100644 --- a/perl/Git.pm +++ b/perl/Git.pm @@ -949,13 +949,16 @@ sub cat_blob { last unless $bytesLeft; my $bytesToRead = $bytesLeft 1024 ? $bytesLeft : 1024; - my $read = read($in, $blob, $bytesToRead, $bytesRead); + my $read = read($in, $blob, $bytesToRead); unless (defined($read)) { $self-_close_cat_blob(); throw Error::Simple(in pipe went bad); } - $bytesRead += $read; + unless (print $fh $blob) { + $self-_close_cat_blob(); + throw Error::Simple(couldn't write to passed in filehandle); + } } # Skip past the trailing newline. @@ -970,11 +973,6 @@ sub cat_blob { throw Error::Simple(didn't find newline after blob); } - unless (print $fh $blob) { - $self-_close_cat_blob(); - throw Error::Simple(couldn't write to passed in filehandle); - } - return $size; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by)
On Fri, Feb 22, 2013 at 09:30:57AM -0800, Joshua Clayton wrote: Read and write each 1024 byte buffer, rather than trying to buffer the entire content of the file. OK. Did you ever repeat your timing with a larger symmetric buffer? That should probably be a separate patch on top, but it might be worth doing while we are thinking about it. Previous code would crash on all files 2 Gib, when the offset variable became negative (perhaps below the level of perl), resulting in a crash. I'm still slightly dubious of this, just because it doesn't match my knowledge of perl (which is admittedly imperfect). I'm curious how you diagnosed it? On a 32 bit system, or a system with low memory it might crash before reaching 2 GiB due to memory exhaustion. Signed-off-by: Joshua Clayton stillcompil...@gmail.com Reviewed-by: Jeff King p...@peff.net The commit message is a good place to mention any side effects, and why they are not a problem. Something like: The previous code buffered the whole blob before writing, so any error reading from cat-file would result in zero bytes being written to the output stream. After this change, the output may be left in a partially written state (or even fully written, if we fail when parsing the final newline from cat-file). However, it's not reasonable for callers to expect anything about the state of the output when we return an error (after all, even with full buffering, we might fail during the writing process). So any caller which cares about this is broken already, and we do not have to worry about them. --- perl/Git.pm | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) The patch itself looks fine to me. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit
On Fri, Feb 22, 2013 at 09:30:57AM -0800, Joshua Clayton wrote: Subject: Re: [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by) One thing I forgot to note in my other response: the subject is part of the commit message, so information like resubmit with... should go with other cover letter material after the ---. It's also customary to say something like PATCHv2 in the brackets (which does get stripped off), and mention what is changed since v1 after the ---. That can help reviewers remember what happened, and it can help the maintainer understand where in the process of review the patch is. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by)
Joshua Clayton stillcompil...@gmail.com writes: Read and write each 1024 byte buffer, rather than trying to buffer the entire content of the file. Previous code would crash on all files 2 Gib, when the offset variable became negative (perhaps below the level of perl), resulting in a crash. On a 32 bit system, or a system with low memory it might crash before reaching 2 GiB due to memory exhaustion. Signed-off-by: Joshua Clayton stillcompil...@gmail.com Reviewed-by: Jeff King p...@peff.net --- Thanks. Subject: Re: [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by) Please drop the () part. A rule of thumb is to make git show output understandable by people who read it 6 months from now. They do not care if the commit is a re-submission. It seems that this issue was with us since the very beginning of this sub since it was introduced at 7182530d8cad (Git.pm: Add hash_and_insert_object and cat_blob, 2008-05-23). perl/Git.pm | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/perl/Git.pm b/perl/Git.pm index 931047c..cc91288 100644 --- a/perl/Git.pm +++ b/perl/Git.pm @@ -949,13 +949,16 @@ sub cat_blob { last unless $bytesLeft; my $bytesToRead = $bytesLeft 1024 ? $bytesLeft : 1024; - my $read = read($in, $blob, $bytesToRead, $bytesRead); + my $read = read($in, $blob, $bytesToRead); unless (defined($read)) { $self-_close_cat_blob(); throw Error::Simple(in pipe went bad); } - $bytesRead += $read; + unless (print $fh $blob) { + $self-_close_cat_blob(); + throw Error::Simple(couldn't write to passed in filehandle); + } Corrupt patch, line-wrapped by your MUA. I wonder if we still need $bytesRead variable. You have $size that is the size of the whole blob, so my $bytesLeft = $size; while (1) { my $bytesToRead = $bytesLeft 1024 ? $bytesLeft : 1024; my $bytesRead = read($in, $blob, $bytesToRead); ... check errors and use the $blob ... $bytesLeft -= $bytesRead; } may be simpler and easier to read, no? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by)
Thanks for all the input and patience. On Fri, Feb 22, 2013 at 10:34 AM, Jeff King p...@peff.net wrote: On Fri, Feb 22, 2013 at 09:30:57AM -0800, Joshua Clayton wrote: Read and write each 1024 byte buffer, rather than trying to buffer the entire content of the file. OK. Did you ever repeat your timing with a larger symmetric buffer? That should probably be a separate patch on top, but it might be worth doing while we are thinking about it. Previous code would crash on all files 2 Gib, when the offset variable became negative (perhaps below the level of perl), resulting in a crash. I'm still slightly dubious of this, just because it doesn't match my knowledge of perl (which is admittedly imperfect). I'm curious how you diagnosed it? I first had the memory exhaustion problem running my git repo on a 32 vm. After bumping the memory from 512 to 4 GiB, and that failing to fix it I moved to my workstation with 16 GiB ...reproduced After the initial crash, I added print $size, , $bytesToRead, , $bytesRead, \n; right before the read command, and it does indeed crash right after the $bytesRead variable crosses LONG_MAX ... 2567089913 1024 2147482624 2567089913 1024 2147483648 2567089913 1024 2147484672 Offset outside string at /usr/share/perl5/Git.pm line 901, GEN36 line 2604. Note that $bytesRead is still positive. I know very little perl, but that symptom seems pretty clear On a 32 bit system, or a system with low memory it might crash before reaching 2 GiB due to memory exhaustion. Signed-off-by: Joshua Clayton stillcompil...@gmail.com Reviewed-by: Jeff King p...@peff.net The commit message is a good place to mention any side effects, and why they are not a problem. Something like: The previous code buffered the whole blob before writing, so any error reading from cat-file would result in zero bytes being written to the output stream. After this change, the output may be left in a partially written state (or even fully written, if we fail when parsing the final newline from cat-file). However, it's not reasonable for callers to expect anything about the state of the output when we return an error (after all, even with full buffering, we might fail during the writing process). So any caller which cares about this is broken already, and we do not have to worry about them. --- perl/Git.pm | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) The patch itself looks fine to me. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html