[PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by)

2013-02-22 Thread Joshua Clayton
Read and write each 1024 byte buffer, rather than trying to buffer
the entire content of the file.
Previous code would crash on all files  2 Gib, when the offset variable
became negative (perhaps below the level of perl), resulting in a crash.
On a 32 bit system, or a system with low memory it might crash before
reaching 2 GiB due to memory exhaustion.

Signed-off-by: Joshua Clayton stillcompil...@gmail.com
Reviewed-by: Jeff King p...@peff.net
---
 perl/Git.pm |   12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/perl/Git.pm b/perl/Git.pm
index 931047c..cc91288 100644
--- a/perl/Git.pm
+++ b/perl/Git.pm
@@ -949,13 +949,16 @@ sub cat_blob {
last unless $bytesLeft;

my $bytesToRead = $bytesLeft  1024 ? $bytesLeft : 1024;
-   my $read = read($in, $blob, $bytesToRead, $bytesRead);
+   my $read = read($in, $blob, $bytesToRead);
unless (defined($read)) {
$self-_close_cat_blob();
throw Error::Simple(in pipe went bad);
}
-
$bytesRead += $read;
+   unless (print $fh $blob) {
+   $self-_close_cat_blob();
+   throw Error::Simple(couldn't write to passed
in filehandle);
+   }
}

# Skip past the trailing newline.
@@ -970,11 +973,6 @@ sub cat_blob {
throw Error::Simple(didn't find newline after blob);
}

-   unless (print $fh $blob) {
-   $self-_close_cat_blob();
-   throw Error::Simple(couldn't write to passed in filehandle);
-   }
-
return $size;
 }

--
1.7.10.4
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by)

2013-02-22 Thread Jeff King
On Fri, Feb 22, 2013 at 09:30:57AM -0800, Joshua Clayton wrote:

 Read and write each 1024 byte buffer, rather than trying to buffer
 the entire content of the file.

OK. Did you ever repeat your timing with a larger symmetric buffer? That
should probably be a separate patch on top, but it might be worth doing
while we are thinking about it.

 Previous code would crash on all files  2 Gib, when the offset variable
 became negative (perhaps below the level of perl), resulting in a crash.

I'm still slightly dubious of this, just because it doesn't match my
knowledge of perl (which is admittedly imperfect). I'm curious how you
diagnosed it?

 On a 32 bit system, or a system with low memory it might crash before
 reaching 2 GiB due to memory exhaustion.
 
 Signed-off-by: Joshua Clayton stillcompil...@gmail.com
 Reviewed-by: Jeff King p...@peff.net

The commit message is a good place to mention any side effects, and why
they are not a problem. Something like:

  The previous code buffered the whole blob before writing, so any error
  reading from cat-file would result in zero bytes being written to the
  output stream.  After this change, the output may be left in a
  partially written state (or even fully written, if we fail when
  parsing the final newline from cat-file). However, it's not reasonable
  for callers to expect anything about the state of the output when we
  return an error (after all, even with full buffering, we might fail
  during the writing process).  So any caller which cares about this is
  broken already, and we do not have to worry about them.

 ---
  perl/Git.pm |   12 +---
  1 file changed, 5 insertions(+), 7 deletions(-)

The patch itself looks fine to me.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit

2013-02-22 Thread Jeff King
On Fri, Feb 22, 2013 at 09:30:57AM -0800, Joshua Clayton wrote:

 Subject: Re: [PATCH] Fix in Git.pm cat_blob crashes on large files
   (resubmit with reviewed-by)

One thing I forgot to note in my other response: the subject is part of
the commit message, so information like resubmit with... should go
with other cover letter material after the ---.

It's also customary to say something like PATCHv2 in the brackets
(which does get stripped off), and mention what is changed since v1
after the ---. That can help reviewers remember what happened, and it
can help the maintainer understand where in the process of review the
patch is.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by)

2013-02-22 Thread Junio C Hamano
Joshua Clayton stillcompil...@gmail.com writes:

 Read and write each 1024 byte buffer, rather than trying to buffer
 the entire content of the file.
 Previous code would crash on all files  2 Gib, when the offset variable
 became negative (perhaps below the level of perl), resulting in a crash.
 On a 32 bit system, or a system with low memory it might crash before
 reaching 2 GiB due to memory exhaustion.

 Signed-off-by: Joshua Clayton stillcompil...@gmail.com
 Reviewed-by: Jeff King p...@peff.net
 ---

Thanks.

 Subject: Re: [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit 
 with reviewed-by)

Please drop the () part.  A rule of thumb is to make git show
output understandable by people who read it 6 months from now.  They
do not care if the commit is a re-submission.

It seems that this issue was with us since the very beginning of
this sub since it was introduced at 7182530d8cad (Git.pm: Add
hash_and_insert_object and cat_blob, 2008-05-23).

  perl/Git.pm |   12 +---
  1 file changed, 5 insertions(+), 7 deletions(-)

 diff --git a/perl/Git.pm b/perl/Git.pm
 index 931047c..cc91288 100644
 --- a/perl/Git.pm
 +++ b/perl/Git.pm
 @@ -949,13 +949,16 @@ sub cat_blob {
 last unless $bytesLeft;

 my $bytesToRead = $bytesLeft  1024 ? $bytesLeft : 1024;
 -   my $read = read($in, $blob, $bytesToRead, $bytesRead);
 +   my $read = read($in, $blob, $bytesToRead);
 unless (defined($read)) {
 $self-_close_cat_blob();
 throw Error::Simple(in pipe went bad);
 }
 -
 $bytesRead += $read;
 +   unless (print $fh $blob) {
 +   $self-_close_cat_blob();
 +   throw Error::Simple(couldn't write to passed
 in filehandle);
 +   }

Corrupt patch, line-wrapped by your MUA.

I wonder if we still need $bytesRead variable.  You have $size that
is the size of the whole blob, so

my $bytesLeft = $size;
while (1) {
my $bytesToRead = $bytesLeft  1024 ? $bytesLeft : 1024;
my $bytesRead = read($in, $blob, $bytesToRead);
... check errors and use the $blob ...
$bytesLeft -= $bytesRead;
}

may be simpler and easier to read, no?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by)

2013-02-22 Thread Joshua Clayton
Thanks for all the input and patience.

On Fri, Feb 22, 2013 at 10:34 AM, Jeff King p...@peff.net wrote:
 On Fri, Feb 22, 2013 at 09:30:57AM -0800, Joshua Clayton wrote:

 Read and write each 1024 byte buffer, rather than trying to buffer
 the entire content of the file.

 OK. Did you ever repeat your timing with a larger symmetric buffer? That
 should probably be a separate patch on top, but it might be worth doing
 while we are thinking about it.

 Previous code would crash on all files  2 Gib, when the offset variable
 became negative (perhaps below the level of perl), resulting in a crash.

 I'm still slightly dubious of this, just because it doesn't match my
 knowledge of perl (which is admittedly imperfect). I'm curious how you
 diagnosed it?

I first had the memory exhaustion problem running my git repo on a 32 vm.
After bumping the memory from 512 to 4 GiB, and that failing to fix it
I moved to my workstation with 16 GiB
...reproduced
After the initial crash, I added

print $size,  , $bytesToRead,  , $bytesRead, \n;

right before the read command, and it does indeed crash right after
the $bytesRead variable crosses LONG_MAX
...
2567089913 1024 2147482624
2567089913 1024 2147483648
2567089913 1024 2147484672
Offset outside string at /usr/share/perl5/Git.pm line 901, GEN36 line 2604.

Note that $bytesRead is still positive.
I know very little perl, but that symptom seems pretty clear


 On a 32 bit system, or a system with low memory it might crash before
 reaching 2 GiB due to memory exhaustion.

 Signed-off-by: Joshua Clayton stillcompil...@gmail.com
 Reviewed-by: Jeff King p...@peff.net

 The commit message is a good place to mention any side effects, and why
 they are not a problem. Something like:

   The previous code buffered the whole blob before writing, so any error
   reading from cat-file would result in zero bytes being written to the
   output stream.  After this change, the output may be left in a
   partially written state (or even fully written, if we fail when
   parsing the final newline from cat-file). However, it's not reasonable
   for callers to expect anything about the state of the output when we
   return an error (after all, even with full buffering, we might fail
   during the writing process).  So any caller which cares about this is
   broken already, and we do not have to worry about them.

 ---
  perl/Git.pm |   12 +---
  1 file changed, 5 insertions(+), 7 deletions(-)

 The patch itself looks fine to me.

 -Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html