Re: [PATCH] xread(): Fix read error when filtering = 2GB on Mac OS X

2013-08-17 Thread John Keeping
On Sat, Aug 17, 2013 at 02:40:05PM +0200, Steffen Prohaska wrote:
 Previously, filtering more than 2GB through an external filter (see
 test) failed on Mac OS X 10.8.4 (12E55) with:
 
 error: read from external filter cat failed
 error: cannot feed the input to external filter cat
 error: cat died of signal 13
 error: external filter cat failed 141
 error: external filter cat failed
 
 The reason is that read() immediately returns with EINVAL if len = 2GB.
 I haven't found any information under which specific conditions this
 occurs.  My suspicion is that it happens when reading from a pipe, while
 reading from a standard file should always be fine.  I haven't tested
 any other version of Mac OS X, though I'd expect that other versions are
 affected as well.
 
 The problem is fixed by always reading less than 2GB in xread().
 xread() doesn't guarantee to read all the requested data at once, and
 callers are expected to gracefully handle partial reads.  Slicing large
 reads into 2GB pieces should not hurt practical performance.
 
 Signed-off-by: Steffen Prohaska proha...@zib.de
 ---
  t/t0021-conversion.sh | 9 +
  wrapper.c | 8 
  2 files changed, 17 insertions(+)
 
 diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
 index e50f0f7..aec1253 100755
 --- a/t/t0021-conversion.sh
 +++ b/t/t0021-conversion.sh
 @@ -190,4 +190,13 @@ test_expect_success 'required filter clean failure' '
   test_must_fail git add test.fc
  '
  
 +test_expect_success 'filter large file' '
 + git config filter.largefile.smudge cat 
 + git config filter.largefile.clean cat 
 + dd if=/dev/zero of=2GB count=2097152 bs=1024 
 + echo /2GB filter=largefile .gitattributes 
 + git add 2GB 2err 
 + ! grep -q error err
 +'
 +
  test_done
 diff --git a/wrapper.c b/wrapper.c
 index 6a015de..2a2f496 100644
 --- a/wrapper.c
 +++ b/wrapper.c
 @@ -139,6 +139,14 @@ ssize_t xread(int fd, void *buf, size_t len)
  {
   ssize_t nr;
   while (1) {
 +#ifdef __APPLE__
 + const size_t twoGB = (1l  31);
 + /* len = 2GB immediately fails on Mac OS X with EINVAL when
 +  * reading from pipe. */
 + if (len = twoGB) {
 + len = twoGB - 1;
 + }

Please don't use unnecessary curly braces here (see
Documentation/CodingGuidelines).

 +#endif
   nr = read(fd, buf, len);
   if ((nr  0)  (errno == EAGAIN || errno == EINTR))
   continue;
 -- 
 1.8.4.rc3.5.gfcb973a
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xread(): Fix read error when filtering = 2GB on Mac OS X

2013-08-17 Thread Torsten Bögershausen
On 2013-08-17 14.40, Steffen Prohaska wrote:
 Previously, filtering more than 2GB through an external filter (see
 test) failed on Mac OS X 10.8.4 (12E55) with:
 
 error: read from external filter cat failed
 error: cannot feed the input to external filter cat
 error: cat died of signal 13
 error: external filter cat failed 141
 error: external filter cat failed
 
 The reason is that read() immediately returns with EINVAL if len = 2GB.
 I haven't found any information under which specific conditions this
 occurs.  My suspicion is that it happens when reading from a pipe, while
 reading from a standard file should always be fine.  I haven't tested
 any other version of Mac OS X, though I'd expect that other versions are
 affected as well.
 
 The problem is fixed by always reading less than 2GB in xread().
 xread() doesn't guarantee to read all the requested data at once, and
 callers are expected to gracefully handle partial reads.  Slicing large
 reads into 2GB pieces should not hurt practical performance.
 
 Signed-off-by: Steffen Prohaska proha...@zib.de
 ---
  t/t0021-conversion.sh | 9 +
  wrapper.c | 8 
  2 files changed, 17 insertions(+)
 
 diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
 index e50f0f7..aec1253 100755
 --- a/t/t0021-conversion.sh
 +++ b/t/t0021-conversion.sh
 @@ -190,4 +190,13 @@ test_expect_success 'required filter clean failure' '
   test_must_fail git add test.fc
  '
  
 +test_expect_success 'filter large file' '
 + git config filter.largefile.smudge cat 
 + git config filter.largefile.clean cat 
 + dd if=/dev/zero of=2GB count=2097152 bs=1024 
 + echo /2GB filter=largefile .gitattributes 
 + git add 2GB 2err 
 + ! grep -q error err
 +'
 +
  test_done
 diff --git a/wrapper.c b/wrapper.c
 index 6a015de..2a2f496 100644
 --- a/wrapper.c
 +++ b/wrapper.c
 @@ -139,6 +139,14 @@ ssize_t xread(int fd, void *buf, size_t len)
  {
   ssize_t nr;
   while (1) {
 +#ifdef __APPLE__
 + const size_t twoGB = (1l  31);
 + /* len = 2GB immediately fails on Mac OS X with EINVAL when
 +  * reading from pipe. */
 + if (len = twoGB) {
 + len = twoGB - 1;
 + }
 +#endif
   nr = read(fd, buf, len);
   if ((nr  0)  (errno == EAGAIN || errno == EINTR))
   continue;
 
Thanks for the patch.
I think we can we can replace  __APPLE__ define with a more generic one.
We had a similar patch for write() some time ago:

config.mak.uname
NEEDS_CLIPPED_WRITE = YesPlease


Makefile
ifdef NEEDS_CLIPPED_WRITE
BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE
COMPAT_OBJS += compat/clipped-write.o
endif

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xread(): Fix read error when filtering = 2GB on Mac OS X

2013-08-17 Thread Johannes Sixt

Am 17.08.2013 14:40, schrieb Steffen Prohaska:

Previously, filtering more than 2GB through an external filter (see
test) failed on Mac OS X 10.8.4 (12E55) with:

 error: read from external filter cat failed
 error: cannot feed the input to external filter cat
 error: cat died of signal 13
 error: external filter cat failed 141
 error: external filter cat failed

The reason is that read() immediately returns with EINVAL if len = 2GB.
I haven't found any information under which specific conditions this
occurs.  My suspicion is that it happens when reading from a pipe, while
reading from a standard file should always be fine.  I haven't tested
any other version of Mac OS X, though I'd expect that other versions are
affected as well.

The problem is fixed by always reading less than 2GB in xread().
xread() doesn't guarantee to read all the requested data at once, and
callers are expected to gracefully handle partial reads.  Slicing large
reads into 2GB pieces should not hurt practical performance.

Signed-off-by: Steffen Prohaska proha...@zib.de
---
  t/t0021-conversion.sh | 9 +
  wrapper.c | 8 
  2 files changed, 17 insertions(+)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index e50f0f7..aec1253 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -190,4 +190,13 @@ test_expect_success 'required filter clean failure' '
test_must_fail git add test.fc
  '

+test_expect_success 'filter large file' '
+   git config filter.largefile.smudge cat 
+   git config filter.largefile.clean cat 
+   dd if=/dev/zero of=2GB count=2097152 bs=1024 


We don't have /dev/zero on Windows. Even if we get a file slightly over 
2GB, we can't handle it on Windows, and other 32bit architectures will 
very likely also be handicapped.


Finally, this test (if it remains in some form) should probably be 
protected by EXPENSIVE.



+   echo /2GB filter=largefile .gitattributes 


Drop the slash, please; it may confuse our bash on Windows (it doesn't 
currently because echo is a builtin, but better safe than sorry).



+   git add 2GB 2err 
+   ! grep -q error err


Executive summary: drop everything starting at 2err.

Long story: Can it happen that (1) git add succeeds, but still produces 
something on stderr, and (2) we do not care what this something is as long 
as it does not contain error? I don't think this combination of 
conditions makes sense; it's sufficient to check that git add does not fail.


BTW, if you add

... 
rm -f 2GB 
git checkout -- 2GB

you would also test the smudge filter code path with a huge file, no?

BTW2, to create a file with slightly over 2GB, you can use

for i in $(test_seq 0 128); do printf %16777216d 1; done 2GB


+'
+
  test_done
diff --git a/wrapper.c b/wrapper.c
index 6a015de..2a2f496 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -139,6 +139,14 @@ ssize_t xread(int fd, void *buf, size_t len)
  {
ssize_t nr;
while (1) {
+#ifdef __APPLE__
+   const size_t twoGB = (1l  31);
+   /* len = 2GB immediately fails on Mac OS X with EINVAL when
+* reading from pipe. */
+   if (len = twoGB) {
+   len = twoGB - 1;
+   }
+#endif
nr = read(fd, buf, len);
if ((nr  0)  (errno == EAGAIN || errno == EINTR))
continue;



-- Hannes

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xread(): Fix read error when filtering = 2GB on Mac OS X

2013-08-17 Thread Jonathan Nieder
Hi,

Steffen Prohaska wrote:

 --- a/wrapper.c
 +++ b/wrapper.c
 @@ -139,6 +139,14 @@ ssize_t xread(int fd, void *buf, size_t len)
  {
   ssize_t nr;
   while (1) {
 +#ifdef __APPLE__
 + const size_t twoGB = (1l  31);
 + /* len = 2GB immediately fails on Mac OS X with EINVAL when
 +  * reading from pipe. */
 + if (len = twoGB) {
 + len = twoGB - 1;
 + }
 +#endif
   nr = read(fd, buf, len);

See 6c642a87 (compat: large write(2) fails on Mac OS X/XNU,
2013-05-10) for a cleaner way to do this.

Hope that helps,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xread(): Fix read error when filtering = 2GB on Mac OS X

2013-08-17 Thread Kyle J. McKay

On Aug 17, 2013, at 05:40, Steffen Prohaska wrote:


Previously, filtering more than 2GB through an external filter (see
test) failed on Mac OS X 10.8.4 (12E55) with:

   error: read from external filter cat failed
   error: cannot feed the input to external filter cat
   error: cat died of signal 13
   error: external filter cat failed 141
   error: external filter cat failed

The reason is that read() immediately returns with EINVAL if len =  
2GB.

I haven't found any information under which specific conditions this
occurs.


According to POSIX [1] for read:

If the value of nbyte is greater than {SSIZE_MAX}, the result is  
implementation-defined.



The write function also has the same restriction [2].

Since OS X still supports running 32-bit executables, and SSIZE_MAX is  
2GB - 1 when running 32-bit it would seem the same limit has been  
imposed on 64-bit executables.  In any case, we should avoid  
implementation-defined behavior for portability unless we know the  
OS we were compiled on has acceptable implementation-defined  
behavior and otherwise never attempt to read or write more than  
SSIZE_MAX bytes.


[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xread(): Fix read error when filtering = 2GB on Mac OS X

2013-08-17 Thread Jonathan Nieder
Kyle J. McKay wrote:

 According to POSIX [1] for read:

 If the value of nbyte is greater than {SSIZE_MAX}, the result is
 implementation-defined.

Sure.

[...]
 Since OS X still supports running 32-bit executables, and SSIZE_MAX is 2GB -
 1 when running 32-bit it would seem the same limit has been imposed on
 64-bit executables.  In any case, we should avoid implementation-defined
 behavior

Wait --- that's a big leap.

In a 64-bit executable, SSIZE_MAX is 2^63 - 1, so the behavior is not
implementation-defined.  I'm not sure if Steffen's copy of git is
32-bit or 64-bit --- my guess would be 64-bit.  So at first glance
this does look like an XNU-specific bug, not a standards thing.

What about the related case where someone does try to git add
a file with a clean filter producing more than SSIZE_MAX and less
than SIZE_MAX bytes?

strbuf_grow() does not directly protect against a strbuf growing to 
SSIZE_MAX bytes, but in practice on most machines realloc() does.  So
in practice we could never read more than SSIZE_MAX characters in the
strbuf_read() codepath, but it might be worth a check for paranoia
anyway.

While we're here, it's easy to wonder: why is git reading into such a
large buffer anyway?  Normally git uses the streaming codepath for
files larger than big_file_threshold (typically 512 MiB).
Unfortunately there are cases where it doesn't.  For example:

  - convert_to_git() has not been taught to stream, so paths
with a clean filter or requiring crlf conversion are read or
mapped into memory.

  - deflate_to_pack() relies on seeking backward to retry when
a pack would grow too large, so git hash-object --stdin
cannot use that codepath.

  - a clean filter can make a file bigger.

  Perhaps git needs to learn to write to a temporary file
  when asked to keep track of a blob that is larger than fits
  reasonably in memory.  Or maybe not.

So there is room for related work but the codepaths that read()
indefinitely large files do seem to be needed, at least in the short
term.  Working around this Mac OS X-specific limitation at the read()
level like you've done still sounds like the right thing to do.

Thanks, both, for your work tracking this down.  Hopefully the next
version of the patch will be in good shape and then it can be applied
quickly.

Thanks and hope that helps,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html