Re: [PATCH] xread(): Fix read error when filtering = 2GB on Mac OS X
On Sat, Aug 17, 2013 at 02:40:05PM +0200, Steffen Prohaska wrote: Previously, filtering more than 2GB through an external filter (see test) failed on Mac OS X 10.8.4 (12E55) with: error: read from external filter cat failed error: cannot feed the input to external filter cat error: cat died of signal 13 error: external filter cat failed 141 error: external filter cat failed The reason is that read() immediately returns with EINVAL if len = 2GB. I haven't found any information under which specific conditions this occurs. My suspicion is that it happens when reading from a pipe, while reading from a standard file should always be fine. I haven't tested any other version of Mac OS X, though I'd expect that other versions are affected as well. The problem is fixed by always reading less than 2GB in xread(). xread() doesn't guarantee to read all the requested data at once, and callers are expected to gracefully handle partial reads. Slicing large reads into 2GB pieces should not hurt practical performance. Signed-off-by: Steffen Prohaska proha...@zib.de --- t/t0021-conversion.sh | 9 + wrapper.c | 8 2 files changed, 17 insertions(+) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index e50f0f7..aec1253 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -190,4 +190,13 @@ test_expect_success 'required filter clean failure' ' test_must_fail git add test.fc ' +test_expect_success 'filter large file' ' + git config filter.largefile.smudge cat + git config filter.largefile.clean cat + dd if=/dev/zero of=2GB count=2097152 bs=1024 + echo /2GB filter=largefile .gitattributes + git add 2GB 2err + ! grep -q error err +' + test_done diff --git a/wrapper.c b/wrapper.c index 6a015de..2a2f496 100644 --- a/wrapper.c +++ b/wrapper.c @@ -139,6 +139,14 @@ ssize_t xread(int fd, void *buf, size_t len) { ssize_t nr; while (1) { +#ifdef __APPLE__ + const size_t twoGB = (1l 31); + /* len = 2GB immediately fails on Mac OS X with EINVAL when + * reading from pipe. */ + if (len = twoGB) { + len = twoGB - 1; + } Please don't use unnecessary curly braces here (see Documentation/CodingGuidelines). +#endif nr = read(fd, buf, len); if ((nr 0) (errno == EAGAIN || errno == EINTR)) continue; -- 1.8.4.rc3.5.gfcb973a -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xread(): Fix read error when filtering = 2GB on Mac OS X
On 2013-08-17 14.40, Steffen Prohaska wrote: Previously, filtering more than 2GB through an external filter (see test) failed on Mac OS X 10.8.4 (12E55) with: error: read from external filter cat failed error: cannot feed the input to external filter cat error: cat died of signal 13 error: external filter cat failed 141 error: external filter cat failed The reason is that read() immediately returns with EINVAL if len = 2GB. I haven't found any information under which specific conditions this occurs. My suspicion is that it happens when reading from a pipe, while reading from a standard file should always be fine. I haven't tested any other version of Mac OS X, though I'd expect that other versions are affected as well. The problem is fixed by always reading less than 2GB in xread(). xread() doesn't guarantee to read all the requested data at once, and callers are expected to gracefully handle partial reads. Slicing large reads into 2GB pieces should not hurt practical performance. Signed-off-by: Steffen Prohaska proha...@zib.de --- t/t0021-conversion.sh | 9 + wrapper.c | 8 2 files changed, 17 insertions(+) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index e50f0f7..aec1253 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -190,4 +190,13 @@ test_expect_success 'required filter clean failure' ' test_must_fail git add test.fc ' +test_expect_success 'filter large file' ' + git config filter.largefile.smudge cat + git config filter.largefile.clean cat + dd if=/dev/zero of=2GB count=2097152 bs=1024 + echo /2GB filter=largefile .gitattributes + git add 2GB 2err + ! grep -q error err +' + test_done diff --git a/wrapper.c b/wrapper.c index 6a015de..2a2f496 100644 --- a/wrapper.c +++ b/wrapper.c @@ -139,6 +139,14 @@ ssize_t xread(int fd, void *buf, size_t len) { ssize_t nr; while (1) { +#ifdef __APPLE__ + const size_t twoGB = (1l 31); + /* len = 2GB immediately fails on Mac OS X with EINVAL when + * reading from pipe. */ + if (len = twoGB) { + len = twoGB - 1; + } +#endif nr = read(fd, buf, len); if ((nr 0) (errno == EAGAIN || errno == EINTR)) continue; Thanks for the patch. I think we can we can replace __APPLE__ define with a more generic one. We had a similar patch for write() some time ago: config.mak.uname NEEDS_CLIPPED_WRITE = YesPlease Makefile ifdef NEEDS_CLIPPED_WRITE BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE COMPAT_OBJS += compat/clipped-write.o endif -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xread(): Fix read error when filtering = 2GB on Mac OS X
Am 17.08.2013 14:40, schrieb Steffen Prohaska: Previously, filtering more than 2GB through an external filter (see test) failed on Mac OS X 10.8.4 (12E55) with: error: read from external filter cat failed error: cannot feed the input to external filter cat error: cat died of signal 13 error: external filter cat failed 141 error: external filter cat failed The reason is that read() immediately returns with EINVAL if len = 2GB. I haven't found any information under which specific conditions this occurs. My suspicion is that it happens when reading from a pipe, while reading from a standard file should always be fine. I haven't tested any other version of Mac OS X, though I'd expect that other versions are affected as well. The problem is fixed by always reading less than 2GB in xread(). xread() doesn't guarantee to read all the requested data at once, and callers are expected to gracefully handle partial reads. Slicing large reads into 2GB pieces should not hurt practical performance. Signed-off-by: Steffen Prohaska proha...@zib.de --- t/t0021-conversion.sh | 9 + wrapper.c | 8 2 files changed, 17 insertions(+) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index e50f0f7..aec1253 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -190,4 +190,13 @@ test_expect_success 'required filter clean failure' ' test_must_fail git add test.fc ' +test_expect_success 'filter large file' ' + git config filter.largefile.smudge cat + git config filter.largefile.clean cat + dd if=/dev/zero of=2GB count=2097152 bs=1024 We don't have /dev/zero on Windows. Even if we get a file slightly over 2GB, we can't handle it on Windows, and other 32bit architectures will very likely also be handicapped. Finally, this test (if it remains in some form) should probably be protected by EXPENSIVE. + echo /2GB filter=largefile .gitattributes Drop the slash, please; it may confuse our bash on Windows (it doesn't currently because echo is a builtin, but better safe than sorry). + git add 2GB 2err + ! grep -q error err Executive summary: drop everything starting at 2err. Long story: Can it happen that (1) git add succeeds, but still produces something on stderr, and (2) we do not care what this something is as long as it does not contain error? I don't think this combination of conditions makes sense; it's sufficient to check that git add does not fail. BTW, if you add ... rm -f 2GB git checkout -- 2GB you would also test the smudge filter code path with a huge file, no? BTW2, to create a file with slightly over 2GB, you can use for i in $(test_seq 0 128); do printf %16777216d 1; done 2GB +' + test_done diff --git a/wrapper.c b/wrapper.c index 6a015de..2a2f496 100644 --- a/wrapper.c +++ b/wrapper.c @@ -139,6 +139,14 @@ ssize_t xread(int fd, void *buf, size_t len) { ssize_t nr; while (1) { +#ifdef __APPLE__ + const size_t twoGB = (1l 31); + /* len = 2GB immediately fails on Mac OS X with EINVAL when +* reading from pipe. */ + if (len = twoGB) { + len = twoGB - 1; + } +#endif nr = read(fd, buf, len); if ((nr 0) (errno == EAGAIN || errno == EINTR)) continue; -- Hannes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xread(): Fix read error when filtering = 2GB on Mac OS X
Hi, Steffen Prohaska wrote: --- a/wrapper.c +++ b/wrapper.c @@ -139,6 +139,14 @@ ssize_t xread(int fd, void *buf, size_t len) { ssize_t nr; while (1) { +#ifdef __APPLE__ + const size_t twoGB = (1l 31); + /* len = 2GB immediately fails on Mac OS X with EINVAL when + * reading from pipe. */ + if (len = twoGB) { + len = twoGB - 1; + } +#endif nr = read(fd, buf, len); See 6c642a87 (compat: large write(2) fails on Mac OS X/XNU, 2013-05-10) for a cleaner way to do this. Hope that helps, Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xread(): Fix read error when filtering = 2GB on Mac OS X
On Aug 17, 2013, at 05:40, Steffen Prohaska wrote: Previously, filtering more than 2GB through an external filter (see test) failed on Mac OS X 10.8.4 (12E55) with: error: read from external filter cat failed error: cannot feed the input to external filter cat error: cat died of signal 13 error: external filter cat failed 141 error: external filter cat failed The reason is that read() immediately returns with EINVAL if len = 2GB. I haven't found any information under which specific conditions this occurs. According to POSIX [1] for read: If the value of nbyte is greater than {SSIZE_MAX}, the result is implementation-defined. The write function also has the same restriction [2]. Since OS X still supports running 32-bit executables, and SSIZE_MAX is 2GB - 1 when running 32-bit it would seem the same limit has been imposed on 64-bit executables. In any case, we should avoid implementation-defined behavior for portability unless we know the OS we were compiled on has acceptable implementation-defined behavior and otherwise never attempt to read or write more than SSIZE_MAX bytes. [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html [2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xread(): Fix read error when filtering = 2GB on Mac OS X
Kyle J. McKay wrote: According to POSIX [1] for read: If the value of nbyte is greater than {SSIZE_MAX}, the result is implementation-defined. Sure. [...] Since OS X still supports running 32-bit executables, and SSIZE_MAX is 2GB - 1 when running 32-bit it would seem the same limit has been imposed on 64-bit executables. In any case, we should avoid implementation-defined behavior Wait --- that's a big leap. In a 64-bit executable, SSIZE_MAX is 2^63 - 1, so the behavior is not implementation-defined. I'm not sure if Steffen's copy of git is 32-bit or 64-bit --- my guess would be 64-bit. So at first glance this does look like an XNU-specific bug, not a standards thing. What about the related case where someone does try to git add a file with a clean filter producing more than SSIZE_MAX and less than SIZE_MAX bytes? strbuf_grow() does not directly protect against a strbuf growing to SSIZE_MAX bytes, but in practice on most machines realloc() does. So in practice we could never read more than SSIZE_MAX characters in the strbuf_read() codepath, but it might be worth a check for paranoia anyway. While we're here, it's easy to wonder: why is git reading into such a large buffer anyway? Normally git uses the streaming codepath for files larger than big_file_threshold (typically 512 MiB). Unfortunately there are cases where it doesn't. For example: - convert_to_git() has not been taught to stream, so paths with a clean filter or requiring crlf conversion are read or mapped into memory. - deflate_to_pack() relies on seeking backward to retry when a pack would grow too large, so git hash-object --stdin cannot use that codepath. - a clean filter can make a file bigger. Perhaps git needs to learn to write to a temporary file when asked to keep track of a blob that is larger than fits reasonably in memory. Or maybe not. So there is room for related work but the codepaths that read() indefinitely large files do seem to be needed, at least in the short term. Working around this Mac OS X-specific limitation at the read() level like you've done still sounds like the right thing to do. Thanks, both, for your work tracking this down. Hopefully the next version of the patch will be in good shape and then it can be applied quickly. Thanks and hope that helps, Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html