[PATCH 1/1] doc: Mention info/attributes in gitrepository-layout

2017-11-22 Thread Steffen Prohaska
Signed-off-by: Steffen Prohaska <proha...@zib.de>
---
 Documentation/gitrepository-layout.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/gitrepository-layout.txt 
b/Documentation/gitrepository-layout.txt
index adf9554ad2..c60bcad44a 100644
--- a/Documentation/gitrepository-layout.txt
+++ b/Documentation/gitrepository-layout.txt
@@ -208,6 +208,10 @@ info/exclude::
'git clean' look at it but the core Git commands do not look
at it.  See also: linkgit:gitignore[5].
 
+info/attributes::
+   Defines which attributes to assign to a path, similar to per-directory
+   `.gitattributes` files.   See also: linkgit:gitattributes[5].
+
 info/sparse-checkout::
This file stores sparse checkout patterns.
See also: linkgit:git-read-tree[1].
-- 
2.15.0.5.g43a9009988



git-sha-x: idea for stronger cryptographic verification while keeping SHA1 content addresses

2017-02-26 Thread Steffen Prohaska
Hi,

Related to shattered, the recent discussion,
<https://public-inbox.org/git/20170223164306.spg2avxzukkgg...@kitenet.net>, the
past
<https://public-inbox.org/git/pine.lnx.4.58.0504291221250.18...@ppc970.osdl.org/>,
and Linus's post <https://plus.google.com/+LinusTorvalds/posts/7tp2gYWQugL>,
the idea below might be interesting.

I skimmed through the discussion but haven't read all the details.  I also
haven't been following the Git list during the last years, so it might very
well be that others have described similar ideas and the general approach has
been reject for some reason that I'm not aware of.

git-sha-x illustrates a potential solution for stronger cryptographic content
verification in Git while keeping SHA1 content pointers.

git-sha-x is available at <https://github.com/sprohaska/git-sha-x>.

git-sha-x computes a hash tree similar to the SHA1-based Git history but using
a different hash function.  The hashes can be added to the commit message and
signed with GPG to confirm the tree and the entire history with a cryptographic
strength that no longer depends on SHA1 but on the chosen git-sha-x hash and
the GPG configuration.  See `git-sha-x --help`.  Examples:

```
git-sha-x commit HEAD
git-sha-x tree HEAD
git-sha-x --sha512 commit HEAD

git-sha-x amend && git-sha-x --sha512 amend && git commit -S --amend -C HEAD
```

git-sha-x is only a proof of concept to illustrate the idea.  I do not intend
to develop it further.

If a similar approach was chosen for Git, the hashes could be managed by Git
and somehow represented in its object store as supplementary information.  Git
could incrementally compute additional hashes while it constructs history and
verify them when transferring data.

The strength of bare SHA1 ids is obviously not increased.  The strength is only
increased if the additional hashes are communicated in a verifiable way, too.
GPG signatures are one way.  Another way could be to communicate them via
a secure channel and pass them to git fetch for verification.  Assuming such an
implementation, a fetch for a commit from this repo could look like:

```bash
git fetch origin \
--sha256=8a3c72de658a4797e36bb29fc3bdc2a2863c04455a1b394ed9331f11f65ba802 \

--sha512=729de81500ce4ad70716d7641a115bd0a67984acc4d674044b25850e36d940bf631f9f6aa88768743690545ac899888fb54f65840f84853f9a8811aeb9ca
 \
ef2a4b7d216ab79630b9cd17e072a86e57f044fa
```

For practical purposes, supplementary hashes in the commit in combination with
GPG signatures should provide sufficient protection against attackers that try
to manipulate SHA1s.  For convenience, supplementary hashes could be stored in
the commit header, similar to `gpgsig`.  A hypothetical commit object could
look like:

```
tree 365c7e42fd004a1778c6d79c0437f970397a59b8
parent c2bfff12099b32425a3bcc4d0c7e6e6a169392d8
tree-sha256 2f588b9308b5203212d646fb56201608449cb4d83a5ffd6b7e6213d175a8077c
parent-sha256 090d9a3e69aa3369efac968abde859a6e42d05b631ece6d533765a35e998336c
tree-sha512 
12ae91b23733d52fa2f42b8f0bb5aeaeb111335688f387614c3b108a8cb86fa0e2cd6d19bf050f8a9308f8c1e991080507c91df53e0fc4cace3f746ec89a789a
parent-sha512 
d319889a40cf945d8c61dbe6d816e10badd49845c547df85ace4327676275eeb5ba2cd962712ddbb8f08f2db17dbc9eb46b59b5f7b7a7e05eab7df0ef89dec65
author Steffen Prohaska <proha...@zib.de> 1488122961 +0100
committer Steffen Prohaska <proha...@zib.de> 1488123452 +0100
gpgsig ...
```

GPG signatures would automatically cover the supplementary hashes.
Verification code paths would have to be added to compute the hashes from the
content to confirm that it has not been tampered with.

Since content verification would become independent from the content address,
the interpretation of the content address could be changed in the future.  The
size of 160 bits could be kept for simplicity.  But the meaning could be
changed.  For example, the first 160 bits of SHA256 could be uses as the
content address.  The remaining bits could be stored in an object supplement.
Verification code paths would combine the content address with the additional
bits to verify the SHA256.  Content pointers would keep their size.  Only the
additional SHA256 bits would be stored and used for verification.

Steffen



signature.asc
Description: Message signed with OpenPGP


[PATCH] subtree: fix AsciiDoc list item continuation

2015-01-04 Thread Steffen Prohaska
List items must be continued with '+' (see [asciidoc]).

[asciidoc] AsciiDoc user guide 17.7. List Item Continuation
http://www.methods.co.nz/asciidoc/userguide.html#X15

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 contrib/subtree/git-subtree.txt | 194 ++--
 1 file changed, 89 insertions(+), 105 deletions(-)

diff --git a/contrib/subtree/git-subtree.txt b/contrib/subtree/git-subtree.txt
index 8272100..54e4b4a 100644
--- a/contrib/subtree/git-subtree.txt
+++ b/contrib/subtree/git-subtree.txt
@@ -81,12 +81,11 @@ merge::
changes into the latest commit.  With '--squash',
creates only one commit that contains all the changes,
rather than merging in the entire history.
-
-   If you use '--squash', the merge direction doesn't
-   always have to be forward; you can use this command to
-   go back in time from v2.5 to v2.4, for example.  If your
-   merge introduces a conflict, you can resolve it in the
-   usual ways.
++
+If you use '--squash', the merge direction doesn't always have to be
+forward; you can use this command to go back in time from v2.5 to v2.4,
+for example.  If your merge introduces a conflict, you can resolve it in
+the usual ways.

 pull::
Exactly like 'merge', but parallels 'git pull' in that
@@ -107,21 +106,19 @@ split::
contents of prefix at the root of the project instead
of in a subdirectory.  Thus, the newly created history
is suitable for export as a separate git repository.
-   
-   After splitting successfully, a single commit id is
-   printed to stdout.  This corresponds to the HEAD of the
-   newly created tree, which you can manipulate however you
-   want.
-   
-   Repeated splits of exactly the same history are
-   guaranteed to be identical (i.e. to produce the same
-   commit ids).  Because of this, if you add new commits
-   and then re-split, the new commits will be attached as
-   commits on top of the history you generated last time,
-   so 'git merge' and friends will work as expected.
-   
-   Note that if you use '--squash' when you merge, you
-   should usually not just '--rejoin' when you split.
++
+After splitting successfully, a single commit id is printed to stdout.
+This corresponds to the HEAD of the newly created tree, which you can
+manipulate however you want.
++
+Repeated splits of exactly the same history are guaranteed to be
+identical (i.e. to produce the same commit ids).  Because of this, if
+you add new commits and then re-split, the new commits will be attached
+as commits on top of the history you generated last time, so 'git merge'
+and friends will work as expected.
++
+Note that if you use '--squash' when you merge, you should usually not
+just '--rejoin' when you split.
 
 
 OPTIONS
@@ -151,109 +148,96 @@ OPTIONS FOR add, merge, push, pull
 --squash::
This option is only valid for add, merge, push and pull
commands.
-
-   Instead of merging the entire history from the subtree
-   project, produce only a single commit that contains all
-   the differences you want to merge, and then merge that
-   new commit into your project.
-   
-   Using this option helps to reduce log clutter. People
-   rarely want to see every change that happened between
-   v1.0 and v1.1 of the library they're using, since none of the
-   interim versions were ever included in their application.
-   
-   Using '--squash' also helps avoid problems when the same
-   subproject is included multiple times in the same
-   project, or is removed and then re-added.  In such a
-   case, it doesn't make sense to combine the histories
-   anyway, since it's unclear which part of the history
-   belongs to which subtree.
-   
-   Furthermore, with '--squash', you can switch back and
-   forth between different versions of a subtree, rather
-   than strictly forward.  'git subtree merge --squash'
-   always adjusts the subtree to match the exactly
-   specified commit, even if getting to that commit would
-   require undoing some changes that were added earlier.
-   
-   Whether or not you use '--squash', changes made in your
-   local repository remain intact and can be later split
-   and send upstream to the subproject.
++
+Instead of merging the entire history from the subtree project, produce
+only a single commit that contains all the differences you want to
+merge, and then merge that new commit into your project.
++
+Using this option helps to reduce log clutter. People rarely want to see
+every change that happened between v1.0 and v1.1 of the library they're
+using, since none of the interim versions were ever included in their
+application.
++
+Using '--squash' also helps avoid problems when the same subproject is
+included multiple times in the same project

[PATCH] sha1_file: don't convert off_t to size_t too early to avoid potential die()

2014-09-21 Thread Steffen Prohaska
xsize_t() checks if an off_t argument can be safely converted to
a size_t return value.  If the check is executed too early, it could
fail for large files on 32-bit architectures even if the size_t code
path is not taken.  Other paths might be able to handle the large file.
Specifically, index_stream_convert_blob() is able to handle a large file
if a filter is configured that returns a small result.

Signed-off-by: Steffen Prohaska proha...@zib.de
---

This patch should be applied on top of sp/stream-clean-filter.

index_stream() might internally also be able to handle large files to
some extent.  But it uses size_t for its third argument, and we must
already die() when calling it.  It might be a good idea to convert its
interface to use off_t and push the size checks further down the stack.
In general, it might be good idea to carefully consider whether to use
off_t or size_t when passing file-related sizes around.  To me it looks
like a separate issue for a separate patch series (I have no specific
plans to prepare one).

 sha1_file.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 5b0e67a..6f18c22 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3180,17 +3180,22 @@ int index_fd(unsigned char *sha1, int fd, struct stat 
*st,
 enum object_type type, const char *path, unsigned flags)
 {
int ret;
-   size_t size = xsize_t(st-st_size);
 
+   /*
+* Call xsize_t() only when needed to avoid potentially unnecessary
+* die() for large files.
+*/
if (type == OBJ_BLOB  path  would_convert_to_git_filter_fd(path))
ret = index_stream_convert_blob(sha1, fd, path, flags);
else if (!S_ISREG(st-st_mode))
ret = index_pipe(sha1, fd, type, path, flags);
-   else if (size = big_file_threshold || type != OBJ_BLOB ||
+   else if (st-st_size = big_file_threshold || type != OBJ_BLOB ||
 (path  would_convert_to_git(path)))
-   ret = index_core(sha1, fd, size, type, path, flags);
+   ret = index_core(sha1, fd, xsize_t(st-st_size), type, path,
+flags);
else
-   ret = index_stream(sha1, fd, size, type, path, flags);
+   ret = index_stream(sha1, fd, xsize_t(st-st_size), type, path,
+  flags);
close(fd);
return ret;
 }
-- 
2.1.0.139.g351b19f

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Documentation: use single-parameter --cacheinfo in example

2014-09-11 Thread Steffen Prohaska
The single-parameter form is described as the preferred way.  Separate
arguments are only supported for backward compatibility.  Update the
example to the recommended form.

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 Documentation/git-update-index.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-update-index.txt 
b/Documentation/git-update-index.txt
index dfc09d9..82eca6f 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -202,7 +202,7 @@ merging.
 To pretend you have a file with mode and sha1 at path, say:
 
 
-$ git update-index --cacheinfo mode sha1 path
+$ git update-index --cacheinfo mode,sha1,path
 
 
 '--info-only' is used to register files without placing them in the object
-- 
2.1.0.137.gbf198b9

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to update index stat info without reading file content?

2014-09-11 Thread Steffen Prohaska
Hi,

Is there a way to update the stat information recorded in the index without 
reading the file content from disk?

Starting from a clean working copy with a committed `file`, I'd like 

touch file
git magic-command file

to bring the index into essentially the same state as

touch file
git status

but without reading the content of `file`.  (I'd be willing to wait a bit 
between touch and the magic command, to avoid any racy-git-related code paths.)

`git-update-index --assume-unchanged` is related.  But it requires completely 
manual handling of changes, because git will not look at marked files until 
told otherwise with `--no-assume-unchanged`.  I'd like to tell git only that 
the current file content is known to be up-to-date.  Any future changes should 
be handled as usual.

In the documentation, `git add --refresh file` sounds a bit like what I'm 
looking for.  The documentation of `--refresh` states: Don’t add the file(s), 
but only refresh their stat() information in the index.  But it doesn't do 
what I want.

I looked a bit into `read-cache.c`.  The comment above `refresh_cache_ent()` 
sounds promising:

 refresh does not calculate a new sha1 file or bring the
 cache up-to-date for mode/content changes. But what it
 _does_ do is to re-match the stat information of a file
 with the cache, so that you can refresh the cache for a
 file that hasn't been changed but where the stat entry is
 out of date.

But it isn't obvious to me whether what I'm looking for is available.  All code 
paths that eventually reach `fill_stat_cache_info()` seem to go through 
`ce_compare_data()` and therefore `index_fd()`, which reads the data from disk.

Steffen--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 2/6] Add git_env_ulong() to parse environment variable

2014-08-28 Thread Steffen Prohaska

On Aug 27, 2014, at 4:47 PM, Junio C Hamano gits...@pobox.com wrote:

 Jeff King p...@peff.net writes:
 
 On Tue, Aug 26, 2014 at 02:54:11PM -0700, Junio C Hamano wrote:
 
 A worse position is to have git_env_bool() that says empty is
 false and add a new git_env_ulong() that says empty is unset.
 
 We should pick one or the other and use it for both.
 
 Yeah, I agree they should probably behave the same.
 
 The middle ground would be to die(). That does not seem super-friendly, but
 then we would also die with GIT_SMART_HTTP=foobar, so perhaps it is not
 unreasonable to just consider it a syntax error.
 
 Hmm, I am not sure if dying is better.  Unless we decide to make
 empty string no longer false everywhere and warn now and then later
 die as part of a 3.0 transition plan or something, that is.
 
 I think it is better in the sense that while it may be unexpected, it
 does not unexpectedly do something that the user cannot easily undo.
 
 I really do not think this topic is worth the effort of a long-term
 deprecation scheme (which I agree _is_ required for a change to the
 config behavior). Let's just leave it as-is. We've seen zero real-world
 complaints, only my own surprise after reading the code (and Steffen's
 patch should be tweaked to match).
 
 OK, then let's do that at least for now and move on.

Ok.  I saw that you tweaked my patch on pu.  Maybe remove the outdated
comment above the function completely:

diff --git a/config.c b/config.c
index 87db755..010bcd0 100644
--- a/config.c
+++ b/config.c
@@ -1122,9 +1122,6 @@ int git_env_bool(const char *k, int def)
return v ? git_config_bool(k, v) : def;
 }

-/*
- * Use default if environment variable is unset or empty string.
- */
 unsigned long git_env_ulong(const char *k, unsigned long val)
 {
const char *v = getenv(k);

Steffen--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 5/6] Change copy_fd() to not close input fd

2014-08-28 Thread Steffen Prohaska

On Aug 26, 2014, at 8:29 PM, Jeff King p...@peff.net wrote:

 On Tue, Aug 26, 2014 at 05:23:24PM +0200, Steffen Prohaska wrote:
 
 The caller opened the fd, so it should be responsible for closing it.
 
 Signed-off-by: Steffen Prohaska proha...@zib.de
 ---
 copy.c | 5 +
 lockfile.c | 3 +++
 2 files changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/copy.c b/copy.c
 index a7f58fd..d0a1d82 100644
 --- a/copy.c
 +++ b/copy.c
 @@ -10,7 +10,6 @@ int copy_fd(int ifd, int ofd)
  break;
  if (len  0) {
  int read_error = errno;
 -close(ifd);
  return error(copy-fd: read returned %s,
   strerror(read_error));
  }
 
 This saved errno is not necessary anymore (the problem was that close()
 clobbered the error in the original code). It can go away, and we can
 even drop the curly braces.
 
 @@ -21,17 +20,14 @@ int copy_fd(int ifd, int ofd)
  len -= written;
  }
  else if (!written) {
 -close(ifd);
  return error(copy-fd: write returned 0);
  } else {
  int write_error = errno;
 -close(ifd);
  return error(copy-fd: write returned %s,
   strerror(write_error));
  }
  }
 
 Ditto here. Actually, isn't this whole write just a reimplementation of
 write_in_full? The latter treats a return of 0 as ENOSPC rather than
 using a custom message, but I think that is sane.
 
 All together:

Makes all sense, and seems sane to me, too.

Junio, I saw that you have the changes on pu with 'SQUASH???...'.  Will you
squash it, or shall I send another complete update of the patch series?

Steffen



 ---
 copy.c | 28 +---
 1 file changed, 5 insertions(+), 23 deletions(-)
 
 diff --git a/copy.c b/copy.c
 index a7f58fd..53a9ece 100644
 --- a/copy.c
 +++ b/copy.c
 @@ -4,34 +4,16 @@ int copy_fd(int ifd, int ofd)
 {
   while (1) {
   char buffer[8192];
 - char *buf = buffer;
   ssize_t len = xread(ifd, buffer, sizeof(buffer));
   if (!len)
   break;
 - if (len  0) {
 - int read_error = errno;
 - close(ifd);
 + if (len  0)
   return error(copy-fd: read returned %s,
 -  strerror(read_error));
 - }
 - while (len) {
 - int written = xwrite(ofd, buf, len);
 - if (written  0) {
 - buf += written;
 - len -= written;
 - }
 - else if (!written) {
 - close(ifd);
 - return error(copy-fd: write returned 0);
 - } else {
 - int write_error = errno;
 - close(ifd);
 - return error(copy-fd: write returned %s,
 -  strerror(write_error));
 - }
 - }
 +  strerror(errno));
 + if (write_in_full(ofd, buffer, len)  0)
 + return error(copy-fd: write returned %s,
 +  strerror(errno));
   }
 - close(ifd);
   return 0;
 }
 

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 2/6] Add git_env_ulong() to parse environment variable

2014-08-26 Thread Steffen Prohaska
The new function will be used to parse GIT_MMAP_LIMIT and
GIT_ALLOC_LIMIT.

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 cache.h  |  1 +
 config.c | 11 +++
 2 files changed, 12 insertions(+)

diff --git a/cache.h b/cache.h
index fcb511d..b820b6a 100644
--- a/cache.h
+++ b/cache.h
@@ -1318,6 +1318,7 @@ extern int git_config_rename_section_in_file(const char 
*, const char *, const c
 extern const char *git_etc_gitconfig(void);
 extern int check_repository_format_version(const char *var, const char *value, 
void *cb);
 extern int git_env_bool(const char *, int);
+extern unsigned long git_env_ulong(const char *, unsigned long);
 extern int git_config_system(void);
 extern int config_error_nonbool(const char *);
 #if defined(__GNUC__)
diff --git a/config.c b/config.c
index 058505c..178721f 100644
--- a/config.c
+++ b/config.c
@@ -1122,6 +1122,17 @@ int git_env_bool(const char *k, int def)
return v ? git_config_bool(k, v) : def;
 }
 
+/*
+ * Use default if environment variable is unset or empty string.
+ */
+unsigned long git_env_ulong(const char *k, unsigned long val)
+{
+   const char *v = getenv(k);
+   if (v  *v  !git_parse_ulong(v, val))
+   die(failed to parse %s, k);
+   return val;
+}
+
 int git_config_system(void)
 {
return !git_env_bool(GIT_CONFIG_NOSYSTEM, 0);
-- 
2.1.0.8.gf3a29c8

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 5/6] Change copy_fd() to not close input fd

2014-08-26 Thread Steffen Prohaska
The caller opened the fd, so it should be responsible for closing it.

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 copy.c | 5 +
 lockfile.c | 3 +++
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/copy.c b/copy.c
index a7f58fd..d0a1d82 100644
--- a/copy.c
+++ b/copy.c
@@ -10,7 +10,6 @@ int copy_fd(int ifd, int ofd)
break;
if (len  0) {
int read_error = errno;
-   close(ifd);
return error(copy-fd: read returned %s,
 strerror(read_error));
}
@@ -21,17 +20,14 @@ int copy_fd(int ifd, int ofd)
len -= written;
}
else if (!written) {
-   close(ifd);
return error(copy-fd: write returned 0);
} else {
int write_error = errno;
-   close(ifd);
return error(copy-fd: write returned %s,
 strerror(write_error));
}
}
}
-   close(ifd);
return 0;
 }
 
@@ -60,6 +56,7 @@ int copy_file(const char *dst, const char *src, int mode)
return fdo;
}
status = copy_fd(fdi, fdo);
+   close(fdi);
if (close(fdo) != 0)
return error(%s: close error: %s, dst, strerror(errno));
 
diff --git a/lockfile.c b/lockfile.c
index 2564a7f..2448d30 100644
--- a/lockfile.c
+++ b/lockfile.c
@@ -224,8 +224,11 @@ int hold_lock_file_for_append(struct lock_file *lk, const 
char *path, int flags)
} else if (copy_fd(orig_fd, fd)) {
if (flags  LOCK_DIE_ON_ERROR)
exit(128);
+   close(orig_fd);
close(fd);
return -1;
+   } else {
+   close(orig_fd);
}
return fd;
 }
-- 
2.1.0.8.gf3a29c8

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 1/6] convert: drop arguments other than 'path' from would_convert_to_git()

2014-08-26 Thread Steffen Prohaska
It is only the path that matters in the decision whether to filter or
not.  Clarify this by making path the single argument of
would_convert_to_git().

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 convert.h   | 5 ++---
 sha1_file.c | 2 +-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/convert.h b/convert.h
index 0c2143c..c638b33 100644
--- a/convert.h
+++ b/convert.h
@@ -40,10 +40,9 @@ extern int convert_to_working_tree(const char *path, const 
char *src,
   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
  struct strbuf *dst);
-static inline int would_convert_to_git(const char *path, const char *src,
-  size_t len, enum safe_crlf checksafe)
+static inline int would_convert_to_git(const char *path)
 {
-   return convert_to_git(path, src, len, NULL, checksafe);
+   return convert_to_git(path, NULL, 0, NULL, 0);
 }
 
 /*
diff --git a/sha1_file.c b/sha1_file.c
index 3f70b1d..00c07f2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3144,7 +3144,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
if (!S_ISREG(st-st_mode))
ret = index_pipe(sha1, fd, type, path, flags);
else if (size = big_file_threshold || type != OBJ_BLOB ||
-(path  would_convert_to_git(path, NULL, 0, 0)))
+(path  would_convert_to_git(path)))
ret = index_core(sha1, fd, size, type, path, flags);
else
ret = index_stream(sha1, fd, size, type, path, flags);
-- 
2.1.0.8.gf3a29c8

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 4/6] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size

2014-08-26 Thread Steffen Prohaska
Similar to testing expectations about malloc with GIT_ALLOC_LIMIT (see
commit d41489 and previous commit), it can be useful to test
expectations about mmap.

This introduces a new environment variable GIT_MMAP_LIMIT to limit the
largest allowed mmap length.  xmmap() is modified to check the limit.
Together with GIT_ALLOC_LIMIT tests can now easily confirm expectations
about memory consumption.

GIT_MMAP_LIMIT will be used in the next commit to test that data will be
streamed to an external filter without mmaping the entire file.

[commit d41489]: d41489a6424308dc9a0409bc2f6845aa08bd4f7d Add more large
blob test cases

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 sha1_file.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 00c07f2..d9b5157 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -663,10 +663,26 @@ void release_pack_memory(size_t need)
; /* nothing */
 }
 
+static void mmap_limit_check(size_t length)
+{
+   static size_t limit = 0;
+   if (!limit) {
+   limit = git_env_ulong(GIT_MMAP_LIMIT, 0);
+   if (!limit)
+   limit = SIZE_MAX;
+   }
+   if (length  limit)
+   die(attempting to mmap %PRIuMAX over limit %PRIuMAX,
+   (uintmax_t)length, (uintmax_t)limit);
+}
+
 void *xmmap(void *start, size_t length,
int prot, int flags, int fd, off_t offset)
 {
-   void *ret = mmap(start, length, prot, flags, fd, offset);
+   void *ret;
+
+   mmap_limit_check(length);
+   ret = mmap(start, length, prot, flags, fd, offset);
if (ret == MAP_FAILED) {
if (!length)
return NULL;
-- 
2.1.0.8.gf3a29c8

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 3/6] Change GIT_ALLOC_LIMIT check to use git_env_ulong()

2014-08-26 Thread Steffen Prohaska
GIT_ALLOC_LIMIT limits xmalloc()'s size, which is of type size_t.
Better use git_env_ulong() to parse the environment variable, so that
the postfixes 'k', 'm', and 'g' can be used; and use size_t to store the
limit for consistency.  The change to size_t has no direct practical
impact, because we use GIT_ALLOC_LIMIT to test small sizes.

The cast of size in the call to die() is changed to uintmax_t to match
the format string PRIuMAX.

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 t/t1050-large.sh |  2 +-
 wrapper.c| 15 ---
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index aea4936..e7657ab 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -13,7 +13,7 @@ test_expect_success setup '
echo X | dd of=large2 bs=1k seek=2000 
echo X | dd of=large3 bs=1k seek=2000 
echo Y | dd of=huge bs=1k seek=2500 
-   GIT_ALLOC_LIMIT=1500 
+   GIT_ALLOC_LIMIT=1500k 
export GIT_ALLOC_LIMIT
 '
 
diff --git a/wrapper.c b/wrapper.c
index bc1bfb8..c5204f7 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -11,14 +11,15 @@ static void (*try_to_free_routine)(size_t size) = 
do_nothing;
 
 static void memory_limit_check(size_t size)
 {
-   static int limit = -1;
-   if (limit == -1) {
-   const char *env = getenv(GIT_ALLOC_LIMIT);
-   limit = env ? atoi(env) * 1024 : 0;
+   static size_t limit = 0;
+   if (!limit) {
+   limit = git_env_ulong(GIT_ALLOC_LIMIT, 0);
+   if (!limit)
+   limit = SIZE_MAX;
}
-   if (limit  size  limit)
-   die(attempting to allocate %PRIuMAX over limit %d,
-   (intmax_t)size, limit);
+   if (size  limit)
+   die(attempting to allocate %PRIuMAX over limit %PRIuMAX,
+   (uintmax_t)size, (uintmax_t)limit);
 }
 
 try_to_free_t set_try_to_free_routine(try_to_free_t routine)
-- 
2.1.0.8.gf3a29c8

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 6/6] convert: stream from fd to required clean filter to reduce used address space

2014-08-26 Thread Steffen Prohaska
The data is streamed to the filter process anyway.  Better avoid mapping
the file if possible.  This is especially useful if a clean filter
reduces the size, for example if it computes a sha1 for binary data,
like git media.  The file size that the previous implementation could
handle was limited by the available address space; large files for
example could not be handled with (32-bit) msysgit.  The new
implementation can filter files of any size as long as the filter output
is small enough.

The new code path is only taken if the filter is required.  The filter
consumes data directly from the fd.  If it fails, the original data is
not immediately available.  The condition can easily be handled as
a fatal error, which is expected for a required filter anyway.

If the filter was not required, the condition would need to be handled
in a different way, like seeking to 0 and reading the data.  But this
would require more restructuring of the code and is probably not worth
it.  The obvious approach of falling back to reading all data would not
help achieving the main purpose of this patch, which is to handle large
files with limited address space.  If reading all data is an option, we
can simply take the old code path right away and mmap the entire file.

The environment variable GIT_MMAP_LIMIT, which has been introduced in
a previous commit is used to test that the expected code path is taken.
A related test that exercises required filters is modified to verify
that the data actually has been modified on its way from the file system
to the object store.

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 convert.c | 55 +--
 convert.h |  5 +
 sha1_file.c   | 27 -
 t/t0021-conversion.sh | 24 +-
 4 files changed, 99 insertions(+), 12 deletions(-)

diff --git a/convert.c b/convert.c
index cb5fbb4..677d339 100644
--- a/convert.c
+++ b/convert.c
@@ -312,11 +312,12 @@ static int crlf_to_worktree(const char *path, const char 
*src, size_t len,
 struct filter_params {
const char *src;
unsigned long size;
+   int fd;
const char *cmd;
const char *path;
 };
 
-static int filter_buffer(int in, int out, void *data)
+static int filter_buffer_or_fd(int in, int out, void *data)
 {
/*
 * Spawn cmd and feed the buffer contents through its stdin.
@@ -355,7 +356,12 @@ static int filter_buffer(int in, int out, void *data)
 
sigchain_push(SIGPIPE, SIG_IGN);
 
-   write_err = (write_in_full(child_process.in, params-src, params-size) 
 0);
+   if (params-src) {
+   write_err = (write_in_full(child_process.in, params-src, 
params-size)  0);
+   } else {
+   write_err = copy_fd(params-fd, child_process.in);
+   }
+
if (close(child_process.in))
write_err = 1;
if (write_err)
@@ -371,7 +377,7 @@ static int filter_buffer(int in, int out, void *data)
return (write_err || status);
 }
 
-static int apply_filter(const char *path, const char *src, size_t len,
+static int apply_filter(const char *path, const char *src, size_t len, int fd,
 struct strbuf *dst, const char *cmd)
 {
/*
@@ -392,11 +398,12 @@ static int apply_filter(const char *path, const char 
*src, size_t len,
return 1;
 
memset(async, 0, sizeof(async));
-   async.proc = filter_buffer;
+   async.proc = filter_buffer_or_fd;
async.data = params;
async.out = -1;
params.src = src;
params.size = len;
+   params.fd = fd;
params.cmd = cmd;
params.path = path;
 
@@ -747,6 +754,25 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
}
 }
 
+int would_convert_to_git_filter_fd(const char *path)
+{
+   struct conv_attrs ca;
+
+   convert_attrs(ca, path);
+   if (!ca.drv)
+   return 0;
+
+   /*
+* Apply a filter to an fd only if the filter is required to succeed.
+* We must die if the filter fails, because the original data before
+* filtering is not available.
+*/
+   if (!ca.drv-required)
+   return 0;
+
+   return apply_filter(path, NULL, 0, -1, NULL, ca.drv-clean);
+}
+
 int convert_to_git(const char *path, const char *src, size_t len,
struct strbuf *dst, enum safe_crlf checksafe)
 {
@@ -761,7 +787,7 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
required = ca.drv-required;
}
 
-   ret |= apply_filter(path, src, len, dst, filter);
+   ret |= apply_filter(path, src, len, -1, dst, filter);
if (!ret  required)
die(%s: clean filter '%s' failed, path, ca.drv-name);
 
@@ -778,6 +804,23 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
return ret | ident_to_git(path, src, len, dst, ca.ident

[PATCH v6 0/6] Stream fd to clean filter; GIT_MMAP_LIMIT, GIT_ALLOC_LIMIT with git_env_ulong()

2014-08-26 Thread Steffen Prohaska
Changes since v5:

 - Introduce and use git_env_ulong().
 - Change copy_fd() to not close input fd, which simplified changes to convert.
 - More detailed explanation why filter must be marked required in commit
   message of 6/6.
 - Style fixes.

Steffen Prohaska (6):
  convert: drop arguments other than 'path' from would_convert_to_git()
  Add git_env_ulong() to parse environment variable
  Change GIT_ALLOC_LIMIT check to use git_env_ulong()
  Introduce GIT_MMAP_LIMIT to allow testing expected mmap size
  Change copy_fd() to not close input fd
  convert: stream from fd to required clean filter to reduce used
address space

 cache.h   |  1 +
 config.c  | 11 +++
 convert.c | 55 +--
 convert.h | 10 +++---
 copy.c|  5 +
 lockfile.c|  3 +++
 sha1_file.c   | 47 ---
 t/t0021-conversion.sh | 24 +-
 t/t1050-large.sh  |  2 +-
 wrapper.c | 15 +++---
 10 files changed, 144 insertions(+), 29 deletions(-)

-- 
2.1.0.8.gf3a29c8

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong()

2014-08-25 Thread Steffen Prohaska

On Aug 25, 2014, at 1:38 PM, Jeff King p...@peff.net wrote:

 On Sun, Aug 24, 2014 at 06:07:44PM +0200, Steffen Prohaska wrote:
 
 diff --git a/wrapper.c b/wrapper.c
 index bc1bfb8..69d1c9b 100644
 --- a/wrapper.c
 +++ b/wrapper.c
 @@ -11,14 +11,18 @@ static void (*try_to_free_routine)(size_t size) = 
 do_nothing;
 
 static void memory_limit_check(size_t size)
 {
 -static int limit = -1;
 -if (limit == -1) {
 -const char *env = getenv(GIT_ALLOC_LIMIT);
 -limit = env ? atoi(env) * 1024 : 0;
 +static size_t limit = SIZE_MAX;
 +if (limit == SIZE_MAX) {
 
 You use SIZE_MAX as the sentinel for not set, and 0 as the sentinel
 for no limit. That seems kind of backwards.
 
 I guess you are inheriting this from the existing code, which lets
 GIT_ALLOC_LIMIT=0 mean no limit. I'm not sure if we want to keep that
 or not (it would be backwards incompatible to change it, but we are
 already breaking compatibility here by assuming bytes rather than
 kilobytes; I think that's OK because this is not a documented feature,
 or one intended to be used externally).

I think it's reasonable that GIT_ALLOC_LIMIT=0 means no limit, so that
the limit can easily be disabled temporarily.

But I could change the sentinel and handle 0 like:

if (git_parse_ulong(env, val)) {
if (!val) {
val = SIZE_MAX;
}
}

Maybe we should do this.



 +const char *var = GIT_ALLOC_LIMIT;
 +unsigned long val = 0;
 +const char *env = getenv(var);
 +if (env  !git_parse_ulong(env, val))
 +die(Failed to parse %s, var);
 +limit = val;
  }
 
 This and the next patch both look OK to me, but I notice this part is
 largely duplicated between the two. We already have git_env_bool to do a
 similar thing for boolean environment variables. Should we do something
 similar like:
 
 diff --git a/config.c b/config.c
 index 058505c..11919eb 100644
 --- a/config.c
 +++ b/config.c
 @@ -1122,6 +1122,14 @@ int git_env_bool(const char *k, int def)
   return v ? git_config_bool(k, v) : def;
 }
 
 +unsigned long git_env_ulong(const char *k, unsigned long val)
 +{
 + const char *v = getenv(k);
 + if (v  !git_parse_ulong(k, val))
 + die(failed to parse %s, k);
 + return val;
 +}
 +
 int git_config_system(void)
 {
   return !git_env_bool(GIT_CONFIG_NOSYSTEM, 0);
 
 It's not a lot of code, but I think the callers end up being much easier
 to read:
 
  if (limit == SIZE_MAX)
   limit = git_env_ulong(GIT_ALLOC_LIMIT, 0);

I think you're right.  I'll change it.


Steffen--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap

2014-08-25 Thread Steffen Prohaska

On Aug 25, 2014, at 2:43 PM, Jeff King p...@peff.net wrote:

 On Sun, Aug 24, 2014 at 06:07:46PM +0200, Steffen Prohaska wrote:
 
 The data is streamed to the filter process anyway.  Better avoid mapping
 the file if possible.  This is especially useful if a clean filter
 reduces the size, for example if it computes a sha1 for binary data,
 like git media.  The file size that the previous implementation could
 handle was limited by the available address space; large files for
 example could not be handled with (32-bit) msysgit.  The new
 implementation can filter files of any size as long as the filter output
 is small enough.
 
 The new code path is only taken if the filter is required.  The filter
 consumes data directly from the fd.  The original data is not available
 to git, so it must fail if the filter fails.
 
 Can you clarify this second paragraph a bit more? If I understand
 correctly, we handle a non-required filter failing by just reading the
 data again (which we can do because we either read it into memory
 ourselves, or mmap it).

We don't read the data again.  convert_to_git() assumes that it is already
in memory and simply keeps the original buffer if the filter fails.


 With the streaming approach, we will read the
 whole file through our stream; if that fails we would then want to read
 the stream from the start.
 
 Couldn't we do that with an lseek (or even an mmap with offset 0)? That
 obviously would not work for non-file inputs, but I think we address
 that already in index_fd: we push non-seekable things off to index_pipe,
 where we spool them to memory.

It could be handled that way, but we would be back to the original problem
that 32-bit git fails for large files.  The convert code path currently
assumes that all data is available in a single buffer at some point to apply
crlf and ident filters.

If the initial filter, which is assumed to reduce the file size, fails, we
could seek to 0 and read the entire file.  But git would then fail for large
files with out-of-memory.  We would not gain anything for the use case that
I describe in the commit message's first paragraph.

To implement something like the ideal strategy below, the entire convert 
machinery for crlf and ident would have to be converted to a streaming
approach.  Another option would be to detect that only the clean filter
would be applied and not crlf and ident.  Maybe we could get away with
something simpler then.

But I think that if the clean filter's purpose is to reduce file size, it
does not make sense to try to handle the case of a failing filter with a 
fallback plan.  The filter should simply be marked required, because
any sane operation requires it.


 So it seems like the ideal strategy would be:
 
  1. If it's seekable, try streaming. If not, fall back to lseek/mmap.
 
  2. If it's not seekable and the filter is required, try streaming. We
 die anyway if we fail.
 
  3. If it's not seekable and the filter is not required, decide based
 on file size:
 
   a. If it's small, spool to memory and proceed as we do now.
 
   b. If it's big, spool to a seekable tempfile.
 
 Your patch implements part 2. But I would think part 1 is the most common
 case. And while part 3b seems unpleasant, it is better than the current
 code (with or without your patch), which will do 3a on a large file.
 
 Hmm. Though I guess in (3) we do not have the size up front, so it's
 complicated (we could spool N bytes to memory, then start dumping to a
 file after that). I do not think we necessarily need to implement that
 part, though. It seems like (1) is the thing I would expect to hit the
 most (i.e., people do not always mark their filters are required).

Well, I think they have to mark it if the filter's purpose is to reduce size.

I'll add a bit of the discussion to the commit message.  I'm not convinced
that we should do more at this point.


 +} else {
 +/* dup(), because copy_fd() closes the input fd. */
 +fd = dup(params-fd);
 
 Not a problem you are introducing, but this seem kind of like a
 misfeature in copy_fd. Is it worth fixing? The function only has two
 existing callers.

I found it confusing.  I think it's worth fixing.

Steffen
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size

2014-08-24 Thread Steffen Prohaska
On Aug 22, 2014, at 6:31 PM, Junio C Hamano gits...@pobox.com wrote:

 Steffen Prohaska proha...@zib.de writes:
 
 +  if (limit == -1) {
 +  const char *env = getenv(GIT_MMAP_LIMIT);
 +  limit = env ? atoi(env) * 1024 : 0;
 
 ... this should then be changed to atol(env), and ... 
 
 In the real codepath (not debugging aid like this) we try to avoid
 atoi/atol so that we can catch errors like feeding 123Q and
 parsing it as 123.
 
 But it is OK to be loose in an debugging aid.  If I were doing
 this code, I actually would call git_parse_ulong() and not
 define it in terms of kilobytes, though.

Thanks for suggesting this.  I wasn't aware of git_parse_ulong().
I think it makes very much sense to use it, even though it's only a
testing aid.

I'll send a PATCH v5 series.

Steffen

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong()

2014-08-24 Thread Steffen Prohaska
GIT_ALLOC_LIMIT limits xmalloc()'s size, which is of type size_t.
Better use git_parse_ulong() to parse the environment variable, so that
the postfixes 'k', 'm', and 'g' can be used; and use size_t to store the
limit for consistency.  The change to size_t has no direct practical
impact, because we use GIT_ALLOC_LIMIT to test small sizes.

The cast of size in the call to die() is changed to uintmax_t to match
the format string PRIuMAX.

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 t/t1050-large.sh |  2 +-
 wrapper.c| 16 ++--
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index aea4936..e7657ab 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -13,7 +13,7 @@ test_expect_success setup '
echo X | dd of=large2 bs=1k seek=2000 
echo X | dd of=large3 bs=1k seek=2000 
echo Y | dd of=huge bs=1k seek=2500 
-   GIT_ALLOC_LIMIT=1500 
+   GIT_ALLOC_LIMIT=1500k 
export GIT_ALLOC_LIMIT
 '
 
diff --git a/wrapper.c b/wrapper.c
index bc1bfb8..69d1c9b 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -11,14 +11,18 @@ static void (*try_to_free_routine)(size_t size) = 
do_nothing;
 
 static void memory_limit_check(size_t size)
 {
-   static int limit = -1;
-   if (limit == -1) {
-   const char *env = getenv(GIT_ALLOC_LIMIT);
-   limit = env ? atoi(env) * 1024 : 0;
+   static size_t limit = SIZE_MAX;
+   if (limit == SIZE_MAX) {
+   const char *var = GIT_ALLOC_LIMIT;
+   unsigned long val = 0;
+   const char *env = getenv(var);
+   if (env  !git_parse_ulong(env, val))
+   die(Failed to parse %s, var);
+   limit = val;
}
if (limit  size  limit)
-   die(attempting to allocate %PRIuMAX over limit %d,
-   (intmax_t)size, limit);
+   die(attempting to allocate %PRIuMAX over limit %PRIuMAX,
+   (uintmax_t)size, (uintmax_t)limit);
 }
 
 try_to_free_t set_try_to_free_routine(try_to_free_t routine)
-- 
2.1.0.8.gd3b6067

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/4] convert: Refactor would_convert_to_git() to single arg 'path'

2014-08-24 Thread Steffen Prohaska
It is only the path that matters in the decision whether to filter or
not.  Clarify this by making path the single argument of
would_convert_to_git().

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 convert.h   | 5 ++---
 sha1_file.c | 2 +-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/convert.h b/convert.h
index 0c2143c..c638b33 100644
--- a/convert.h
+++ b/convert.h
@@ -40,10 +40,9 @@ extern int convert_to_working_tree(const char *path, const 
char *src,
   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
  struct strbuf *dst);
-static inline int would_convert_to_git(const char *path, const char *src,
-  size_t len, enum safe_crlf checksafe)
+static inline int would_convert_to_git(const char *path)
 {
-   return convert_to_git(path, src, len, NULL, checksafe);
+   return convert_to_git(path, NULL, 0, NULL, 0);
 }
 
 /*
diff --git a/sha1_file.c b/sha1_file.c
index 3f70b1d..00c07f2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3144,7 +3144,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
if (!S_ISREG(st-st_mode))
ret = index_pipe(sha1, fd, type, path, flags);
else if (size = big_file_threshold || type != OBJ_BLOB ||
-(path  would_convert_to_git(path, NULL, 0, 0)))
+(path  would_convert_to_git(path)))
ret = index_core(sha1, fd, size, type, path, flags);
else
ret = index_stream(sha1, fd, size, type, path, flags);
-- 
2.1.0.8.gd3b6067

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap

2014-08-24 Thread Steffen Prohaska
The data is streamed to the filter process anyway.  Better avoid mapping
the file if possible.  This is especially useful if a clean filter
reduces the size, for example if it computes a sha1 for binary data,
like git media.  The file size that the previous implementation could
handle was limited by the available address space; large files for
example could not be handled with (32-bit) msysgit.  The new
implementation can filter files of any size as long as the filter output
is small enough.

The new code path is only taken if the filter is required.  The filter
consumes data directly from the fd.  The original data is not available
to git, so it must fail if the filter fails.

The environment variable GIT_MMAP_LIMIT, which has been introduced in
the previous commit is used to test that the expected code path is
taken.  A related test that exercises required filters is modified to
verify that the data actually has been modified on its way from the file
system to the object store.

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 convert.c | 60 +--
 convert.h |  5 +
 sha1_file.c   | 27 ++-
 t/t0021-conversion.sh | 24 -
 4 files changed, 104 insertions(+), 12 deletions(-)

diff --git a/convert.c b/convert.c
index cb5fbb4..463f6de 100644
--- a/convert.c
+++ b/convert.c
@@ -312,11 +312,12 @@ static int crlf_to_worktree(const char *path, const char 
*src, size_t len,
 struct filter_params {
const char *src;
unsigned long size;
+   int fd;
const char *cmd;
const char *path;
 };
 
-static int filter_buffer(int in, int out, void *data)
+static int filter_buffer_or_fd(int in, int out, void *data)
 {
/*
 * Spawn cmd and feed the buffer contents through its stdin.
@@ -325,6 +326,7 @@ static int filter_buffer(int in, int out, void *data)
struct filter_params *params = (struct filter_params *)data;
int write_err, status;
const char *argv[] = { NULL, NULL };
+   int fd;
 
/* apply % substitution to cmd */
struct strbuf cmd = STRBUF_INIT;
@@ -355,7 +357,17 @@ static int filter_buffer(int in, int out, void *data)
 
sigchain_push(SIGPIPE, SIG_IGN);
 
-   write_err = (write_in_full(child_process.in, params-src, params-size) 
 0);
+   if (params-src) {
+   write_err = (write_in_full(child_process.in, params-src, 
params-size)  0);
+   } else {
+   /* dup(), because copy_fd() closes the input fd. */
+   fd = dup(params-fd);
+   if (fd  0)
+   write_err = error(failed to dup file descriptor.);
+   else
+   write_err = copy_fd(fd, child_process.in);
+   }
+
if (close(child_process.in))
write_err = 1;
if (write_err)
@@ -371,7 +383,7 @@ static int filter_buffer(int in, int out, void *data)
return (write_err || status);
 }
 
-static int apply_filter(const char *path, const char *src, size_t len,
+static int apply_filter(const char *path, const char *src, size_t len, int fd,
 struct strbuf *dst, const char *cmd)
 {
/*
@@ -392,11 +404,12 @@ static int apply_filter(const char *path, const char 
*src, size_t len,
return 1;
 
memset(async, 0, sizeof(async));
-   async.proc = filter_buffer;
+   async.proc = filter_buffer_or_fd;
async.data = params;
async.out = -1;
params.src = src;
params.size = len;
+   params.fd = fd;
params.cmd = cmd;
params.path = path;
 
@@ -747,6 +760,24 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
}
 }
 
+int would_convert_to_git_filter_fd(const char *path)
+{
+   struct conv_attrs ca;
+
+   convert_attrs(ca, path);
+   if (!ca.drv)
+   return 0;
+
+   /* Apply a filter to an fd only if the filter is required to succeed.
+* We must die if the filter fails, because the original data before
+* filtering is not available.
+*/
+   if (!ca.drv-required)
+   return 0;
+
+   return apply_filter(path, NULL, 0, -1, NULL, ca.drv-clean);
+}
+
 int convert_to_git(const char *path, const char *src, size_t len,
struct strbuf *dst, enum safe_crlf checksafe)
 {
@@ -761,7 +792,7 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
required = ca.drv-required;
}
 
-   ret |= apply_filter(path, src, len, dst, filter);
+   ret |= apply_filter(path, src, len, -1, dst, filter);
if (!ret  required)
die(%s: clean filter '%s' failed, path, ca.drv-name);
 
@@ -778,6 +809,23 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
 
+void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst

[PATCH v5 0/4] Stream fd to clean filter; GIT_MMAP_LIMIT, GIT_ALLOC_LIMIT with git_parse_ulong()

2014-08-24 Thread Steffen Prohaska
Changes since v4: use git_parse_ulong() to parse env vars.

Steffen Prohaska (4):
  convert: Refactor would_convert_to_git() to single arg 'path'
  Change GIT_ALLOC_LIMIT check to use git_parse_ulong()
  Introduce GIT_MMAP_LIMIT to allow testing expected mmap size
  convert: Stream from fd to required clean filter instead of mmap

 convert.c | 60 +--
 convert.h | 10 ++---
 sha1_file.c   | 50 +++---
 t/t0021-conversion.sh | 24 -
 t/t1050-large.sh  |  2 +-
 wrapper.c | 16 --
 6 files changed, 138 insertions(+), 24 deletions(-)

-- 
2.1.0.8.gd3b6067

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 3/4] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size

2014-08-24 Thread Steffen Prohaska
Similar to testing expectations about malloc with GIT_ALLOC_LIMIT (see
commit d41489 and previous commit), it can be useful to test
expectations about mmap.

This introduces a new environment variable GIT_MMAP_LIMIT to limit the
largest allowed mmap length.  xmmap() is modified to check the limit.
Together with GIT_ALLOC_LIMIT tests can now easily confirm expectations
about memory consumption.

GIT_MMAP_LIMIT will be used in the next commit to test that data will be
streamed to an external filter without mmaping the entire file.

[commit d41489]: d41489a6424308dc9a0409bc2f6845aa08bd4f7d Add more large
blob test cases

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 sha1_file.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 00c07f2..3204f66 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -663,10 +663,29 @@ void release_pack_memory(size_t need)
; /* nothing */
 }
 
+static void mmap_limit_check(size_t length)
+{
+   static size_t limit = SIZE_MAX;
+   if (limit == SIZE_MAX) {
+   const char *var = GIT_MMAP_LIMIT;
+   unsigned long val = 0;
+   const char *env = getenv(var);
+   if (env  !git_parse_ulong(env, val))
+   die(Failed to parse %s, var);
+   limit = val;
+   }
+   if (limit  length  limit)
+   die(attempting to mmap %PRIuMAX over limit %PRIuMAX,
+   (uintmax_t)length, (uintmax_t)limit);
+}
+
 void *xmmap(void *start, size_t length,
int prot, int flags, int fd, off_t offset)
 {
-   void *ret = mmap(start, length, prot, flags, fd, offset);
+   void *ret;
+
+   mmap_limit_check(length);
+   ret = mmap(start, length, prot, flags, fd, offset);
if (ret == MAP_FAILED) {
if (!length)
return NULL;
-- 
2.1.0.8.gd3b6067

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size

2014-08-22 Thread Steffen Prohaska

On Aug 22, 2014, at 12:26 AM, Junio C Hamano gits...@pobox.com wrote:

 Steffen Prohaska proha...@zib.de writes:
 
 Similar to testing expectations about malloc with GIT_ALLOC_LIMIT (see
 commit d41489), it can be useful to test expectations about mmap.
 
 This introduces a new environment variable GIT_MMAP_LIMIT to limit the
 largest allowed mmap length (in KB).  xmmap() is modified to check the
 limit.  Together with GIT_ALLOC_LIMIT tests can now easily confirm
 expectations about memory consumption.
 
 GIT_ALLOC_LIMIT will be used in the next commit to test that data will
 
 I smell the need for s/ALLOC/MMAP/ here, but perhaps you did mean
 ALLOC (I won't know until I check 3/3 ;-)

You are right.


 diff --git a/sha1_file.c b/sha1_file.c
 index 00c07f2..88d64c0 100644
 --- a/sha1_file.c
 +++ b/sha1_file.c
 @@ -663,10 +663,25 @@ void release_pack_memory(size_t need)
  ; /* nothing */
 }
 
 +static void mmap_limit_check(size_t length)
 +{
 +static int limit = -1;
 
 Perhaps you want ssize_t here?  I see mmap() as a tool to handle a
 lot more data than a single malloc() typically would ;-) so previous
 mistakes by other people would not be a good excuse.

Maybe; and ...


 +if (limit == -1) {
 +const char *env = getenv(GIT_MMAP_LIMIT);
 +limit = env ? atoi(env) * 1024 : 0;

... this should then be changed to atol(env), and ... 


 +}
 +if (limit  length  limit)
 +die(attempting to mmap %PRIuMAX over limit %d,
 +(intmax_t)length, limit);

... here PRIuMAX and (uintmax_t); (uintmax_t) also for length.

I'll fix it and also GIT_ALLOC_LIMIT.

Steffen


 +}
 +
 void *xmmap(void *start, size_t length,
  int prot, int flags, int fd, off_t offset)
 {
 -void *ret = mmap(start, length, prot, flags, fd, offset);
 +void *ret;
 +
 +mmap_limit_check(length);
 +ret = mmap(start, length, prot, flags, fd, offset);
  if (ret == MAP_FAILED) {
  if (!length)
  return NULL;

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 3/4] Change GIT_ALLOC_LIMIT check to use ssize_t for consistency

2014-08-22 Thread Steffen Prohaska
GIT_ALLOC_LIMIT limits xmalloc()'s size, which is of type size_t.  Use
ssize_t to store the limit for consistency.  The change has no direct
practical impact, because we use GIT_ALLOC_LIMIT to test small sizes.

The cast of size in the call to die() is changed to uintmax_t to match
the format string PRIuMAX.

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 wrapper.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/wrapper.c b/wrapper.c
index bc1bfb8..e91f6e9 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -11,14 +11,14 @@ static void (*try_to_free_routine)(size_t size) = 
do_nothing;
 
 static void memory_limit_check(size_t size)
 {
-   static int limit = -1;
+   static ssize_t limit = -1;
if (limit == -1) {
const char *env = getenv(GIT_ALLOC_LIMIT);
-   limit = env ? atoi(env) * 1024 : 0;
+   limit = env ? atol(env) * 1024 : 0;
}
if (limit  size  limit)
-   die(attempting to allocate %PRIuMAX over limit %d,
-   (intmax_t)size, limit);
+   die(attempting to allocate %PRIuMAX over limit %PRIuMAX,
+   (uintmax_t)size, (uintmax_t)limit);
 }
 
 try_to_free_t set_try_to_free_routine(try_to_free_t routine)
-- 
2.1.0.6.gb452461

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/4] convert: Refactor would_convert_to_git() to single arg 'path'

2014-08-22 Thread Steffen Prohaska
It is only the path that matters in the decision whether to filter or
not.  Clarify this by making path the single argument of
would_convert_to_git().

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 convert.h   | 5 ++---
 sha1_file.c | 2 +-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/convert.h b/convert.h
index 0c2143c..c638b33 100644
--- a/convert.h
+++ b/convert.h
@@ -40,10 +40,9 @@ extern int convert_to_working_tree(const char *path, const 
char *src,
   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
  struct strbuf *dst);
-static inline int would_convert_to_git(const char *path, const char *src,
-  size_t len, enum safe_crlf checksafe)
+static inline int would_convert_to_git(const char *path)
 {
-   return convert_to_git(path, src, len, NULL, checksafe);
+   return convert_to_git(path, NULL, 0, NULL, 0);
 }
 
 /*
diff --git a/sha1_file.c b/sha1_file.c
index 3f70b1d..00c07f2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3144,7 +3144,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
if (!S_ISREG(st-st_mode))
ret = index_pipe(sha1, fd, type, path, flags);
else if (size = big_file_threshold || type != OBJ_BLOB ||
-(path  would_convert_to_git(path, NULL, 0, 0)))
+(path  would_convert_to_git(path)))
ret = index_core(sha1, fd, size, type, path, flags);
else
ret = index_stream(sha1, fd, size, type, path, flags);
-- 
2.1.0.6.gb452461

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/4] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size

2014-08-22 Thread Steffen Prohaska
Similar to testing expectations about malloc with GIT_ALLOC_LIMIT (see
commit d41489), it can be useful to test expectations about mmap.

This introduces a new environment variable GIT_MMAP_LIMIT to limit the
largest allowed mmap length (in KB).  xmmap() is modified to check the
limit.  Together with GIT_ALLOC_LIMIT tests can now easily confirm
expectations about memory consumption.

GIT_MMAP_LIMIT will be used in the next commit to test that data will be
streamed to an external filter without mmaping the entire file.

[commit d41489]: d41489a6424308dc9a0409bc2f6845aa08bd4f7d Add more large
blob test cases

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 sha1_file.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 00c07f2..603673b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -663,10 +663,25 @@ void release_pack_memory(size_t need)
; /* nothing */
 }
 
+static void mmap_limit_check(size_t length)
+{
+   static ssize_t limit = -1;
+   if (limit == -1) {
+   const char *env = getenv(GIT_MMAP_LIMIT);
+   limit = env ? atol(env) * 1024 : 0;
+   }
+   if (limit  length  limit)
+   die(attempting to mmap %PRIuMAX over limit %PRIuMAX,
+   (uintmax_t)length, (uintmax_t)limit);
+}
+
 void *xmmap(void *start, size_t length,
int prot, int flags, int fd, off_t offset)
 {
-   void *ret = mmap(start, length, prot, flags, fd, offset);
+   void *ret;
+
+   mmap_limit_check(length);
+   ret = mmap(start, length, prot, flags, fd, offset);
if (ret == MAP_FAILED) {
if (!length)
return NULL;
-- 
2.1.0.6.gb452461

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/4] Stream fd to clean filter, GIT_MMAP_LIMIT, GIT_ALLOC_LIMIT

2014-08-22 Thread Steffen Prohaska
Only changes since PATCH v3: Use ssize_t to store limits.

Steffen Prohaska (4):
  convert: Refactor would_convert_to_git() to single arg 'path'
  Introduce GIT_MMAP_LIMIT to allow testing expected mmap size
  Change GIT_ALLOC_LIMIT check to use ssize_t for consistency
  convert: Stream from fd to required clean filter instead of mmap

 convert.c | 60 +--
 convert.h | 10 ++---
 sha1_file.c   | 46 ---
 t/t0021-conversion.sh | 24 -
 wrapper.c |  8 +++
 5 files changed, 127 insertions(+), 21 deletions(-)

-- 
2.1.0.6.gb452461

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 4/4] convert: Stream from fd to required clean filter instead of mmap

2014-08-22 Thread Steffen Prohaska
The data is streamed to the filter process anyway.  Better avoid mapping
the file if possible.  This is especially useful if a clean filter
reduces the size, for example if it computes a sha1 for binary data,
like git media.  The file size that the previous implementation could
handle was limited by the available address space; large files for
example could not be handled with (32-bit) msysgit.  The new
implementation can filter files of any size as long as the filter output
is small enough.

The new code path is only taken if the filter is required.  The filter
consumes data directly from the fd.  The original data is not available
to git, so it must fail if the filter fails.

The environment variable GIT_MMAP_LIMIT, which has been introduced in
the previous commit is used to test that the expected code path is
taken.  A related test that exercises required filters is modified to
verify that the data actually has been modified on its way from the file
system to the object store.

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 convert.c | 60 +--
 convert.h |  5 +
 sha1_file.c   | 27 ++-
 t/t0021-conversion.sh | 24 -
 4 files changed, 104 insertions(+), 12 deletions(-)

diff --git a/convert.c b/convert.c
index cb5fbb4..463f6de 100644
--- a/convert.c
+++ b/convert.c
@@ -312,11 +312,12 @@ static int crlf_to_worktree(const char *path, const char 
*src, size_t len,
 struct filter_params {
const char *src;
unsigned long size;
+   int fd;
const char *cmd;
const char *path;
 };
 
-static int filter_buffer(int in, int out, void *data)
+static int filter_buffer_or_fd(int in, int out, void *data)
 {
/*
 * Spawn cmd and feed the buffer contents through its stdin.
@@ -325,6 +326,7 @@ static int filter_buffer(int in, int out, void *data)
struct filter_params *params = (struct filter_params *)data;
int write_err, status;
const char *argv[] = { NULL, NULL };
+   int fd;
 
/* apply % substitution to cmd */
struct strbuf cmd = STRBUF_INIT;
@@ -355,7 +357,17 @@ static int filter_buffer(int in, int out, void *data)
 
sigchain_push(SIGPIPE, SIG_IGN);
 
-   write_err = (write_in_full(child_process.in, params-src, params-size) 
 0);
+   if (params-src) {
+   write_err = (write_in_full(child_process.in, params-src, 
params-size)  0);
+   } else {
+   /* dup(), because copy_fd() closes the input fd. */
+   fd = dup(params-fd);
+   if (fd  0)
+   write_err = error(failed to dup file descriptor.);
+   else
+   write_err = copy_fd(fd, child_process.in);
+   }
+
if (close(child_process.in))
write_err = 1;
if (write_err)
@@ -371,7 +383,7 @@ static int filter_buffer(int in, int out, void *data)
return (write_err || status);
 }
 
-static int apply_filter(const char *path, const char *src, size_t len,
+static int apply_filter(const char *path, const char *src, size_t len, int fd,
 struct strbuf *dst, const char *cmd)
 {
/*
@@ -392,11 +404,12 @@ static int apply_filter(const char *path, const char 
*src, size_t len,
return 1;
 
memset(async, 0, sizeof(async));
-   async.proc = filter_buffer;
+   async.proc = filter_buffer_or_fd;
async.data = params;
async.out = -1;
params.src = src;
params.size = len;
+   params.fd = fd;
params.cmd = cmd;
params.path = path;
 
@@ -747,6 +760,24 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
}
 }
 
+int would_convert_to_git_filter_fd(const char *path)
+{
+   struct conv_attrs ca;
+
+   convert_attrs(ca, path);
+   if (!ca.drv)
+   return 0;
+
+   /* Apply a filter to an fd only if the filter is required to succeed.
+* We must die if the filter fails, because the original data before
+* filtering is not available.
+*/
+   if (!ca.drv-required)
+   return 0;
+
+   return apply_filter(path, NULL, 0, -1, NULL, ca.drv-clean);
+}
+
 int convert_to_git(const char *path, const char *src, size_t len,
struct strbuf *dst, enum safe_crlf checksafe)
 {
@@ -761,7 +792,7 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
required = ca.drv-required;
}
 
-   ret |= apply_filter(path, src, len, dst, filter);
+   ret |= apply_filter(path, src, len, -1, dst, filter);
if (!ret  required)
die(%s: clean filter '%s' failed, path, ca.drv-name);
 
@@ -778,6 +809,23 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
 
+void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst

[PATCH v3 1/3] convert: Refactor would_convert_to_git() to single arg 'path'

2014-08-21 Thread Steffen Prohaska
It is only the path that matters in the decision whether to filter or
not.  Clarify this by making path the single argument of
would_convert_to_git().

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 convert.h   | 5 ++---
 sha1_file.c | 2 +-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/convert.h b/convert.h
index 0c2143c..c638b33 100644
--- a/convert.h
+++ b/convert.h
@@ -40,10 +40,9 @@ extern int convert_to_working_tree(const char *path, const 
char *src,
   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
  struct strbuf *dst);
-static inline int would_convert_to_git(const char *path, const char *src,
-  size_t len, enum safe_crlf checksafe)
+static inline int would_convert_to_git(const char *path)
 {
-   return convert_to_git(path, src, len, NULL, checksafe);
+   return convert_to_git(path, NULL, 0, NULL, 0);
 }
 
 /*
diff --git a/sha1_file.c b/sha1_file.c
index 3f70b1d..00c07f2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3144,7 +3144,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
if (!S_ISREG(st-st_mode))
ret = index_pipe(sha1, fd, type, path, flags);
else if (size = big_file_threshold || type != OBJ_BLOB ||
-(path  would_convert_to_git(path, NULL, 0, 0)))
+(path  would_convert_to_git(path)))
ret = index_core(sha1, fd, size, type, path, flags);
else
ret = index_stream(sha1, fd, size, type, path, flags);
-- 
2.1.0.6.gb452461

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/3] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size

2014-08-21 Thread Steffen Prohaska
Similar to testing expectations about malloc with GIT_ALLOC_LIMIT (see
commit d41489), it can be useful to test expectations about mmap.

This introduces a new environment variable GIT_MMAP_LIMIT to limit the
largest allowed mmap length (in KB).  xmmap() is modified to check the
limit.  Together with GIT_ALLOC_LIMIT tests can now easily confirm
expectations about memory consumption.

GIT_ALLOC_LIMIT will be used in the next commit to test that data will
be streamed to an external filter without mmaping the entire file.

[commit d41489]: d41489a6424308dc9a0409bc2f6845aa08bd4f7d Add more large
blob test cases

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 sha1_file.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 00c07f2..88d64c0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -663,10 +663,25 @@ void release_pack_memory(size_t need)
; /* nothing */
 }
 
+static void mmap_limit_check(size_t length)
+{
+   static int limit = -1;
+   if (limit == -1) {
+   const char *env = getenv(GIT_MMAP_LIMIT);
+   limit = env ? atoi(env) * 1024 : 0;
+   }
+   if (limit  length  limit)
+   die(attempting to mmap %PRIuMAX over limit %d,
+   (intmax_t)length, limit);
+}
+
 void *xmmap(void *start, size_t length,
int prot, int flags, int fd, off_t offset)
 {
-   void *ret = mmap(start, length, prot, flags, fd, offset);
+   void *ret;
+
+   mmap_limit_check(length);
+   ret = mmap(start, length, prot, flags, fd, offset);
if (ret == MAP_FAILED) {
if (!length)
return NULL;
-- 
2.1.0.6.gb452461

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/3] convert: Stream from fd to required clean filter instead of mmap

2014-08-21 Thread Steffen Prohaska
The data is streamed to the filter process anyway.  Better avoid mapping
the file if possible.  This is especially useful if a clean filter
reduces the size, for example if it computes a sha1 for binary data,
like git media.  The file size that the previous implementation could
handle was limited by the available address space; large files for
example could not be handled with (32-bit) msysgit.  The new
implementation can filter files of any size as long as the filter output
is small enough.

The new code path is only taken if the filter is required.  The filter
consumes data directly from the fd.  The original data is not available
to git, so it must fail if the filter fails.

The environment variable GIT_MMAP_LIMIT, which has been introduced in
the previous commit is used to test that the expected code path is
taken.  A related test that exercises required filters is modified to
verify that the data actually has been modified on its way from the file
system to the object store.

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 convert.c | 60 +--
 convert.h |  5 +
 sha1_file.c   | 27 ++-
 t/t0021-conversion.sh | 24 -
 4 files changed, 104 insertions(+), 12 deletions(-)

diff --git a/convert.c b/convert.c
index cb5fbb4..463f6de 100644
--- a/convert.c
+++ b/convert.c
@@ -312,11 +312,12 @@ static int crlf_to_worktree(const char *path, const char 
*src, size_t len,
 struct filter_params {
const char *src;
unsigned long size;
+   int fd;
const char *cmd;
const char *path;
 };
 
-static int filter_buffer(int in, int out, void *data)
+static int filter_buffer_or_fd(int in, int out, void *data)
 {
/*
 * Spawn cmd and feed the buffer contents through its stdin.
@@ -325,6 +326,7 @@ static int filter_buffer(int in, int out, void *data)
struct filter_params *params = (struct filter_params *)data;
int write_err, status;
const char *argv[] = { NULL, NULL };
+   int fd;
 
/* apply % substitution to cmd */
struct strbuf cmd = STRBUF_INIT;
@@ -355,7 +357,17 @@ static int filter_buffer(int in, int out, void *data)
 
sigchain_push(SIGPIPE, SIG_IGN);
 
-   write_err = (write_in_full(child_process.in, params-src, params-size) 
 0);
+   if (params-src) {
+   write_err = (write_in_full(child_process.in, params-src, 
params-size)  0);
+   } else {
+   /* dup(), because copy_fd() closes the input fd. */
+   fd = dup(params-fd);
+   if (fd  0)
+   write_err = error(failed to dup file descriptor.);
+   else
+   write_err = copy_fd(fd, child_process.in);
+   }
+
if (close(child_process.in))
write_err = 1;
if (write_err)
@@ -371,7 +383,7 @@ static int filter_buffer(int in, int out, void *data)
return (write_err || status);
 }
 
-static int apply_filter(const char *path, const char *src, size_t len,
+static int apply_filter(const char *path, const char *src, size_t len, int fd,
 struct strbuf *dst, const char *cmd)
 {
/*
@@ -392,11 +404,12 @@ static int apply_filter(const char *path, const char 
*src, size_t len,
return 1;
 
memset(async, 0, sizeof(async));
-   async.proc = filter_buffer;
+   async.proc = filter_buffer_or_fd;
async.data = params;
async.out = -1;
params.src = src;
params.size = len;
+   params.fd = fd;
params.cmd = cmd;
params.path = path;
 
@@ -747,6 +760,24 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
}
 }
 
+int would_convert_to_git_filter_fd(const char *path)
+{
+   struct conv_attrs ca;
+
+   convert_attrs(ca, path);
+   if (!ca.drv)
+   return 0;
+
+   /* Apply a filter to an fd only if the filter is required to succeed.
+* We must die if the filter fails, because the original data before
+* filtering is not available.
+*/
+   if (!ca.drv-required)
+   return 0;
+
+   return apply_filter(path, NULL, 0, -1, NULL, ca.drv-clean);
+}
+
 int convert_to_git(const char *path, const char *src, size_t len,
struct strbuf *dst, enum safe_crlf checksafe)
 {
@@ -761,7 +792,7 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
required = ca.drv-required;
}
 
-   ret |= apply_filter(path, src, len, dst, filter);
+   ret |= apply_filter(path, src, len, -1, dst, filter);
if (!ret  required)
die(%s: clean filter '%s' failed, path, ca.drv-name);
 
@@ -778,6 +809,23 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
 
+void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst

[PATCH v3 0/3] Stream fd to clean filter, GIT_MMAP_LIMIT

2014-08-21 Thread Steffen Prohaska
I revised the testing approach as discussed.  Patch 2/3 adds GIT_MMAP_LIMIT,
which allows testing of memory expectations together with GIT_ALLOC_LIMIT.

The rest is unchanged compared to v2.

Steffen Prohaska (3):
  convert: Refactor would_convert_to_git() to single arg 'path'
  Introduce GIT_MMAP_LIMIT to allow testing expected mmap size
  convert: Stream from fd to required clean filter instead of mmap

 convert.c | 60 +--
 convert.h | 10 ++---
 sha1_file.c   | 46 ---
 t/t0021-conversion.sh | 24 -
 4 files changed, 123 insertions(+), 17 deletions(-)

-- 
2.1.0.6.gb452461

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] convert: Stream from fd to required clean filter instead of mmap

2014-08-19 Thread Steffen Prohaska

On Aug 19, 2014, at 9:53 AM, Jeff King p...@peff.net wrote:

 For files 2GB on a 32-bit system (e.g. msysgit), filtering with the
 previous code always failed.  Now it works.  I created the patch to
 change git from 'fundamentally doesn't handle this' to 'works as
 expected'.
 
 I deal with similar problem and added $GIT_ALLOC_LIMIT to test large
 blob code. Maybe we could add $GIT_MMAP_LIMIT? I don't suppose we call
 xmmap/xmalloc so often that the extra test would slow git down.
 
 Yeah, I think I'd prefer that approach. We should mmap _way_ less than
 we malloc, and I do not think the malloc check has caused any problems.

I am going to update the patch accordingly.

Steffen

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] convert: Stream from fd to required clean filter instead of mmap

2014-08-17 Thread Steffen Prohaska

On Aug 16, 2014, at 7:00 PM, Andreas Schwab sch...@linux-m68k.org wrote:

 Steffen Prohaska proha...@zib.de writes:
 
 The test should confirm that the the file that is added is not mmapped to
 memory.
 
 RSS doesn't tell you that.  You can mmap a big file without RSS getting
 bigger.

All data are accessed after mapping, so RSS should be fine.  The test always 
did what I expected.

Steffen

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] convert: Stream from fd to required clean filter instead of mmap

2014-08-17 Thread Steffen Prohaska

On Aug 17, 2014, at 9:27 AM, Jeff King p...@peff.net wrote:

 On Sat, Aug 16, 2014 at 06:26:08PM +0200, Steffen Prohaska wrote:
 
 Is the 15MB limit supposed to be imposed somewhere or is it just a guide
 of how much memory we expect Git to use in this scenario?
 
 The test should confirm that the the file that is added is not mmapped
 to memory.  The process size should be relatively small independently
 of the size of the file that is added.  I wanted to keep the file size
 small.  The chosen sizes worked for me on Mac and Linux.
 
 Measuring memory usage seems inherently a bit flaky for the test suite.
 It's also a little out of place, as the test suite is generally about
 correctness and outcomes, and this is a performance issue.

For files 2GB on a 32-bit system (e.g. msysgit), filtering with the previous 
code always failed.  Now it works.  I created the patch to change git from 
'fundamentally doesn't handle this' to 'works as expected'.


 Would it make more sense to construct a t/perf test that shows off the
 change? I suppose the run-time change may not be that impressive, but it
 would be cool if t/perf could measure max memory use, too. Then we can
 just compare results between versions, which is enough to detect
 regressions.

I wasn't aware of t/perf.  Thanks for suggesting this.

I agree that testing memory usage might be a bit flaky.  t/perf might indeed be 
a better place.

I'm not yet entirely convinced, though.  I'm wondering whether the proposed 
test would be robust enough with a large enough threshold to keep it in the 
main test suite.

Steffen
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] convert: Stream from fd to required clean filter instead of mmap

2014-08-16 Thread Steffen Prohaska

On Aug 16, 2014, at 12:27 PM, John Keeping j...@keeping.me.uk wrote:

 +test_expect_success HAVE_MAX_MEM_USAGE \
 +'filtering large input to small output should use little memory' '
 +git config filter.devnull.clean cat /dev/null 
 +git config filter.devnull.required true 
 +for i in $(test_seq 1 30); do printf %1048576d 1; done 30MB 
 +echo 30MB filter=devnull .gitattributes 
 +max_mem_usage_is_lt_KB 15000 git add 30MB
 +'
 
 This test fails for me:
 
 -- 8 --
 expecting success: 
git config filter.devnull.clean cat /dev/null 
git config filter.devnull.required true 
for i in $(test_seq 1 30); do printf %1048576d 1; done 30MB 
echo 30MB filter=devnull .gitattributes 
max_mem_usage_is_lt_KB 15000 git add 30MB
 
 Command used too much memory (expected limit 15000KB, actual usage 15808KB).
 not ok 8 - filtering large input to small output should use little memory
 -- 8 --
 
 This is on Linux 3.16 x86_64 with GCC 4.8.3 and glibc 2.19.  My GCC also
 has -fstack-protector and -D_FORTIFY_SOURCE=2 enabled by default;
 turning those off for Git decreased the memory usage by about 500KB but
 not enough to make the test pass.  Of course, all of the libraries Git
 is linking are also compiled with those flags...
 
 Is the 15MB limit supposed to be imposed somewhere or is it just a guide
 of how much memory we expect Git to use in this scenario?

The test should confirm that the the file that is added is not mmapped to 
memory.  The process size should be relatively small independently of the size 
of the file that is added.  I wanted to keep the file size small.  The chosen 
sizes worked for me on Mac and Linux.

A simple solution could be to increase the size of the test file and the limit. 
 I'd suggest to squash the diff below.  The test should still run reasonably 
fast with a 50 MB test file.

Junio, shall I send a whole updated patch series?

Steffen


diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 2cb2414..43de87d 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -243,9 +243,9 @@ test_expect_success HAVE_MAX_MEM_USAGE \
 'filtering large input to small output should use little memory' '
git config filter.devnull.clean cat /dev/null 
git config filter.devnull.required true 
-   for i in $(test_seq 1 30); do printf %1048576d 1; done 30MB 
-   echo 30MB filter=devnull .gitattributes 
-   max_mem_usage_is_lt_KB 15000 git add 30MB
+   for i in $(test_seq 1 50); do printf %1048576d 1; done 50MB 
+   echo 50MB filter=devnull .gitattributes 
+   max_mem_usage_is_lt_KB 35000 git add 50MB
 '




--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] convert: Stream from fd to required clean filter instead of mmap

2014-08-05 Thread Steffen Prohaska
I'll address the comments about style in a revised patch.


On Aug 4, 2014, at 9:03 PM, Junio C Hamano gits...@pobox.com wrote:

 +return apply_filter(path, 0, 0, -1, 0, ca.drv-clean);
 
 What's the significance of -1 here?  Does somebody in the
 callchain from apply_filter() check if fd  0 and act differently
 (not a complaint nor rhetoric question)?

The decision in apply_filter() to return before the real work is
taken on the first 0.  Any value for fd would be ok.  The -1 is
only a reminder that the fd is invalid.

Steffen
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] convert: Refactor would_convert_to_git() to single arg 'path'

2014-08-05 Thread Steffen Prohaska
It is only the path that matters in the decision whether to filter or
not.  Clarify this by making path the single argument of
would_convert_to_git().

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 convert.h   | 5 ++---
 sha1_file.c | 2 +-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/convert.h b/convert.h
index 0c2143c..c638b33 100644
--- a/convert.h
+++ b/convert.h
@@ -40,10 +40,9 @@ extern int convert_to_working_tree(const char *path, const 
char *src,
   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
  struct strbuf *dst);
-static inline int would_convert_to_git(const char *path, const char *src,
-  size_t len, enum safe_crlf checksafe)
+static inline int would_convert_to_git(const char *path)
 {
-   return convert_to_git(path, src, len, NULL, checksafe);
+   return convert_to_git(path, NULL, 0, NULL, 0);
 }
 
 /*
diff --git a/sha1_file.c b/sha1_file.c
index 3f70b1d..00c07f2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3144,7 +3144,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
if (!S_ISREG(st-st_mode))
ret = index_pipe(sha1, fd, type, path, flags);
else if (size = big_file_threshold || type != OBJ_BLOB ||
-(path  would_convert_to_git(path, NULL, 0, 0)))
+(path  would_convert_to_git(path)))
ret = index_core(sha1, fd, size, type, path, flags);
else
ret = index_stream(sha1, fd, size, type, path, flags);
-- 
2.1.0.rc1.6.gb569bd8

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] Stream fd to clean filter

2014-08-05 Thread Steffen Prohaska
The main difference to the previous version is that I've split off the
refactoring into a separate commit.  The rest is polishing the style.

Steffen Prohaska (2):
  convert: Refactor would_convert_to_git() to single arg 'path'
  convert: Stream from fd to required clean filter instead of mmap

 convert.c | 60 -
 convert.h | 10 +---
 sha1_file.c   | 29 --
 t/t0021-conversion.sh | 68 +++
 4 files changed, 151 insertions(+), 16 deletions(-)

-- 
2.1.0.rc1.6.gb569bd8

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] convert: Stream from fd to required clean filter instead of mmap

2014-08-05 Thread Steffen Prohaska
The data is streamed to the filter process anyway.  Better avoid mapping
the file if possible.  This is especially useful if a clean filter
reduces the size, for example if it computes a sha1 for binary data,
like git media.  The file size that the previous implementation could
handle was limited by the available address space; large files for
example could not be handled with (32-bit) msysgit.  The new
implementation can filter files of any size as long as the filter output
is small enough.

The new code path is only taken if the filter is required.  The filter
consumes data directly from the fd.  The original data is not available
to git, so it must fail if the filter fails.

The test that exercises required filters is modified to verify that the
data actually has been modified on its way from the file system to the
object store.

The expectation on the process size is tested using /usr/bin/time.  An
alternative would have been tcsh, which could be used to print memory
information as follows:

tcsh -c 'set time=(0 %M); cmd'

Although the logic could perhaps be simplified with tcsh, I chose to use
'time' to avoid a dependency on tcsh.

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 convert.c | 60 -
 convert.h |  5 
 sha1_file.c   | 27 +++-
 t/t0021-conversion.sh | 68 +++
 4 files changed, 148 insertions(+), 12 deletions(-)

diff --git a/convert.c b/convert.c
index cb5fbb4..463f6de 100644
--- a/convert.c
+++ b/convert.c
@@ -312,11 +312,12 @@ static int crlf_to_worktree(const char *path, const char 
*src, size_t len,
 struct filter_params {
const char *src;
unsigned long size;
+   int fd;
const char *cmd;
const char *path;
 };
 
-static int filter_buffer(int in, int out, void *data)
+static int filter_buffer_or_fd(int in, int out, void *data)
 {
/*
 * Spawn cmd and feed the buffer contents through its stdin.
@@ -325,6 +326,7 @@ static int filter_buffer(int in, int out, void *data)
struct filter_params *params = (struct filter_params *)data;
int write_err, status;
const char *argv[] = { NULL, NULL };
+   int fd;
 
/* apply % substitution to cmd */
struct strbuf cmd = STRBUF_INIT;
@@ -355,7 +357,17 @@ static int filter_buffer(int in, int out, void *data)
 
sigchain_push(SIGPIPE, SIG_IGN);
 
-   write_err = (write_in_full(child_process.in, params-src, params-size) 
 0);
+   if (params-src) {
+   write_err = (write_in_full(child_process.in, params-src, 
params-size)  0);
+   } else {
+   /* dup(), because copy_fd() closes the input fd. */
+   fd = dup(params-fd);
+   if (fd  0)
+   write_err = error(failed to dup file descriptor.);
+   else
+   write_err = copy_fd(fd, child_process.in);
+   }
+
if (close(child_process.in))
write_err = 1;
if (write_err)
@@ -371,7 +383,7 @@ static int filter_buffer(int in, int out, void *data)
return (write_err || status);
 }
 
-static int apply_filter(const char *path, const char *src, size_t len,
+static int apply_filter(const char *path, const char *src, size_t len, int fd,
 struct strbuf *dst, const char *cmd)
 {
/*
@@ -392,11 +404,12 @@ static int apply_filter(const char *path, const char 
*src, size_t len,
return 1;
 
memset(async, 0, sizeof(async));
-   async.proc = filter_buffer;
+   async.proc = filter_buffer_or_fd;
async.data = params;
async.out = -1;
params.src = src;
params.size = len;
+   params.fd = fd;
params.cmd = cmd;
params.path = path;
 
@@ -747,6 +760,24 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
}
 }
 
+int would_convert_to_git_filter_fd(const char *path)
+{
+   struct conv_attrs ca;
+
+   convert_attrs(ca, path);
+   if (!ca.drv)
+   return 0;
+
+   /* Apply a filter to an fd only if the filter is required to succeed.
+* We must die if the filter fails, because the original data before
+* filtering is not available.
+*/
+   if (!ca.drv-required)
+   return 0;
+
+   return apply_filter(path, NULL, 0, -1, NULL, ca.drv-clean);
+}
+
 int convert_to_git(const char *path, const char *src, size_t len,
struct strbuf *dst, enum safe_crlf checksafe)
 {
@@ -761,7 +792,7 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
required = ca.drv-required;
}
 
-   ret |= apply_filter(path, src, len, dst, filter);
+   ret |= apply_filter(path, src, len, -1, dst, filter);
if (!ret  required)
die(%s: clean filter '%s' failed, path, ca.drv-name);
 
@@ -778,6 +809,23 @@ int convert_to_git(const char *path

[PATCH] convert: Stream from fd to required clean filter instead of mmap

2014-08-03 Thread Steffen Prohaska
The data is streamed to the filter process anyway.  Better avoid mapping
the file if possible.  This is especially useful if a clean filter
reduces the size, for example if it computes a sha1 for binary data,
like git media.  The file size that the previous implementation could
handle was limited by the available address space; large files for
example could not be handled with (32-bit) msysgit.  The new
implementation can filter files of any size as long as the filter output
is small enough.

The new code path is only taken if the filter is required.  The filter
consumes data directly from the fd.  The original data is not available
to git, so it must fail if the filter fails.

The test that exercises required filters is modified to verify that the
data actually has been modified on its way from the file system to the
object store.

The expectation on the process size is tested using /usr/bin/time.  An
alternative would have been tcsh, which could be used to print memory
information as follows:

tcsh -c 'set time=(0 %M); cmd'

Although the logic could perhaps be simplified with tcsh, I chose to use
'time' to avoid a dependency on tcsh.

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 convert.c | 58 ++-
 convert.h | 10 +---
 sha1_file.c   | 29 --
 t/t0021-conversion.sh | 68 +++
 4 files changed, 149 insertions(+), 16 deletions(-)

diff --git a/convert.c b/convert.c
index cb5fbb4..58a516a 100644
--- a/convert.c
+++ b/convert.c
@@ -312,11 +312,12 @@ static int crlf_to_worktree(const char *path, const char 
*src, size_t len,
 struct filter_params {
const char *src;
unsigned long size;
+   int fd;
const char *cmd;
const char *path;
 };
 
-static int filter_buffer(int in, int out, void *data)
+static int filter_buffer_or_fd(int in, int out, void *data)
 {
/*
 * Spawn cmd and feed the buffer contents through its stdin.
@@ -325,6 +326,7 @@ static int filter_buffer(int in, int out, void *data)
struct filter_params *params = (struct filter_params *)data;
int write_err, status;
const char *argv[] = { NULL, NULL };
+   int fd;
 
/* apply % substitution to cmd */
struct strbuf cmd = STRBUF_INIT;
@@ -355,7 +357,17 @@ static int filter_buffer(int in, int out, void *data)
 
sigchain_push(SIGPIPE, SIG_IGN);
 
-   write_err = (write_in_full(child_process.in, params-src, params-size) 
 0);
+   if (params-src) {
+   write_err = (write_in_full(child_process.in, params-src, 
params-size)  0);
+   } else {
+   /* dup(), because copy_fd() closes the input fd. */
+   fd = dup(params-fd);
+   if (fd  0)
+   write_err = error(failed to dup file descriptor.);
+   else
+   write_err = copy_fd(fd, child_process.in);
+   }
+
if (close(child_process.in))
write_err = 1;
if (write_err)
@@ -371,7 +383,7 @@ static int filter_buffer(int in, int out, void *data)
return (write_err || status);
 }
 
-static int apply_filter(const char *path, const char *src, size_t len,
+static int apply_filter(const char *path, const char *src, size_t len, int fd,
 struct strbuf *dst, const char *cmd)
 {
/*
@@ -392,11 +404,12 @@ static int apply_filter(const char *path, const char 
*src, size_t len,
return 1;
 
memset(async, 0, sizeof(async));
-   async.proc = filter_buffer;
+   async.proc = filter_buffer_or_fd;
async.data = params;
async.out = -1;
params.src = src;
params.size = len;
+   params.fd = fd;
params.cmd = cmd;
params.path = path;
 
@@ -747,6 +760,22 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
}
 }
 
+int would_convert_to_git_filter_fd(const char *path) {
+   struct conv_attrs ca;
+   convert_attrs(ca, path);
+
+   if (!ca.drv)
+   return 0;
+
+   /* Apply a filter to an fd only if the filter is required to succeed.
+* We must die if the filter fails, because the original data before
+* filtering is not available. */
+   if (!ca.drv-required)
+   return 0;
+
+   return apply_filter(path, 0, 0, -1, 0, ca.drv-clean);
+}
+
 int convert_to_git(const char *path, const char *src, size_t len,
struct strbuf *dst, enum safe_crlf checksafe)
 {
@@ -761,7 +790,7 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
required = ca.drv-required;
}
 
-   ret |= apply_filter(path, src, len, dst, filter);
+   ret |= apply_filter(path, src, len, -1, dst, filter);
if (!ret  required)
die(%s: clean filter '%s' failed, path, ca.drv-name);
 
@@ -778,6 +807,23 @@ int convert_to_git(const char *path, const char

Re: [PATCH v2] completion: Handle '!f() { ... }; f' aliases

2014-06-12 Thread Steffen Prohaska

On Jun 10, 2014, at 7:27 AM, Junio C Hamano gits...@pobox.com wrote:

 Steffen Prohaska proha...@zib.de writes:
 
 I tend to prefer writing it like so instead:
 
sh -c '...' -
 
 so that I won't clobber f (or any other name).  I wonder if you
 can help users of this other pattern as well.

I'll send an updated patch that handles it.


 +test_expect_success 'completion uses cmd completion for alias !f() { 
 VAR=val git cmd ... }' '
 +test_config alias.co !f() { VAR=val git checkout ... ; } f 
 
 Is it only f that is completed, or can I spell it using another
 arbitrary token, e.g.
 
   test_config alias.co !co () { git checkout ... } co

Any token that starts with ! already worked before. 

The updated patch will also handle spaces before the parens.

Steffen--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] completion: Handle '!f() { ... }; f' and !sh -c '...' aliases

2014-06-12 Thread Steffen Prohaska
'!f() { ... }; f' and !sh -c '' are recommended patterns for
declaring more complex aliases (see git wiki [1]).  This commit teaches
the completion to handle them.

When determining which completion to use for an alias, an opening brace
or single quote is now skipped, and the search for a git command is
continued.  For example, the aliases '!f() { git commit ... }' or !sh
-c 'git commit ...' now trigger commit completion.  Previously, the
search stopped on the opening brace or quote, and the completion tried
it to determine how to complete, which obviously was useless.

The null command ':' is now skipped, so that it can be used as
a workaround to declare the desired completion style.  For example, the
aliases '!f() { : git commit ; if ...  ' and !sh -c ': git commit; if
...' now trigger commit completion.

Shell function declarations now work with or without space before
the parens, i.e. '!f() ...' and '!f () ...' both work.

[1] https://git.wiki.kernel.org/index.php/Aliases

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 contrib/completion/git-completion.bash | 10 ++
 t/t9902-completion.sh  | 27 +++
 2 files changed, 37 insertions(+)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 2c59a76..575f8f7 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -21,6 +21,12 @@
 #source ~/.git-completion.sh
 #3) Consider changing your PS1 to also show the current branch,
 #   see git-prompt.sh for details.
+#
+# If you use complex aliases of form '!f() { ... }; f', you can use the null
+# command ':' as the first command in the function body to declare the desired
+# completion style.  For example '!f() { : git commit ; ... }; f' will
+# tell the completion to use commit completion.  This also works with aliases
+# of form !sh -c '...'.  For example, !sh -c ': git commit ; ... '.
 
 case $COMP_WORDBREAKS in
 *:*) : great ;;
@@ -781,6 +787,10 @@ __git_aliased_command ()
-*) : option ;;
*=*): setting env ;;
git): git itself ;;
+   \(\))   : skip parens of shell function definition ;;
+   {)  : skip start of shell helper function ;;
+   :)  : skip null command ;;
+   \'*): skip opening quote after sh -c ;;
*)
echo $word
return
diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 2d4beb5..1d1c106 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -550,6 +550,33 @@ test_expect_success 'complete files' '
test_completion git add mom momified
 '
 
+test_expect_success completion uses cmd completion for alias: !sh -c 'git 
cmd ...' '
+   test_config alias.co !sh -c '''git checkout ...''' 
+   test_completion git co m -\EOF
+   master Z
+   mybranch Z
+   mytag Z
+   EOF
+'
+
+test_expect_success 'completion uses cmd completion for alias: !f () { 
VAR=val git cmd ... }' '
+   test_config alias.co !f () { VAR=val git checkout ... ; } f 
+   test_completion git co m -\EOF
+   master Z
+   mybranch Z
+   mytag Z
+   EOF
+'
+
+test_expect_success 'completion used cmd completion for alias: !f() { : git 
cmd ; ... }' '
+   test_config alias.co !f() { : git checkout ; if ... } f 
+   test_completion git co m -\EOF
+   master Z
+   mybranch Z
+   mytag Z
+   EOF
+'
+
 test_expect_failure 'complete with tilde expansion' '
git init tmp  cd tmp 
test_when_finished cd ..  rm -rf tmp 
-- 
2.0.0.244.g4e8e734

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] completion: Handle '!f() { ... }; f' aliases

2014-06-09 Thread Steffen Prohaska
'!f() { ... }; f' is a recommended pattern to declare more complex
aliases (see git wiki [1]).  This commit teaches the completion to
handle them.

When determining which completion to use for an alias, the opening brace
is now ignored in order to continue the search for a git command inside
the function body.  For example, the alias '!f() { git commit ... }' now
triggers commit completion.  Previously, the search stopped on '{', and
the completion tried it to determine how to complete, which obviously
was useless.

Furthermore, the null command ':' is now skipped, so that it can be used
as a workaround to declare the desired completion style.  For example,
the alias '!f() { : git commit ; if ...  ' now triggers commit
completion.

[1] https://git.wiki.kernel.org/index.php/Aliases

Signed-off-by: Steffen Prohaska proha...@zib.de
---

I changed the tests to use test_config, as Eric suggested.  Thanks.

 contrib/completion/git-completion.bash |  7 +++
 t/t9902-completion.sh  | 18 ++
 2 files changed, 25 insertions(+)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 2c59a76..aecb975 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -21,6 +21,11 @@
 #source ~/.git-completion.sh
 #3) Consider changing your PS1 to also show the current branch,
 #   see git-prompt.sh for details.
+#
+# If you use complex aliases of form '!f() { ... }; f', you can use the null
+# command ':' as the first command in the function body to declare the desired
+# completion style.  For example '!f() { : git commit ; ... }; f' will
+# tell the completion to use commit completion.
 
 case $COMP_WORDBREAKS in
 *:*) : great ;;
@@ -781,6 +786,8 @@ __git_aliased_command ()
-*) : option ;;
*=*): setting env ;;
git): git itself ;;
+   {)  : skip start of shell helper function ;;
+   :)  : skip null command ;;
*)
echo $word
return
diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 2d4beb5..10ceb29 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -550,6 +550,24 @@ test_expect_success 'complete files' '
test_completion git add mom momified
 '
 
+test_expect_success 'completion uses cmd completion for alias !f() { VAR=val 
git cmd ... }' '
+   test_config alias.co !f() { VAR=val git checkout ... ; } f 
+   test_completion git co m -\EOF
+   master Z
+   mybranch Z
+   mytag Z
+   EOF
+'
+
+test_expect_success 'completion used cmd completion for alias !f() { : git 
cmd ; ... }' '
+   test_config alias.co !f() { : git checkout ; if ... } f 
+   test_completion git co m -\EOF
+   master Z
+   mybranch Z
+   mytag Z
+   EOF
+'
+
 test_expect_failure 'complete with tilde expansion' '
git init tmp  cd tmp 
test_when_finished cd ..  rm -rf tmp 
-- 
2.0.0.244.g4e8e734

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] completion: Handle '!f() { ... }; f' aliases

2014-06-07 Thread Steffen Prohaska
'!f() { ... }; f' is a recommended pattern to declare more complex
aliases (see git wiki [1]).  This commit teaches the completion to
handle them.

When determining which completion to use for an alias, the opening brace
is now ignored in order to continue the search for a git command inside
the function body.  For example, the alias '!f() { git commit ... }' now
triggers commit completion.  Previously, the search stopped on '{', and
the completion tried it to determine how to complete, which obviously
was useless.

Furthermore, the null command ':' is now skipped, so that it can be used
as a workaround to declare the desired completion style.  For example,
the alias '!f() { : git commit ; if ...  ' now triggers commit
completion.

[1] https://git.wiki.kernel.org/index.php/Aliases

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 contrib/completion/git-completion.bash |  7 +++
 t/t9902-completion.sh  | 20 
 2 files changed, 27 insertions(+)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 2c59a76..aecb975 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -21,6 +21,11 @@
 #source ~/.git-completion.sh
 #3) Consider changing your PS1 to also show the current branch,
 #   see git-prompt.sh for details.
+#
+# If you use complex aliases of form '!f() { ... }; f', you can use the null
+# command ':' as the first command in the function body to declare the desired
+# completion style.  For example '!f() { : git commit ; ... }; f' will
+# tell the completion to use commit completion.
 
 case $COMP_WORDBREAKS in
 *:*) : great ;;
@@ -781,6 +786,8 @@ __git_aliased_command ()
-*) : option ;;
*=*): setting env ;;
git): git itself ;;
+   {)  : skip start of shell helper function ;;
+   :)  : skip null command ;;
*)
echo $word
return
diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 2d4beb5..ea48681 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -550,6 +550,26 @@ test_expect_success 'complete files' '
test_completion git add mom momified
 '
 
+test_expect_success 'completion uses cmd completion for alias !f() { VAR=val 
git cmd ... }' '
+   git config alias.co !f() { VAR=val git checkout ... ; } f 
+   test_completion git co m -\EOF 
+   master Z
+   mybranch Z
+   mytag Z
+   EOF
+   git config --unset alias.co
+'
+
+test_expect_success 'completion used cmd completion for alias !f() { : git 
cmd ; ... }' '
+   git config alias.co !f() { : git checkout ; if ... } f 
+   test_completion git co m -\EOF 
+   master Z
+   mybranch Z
+   mytag Z
+   EOF
+   git config --unset alias.co
+'
+
 test_expect_failure 'complete with tilde expansion' '
git init tmp  cd tmp 
test_when_finished cd ..  rm -rf tmp 
-- 
2.0.0.244.g4e8e734

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v1.8.4.1

2013-09-28 Thread Steffen Prohaska
Hello Jonathan,

On Sep 27, 2013, at 8:52 PM, Jonathan Nieder jrnie...@gmail.com wrote:

 The latest maintenance release Git v1.8.4.1 is now available.

I can't find the following three minor doc fixes

  http://article.gmane.org/gmane.comp.version-control.git/235234
  http://article.gmane.org/gmane.comp.version-control.git/235233
  http://article.gmane.org/gmane.comp.version-control.git/235255

on any of your branches.  I'm wondering whether they got overlooked when you 
took over from Junio.

Steffen--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] git.txt: Fix asciidoc syntax of --*-pathspecs

2013-09-23 Thread Steffen Prohaska
Labeled lists require a double colon.

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 Documentation/git.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 5d68d33..4c2757e 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -475,19 +475,19 @@ example the following invocations are equivalent:
This is equivalent to setting the `GIT_LITERAL_PATHSPECS` environment
variable to `1`.
 
---glob-pathspecs:
+--glob-pathspecs::
Add glob magic to all pathspec. This is equivalent to setting
the `GIT_GLOB_PATHSPECS` environment variable to `1`. Disabling
globbing on individual pathspecs can be done using pathspec
magic :(literal)
 
---noglob-pathspecs:
+--noglob-pathspecs::
Add literal magic to all pathspec. This is equivalent to setting
the `GIT_NOGLOB_PATHSPECS` environment variable to `1`. Enabling
globbing on individual pathspecs can be done using pathspec
magic :(glob)
 
---icase-pathspecs:
+--icase-pathspecs::
Add icase magic to all pathspec. This is equivalent to setting
the `GIT_ICASE_PATHSPECS` environment variable to `1`.
 
-- 
1.8.4.477.gfa286b2

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] git-prune-packed.txt: Fix reference to GIT_OBJECT_DIRECTORY

2013-09-23 Thread Steffen Prohaska
git-prune-packed operates on GIT_OBJECT_DIRECTORY.  See 'environment.c',

git_object_dir = getenv(DB_ENVIRONMENT);

and cache.h,

#define DB_ENVIRONMENT GIT_OBJECT_DIRECTORY

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 Documentation/git-prune-packed.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-prune-packed.txt 
b/Documentation/git-prune-packed.txt
index 80dc022..6738055 100644
--- a/Documentation/git-prune-packed.txt
+++ b/Documentation/git-prune-packed.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 
 DESCRIPTION
 ---
-This program searches the `$GIT_OBJECT_DIR` for all objects that currently
+This program searches the `$GIT_OBJECT_DIRECTORY` for all objects that 
currently
 exist in a pack file as well as the independent object directories.
 
 All such extra objects are removed.
-- 
1.8.4.477.gfa286b2

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 0/2] Fix IO = 2GB on Mac, fixed typo

2013-08-21 Thread Steffen Prohaska
Fixed typo in comment.

Steffen Prohaska (2):
  xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X
  Revert compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

 Makefile   |  8 
 compat/clipped-write.c | 13 -
 config.mak.uname   |  1 -
 git-compat-util.h  |  5 -
 t/t0021-conversion.sh  | 14 ++
 wrapper.c  | 12 
 6 files changed, 26 insertions(+), 27 deletions(-)
 delete mode 100644 compat/clipped-write.c

-- 
1.8.4.rc3.5.g4f480ff

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 2/2] Revert compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

2013-08-21 Thread Steffen Prohaska
The previous commit introduced a size limit on IO chunks on all
platforms.  The compat clipped_write() is not needed anymore.

This reverts commit 6c642a878688adf46b226903858b53e2d31ac5c3.

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 Makefile   |  8 
 compat/clipped-write.c | 13 -
 config.mak.uname   |  1 -
 git-compat-util.h  |  5 -
 4 files changed, 27 deletions(-)
 delete mode 100644 compat/clipped-write.c

diff --git a/Makefile b/Makefile
index 3588ca1..4026211 100644
--- a/Makefile
+++ b/Makefile
@@ -69,9 +69,6 @@ all::
 # Define NO_MSGFMT_EXTENDED_OPTIONS if your implementation of msgfmt
 # doesn't support GNU extensions like --check and --statistics
 #
-# Define NEEDS_CLIPPED_WRITE if your write(2) cannot write more than
-# INT_MAX bytes at once (e.g. MacOS X).
-#
 # Define HAVE_PATHS_H if you have paths.h and want to use the default PATH
 # it specifies.
 #
@@ -1493,11 +1490,6 @@ ifndef NO_MSGFMT_EXTENDED_OPTIONS
MSGFMT += --check --statistics
 endif
 
-ifdef NEEDS_CLIPPED_WRITE
-   BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE
-   COMPAT_OBJS += compat/clipped-write.o
-endif
-
 ifneq (,$(XDL_FAST_HASH))
BASIC_CFLAGS += -DXDL_FAST_HASH
 endif
diff --git a/compat/clipped-write.c b/compat/clipped-write.c
deleted file mode 100644
index b8f98ff..000
--- a/compat/clipped-write.c
+++ /dev/null
@@ -1,13 +0,0 @@
-#include ../git-compat-util.h
-#undef write
-
-/*
- * Version of write that will write at most INT_MAX bytes.
- * Workaround a xnu bug on Mac OS X
- */
-ssize_t clipped_write(int fildes, const void *buf, size_t nbyte)
-{
-   if (nbyte  INT_MAX)
-   nbyte = INT_MAX;
-   return write(fildes, buf, nbyte);
-}
diff --git a/config.mak.uname b/config.mak.uname
index b27f51d..7d61531 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -95,7 +95,6 @@ ifeq ($(uname_S),Darwin)
NO_MEMMEM = YesPlease
USE_ST_TIMESPEC = YesPlease
HAVE_DEV_TTY = YesPlease
-   NEEDS_CLIPPED_WRITE = YesPlease
COMPAT_OBJS += compat/precompose_utf8.o
BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
 endif
diff --git a/git-compat-util.h b/git-compat-util.h
index 115cb1d..96d8881 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -185,11 +185,6 @@ typedef unsigned long uintptr_t;
 #define probe_utf8_pathname_composition(a,b)
 #endif
 
-#ifdef NEEDS_CLIPPED_WRITE
-ssize_t clipped_write(int fildes, const void *buf, size_t nbyte);
-#define write(x,y,z) clipped_write((x),(y),(z))
-#endif
-
 #ifdef MKDIR_WO_TRAILING_SLASH
 #define mkdir(a,b) compat_mkdir_wo_trailing_slash((a),(b))
 extern int compat_mkdir_wo_trailing_slash(const char*, mode_t);
-- 
1.8.4.rc3.5.g4f480ff

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X

2013-08-21 Thread Steffen Prohaska
Previously, filtering 2GB or more through an external filter (see test)
failed on Mac OS X 10.8.4 (12E55) for a 64-bit executable with:

error: read from external filter cat failed
error: cannot feed the input to external filter cat
error: cat died of signal 13
error: external filter cat failed 141
error: external filter cat failed

The reason was that read() immediately returns with EINVAL if nbyte =
2GB.  According to POSIX [1], if the value of nbyte passed to read() is
greater than SSIZE_MAX, the result is implementation-defined.  The write
function has the same restriction [2].  Since OS X still supports
running 32-bit executables, the 32-bit limit (SSIZE_MAX = INT_MAX
= 2GB - 1) seems to be also imposed on 64-bit executables under certain
conditions.  For write, the problem has been addressed earlier [6c642a].

This commit addresses the problem for read() and write() by limiting
size of IO chunks unconditionally on all platforms in xread() and
xwrite().  Large chunks only cause problems, like triggering the OS
X bug or causing latencies when killing the process.  Reasonably sized
smaller chunks have no negative impact on performance.

The compat wrapper clipped_write() introduced earlier [6c642a] is not
needed anymore.  It will be reverted in a separate commit.  The new test
catches read and write problems.

Note that 'git add' exits with 0 even if it prints filtering errors to
stderr.  The test, therefore, checks stderr.  'git add' should probably
be changed (sometime in another commit) to exit with nonzero if
filtering fails.  The test could then be changed to use test_must_fail.

Thanks to the following people for suggestions and testing:

Johannes Sixt j...@kdbg.org
John Keeping j...@keeping.me.uk
Jonathan Nieder jrnie...@gmail.com
Kyle J. McKay mack...@gmail.com
Linus Torvalds torva...@linux-foundation.org
Torsten Bögershausen tbo...@web.de

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

[6c642a] commit 6c642a878688adf46b226903858b53e2d31ac5c3
compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 t/t0021-conversion.sh | 14 ++
 wrapper.c | 12 
 2 files changed, 26 insertions(+)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index e50f0f7..b92e6cb 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -190,4 +190,18 @@ test_expect_success 'required filter clean failure' '
test_must_fail git add test.fc
 '
 
+test -n $GIT_TEST_LONG  test_set_prereq EXPENSIVE
+
+test_expect_success EXPENSIVE 'filter large file' '
+   git config filter.largefile.smudge cat 
+   git config filter.largefile.clean cat 
+   for i in $(test_seq 1 2048); do printf %1048576d 1; done 2GB 
+   echo 2GB filter=largefile .gitattributes 
+   git add 2GB 2err 
+   ! test -s err 
+   rm -f 2GB 
+   git checkout -- 2GB 2err 
+   ! test -s err
+'
+
 test_done
diff --git a/wrapper.c b/wrapper.c
index 6a015de..66cc727 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -131,6 +131,14 @@ void *xcalloc(size_t nmemb, size_t size)
 }
 
 /*
+ * Limit size of IO chunks, because huge chunks only cause pain.  OS X 64-bit
+ * is buggy, returning EINVAL if len = INT_MAX; and even in the absense of
+ * bugs, large chunks can result in bad latencies when you decide to kill the
+ * process.
+ */
+#define MAX_IO_SIZE (8*1024*1024)
+
+/*
  * xread() is the same a read(), but it automatically restarts read()
  * operations with a recoverable error (EAGAIN and EINTR). xread()
  * DOES NOT GUARANTEE that len bytes is read even if the data is available.
@@ -138,6 +146,8 @@ void *xcalloc(size_t nmemb, size_t size)
 ssize_t xread(int fd, void *buf, size_t len)
 {
ssize_t nr;
+   if (len  MAX_IO_SIZE)
+   len = MAX_IO_SIZE;
while (1) {
nr = read(fd, buf, len);
if ((nr  0)  (errno == EAGAIN || errno == EINTR))
@@ -154,6 +164,8 @@ ssize_t xread(int fd, void *buf, size_t len)
 ssize_t xwrite(int fd, const void *buf, size_t len)
 {
ssize_t nr;
+   if (len  MAX_IO_SIZE)
+   len = MAX_IO_SIZE;
while (1) {
nr = write(fd, buf, len);
if ((nr  0)  (errno == EAGAIN || errno == EINTR))
-- 
1.8.4.rc3.5.g4f480ff

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 2/2] Revert compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

2013-08-20 Thread Steffen Prohaska
The previous commit introduced a size limit on IO chunks on all
platforms.  The compat clipped_write() is not needed anymore.

This reverts commit 6c642a878688adf46b226903858b53e2d31ac5c3.

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 Makefile   |  8 
 compat/clipped-write.c | 13 -
 config.mak.uname   |  1 -
 git-compat-util.h  |  5 -
 4 files changed, 27 deletions(-)
 delete mode 100644 compat/clipped-write.c

diff --git a/Makefile b/Makefile
index 3588ca1..4026211 100644
--- a/Makefile
+++ b/Makefile
@@ -69,9 +69,6 @@ all::
 # Define NO_MSGFMT_EXTENDED_OPTIONS if your implementation of msgfmt
 # doesn't support GNU extensions like --check and --statistics
 #
-# Define NEEDS_CLIPPED_WRITE if your write(2) cannot write more than
-# INT_MAX bytes at once (e.g. MacOS X).
-#
 # Define HAVE_PATHS_H if you have paths.h and want to use the default PATH
 # it specifies.
 #
@@ -1493,11 +1490,6 @@ ifndef NO_MSGFMT_EXTENDED_OPTIONS
MSGFMT += --check --statistics
 endif
 
-ifdef NEEDS_CLIPPED_WRITE
-   BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE
-   COMPAT_OBJS += compat/clipped-write.o
-endif
-
 ifneq (,$(XDL_FAST_HASH))
BASIC_CFLAGS += -DXDL_FAST_HASH
 endif
diff --git a/compat/clipped-write.c b/compat/clipped-write.c
deleted file mode 100644
index b8f98ff..000
--- a/compat/clipped-write.c
+++ /dev/null
@@ -1,13 +0,0 @@
-#include ../git-compat-util.h
-#undef write
-
-/*
- * Version of write that will write at most INT_MAX bytes.
- * Workaround a xnu bug on Mac OS X
- */
-ssize_t clipped_write(int fildes, const void *buf, size_t nbyte)
-{
-   if (nbyte  INT_MAX)
-   nbyte = INT_MAX;
-   return write(fildes, buf, nbyte);
-}
diff --git a/config.mak.uname b/config.mak.uname
index b27f51d..7d61531 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -95,7 +95,6 @@ ifeq ($(uname_S),Darwin)
NO_MEMMEM = YesPlease
USE_ST_TIMESPEC = YesPlease
HAVE_DEV_TTY = YesPlease
-   NEEDS_CLIPPED_WRITE = YesPlease
COMPAT_OBJS += compat/precompose_utf8.o
BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
 endif
diff --git a/git-compat-util.h b/git-compat-util.h
index 115cb1d..96d8881 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -185,11 +185,6 @@ typedef unsigned long uintptr_t;
 #define probe_utf8_pathname_composition(a,b)
 #endif
 
-#ifdef NEEDS_CLIPPED_WRITE
-ssize_t clipped_write(int fildes, const void *buf, size_t nbyte);
-#define write(x,y,z) clipped_write((x),(y),(z))
-#endif
-
 #ifdef MKDIR_WO_TRAILING_SLASH
 #define mkdir(a,b) compat_mkdir_wo_trailing_slash((a),(b))
 extern int compat_mkdir_wo_trailing_slash(const char*, mode_t);
-- 
1.8.4.rc3.5.g4f480ff

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 0/2] Fix IO of =2GB on Mac OS X by limiting IO chunks

2013-08-20 Thread Steffen Prohaska
This is the revised patch taking the considerations about IO chunk size into
account.  The series deletes more than it adds and fixes a bug.  Nice.

Steffen Prohaska (2):
  xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X
  Revert compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

 Makefile   |  8 
 compat/clipped-write.c | 13 -
 config.mak.uname   |  1 -
 git-compat-util.h  |  5 -
 t/t0021-conversion.sh  | 14 ++
 wrapper.c  | 12 
 6 files changed, 26 insertions(+), 27 deletions(-)
 delete mode 100644 compat/clipped-write.c

-- 
1.8.4.rc3.5.g4f480ff

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X

2013-08-20 Thread Steffen Prohaska
Previously, filtering 2GB or more through an external filter (see test)
failed on Mac OS X 10.8.4 (12E55) for a 64-bit executable with:

error: read from external filter cat failed
error: cannot feed the input to external filter cat
error: cat died of signal 13
error: external filter cat failed 141
error: external filter cat failed

The reason was that read() immediately returns with EINVAL if nbyte =
2GB.  According to POSIX [1], if the value of nbyte passed to read() is
greater than SSIZE_MAX, the result is implementation-defined.  The write
function has the same restriction [2].  Since OS X still supports
running 32-bit executables, the 32-bit limit (SSIZE_MAX = INT_MAX
= 2GB - 1) seems to be also imposed on 64-bit executables under certain
conditions.  For write, the problem has been addressed earlier [6c642a].

This commit addresses the problem for read() and write() by limiting
size of IO chunks unconditionally on all platforms in xread() and
xwrite().  Large chunks only cause problems, like triggering the OS
X bug or causing latencies when killing the process.  Reasonably sized
smaller chunks have no negative impact on performance.

The compat wrapper clipped_write() introduced earlier [6c642a] is not
needed anymore.  It will be reverted in a separate commit.  The new test
catches read and write problems.

Note that 'git add' exits with 0 even if it prints filtering errors to
stderr.  The test, therefore, checks stderr.  'git add' should probably
be changed (sometime in another commit) to exit with nonzero if
filtering fails.  The test could then be changed to use test_must_fail.

Thanks to the following people for suggestions and testing:

Johannes Sixt j...@kdbg.org
John Keeping j...@keeping.me.uk
Jonathan Nieder jrnie...@gmail.com
Kyle J. McKay mack...@gmail.com
Linus Torvalds torva...@linux-foundation.org
Torsten Bögershausen tbo...@web.de

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

[6c642a] commit 6c642a878688adf46b226903858b53e2d31ac5c3
compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 t/t0021-conversion.sh | 14 ++
 wrapper.c | 12 
 2 files changed, 26 insertions(+)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index e50f0f7..b92e6cb 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -190,4 +190,18 @@ test_expect_success 'required filter clean failure' '
test_must_fail git add test.fc
 '
 
+test -n $GIT_TEST_LONG  test_set_prereq EXPENSIVE
+
+test_expect_success EXPENSIVE 'filter large file' '
+   git config filter.largefile.smudge cat 
+   git config filter.largefile.clean cat 
+   for i in $(test_seq 1 2048); do printf %1048576d 1; done 2GB 
+   echo 2GB filter=largefile .gitattributes 
+   git add 2GB 2err 
+   ! test -s err 
+   rm -f 2GB 
+   git checkout -- 2GB 2err 
+   ! test -s err
+'
+
 test_done
diff --git a/wrapper.c b/wrapper.c
index 6a015de..97e3cf7 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -131,6 +131,14 @@ void *xcalloc(size_t nmemb, size_t size)
 }
 
 /*
+ * Limit size of IO chunks, because huge chunks only cause pain.  OS X 64-bit
+ * buggy, returning EINVAL if len = INT_MAX; and even in the absense of bugs,
+ * large chunks can result in bad latencies when you decide to kill the
+ * process.
+ */
+#define MAX_IO_SIZE (8*1024*1024)
+
+/*
  * xread() is the same a read(), but it automatically restarts read()
  * operations with a recoverable error (EAGAIN and EINTR). xread()
  * DOES NOT GUARANTEE that len bytes is read even if the data is available.
@@ -138,6 +146,8 @@ void *xcalloc(size_t nmemb, size_t size)
 ssize_t xread(int fd, void *buf, size_t len)
 {
ssize_t nr;
+   if (len  MAX_IO_SIZE)
+   len = MAX_IO_SIZE;
while (1) {
nr = read(fd, buf, len);
if ((nr  0)  (errno == EAGAIN || errno == EINTR))
@@ -154,6 +164,8 @@ ssize_t xread(int fd, void *buf, size_t len)
 ssize_t xwrite(int fd, const void *buf, size_t len)
 {
ssize_t nr;
+   if (len  MAX_IO_SIZE)
+   len = MAX_IO_SIZE;
while (1) {
nr = write(fd, buf, len);
if ((nr  0)  (errno == EAGAIN || errno == EINTR))
-- 
1.8.4.rc3.5.g4f480ff

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] compat: Fix read() of 2GB and more on Mac OS X

2013-08-19 Thread Steffen Prohaska
Previously, filtering 2GB or more through an external filter (see test)
failed on Mac OS X 10.8.4 (12E55) for a 64-bit executable with:

error: read from external filter cat failed
error: cannot feed the input to external filter cat
error: cat died of signal 13
error: external filter cat failed 141
error: external filter cat failed

The reason was that read() immediately returns with EINVAL if nbyte =
2GB.  According to POSIX [1], if the value of nbyte passed to read() is
greater than SSIZE_MAX, the result is implementation-defined.  The write
function has the same restriction [2].  Since OS X still supports
running 32-bit executables, the 32-bit limit (SSIZE_MAX = INT_MAX
= 2GB - 1) seems to be also imposed on 64-bit executables under certain
conditions.  For write, the problem has been addressed in a earlier
commit [6c642a].

The problem for read() is addressed in a similar way by introducing
a wrapper function in compat that always reads less than 2GB.
Unfortunately, '#undef read' is needed at a few places to avoid
expanding the compat macro in constructs like 'vtbl-read(...)'.

Note that 'git add' exits with 0 even if it prints filtering errors to
stderr.  The test, therefore, checks stderr.  'git add' should probably
be changed (sometime in another commit) to exit with nonzero if
filtering fails.  The test could then be changed to use test_must_fail.

Thanks to the following people for their suggestions:

Johannes Sixt j...@kdbg.org
John Keeping j...@keeping.me.uk
Jonathan Nieder jrnie...@gmail.com
Kyle J. McKay mack...@gmail.com
Torsten Bögershausen tbo...@web.de

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

[6c642a] 6c642a878688adf46b226903858b53e2d31ac5c3
compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 Makefile  |  8 
 builtin/var.c |  1 +
 config.mak.uname  |  1 +
 git-compat-util.h |  5 +
 streaming.c   |  1 +
 t/t0021-conversion.sh | 14 ++
 6 files changed, 30 insertions(+)

diff --git a/Makefile b/Makefile
index 3588ca1..0f69e24 100644
--- a/Makefile
+++ b/Makefile
@@ -69,6 +69,9 @@ all::
 # Define NO_MSGFMT_EXTENDED_OPTIONS if your implementation of msgfmt
 # doesn't support GNU extensions like --check and --statistics
 #
+# Define NEEDS_CLIPPED_READ if your read(2) cannot read more than
+# INT_MAX bytes at once (e.g. MacOS X).
+#
 # Define NEEDS_CLIPPED_WRITE if your write(2) cannot write more than
 # INT_MAX bytes at once (e.g. MacOS X).
 #
@@ -1493,6 +1496,11 @@ ifndef NO_MSGFMT_EXTENDED_OPTIONS
MSGFMT += --check --statistics
 endif
 
+ifdef NEEDS_CLIPPED_READ
+   BASIC_CFLAGS += -DNEEDS_CLIPPED_READ
+   COMPAT_OBJS += compat/clipped-read.o
+endif
+
 ifdef NEEDS_CLIPPED_WRITE
BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE
COMPAT_OBJS += compat/clipped-write.o
diff --git a/builtin/var.c b/builtin/var.c
index aedbb53..e59f5ba 100644
--- a/builtin/var.c
+++ b/builtin/var.c
@@ -38,6 +38,7 @@ static struct git_var git_vars[] = {
{ , NULL },
 };
 
+#undef read
 static void list_vars(void)
 {
struct git_var *ptr;
diff --git a/config.mak.uname b/config.mak.uname
index b27f51d..5c10726 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -95,6 +95,7 @@ ifeq ($(uname_S),Darwin)
NO_MEMMEM = YesPlease
USE_ST_TIMESPEC = YesPlease
HAVE_DEV_TTY = YesPlease
+   NEEDS_CLIPPED_READ = YesPlease
NEEDS_CLIPPED_WRITE = YesPlease
COMPAT_OBJS += compat/precompose_utf8.o
BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
diff --git a/git-compat-util.h b/git-compat-util.h
index 115cb1d..a227127 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -185,6 +185,11 @@ typedef unsigned long uintptr_t;
 #define probe_utf8_pathname_composition(a,b)
 #endif
 
+#ifdef NEEDS_CLIPPED_READ
+ssize_t clipped_read(int fd, void *buf, size_t nbyte);
+#define read(x,y,z) clipped_read((x),(y),(z))
+#endif
+
 #ifdef NEEDS_CLIPPED_WRITE
 ssize_t clipped_write(int fildes, const void *buf, size_t nbyte);
 #define write(x,y,z) clipped_write((x),(y),(z))
diff --git a/streaming.c b/streaming.c
index debe904..c1fe34a 100644
--- a/streaming.c
+++ b/streaming.c
@@ -99,6 +99,7 @@ int close_istream(struct git_istream *st)
return r;
 }
 
+#undef read
 ssize_t read_istream(struct git_istream *st, void *buf, size_t sz)
 {
return st-vtbl-read(st, buf, sz);
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index e50f0f7..b92e6cb 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -190,4 +190,18 @@ test_expect_success 'required filter clean failure' '
test_must_fail git add test.fc
 '
 
+test -n $GIT_TEST_LONG  test_set_prereq EXPENSIVE
+
+test_expect_success EXPENSIVE 'filter large file' '
+   git config filter.largefile.smudge cat 
+   git config

Re: [PATCH v2] compat: Fix read() of 2GB and more on Mac OS X

2013-08-19 Thread Steffen Prohaska

On Aug 19, 2013, at 9:54 AM, John Keeping j...@keeping.me.uk wrote:

 You've created compat/clipped-read.c, but...
 
 Makefile  |  8 
 builtin/var.c |  1 +
 config.mak.uname  |  1 +
 git-compat-util.h |  5 +
 streaming.c   |  1 +
 t/t0021-conversion.sh | 14 ++
 6 files changed, 30 insertions(+)
 
 ... it's not included here.  Did you forget to git add?

Indeed.  How embarrassing.  Thanks for spotting this.  I'll send v3 in a minute.

Stefffen--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] compat: Fix read() of 2GB and more on Mac OS X

2013-08-19 Thread Steffen Prohaska
Previously, filtering 2GB or more through an external filter (see test)
failed on Mac OS X 10.8.4 (12E55) for a 64-bit executable with:

error: read from external filter cat failed
error: cannot feed the input to external filter cat
error: cat died of signal 13
error: external filter cat failed 141
error: external filter cat failed

The reason was that read() immediately returns with EINVAL if nbyte =
2GB.  According to POSIX [1], if the value of nbyte passed to read() is
greater than SSIZE_MAX, the result is implementation-defined.  The write
function has the same restriction [2].  Since OS X still supports
running 32-bit executables, the 32-bit limit (SSIZE_MAX = INT_MAX
= 2GB - 1) seems to be also imposed on 64-bit executables under certain
conditions.  For write, the problem has been addressed in a earlier
commit [6c642a].

The problem for read() is addressed in a similar way by introducing
a wrapper function in compat that always reads less than 2GB.
Unfortunately, '#undef read' is needed at a few places to avoid
expanding the compat macro in constructs like 'vtbl-read(...)'.

Note that 'git add' exits with 0 even if it prints filtering errors to
stderr.  The test, therefore, checks stderr.  'git add' should probably
be changed (sometime in another commit) to exit with nonzero if
filtering fails.  The test could then be changed to use test_must_fail.

Thanks to the following people for their suggestions:

Johannes Sixt j...@kdbg.org
John Keeping j...@keeping.me.uk
Jonathan Nieder jrnie...@gmail.com
Kyle J. McKay mack...@gmail.com
Torsten Bögershausen tbo...@web.de

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

[6c642a] 6c642a878688adf46b226903858b53e2d31ac5c3
compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 Makefile  |  8 
 builtin/var.c |  1 +
 compat/clipped-read.c | 13 +
 config.mak.uname  |  1 +
 git-compat-util.h |  5 +
 streaming.c   |  1 +
 t/t0021-conversion.sh | 14 ++
 7 files changed, 43 insertions(+)
 create mode 100644 compat/clipped-read.c

diff --git a/Makefile b/Makefile
index 3588ca1..0f69e24 100644
--- a/Makefile
+++ b/Makefile
@@ -69,6 +69,9 @@ all::
 # Define NO_MSGFMT_EXTENDED_OPTIONS if your implementation of msgfmt
 # doesn't support GNU extensions like --check and --statistics
 #
+# Define NEEDS_CLIPPED_READ if your read(2) cannot read more than
+# INT_MAX bytes at once (e.g. MacOS X).
+#
 # Define NEEDS_CLIPPED_WRITE if your write(2) cannot write more than
 # INT_MAX bytes at once (e.g. MacOS X).
 #
@@ -1493,6 +1496,11 @@ ifndef NO_MSGFMT_EXTENDED_OPTIONS
MSGFMT += --check --statistics
 endif
 
+ifdef NEEDS_CLIPPED_READ
+   BASIC_CFLAGS += -DNEEDS_CLIPPED_READ
+   COMPAT_OBJS += compat/clipped-read.o
+endif
+
 ifdef NEEDS_CLIPPED_WRITE
BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE
COMPAT_OBJS += compat/clipped-write.o
diff --git a/builtin/var.c b/builtin/var.c
index aedbb53..e59f5ba 100644
--- a/builtin/var.c
+++ b/builtin/var.c
@@ -38,6 +38,7 @@ static struct git_var git_vars[] = {
{ , NULL },
 };
 
+#undef read
 static void list_vars(void)
 {
struct git_var *ptr;
diff --git a/compat/clipped-read.c b/compat/clipped-read.c
new file mode 100644
index 000..6962f67
--- /dev/null
+++ b/compat/clipped-read.c
@@ -0,0 +1,13 @@
+#include ../git-compat-util.h
+#undef read
+
+/*
+ * Version of read that will write at most INT_MAX bytes.
+ * Workaround a xnu bug on Mac OS X
+ */
+ssize_t clipped_read(int fd, void *buf, size_t nbyte)
+{
+   if (nbyte  INT_MAX)
+   nbyte = INT_MAX;
+   return read(fd, buf, nbyte);
+}
diff --git a/config.mak.uname b/config.mak.uname
index b27f51d..5c10726 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -95,6 +95,7 @@ ifeq ($(uname_S),Darwin)
NO_MEMMEM = YesPlease
USE_ST_TIMESPEC = YesPlease
HAVE_DEV_TTY = YesPlease
+   NEEDS_CLIPPED_READ = YesPlease
NEEDS_CLIPPED_WRITE = YesPlease
COMPAT_OBJS += compat/precompose_utf8.o
BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
diff --git a/git-compat-util.h b/git-compat-util.h
index 115cb1d..a227127 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -185,6 +185,11 @@ typedef unsigned long uintptr_t;
 #define probe_utf8_pathname_composition(a,b)
 #endif
 
+#ifdef NEEDS_CLIPPED_READ
+ssize_t clipped_read(int fd, void *buf, size_t nbyte);
+#define read(x,y,z) clipped_read((x),(y),(z))
+#endif
+
 #ifdef NEEDS_CLIPPED_WRITE
 ssize_t clipped_write(int fildes, const void *buf, size_t nbyte);
 #define write(x,y,z) clipped_write((x),(y),(z))
diff --git a/streaming.c b/streaming.c
index debe904..c1fe34a 100644
--- a/streaming.c
+++ b/streaming.c
@@ -99,6 +99,7 @@ int close_istream(struct git_istream *st)
return r

Re: [PATCH v2] compat: Fix read() of 2GB and more on Mac OS X

2013-08-19 Thread Steffen Prohaska

On Aug 19, 2013, at 10:20 AM, Johannes Sixt j...@kdbg.org wrote:

 Am 19.08.2013 08:38, schrieb Steffen Prohaska:
 +test_expect_success EXPENSIVE 'filter large file' '
 +git config filter.largefile.smudge cat 
 +git config filter.largefile.clean cat 
 +for i in $(test_seq 1 2048); do printf %1048576d 1; done 2GB 
 
 Shouldn't you count to 2049 to get a file that is over 2GB?

No.  INT_MAX = 2GB - 1 works.  INT_MAX + 1 = 2GB fails.  It tests exactly at 
the boundary.

Steffen

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] compat: Fix read() of 2GB and more on Mac OS X

2013-08-19 Thread Steffen Prohaska
Previously, filtering 2GB or more through an external filter (see test)
failed on Mac OS X 10.8.4 (12E55) for a 64-bit executable with:

error: read from external filter cat failed
error: cannot feed the input to external filter cat
error: cat died of signal 13
error: external filter cat failed 141
error: external filter cat failed

The reason was that read() immediately returns with EINVAL if nbyte =
2GB.  According to POSIX [1], if the value of nbyte passed to read() is
greater than SSIZE_MAX, the result is implementation-defined.  The write
function has the same restriction [2].  Since OS X still supports
running 32-bit executables, the 32-bit limit (SSIZE_MAX = INT_MAX
= 2GB - 1) seems to be also imposed on 64-bit executables under certain
conditions.  For write, the problem has been addressed in a earlier
commit [6c642a].

The problem for read() is addressed in a similar way by introducing
a wrapper function in compat that always reads less than 2GB.  It is
very likely that the read() and write() wrappers are always used
together.  To avoid introducing another option, NEEDS_CLIPPED_WRITE is
changed to NEEDS_CLIPPED_IO and used to activate both wrappers.

To avoid expanding the read compat macro in constructs like
'vtbl-read(...)', 'read' is renamed to 'readfn' in two cases.  The
solution seems more robust than using '#undef read'.

Note that 'git add' exits with 0 even if it prints filtering errors to
stderr.  The test, therefore, checks stderr.  'git add' should probably
be changed (sometime in another commit) to exit with nonzero if
filtering fails.  The test could then be changed to use test_must_fail.

Thanks to the following people for their suggestions:

Johannes Sixt j...@kdbg.org
John Keeping j...@keeping.me.uk
Jonathan Nieder jrnie...@gmail.com
Kyle J. McKay mack...@gmail.com
Torsten Bögershausen tbo...@web.de
Eric Sunshine sunsh...@sunshineco.com

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

[6c642a] 6c642a878688adf46b226903858b53e2d31ac5c3
compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 Makefile | 10 +-
 builtin/var.c| 10 +-
 compat/{clipped-write.c = clipped-io.c} | 11 ++-
 config.mak.uname |  2 +-
 git-compat-util.h|  5 -
 streaming.c  |  4 ++--
 t/t0021-conversion.sh| 14 ++
 7 files changed, 41 insertions(+), 15 deletions(-)
 rename compat/{clipped-write.c = clipped-io.c} (53%)

diff --git a/Makefile b/Makefile
index 3588ca1..f54134d 100644
--- a/Makefile
+++ b/Makefile
@@ -69,8 +69,8 @@ all::
 # Define NO_MSGFMT_EXTENDED_OPTIONS if your implementation of msgfmt
 # doesn't support GNU extensions like --check and --statistics
 #
-# Define NEEDS_CLIPPED_WRITE if your write(2) cannot write more than
-# INT_MAX bytes at once (e.g. MacOS X).
+# Define NEEDS_CLIPPED_IO if your read(2) and/or write(2) cannot handle more
+# than INT_MAX bytes at once (e.g. Mac OS X).
 #
 # Define HAVE_PATHS_H if you have paths.h and want to use the default PATH
 # it specifies.
@@ -1493,9 +1493,9 @@ ifndef NO_MSGFMT_EXTENDED_OPTIONS
MSGFMT += --check --statistics
 endif
 
-ifdef NEEDS_CLIPPED_WRITE
-   BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE
-   COMPAT_OBJS += compat/clipped-write.o
+ifdef NEEDS_CLIPPED_IO
+   BASIC_CFLAGS += -DNEEDS_CLIPPED_IO
+   COMPAT_OBJS += compat/clipped-io.o
 endif
 
 ifneq (,$(XDL_FAST_HASH))
diff --git a/builtin/var.c b/builtin/var.c
index aedbb53..06f8459 100644
--- a/builtin/var.c
+++ b/builtin/var.c
@@ -28,7 +28,7 @@ static const char *pager(int flag)
 
 struct git_var {
const char *name;
-   const char *(*read)(int);
+   const char *(*readfn)(int);
 };
 static struct git_var git_vars[] = {
{ GIT_COMMITTER_IDENT, git_committer_info },
@@ -43,8 +43,8 @@ static void list_vars(void)
struct git_var *ptr;
const char *val;
 
-   for (ptr = git_vars; ptr-read; ptr++)
-   if ((val = ptr-read(0)))
+   for (ptr = git_vars; ptr-readfn; ptr++)
+   if ((val = ptr-readfn(0)))
printf(%s=%s\n, ptr-name, val);
 }
 
@@ -53,9 +53,9 @@ static const char *read_var(const char *var)
struct git_var *ptr;
const char *val;
val = NULL;
-   for (ptr = git_vars; ptr-read; ptr++) {
+   for (ptr = git_vars; ptr-readfn; ptr++) {
if (strcmp(var, ptr-name) == 0) {
-   val = ptr-read(IDENT_STRICT);
+   val = ptr-readfn(IDENT_STRICT);
break;
}
}
diff --git a/compat/clipped-write.c b/compat/clipped-io.c
similarity index 53%
rename from compat/clipped-write.c
rename

Re: [PATCH v4] compat: Fix read() of 2GB and more on Mac OS X

2013-08-19 Thread Steffen Prohaska

On Aug 19, 2013, at 6:04 PM, Linus Torvalds torva...@linux-foundation.org 
wrote:

 I hate your patch for other reasons, though:
 
 The problem for read() is addressed in a similar way by introducing
 a wrapper function in compat that always reads less than 2GB.
 
 Why do you do that? We already _have_ wrapper functions for read(),
 namely xread().  Exactly because you basically have to, in order to
 handle signals on interruptible filesystems (which aren't POSIX
 either, but at least sanely so) or from other random sources. And to
 handle the you can't do reads that big issue.
 
 So why isn't the patch much more straightforward? 

The first version was more straightforward [1].  But reviewers suggested
that the compat wrappers would be the right way to do it and showed me
that it has been done like this before [2].

I haven't submitted anything in a while, so I tried to be a kind person
and followed the suggestions.  I started to hate the patch a bit (maybe less
than you), but I wasn't brave enough to reject the suggestions.  This is
why the patch became what it is.

I'm happy to rework it again towards your suggestion.  I would also remove
the compat wrapper for write().  But I got a bit tired.  I'd appreciate if
I received more indication whether a version without compat wrappers would
be accepted.

Steffen

[1] http://article.gmane.org/gmane.comp.version-control.git/232455
[2] 6c642a8 compate/clipped-write.c: large write(2) fails on Mac OS X/XNU--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] strbuf_add_wrapped*(): Remove unused return value

2012-12-10 Thread Steffen Prohaska
Since shortlog isn't using the return value anymore (see previous
commit), the functions can be changed to void.

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 utf8.c | 13 ++---
 utf8.h |  4 ++--
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/utf8.c b/utf8.c
index 5c61bbe..a4ee665 100644
--- a/utf8.c
+++ b/utf8.c
@@ -323,7 +323,7 @@ static size_t display_mode_esc_sequence_len(const char *s)
  * If indent is negative, assume that already -indent columns have been
  * consumed (and no extra indent is necessary for the first line).
  */
-int strbuf_add_wrapped_text(struct strbuf *buf,
+void strbuf_add_wrapped_text(struct strbuf *buf,
const char *text, int indent1, int indent2, int width)
 {
int indent, w, assume_utf8 = 1;
@@ -332,7 +332,7 @@ int strbuf_add_wrapped_text(struct strbuf *buf,
 
if (width = 0) {
strbuf_add_indented_text(buf, text, indent1, indent2);
-   return 1;
+   return;
}
 
 retry:
@@ -356,14 +356,14 @@ retry:
if (w = width || !space) {
const char *start = bol;
if (!c  text == start)
-   return w;
+   return;
if (space)
start = space;
else
strbuf_addchars(buf, ' ', indent);
strbuf_add(buf, start, text - start);
if (!c)
-   return w;
+   return;
space = text;
if (c == '\t')
w |= 0x07;
@@ -405,13 +405,12 @@ new_line:
}
 }
 
-int strbuf_add_wrapped_bytes(struct strbuf *buf, const char *data, int len,
+void strbuf_add_wrapped_bytes(struct strbuf *buf, const char *data, int len,
 int indent, int indent2, int width)
 {
char *tmp = xstrndup(data, len);
-   int r = strbuf_add_wrapped_text(buf, tmp, indent, indent2, width);
+   strbuf_add_wrapped_text(buf, tmp, indent, indent2, width);
free(tmp);
-   return r;
 }
 
 int is_encoding_utf8(const char *name)
diff --git a/utf8.h b/utf8.h
index 93ef600..a214238 100644
--- a/utf8.h
+++ b/utf8.h
@@ -9,9 +9,9 @@ int is_utf8(const char *text);
 int is_encoding_utf8(const char *name);
 int same_encoding(const char *, const char *);
 
-int strbuf_add_wrapped_text(struct strbuf *buf,
+void strbuf_add_wrapped_text(struct strbuf *buf,
const char *text, int indent, int indent2, int width);
-int strbuf_add_wrapped_bytes(struct strbuf *buf, const char *data, int len,
+void strbuf_add_wrapped_bytes(struct strbuf *buf, const char *data, int len,
 int indent, int indent2, int width);
 
 #ifndef NO_ICONV
-- 
1.8.1.rc1.2.gfb98a3a

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] shortlog: Fix wrapping lines of wraplen (was broken since recent off-by-one fix)

2012-12-10 Thread Steffen Prohaska
A recent commit [1] fixed a off-by-one wrapping error.  As
a side-effect, the conditional in add_wrapped_shortlog_msg() whether to
append a newline needs to be removed.  add_wrapped_shortlog_msg() should
always append a newline, which was the case before the off-by-one fix,
because strbuf_add_wrapped_text() never returned a value of wraplen.

[1] 14e1a4e1ff70aff36db3f5d2a8b806efd0134d50 utf8: fix off-by-one
wrapping of text

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 builtin/shortlog.c  |  5 ++---
 t/t4201-shortlog.sh | 24 
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index b316cf3..8360514 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -306,9 +306,8 @@ parse_done:
 static void add_wrapped_shortlog_msg(struct strbuf *sb, const char *s,
 const struct shortlog *log)
 {
-   int col = strbuf_add_wrapped_text(sb, s, log-in1, log-in2, log-wrap);
-   if (col != log-wrap)
-   strbuf_addch(sb, '\n');
+   strbuf_add_wrapped_text(sb, s, log-in1, log-in2, log-wrap);
+   strbuf_addch(sb, '\n');
 }
 
 void shortlog_output(struct shortlog *log)
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 6872ba1..02ac978 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -120,6 +120,30 @@ test_expect_success 'shortlog from non-git directory' '
test_cmp expect out
 '
 
+test_expect_success 'shortlog should add newline when input line matches 
wraplen' '
+   cat expect \EOF 
+A U Thor (2):
+  bb:  bbb  bbb bb  bbb b bb
+  aa: aa aa   aa  aa aaa
+
+EOF
+   git shortlog -w out \EOF 
+commit 0001
+Author: A U Thor aut...@example.com
+Date:   Thu Apr 7 15:14:13 2005 -0700
+
+aa: aa aa   aa  aa aaa
+
+commit 0002
+Author: A U Thor aut...@example.com
+Date:   Thu Apr 7 15:14:13 2005 -0700
+
+bb:  bbb  bbb bb  bbb b bb
+
+EOF
+   test_cmp expect out
+'
+
 iconvfromutf8toiso88591() {
printf %s $* | iconv -f UTF-8 -t ISO8859-1
 }
-- 
1.8.1.rc1.2.gfb98a3a

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Re: [PATCH] shortlog: Fix wrapping lines of wraplen

2012-12-10 Thread Steffen Prohaska
On Dec 9, 2012, at 10:36 AM, Junio C Hamano gits...@pobox.com wrote:

 Steffen Prohaska proha...@zib.de writes:
 
  A recent commit [1] fixed a off-by-one wrapping error.  As
  a side-effect, add_wrapped_shortlog_msg() needs to be changed to always
  append a newline.
 
 Could you clarify As a side effect a bit more?  Do you mean
 something like this?

See updated patches below.

Steffen

Steffen Prohaska (2):
  shortlog: Fix wrapping lines of wraplen (was broken since recent
off-by-one fix)
  strbuf_add_wrapped*(): Remove unused return value

 builtin/shortlog.c  |  5 ++---
 t/t4201-shortlog.sh | 24 
 utf8.c  | 13 ++---
 utf8.h  |  4 ++--
 4 files changed, 34 insertions(+), 12 deletions(-)

-- 
1.8.1.rc1.2.gfb98a3a

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] shortlog: Fix wrapping lines of wraplen (was broken since recent off-by-one fix)

2012-12-08 Thread Steffen Prohaska
A recent commit [1] fixed a off-by-one wrapping error.  As
a side-effect, add_wrapped_shortlog_msg() needs to be changed to always
append a newline.

[1] 14e1a4e1ff70aff36db3f5d2a8b806efd0134d50 utf8: fix off-by-one
wrapping of text

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 builtin/shortlog.c  |  3 +--
 t/t4201-shortlog.sh | 24 
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index b316cf3..db5b57d 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -307,8 +307,7 @@ static void add_wrapped_shortlog_msg(struct strbuf *sb, 
const char *s,
 const struct shortlog *log)
 {
int col = strbuf_add_wrapped_text(sb, s, log-in1, log-in2, log-wrap);
-   if (col != log-wrap)
-   strbuf_addch(sb, '\n');
+   strbuf_addch(sb, '\n');
 }
 
 void shortlog_output(struct shortlog *log)
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 6872ba1..02ac978 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -120,6 +120,30 @@ test_expect_success 'shortlog from non-git directory' '
test_cmp expect out
 '
 
+test_expect_success 'shortlog should add newline when input line matches 
wraplen' '
+   cat expect \EOF 
+A U Thor (2):
+  bb:  bbb  bbb bb  bbb b bb
+  aa: aa aa   aa  aa aaa
+
+EOF
+   git shortlog -w out \EOF 
+commit 0001
+Author: A U Thor aut...@example.com
+Date:   Thu Apr 7 15:14:13 2005 -0700
+
+aa: aa aa   aa  aa aaa
+
+commit 0002
+Author: A U Thor aut...@example.com
+Date:   Thu Apr 7 15:14:13 2005 -0700
+
+bb:  bbb  bbb bb  bbb b bb
+
+EOF
+   test_cmp expect out
+'
+
 iconvfromutf8toiso88591() {
printf %s $* | iconv -f UTF-8 -t ISO8859-1
 }
-- 
1.8.1.rc1.2.gfb98a3a

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is anyone working on a next-gen Git protocol?

2012-10-10 Thread Steffen Prohaska
On Oct 8, 2012, at 6:27 PM, Junio C Hamano wrote:

 Once we go into want/have phase, I do not think there is a need
 for fundamental change in the protocol (by this, I am not counting a
 change to send haves sparsely and possibly backtracking to bisect
 history, etc. as fundamental).

I've recently discovered that the current protocol can be amazingly
inefficient when it comes to transferring binary objects.  Assuming two
repositories that are in sync.  After a 'git checkout --orphan  git
commit', a subsequent transfers sends all the blobs attached to the new
commit, although the other side already has all the blobs.

This behavior is especially annoying when (mis)using git to store binary
files.  I was thinking for a while that it might be a reasonable idea to
store binary files in a submodule and frequently cut the history in
order to save space.  The history would have little value anyway, since
diff and merge don't make much sense with binary files.

Eventually, I abandoned the idea due to the current behavior of the
protocol.  I had expected that git would be smarter and behave more like
rsync, for example, by skipping big blobs as soon as it recognizes that
they are already available at both sides.

Maybe the new protocol could include an optimization for the described
case.  I don't know whether this would be a fundamental change.

Steffen

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-web--browse: Fix open HTML help pages from iTerm

2012-09-27 Thread Steffen Prohaska

On Sep 27, 2012, at 9:11 PM, Junio C Hamano wrote:

 Steffen Prohaska proha...@zib.de writes:
 
 iTerm is an alternative to the default terminal emulation program on Mac
 OS X.  git-web--browse wasn't aware of iTerm and failed to open HTML
 help pages when used in a shell session running in iTerm, reporting No
 known browser available.  Now it works as expected.
 
 Signed-off-by: Steffen Prohaska proha...@zib.de
 ---
 git-web--browse.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/git-web--browse.sh b/git-web--browse.sh
 index 1e82726..95ecf65 100755
 --- a/git-web--browse.sh
 +++ b/git-web--browse.sh
 @@ -120,7 +120,8 @@ if test -z $browser ; then
  fi
  # SECURITYSESSIONID indicates an OS X GUI login session
  if test -n $SECURITYSESSIONID \
 --o $TERM_PROGRAM = Apple_Terminal ; then
 +-o $TERM_PROGRAM = Apple_Terminal \
 +-o $TERM_PROGRAM = iTerm.app ; then
  browser_candidates=open $browser_candidates
  fi
 
 I do not have anything against iTerm, but could we have a solution
 that does not force us to keep adding 47 different terminal program
 names to the list over the longer term (no pun intended)?  For
 example, If on OS-X (which by the way does not seem to be checked
 with the current logic) and environment TERM_PROGRAM is set to any
 value, or something.

I googled a bit and it seems that TERM_PROGRAM is specific to OS X.
So simply testing whether TERM_PROGRAM is set to any value (without
additional check for OS X) might be good enough.

I am wondering whether anyone knows if TERM_PROGRAM is used on other
operating systems besides OS X.

Steffen
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] git-web--browse: Fix open HTML help pages from iTerm

2012-09-25 Thread Steffen Prohaska
iTerm is an alternative to the default terminal emulation program on Mac
OS X.  git-web--browse wasn't aware of iTerm and failed to open HTML
help pages when used in a shell session running in iTerm, reporting No
known browser available.  Now it works as expected.

Signed-off-by: Steffen Prohaska proha...@zib.de
---
 git-web--browse.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/git-web--browse.sh b/git-web--browse.sh
index 1e82726..95ecf65 100755
--- a/git-web--browse.sh
+++ b/git-web--browse.sh
@@ -120,7 +120,8 @@ if test -z $browser ; then
fi
# SECURITYSESSIONID indicates an OS X GUI login session
if test -n $SECURITYSESSIONID \
-   -o $TERM_PROGRAM = Apple_Terminal ; then
+   -o $TERM_PROGRAM = Apple_Terminal \
+   -o $TERM_PROGRAM = iTerm.app ; then
browser_candidates=open $browser_candidates
fi
# /bin/start indicates MinGW
-- 
1.7.12.1.403.g14e83b4

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html