date:20140825

Re: [PATCH] Undefine strlcpy if needed.

2014-08-25 Thread Ramsay Jones

On 25/08/14 02:54, tsuna wrote:
 On Sun, Aug 24, 2014 at 5:32 PM, Ramsay Jones
 ram...@ramsay1.demon.co.uk wrote:
 Again, I don't have access to an OS X system, so I don't know
 which package provides libintl/gettext, but it seems to be missing
 on your system.
 
 Probably yeah, those libraries don’t seem to be provided in standard
 with OS X or OS X’s development tools, so maybe the Makefile should
 also default to having NO_GETTEXT=YesPlease when on OS X.
 
 You can avoid the build failure, without running configure, by
 setting NO_GETTEXT=YesPlease in your config.mak file.



 I need to run configure first:

 $ make configure
 GEN configure
 $ ./configure
 configure: Setting lib to 'lib' (the default)
 […]

 So, presumably, configure has set NO_GETEXT=YesPlease in your
 config.mak.autogen file.
 
 Yes it did.
 
 I don’t mind running configure, but so far Git has compiled fine
 without doing it.  Should we fix the default values of NO_STRLCPY /
 NO_GETEXT on OS X?
 

Is NO_STRLCPY still a problem with a fresh clone (and putting
NO_GETEXT=YesPlease in your config.mak)? I still do not understand
why you were getting those warnings; AFAICT it should not be happening!
Also, Torsten could not reproduce.

As far as NO_GETTEXT is concerned, I have to defer to someone who has
experience on that platform (I have _zero_ experience on OS X).

ATB,
Ramsay Jones


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong()

2014-08-25 Thread Jeff King

On Sun, Aug 24, 2014 at 06:07:44PM +0200, Steffen Prohaska wrote:

 diff --git a/wrapper.c b/wrapper.c
 index bc1bfb8..69d1c9b 100644
 --- a/wrapper.c
 +++ b/wrapper.c
 @@ -11,14 +11,18 @@ static void (*try_to_free_routine)(size_t size) = 
 do_nothing;
  
  static void memory_limit_check(size_t size)
  {
 - static int limit = -1;
 - if (limit == -1) {
 - const char *env = getenv(GIT_ALLOC_LIMIT);
 - limit = env ? atoi(env) * 1024 : 0;
 + static size_t limit = SIZE_MAX;
 + if (limit == SIZE_MAX) {

You use SIZE_MAX as the sentinel for not set, and 0 as the sentinel
for no limit. That seems kind of backwards.

I guess you are inheriting this from the existing code, which lets
GIT_ALLOC_LIMIT=0 mean no limit. I'm not sure if we want to keep that
or not (it would be backwards incompatible to change it, but we are
already breaking compatibility here by assuming bytes rather than
kilobytes; I think that's OK because this is not a documented feature,
or one intended to be used externally).

 + const char *var = GIT_ALLOC_LIMIT;
 + unsigned long val = 0;
 + const char *env = getenv(var);
 + if (env  !git_parse_ulong(env, val))
 + die(Failed to parse %s, var);
 + limit = val;
   }

This and the next patch both look OK to me, but I notice this part is
largely duplicated between the two. We already have git_env_bool to do a
similar thing for boolean environment variables. Should we do something
similar like:

diff --git a/config.c b/config.c
index 058505c..11919eb 100644
--- a/config.c
+++ b/config.c
@@ -1122,6 +1122,14 @@ int git_env_bool(const char *k, int def)
return v ? git_config_bool(k, v) : def;
 }
 
+unsigned long git_env_ulong(const char *k, unsigned long val)
+{
+   const char *v = getenv(k);
+   if (v  !git_parse_ulong(k, val))
+   die(failed to parse %s, k);
+   return val;
+}
+
 int git_config_system(void)
 {
return !git_env_bool(GIT_CONFIG_NOSYSTEM, 0);

It's not a lot of code, but I think the callers end up being much easier
to read:

  if (limit == SIZE_MAX)
limit = git_env_ulong(GIT_ALLOC_LIMIT, 0);

   if (limit  size  limit)
 - die(attempting to allocate %PRIuMAX over limit %d,
 - (intmax_t)size, limit);
 + die(attempting to allocate %PRIuMAX over limit %PRIuMAX,
 + (uintmax_t)size, (uintmax_t)limit);

This part is duplicated, too, though I do not know if the infrastructure
to avoid that is worth the trouble. Unless you wanted to do a whole:

  check_limit(limit, GIT_ALLOC_LIMIT, size);

or something, but I am also not convinced that is not just obfuscating
things.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap

2014-08-25 Thread Jeff King

On Sun, Aug 24, 2014 at 06:07:46PM +0200, Steffen Prohaska wrote:

 The data is streamed to the filter process anyway.  Better avoid mapping
 the file if possible.  This is especially useful if a clean filter
 reduces the size, for example if it computes a sha1 for binary data,
 like git media.  The file size that the previous implementation could
 handle was limited by the available address space; large files for
 example could not be handled with (32-bit) msysgit.  The new
 implementation can filter files of any size as long as the filter output
 is small enough.
 
 The new code path is only taken if the filter is required.  The filter
 consumes data directly from the fd.  The original data is not available
 to git, so it must fail if the filter fails.

Can you clarify this second paragraph a bit more? If I understand
correctly, we handle a non-required filter failing by just reading the
data again (which we can do because we either read it into memory
ourselves, or mmap it). With the streaming approach, we will read the
whole file through our stream; if that fails we would then want to read
the stream from the start.

Couldn't we do that with an lseek (or even an mmap with offset 0)? That
obviously would not work for non-file inputs, but I think we address
that already in index_fd: we push non-seekable things off to index_pipe,
where we spool them to memory.

So it seems like the ideal strategy would be:

  1. If it's seekable, try streaming. If not, fall back to lseek/mmap.

  2. If it's not seekable and the filter is required, try streaming. We
 die anyway if we fail.

  3. If it's not seekable and the filter is not required, decide based
 on file size:

   a. If it's small, spool to memory and proceed as we do now.

   b. If it's big, spool to a seekable tempfile.

Your patch implements part 2. But I would think part 1 is the most common
case. And while part 3b seems unpleasant, it is better than the current
code (with or without your patch), which will do 3a on a large file.

Hmm. Though I guess in (3) we do not have the size up front, so it's
complicated (we could spool N bytes to memory, then start dumping to a
file after that). I do not think we necessarily need to implement that
part, though. It seems like (1) is the thing I would expect to hit the
most (i.e., people do not always mark their filters are required).

 - write_err = (write_in_full(child_process.in, params-src, params-size) 
  0);
 + if (params-src) {
 + write_err = (write_in_full(child_process.in, params-src, 
 params-size)  0);

Style: 4-space indentation (rather than a tab). There's more of it in
this function (and in would_convert...) that I didn't mark.

 + } else {
 + /* dup(), because copy_fd() closes the input fd. */
 + fd = dup(params-fd);

Not a problem you are introducing, but this seem kind of like a
misfeature in copy_fd. Is it worth fixing? The function only has two
existing callers.

 + /* Apply a filter to an fd only if the filter is required to succeed.
 +  * We must die if the filter fails, because the original data before
 +  * filtering is not available.
 +  */

Style nit:

  /*
   * We have a blank line at the top of our
   * multi-line comments.
   */

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] git-prompt: do not look for refs/stash in $GIT_DIR

2014-08-25 Thread Jeff King

On Sun, Aug 24, 2014 at 08:22:41PM +0700, Gábor Szeder wrote:

 On Aug 23, 2014 12:26 PM, Jeff King p...@peff.net wrote:
  Since dd0b72c (bash prompt: use bash builtins to check stash 
  state, 2011-04-01), git-prompt checks whether we have a 
  stash by looking for $GIT_DIR/refs/stash. Generally external 
  programs should never do this, because they would miss 
  packed-refs.
 
 Not sure whether the prompt script is external program or not, but
 doesn't matter, this is the right thing to do.

Yeah, by external I just meant nothing outside of refs.c should make
this assumption.

  That commit claims that packed-refs does not pack 
  refs/stash, but that is not quite true. It does pack the 
  ref, but due to a bug, fails to prune the ref. When we fix 
  that bug, we would want to be doing the right thing here. 
 
  Signed-off-by: Jeff King p...@peff.net 
  --- 
  I know we are pretty sensitive to forks in the prompt code (after all, 
  that was the point of dd0b72c). This patch is essentially a reversion of 
  this hunk of dd0b72c, and is definitely safe.
 
 I'm not sure, but if I remember correctly (don't have the means to
 check it at the moment, sorry) in that commit I also added a 'git
 pack-ref' invocation to the relevant test(s?) to guard us against
 breakages due to changes in 'git pack-refs'.  If that is so, then I
 think those invocations should be removed as well, as they'll become
 useless.

It did add that change (that's actually how I noticed the problem!
Thank you for being thorough in dd0b72c). My inclination is to leave the
pack-refs invocations, as they protect against a certain class of errors
(we are not doing the risky behavior now, but the purpose of the test
suite is to detect regressions; the next person to touch that code may
not be so careful as you were).

I don't feel too strongly, though, so if we want them gone, I'm OK with
that.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-25 Thread Jeff King

On Sun, Aug 24, 2014 at 04:39:37PM -0700, Junio C Hamano wrote:

 On Sun, Aug 24, 2014 at 8:10 AM, Stefan Beller stefanbel...@gmail.com wrote:
for (p = list, i = 0; i  cnt; i++) {
  - struct name_decoration *r = xmalloc(sizeof(*r) + 100);
  + char name[100];
 
  Would it make sense to convert the 'name' into a git strbuf?
  Please have a look at Documentation/technical/api-strbuf.txt
 
 Why not go one step further and format it twice, once only
 to measure the necessary size to allocate, allocate and
 then format into it for real? Then you do not need to print
 into a temporary strbuf, copy the result and free the strbuf,
 no?
 
 BUT.
 
 The string will always be dist= followed by decimal representation of
 a count that fits in int anyway, so I actually think use of strbuf is way
 overkill (and formatting it twice also is); the patch as posted should be
 just fine.

I think you are right, and the patch is the right direction (assuming we
want to do this; I question whether there are enough elements in the
list for us to care about the size, and if there are, we are probably
better off storing the int and formatting the strings on the fly).

I wonder if there is a way we could get rid of the magic 100 here,
though. Its meaning is enough to hold 'dist=' and any integer. But you
have to read carefully to see that this call to sprintf is not a buffer
overflow. A strbuf is one way to get rid of it, though it is awkward
because we then have to copy the result into a flex-array structure.

It would be nice if there was some way to abstract the idea of
formatting a buffer directly into a flex-array. That would involve the
double-format you mention, but we could use it in lots of places to make
the code nicer. Maybe like:

  void *fmt_flex_array(size_t base, const char *fmt, ...)
  {
  va_list ap;
  size_t flex;
  unsigned char *ret;

  va_start(ap, fmt);
  flex = vsnprintf(NULL, 0, fmt, ap);
  va_end(ap);

  ret = xmalloc(base + flex + 1);
  va_start(ap, fmt);
  /* Eek, see below */
  vsnprintf(ret + flex, flex + 1, fmt, ap);

  return ret;
  }

and you'd call it like:

  struct name_decoration *r = fmt_flex_array(sizeof(*r), dist=%d, x);

Except that I don't think we are guaranteed that offsetof(mystruct,
flex_member) is equal to sizeof(mystruct). If FLEX_ARRAY is 0, it should
be, but some platforms use FLEX_ARRAY=1. So you'd have to pass in the
offset like:

  struct name_decoration *r = fmt_flex_array(sizeof(*r),
 offsetof(*r, name),
 dist=%d, x);

which is a little less nice. You could make it nicer with a macro, but
we don't assume variadic macros. sigh

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-25 Thread Jeff King

On Sun, Aug 24, 2014 at 07:47:24PM +0530, Arjun Sreedharan wrote:

 diff --git a/bisect.c b/bisect.c
 index d6e851d..c96aab0 100644
 --- a/bisect.c
 +++ b/bisect.c
 @@ -215,10 +215,13 @@ static struct commit_list *best_bisection_sorted(struct 
 commit_list *list, int n
   }
   qsort(array, cnt, sizeof(*array), compare_commit_dist);
   for (p = list, i = 0; i  cnt; i++) {
 - struct name_decoration *r = xmalloc(sizeof(*r) + 100);
 + char name[100];
 + sprintf(name, dist=%d, array[i].distance);
 + int name_len = strlen(name);
 + struct name_decoration *r = xmalloc(sizeof(*r) + name_len);

This allocation should be name_len + 1 for the NUL-terminator, no?

It looks like add_name_decoration in log-tree already handles half of
what you are adding here. Can we just make that available globally (it
is manipulating the already-global struct decoration name_decoration)?

I also notice that we do not set r-type at all, meaning the decoration
lookup code in log-tree will access uninitialized memory (worse, it will
use it as a pointer offset into the color list; I got a segfault when I
tried to run git rev-list --bisect-all v1.8.0..v1.9.0).

I think we need this:

diff --git a/bisect.c b/bisect.c
index d6e851d..e2a7682 100644
--- a/bisect.c
+++ b/bisect.c
@@ -219,6 +219,7 @@ static struct commit_list *best_bisection_sorted(struct 
commit_list *list, int n
struct object *obj = (array[i].commit-object);
 
sprintf(r-name, dist=%d, array[i].distance);
+   r-type = 0;
r-next = add_decoration(name_decoration, obj, r);
p-item = array[i].commit;
p = p-next;

at a minimum.

It looks like this was a regression caused by eb3005e (commit.h: add
'type' to struct name_decoration, 2010-06-19). Which makes me wonder if
anybody actually _uses_ --bisect-all (which AFAICT is the only way to
trigger the problem), but since it's public, I guess we should keep it.

I think the sane thing here is to stop advertising name_decoration as a
global, and make all callers use add_name_decoration. That makes it
easier for callers like this one, and would have caught the regression
caused be eb3005e (the compiler would have noticed that we were not
passing a type parameter to the function).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Relative submodule URLs

2014-08-25 Thread Robert Dailey

On Fri, Aug 22, 2014 at 11:00 AM, Marc Branchaud marcn...@xiplink.com wrote:
 A couple of years ago I started to work on such a thing ([1] [2] [3]), mainly
 because when we tried to change to relative submodules we got bitten when
 someone used clone's -o option so that his super-repo had no origin remote
 *and* his was checked out on a detached HEAD.  So get_default_remote() failed
 for him.

 I didn't have time to complete the work -- it ended up being quite involved.
  But Junio did come up with an excellent transition plan [4] for adopting a
 default remote setting.

 [1] (v0) http://thread.gmane.org/gmane.comp.version-control.git/200145
 [2] (v1) http://thread.gmane.org/gmane.comp.version-control.git/201065
 [3] (v2) http://thread.gmane.org/gmane.comp.version-control.git/201306
 [4] http://article.gmane.org/gmane.comp.version-control.git/201332

 I think you're on the right path. However I'd suggest something like
 the following:

 [submodule]
 remote = remote_for_relative_submodules (e.g. `upstream`)

 I think remote.default would be more generally useful, especially when
 working with detached checkouts.

Honestly speaking I don't use default.remote, even now that I know
about it thanks to the discussion ongoing here. The reason is that
sometimes I push my branches to origin, sometimes I push them to my
fork. I like explicit control as to which one I push to. I also sync
my git config file to dropbox and I use it on multiple projects and
platforms. I don't use the same push destination workflow on all
projects. It seems to get in the way of my workflow more than it
helps. I really only ever have two needs:

1. Push explicitly to my remote (e.g. `git push fork` or `git push origin`)
2. Push to the tracked branch (e.g. `git push`)

I'm also not sure how `push.default = simple` conflicts with the usage
of `remote.default`, since in the tracked-repo case, you must
explicitly specify the source ref to push. Is this behavior documented
somewhere?

 (For the record, I would also be happy if clone got rid of its -o option and
 origin became the sacred, reserved remote name (perhaps translated into
 other languages as needed) that clone always uses no matter what.)

 [branch.name]
 submoduleRemote = remote_for_relative_submodule

 If I understand correctly, you want this so that your branch can be a fork of
 only the super-repo while the submodules are not forked and so they should
 keep tracking their original repo.

That's correct. But this is case-by-case. Sometimes I make a change
where I want the submodule forked (rare), most times I don't.
Sometimes I can get away with pushing changes to the submodule and
worrying about it later since I know the submodule ref won't move
forward unless someone does update --remote (which isn't often or only
done as needed).

 To me this seems to be going in the opposite direction of having branches
 recursively apply to submodules, which I think most of us want.

 A branch should fork the entire repo, including its submodules.  The
 implication is that if you want to push that branch somewhere, that somewhere
 needs to be able to accept the forks of the submodules *even if those
 submodules aren't changed in your branch* because at the very least the
 branch ref has to exist in the submodules' repositories.

There are many levels on which this can apply. When it comes to
checkouts and such, I agree. However, how will this impact *creating*
branches? What about forking? Do you expect submodule forking 
branching to be automatic as well? Based on your description, it seems
so (although a new branch doesn't necessarily have to correspond to a
new fork, unless I'm misunderstanding you). This seems difficult to
do, especially the forking part since you would need an API for this
(Github, Atlassian Stash, etc), unless you are thinking of something
clever like local/relative forks.

However the inconvenience of forking manually isn't the main reason
why I avoid forking submodules. It's the complication of pull
requests. There is no uniformity there, which is unfortunate.
Recursive pull requests are something outside the scope of git, I
realize that, but it would still be nice. However the suggestion you
make here lays the foundation for that I think.

 With absolute-path submodules, the push is a simple as creating the branch
 ref in the submodules' home repositories -- even if the main somewhere
 you're pushing to isn't one of those repositories.

 With relative-path submodules, the push's target repo *must* also have the
 submodules in their proper places, so that they can get updated.
 Furthermore, if you clone a repo that has relative-path submodules you *must*
 also clone the submodules.

 Robert, I think what you'll say to this is that you still want your branch to
 track the latest submodules updates from their home repository. (BTW, I'm
 confused with how you're using the terms upstream and origin.  I'll use
 home to refer to the repository where everything

Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-25 Thread Christian Couder

On Mon, Aug 25, 2014 at 3:35 PM, Jeff King p...@peff.net wrote:
 On Sun, Aug 24, 2014 at 07:47:24PM +0530, Arjun Sreedharan wrote:

 diff --git a/bisect.c b/bisect.c
 index d6e851d..c96aab0 100644
 --- a/bisect.c
 +++ b/bisect.c
 @@ -215,10 +215,13 @@ static struct commit_list 
 *best_bisection_sorted(struct commit_list *list, int n
   }
   qsort(array, cnt, sizeof(*array), compare_commit_dist);
   for (p = list, i = 0; i  cnt; i++) {
 - struct name_decoration *r = xmalloc(sizeof(*r) + 100);
 + char name[100];
 + sprintf(name, dist=%d, array[i].distance);
 + int name_len = strlen(name);
 + struct name_decoration *r = xmalloc(sizeof(*r) + name_len);

 This allocation should be name_len + 1 for the NUL-terminator, no?

I wondered about that too, but as struct name_decoration is defined like this:

struct name_decoration {
struct name_decoration *next;
int type;
char name[1];
};

the .name field of this struct already has one char, so the allocation
above should be ok.

 It looks like add_name_decoration in log-tree already handles half of
 what you are adding here. Can we just make that available globally (it
 is manipulating the already-global struct decoration name_decoration)?

Yeah, it looks like it should be better.

Note that add_name_decoration() does:

int nlen = strlen(name);
struct name_decoration *res = xmalloc(sizeof(struct name_decoration) + nlen);

so it also relies on the fact that .name contains one char.

 I also notice that we do not set r-type at all, meaning the decoration
 lookup code in log-tree will access uninitialized memory (worse, it will
 use it as a pointer offset into the color list; I got a segfault when I
 tried to run git rev-list --bisect-all v1.8.0..v1.9.0).

 I think we need this:

 diff --git a/bisect.c b/bisect.c
 index d6e851d..e2a7682 100644
 --- a/bisect.c
 +++ b/bisect.c
 @@ -219,6 +219,7 @@ static struct commit_list *best_bisection_sorted(struct 
 commit_list *list, int n
 struct object *obj = (array[i].commit-object);

 sprintf(r-name, dist=%d, array[i].distance);
 +   r-type = 0;
 r-next = add_decoration(name_decoration, obj, r);
 p-item = array[i].commit;
 p = p-next;

 at a minimum.

Yeah if we don't use add_name_decoration() we would need that.
Thanks for noticing.

 It looks like this was a regression caused by eb3005e (commit.h: add
 'type' to struct name_decoration, 2010-06-19). Which makes me wonder if
 anybody actually _uses_ --bisect-all (which AFAICT is the only way to
 trigger the problem), but since it's public, I guess we should keep it.

Yeah, we should probably keep it.

 I think the sane thing here is to stop advertising name_decoration as a
 global, and make all callers use add_name_decoration. That makes it
 easier for callers like this one, and would have caught the regression
 caused be eb3005e (the compiler would have noticed that we were not
 passing a type parameter to the function).

I agree.

Thanks,
Christian.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Re: Relative submodule URLs

2014-08-25 Thread Robert Dailey

On Sun, Aug 24, 2014 at 8:34 AM, Heiko Voigt hvo...@hvoigt.net wrote:
New --with--remote parameter for 'git submodule'

While having said all that about submodule settings I think a much
much simpler start is to go ahead with a commandline setting, like
Robert proposed here[2].

For that we do not have to worry about how it can be stored,
transported, defined per submodule or on a branch, since answers to this
are given at the commandline (and current repository state).

There are still open questions about this though:

* Should the name in the submodule be 'origin' even though you
specified --with-remote=somewhere? For me its always confusing to
have the same/similar remotes named differently in different
repositories. That why I try to keep the names the same in all my
clones of repositories (i.e. for my private, github, upstream
remotes).

* When you do a 'git submodule sync --with-remote=somewhere' should
the remote be added or replaced.

My opinion on these are:

The remote should be named as in the superproject so
--with-remote=somewhere adds/replaces the remote 'somewhere' in the
submodules named on the commandline (or all in case no submodule is
specified). In case of a fresh clone of the submodule, there would be no
origin but only a remote under the new name.

Would the --with-remote feature I describe be a feasible start for you
Robert? What do others think? Is the naming of the parameter
'--with-remote' alright?

Cheers Heiko

[1] http://article.gmane.org/gmane.comp.version-control.git/255512
[2] http://article.gmane.org/gmane.comp.version-control.git/255512
[3]
https://github.com/jlehmann/git-submod-enhancements/wiki#special-ref-overriding-gitmodules-values

Hi Heiko,

My last email response was in violation of your request to keep the
two topics separate, sorry about that. I started typing it this
weekend and completed the draft this morning, without having read this
response from you first. At this point my only intention was to start
discussion on a possible short-term solution. I realize the Git
developers are working hard on improving submodule workflow for the
long term. In addition I do not have the domain expertise to properly
make suggestions in regards to longer-term solutions, so I leave that
to you :-)

The --with-remote feature would allow me to begin using relative
submodules because:

On a per-submodule basis, I can specify the remote it will use. When I
fork a submodule and need to start tracking it, I can run `git
submodule sync --with-remote fork`, which will take my super repo's
'fork' remote, REPLACE 'origin' in the submodule with that URL, and
also redo the relative URL calculation. This is ideal since I use HTTP
at home (so I can use my proxy server to access git behind firewall at
work) and at work physically I use SSH for performance (to avoid HTTP
protocol). I also like the idea of never having to update my
submodule URLs again if the git server moves, domain name changes, or
whatever else.

Here is what I think would make the feature most usable. I think you
went over some of these ideas but I just want to clarify, to make sure
we're on the same page. Please correct me as needed.

1. Running `git submodule update --with-remote name` shall fail the
command unconditionally.
2. Using the `--with-remote` option on submodule `update` or `sync`
will fail if it detects absolute submodule URLs in .gitmodule
3. Running `git submodule update --init --with-remote name` shall
fail the command ONLY if a submodule is being processed that is NOT
also being initialized.
4. The behavior of git submodule's `update` or `sync` commands
combined with `--with-remote` will REPLACE or CREATE the 'origin'
remote in each submodule it is run in. We will not allow the user to
configure what the submodule remote name will end up being (I think
this is current behavior and forces good practice; I consider `origin`
an adopted standard for git, and actually wish it was more enforced
for super projects as well!)

Let me know if I've missed anything. Once we clarify requirements I'll
attempt to start work on this during my free time. I'll start by
testing this through msysgit, since I do not have linux installed, but
I have Linux Mint running in a Virtual Machine so I can test on both
platforms as needed (I don't have a lot of experience on Linux
though).

I hope you won't mind me reaching out for questions as needed, however
I will attempt to be as resourceful as possible since I know you're
all busy. Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Re: Relative submodule URLs

2014-08-25 Thread Robert Dailey

On Mon, Aug 25, 2014 at 9:29 AM, Robert Dailey rcdailey.li...@gmail.com wrote:
On Sun, Aug 24, 2014 at 8:34 AM, Heiko Voigt hvo...@hvoigt.net wrote:
New --with--remote parameter for 'git submodule'

While having said all that about submodule settings I think a much
much simpler start is to go ahead with a commandline setting, like
Robert proposed here[2].

For that we do not have to worry about how it can be stored,
transported, defined per submodule or on a branch, since answers to this
are given at the commandline (and current repository state).

There are still open questions about this though:

* When you do a 'git submodule sync --with-remote=somewhere' should
the remote be added or replaced.

My opinion on these are:

Would the --with-remote feature I describe be a feasible start for you
Robert? What do others think? Is the naming of the parameter
'--with-remote' alright?

Cheers Heiko

Hi Heiko,

The --with-remote feature would allow me to begin using relative
submodules because:

Here is what I think would make the feature most usable. I think you
went over some of these ideas but I just want to clarify, to make sure
we're on the same page. Please correct me as needed.

I hope you won't mind me reaching out for questions as needed, however
I will attempt to be as resourceful as possible since I know you're
all busy. Thanks.

Thought of a few more:

5. If `--with-remote` is unspecified, behavior will continue as it
currently does (I'm not clear on the precedence here of various
options, but I assume: `remote.default` first, then
`branch.name.remote`)
6. `--with-remote` will take

Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-25 Thread Jeff King

On Mon, Aug 25, 2014 at 04:06:52PM +0200, Christian Couder wrote:

  This allocation should be name_len + 1 for the NUL-terminator, no?
 
 I wondered about that too, but as struct name_decoration is defined like this:
 
 struct name_decoration {
 struct name_decoration *next;
 int type;
 char name[1];
 };
 
 the .name field of this struct already has one char, so the allocation
 above should be ok.

Yeah, you're right. I would argue it should just be FLEX_ARRAY for
consistency with other spots, though (in which case add_name_decoration
needs to be updated with a +1).

Running git grep '^char [^ ]*\[[01]]' -- '*.[ch]' shows that this
is one of only two spots that don't use FLEX_ARRAY (and the other has a
comment explaining why not).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong()

2014-08-25 Thread Steffen Prohaska


On Aug 25, 2014, at 1:38 PM, Jeff King p...@peff.net wrote:

 On Sun, Aug 24, 2014 at 06:07:44PM +0200, Steffen Prohaska wrote:
 
 diff --git a/wrapper.c b/wrapper.c
 index bc1bfb8..69d1c9b 100644
 --- a/wrapper.c
 +++ b/wrapper.c
 @@ -11,14 +11,18 @@ static void (*try_to_free_routine)(size_t size) = 
 do_nothing;
 
 static void memory_limit_check(size_t size)
 {
 -static int limit = -1;
 -if (limit == -1) {
 -const char *env = getenv(GIT_ALLOC_LIMIT);
 -limit = env ? atoi(env) * 1024 : 0;
 +static size_t limit = SIZE_MAX;
 +if (limit == SIZE_MAX) {
 
 You use SIZE_MAX as the sentinel for not set, and 0 as the sentinel
 for no limit. That seems kind of backwards.
 
 I guess you are inheriting this from the existing code, which lets
 GIT_ALLOC_LIMIT=0 mean no limit. I'm not sure if we want to keep that
 or not (it would be backwards incompatible to change it, but we are
 already breaking compatibility here by assuming bytes rather than
 kilobytes; I think that's OK because this is not a documented feature,
 or one intended to be used externally).

I think it's reasonable that GIT_ALLOC_LIMIT=0 means no limit, so that
the limit can easily be disabled temporarily.

But I could change the sentinel and handle 0 like:

if (git_parse_ulong(env, val)) {
if (!val) {
val = SIZE_MAX;
}
}

Maybe we should do this.



 +const char *var = GIT_ALLOC_LIMIT;
 +unsigned long val = 0;
 +const char *env = getenv(var);
 +if (env  !git_parse_ulong(env, val))
 +die(Failed to parse %s, var);
 +limit = val;
  }
 
 This and the next patch both look OK to me, but I notice this part is
 largely duplicated between the two. We already have git_env_bool to do a
 similar thing for boolean environment variables. Should we do something
 similar like:
 
 diff --git a/config.c b/config.c
 index 058505c..11919eb 100644
 --- a/config.c
 +++ b/config.c
 @@ -1122,6 +1122,14 @@ int git_env_bool(const char *k, int def)
   return v ? git_config_bool(k, v) : def;
 }
 
 +unsigned long git_env_ulong(const char *k, unsigned long val)
 +{
 + const char *v = getenv(k);
 + if (v  !git_parse_ulong(k, val))
 + die(failed to parse %s, k);
 + return val;
 +}
 +
 int git_config_system(void)
 {
   return !git_env_bool(GIT_CONFIG_NOSYSTEM, 0);
 
 It's not a lot of code, but I think the callers end up being much easier
 to read:
 
  if (limit == SIZE_MAX)
   limit = git_env_ulong(GIT_ALLOC_LIMIT, 0);

I think you're right.  I'll change it.


Steffen--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong()

2014-08-25 Thread Jeff King

On Mon, Aug 25, 2014 at 05:06:22PM +0200, Steffen Prohaska wrote:

 I think it's reasonable that GIT_ALLOC_LIMIT=0 means no limit, so that
 the limit can easily be disabled temporarily.

IMHO, GIT_ALLOC_LIMIT= (i.e., the empty string) would be a good way to
say that (and I guess that even works currently, due to the way atoi
works, but I suspect git_parse_ulong might complain). It is probably not
worth worrying about too much. This is not even a user-facing interface,
and the test scripts just set it to 0.

So I'd be OK going that direction, or just leaving it as-is.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Help improving the future of debugging

2014-08-25 Thread Benjamin Siegmund

Dear Git Developers,

since 1974 researchers and software developers try to ease software
debugging. Over the last years, they created many new tools and
formalized methods. We are interested if these advancements have
reached professional software developers and how they influenced their
approach. To find this out, we are conducting an Online Survey for
Software Developers. From the results we expect new insights into
debugging practice that help us to suggest new directions for future
research. So if you are a software developer or know any software
developers, you can really help us.  The survey is, of course, fully
anonymous and will take about 15 minutes to fill out. Feel free to
redistribute this message to anyone who you think might be interested.
The survey can be reached at:


http://www.uni-potsdam.de/skopie-up/index.php/689349


Thank you for your interest,
Benjamin Siegmund
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] resolved deltas

2014-08-25 Thread René Scharfe

Am 23.08.2014 um 13:18 schrieb Jeff King:
 On Sat, Aug 23, 2014 at 07:04:59AM -0400, Jeff King wrote:
 
 On Sat, Aug 23, 2014 at 06:56:40AM -0400, Jeff King wrote:

 So I think your patch is doing the right thing.

 By the way, if you want to add a test to your patch, there is
 infrastructure in t5308 to create packs with duplicate objects. If I
 understand the problem correctly, you could trigger this by having a
 delta object whose base is duplicated, even without the missing object.
 
 This actually turned out to be really easy. The test below fails without
 your patch and passes with it. Please feel free to squash it in.
 
 diff --git a/t/t5308-pack-detect-duplicates.sh 
 b/t/t5308-pack-detect-duplicates.sh
 index 9c5a876..50f7a69 100755
 --- a/t/t5308-pack-detect-duplicates.sh
 +++ b/t/t5308-pack-detect-duplicates.sh
 @@ -77,4 +77,19 @@ test_expect_success 'index-pack can reject packs with 
 duplicates' '
   test_expect_code 1 git cat-file -e $LO_SHA1
   '
   
 +test_expect_success 'duplicated delta base does not trigger assert' '
 + clear_packs 
 + {
 + A=01d7713666f4de822776c7622c10f1b07de280dc 
 + B=e68fe8129b546b101aee9510c5328e7f21ca1d18 
 + pack_header 3 
 + pack_obj $A $B 
 + pack_obj $B 
 + pack_obj $B
 + } dups.pack 
 + pack_trailer dups.pack 
 + git index-pack --stdin dups.pack 
 + test_must_fail git index-pack --stdin --strict dups.pack
 +'
 +
   test_done

Thanks, that looks good.  But while preparing the patch I noticed that
the added test sometimes fails.  Helgrind pointed outet a race
condition.  It is not caused by the patch to turn the asserts into
regular ifs, however -- here's a Helgrind report for the original code
with the new test:

==34949== Helgrind, a thread error detector
==34949== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al.
==34949== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==34949== Command: /home/lsr/src/git/t/../bin-wrappers/git index-pack --stdin
==34949==
==34949== Helgrind, a thread error detector
==34949== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al.
==34949== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==34949== Command: /home/lsr/src/git/git index-pack --stdin
==34949==
==34949== ---Thread-Announcement--
==34949==
==34949== Thread #3 was created
==34949==at 0x594DF7E: clone (clone.S:74)
==34949==by 0x544A2B9: do_clone.constprop.3 (createthread.c:75)
==34949==by 0x544B762: pthread_create@@GLIBC_2.2.5 (createthread.c:245)
==34949==by 0x4C2D55D: pthread_create_WRK (hg_intercepts.c:269)
==34949==by 0x43ABB8: cmd_index_pack (index-pack.c:1097)
==34949==by 0x405B6A: handle_builtin (git.c:351)
==34949==by 0x404CE8: main (git.c:575)
==34949==
==34949== ---Thread-Announcement--
==34949==
==34949== Thread #2 was created
==34949==at 0x594DF7E: clone (clone.S:74)
==34949==by 0x544A2B9: do_clone.constprop.3 (createthread.c:75)
==34949==by 0x544B762: pthread_create@@GLIBC_2.2.5 (createthread.c:245)
==34949==by 0x4C2D55D: pthread_create_WRK (hg_intercepts.c:269)
==34949==by 0x43ABB8: cmd_index_pack (index-pack.c:1097)
==34949==by 0x405B6A: handle_builtin (git.c:351)
==34949==by 0x404CE8: main (git.c:575)
==34949==
==34949== 
==34949==
==34949== Possible data race during read of size 4 at 0x5E15910 by thread #3
==34949== Locks held: none
==34949==at 0x439327: find_unresolved_deltas (index-pack.c:918)
==34949==by 0x439666: threaded_second_pass (index-pack.c:1002)
==34949==by 0x4C2D6F6: mythread_wrapper (hg_intercepts.c:233)
==34949==by 0x544B0A3: start_thread (pthread_create.c:309)
==34949==
==34949== This conflicts with a previous write of size 4 by thread #2
==34949== Locks held: none
==34949==at 0x4390E2: resolve_delta (index-pack.c:865)
==34949==by 0x439340: find_unresolved_deltas (index-pack.c:919)
==34949==by 0x439666: threaded_second_pass (index-pack.c:1002)
==34949==by 0x4C2D6F6: mythread_wrapper (hg_intercepts.c:233)
==34949==by 0x544B0A3: start_thread (pthread_create.c:309)
==34949==
==34949== Address 0x5E15910 is 48 bytes inside a block of size 256 alloc'd
==34949==at 0x4C2A7D0: calloc (vg_replace_malloc.c:618)
==34949==by 0x50CA83: xcalloc (wrapper.c:119)
==34949==by 0x439AF6: cmd_index_pack (index-pack.c:1643)
==34949==by 0x405B6A: handle_builtin (git.c:351)
==34949==by 0x404CE8: main (git.c:575)
==34949==
git: builtin/index-pack.c:918: find_unresolved_deltas_1: Assertion 
`child-real_type == OBJ_REF_DELTA' failed.
==34949==
==34949== For counts of detected and suppressed errors, rerun with: -v
==34949== Use --history-level=approx or =none to gain increased speed, at
==34949== the cost of reduced accuracy of conflicting-access

Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap

2014-08-25 Thread Steffen Prohaska


On Aug 25, 2014, at 2:43 PM, Jeff King p...@peff.net wrote:

 On Sun, Aug 24, 2014 at 06:07:46PM +0200, Steffen Prohaska wrote:
 
 The data is streamed to the filter process anyway.  Better avoid mapping
 the file if possible.  This is especially useful if a clean filter
 reduces the size, for example if it computes a sha1 for binary data,
 like git media.  The file size that the previous implementation could
 handle was limited by the available address space; large files for
 example could not be handled with (32-bit) msysgit.  The new
 implementation can filter files of any size as long as the filter output
 is small enough.
 
 The new code path is only taken if the filter is required.  The filter
 consumes data directly from the fd.  The original data is not available
 to git, so it must fail if the filter fails.
 
 Can you clarify this second paragraph a bit more? If I understand
 correctly, we handle a non-required filter failing by just reading the
 data again (which we can do because we either read it into memory
 ourselves, or mmap it).

We don't read the data again.  convert_to_git() assumes that it is already
in memory and simply keeps the original buffer if the filter fails.


 With the streaming approach, we will read the
 whole file through our stream; if that fails we would then want to read
 the stream from the start.
 
 Couldn't we do that with an lseek (or even an mmap with offset 0)? That
 obviously would not work for non-file inputs, but I think we address
 that already in index_fd: we push non-seekable things off to index_pipe,
 where we spool them to memory.

It could be handled that way, but we would be back to the original problem
that 32-bit git fails for large files.  The convert code path currently
assumes that all data is available in a single buffer at some point to apply
crlf and ident filters.

If the initial filter, which is assumed to reduce the file size, fails, we
could seek to 0 and read the entire file.  But git would then fail for large
files with out-of-memory.  We would not gain anything for the use case that
I describe in the commit message's first paragraph.

To implement something like the ideal strategy below, the entire convert 
machinery for crlf and ident would have to be converted to a streaming
approach.  Another option would be to detect that only the clean filter
would be applied and not crlf and ident.  Maybe we could get away with
something simpler then.

But I think that if the clean filter's purpose is to reduce file size, it
does not make sense to try to handle the case of a failing filter with a 
fallback plan.  The filter should simply be marked required, because
any sane operation requires it.


 So it seems like the ideal strategy would be:
 
  1. If it's seekable, try streaming. If not, fall back to lseek/mmap.
 
  2. If it's not seekable and the filter is required, try streaming. We
 die anyway if we fail.
 
  3. If it's not seekable and the filter is not required, decide based
 on file size:
 
   a. If it's small, spool to memory and proceed as we do now.
 
   b. If it's big, spool to a seekable tempfile.
 
 Your patch implements part 2. But I would think part 1 is the most common
 case. And while part 3b seems unpleasant, it is better than the current
 code (with or without your patch), which will do 3a on a large file.
 
 Hmm. Though I guess in (3) we do not have the size up front, so it's
 complicated (we could spool N bytes to memory, then start dumping to a
 file after that). I do not think we necessarily need to implement that
 part, though. It seems like (1) is the thing I would expect to hit the
 most (i.e., people do not always mark their filters are required).

Well, I think they have to mark it if the filter's purpose is to reduce size.

I'll add a bit of the discussion to the commit message.  I'm not convinced
that we should do more at this point.


 +} else {
 +/* dup(), because copy_fd() closes the input fd. */
 +fd = dup(params-fd);
 
 Not a problem you are introducing, but this seem kind of like a
 misfeature in copy_fd. Is it worth fixing? The function only has two
 existing callers.

I found it confusing.  I think it's worth fixing.

Steffen
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/5] fast-import: fix buffer overflow in dump_tags

2014-08-25 Thread Ronnie Sahlberg

Jeff,
We have a fix like this in the next set of transaction updates.
https://code-review.googlesource.com/#/c/1012/13/fast-import.c

However, if your concerns are the integrity of the servers and not
taking any chances
you might not want to wait for my patches to graduate.


ronnie sahlberg

On Fri, Aug 22, 2014 at 10:32 PM, Jeff King p...@peff.net wrote:
 When creating a new annotated tag, we sprintf the refname
 into a static-sized buffer. If we have an absurdly long
 tagname, like:

   git init repo 
   cd repo 
   git commit --allow-empty -m foo 
   git tag -m message mytag 
   git fast-export mytag |
   perl -lpe '/^tag/ and s/mytag/a x 8192/e' |
   git fast-import input

 we'll overflow the buffer. We can fix it by using a strbuf.

 Signed-off-by: Jeff King p...@peff.net
 ---
 I'm not sure how easily exploitable this is. The buffer is on the stack,
 and we definitely demolish the return address. But we never actually
 return from the function, since lock_ref_sha1 will fail in such a case
 and die (it cannot succeed because the name is longer than PATH_MAX,
 which we check when concatenating it with $GIT_DIR).

 Still, there is no limit to the size of buffer you can feed it, so it's
 entirely possible you can overwrite something else and cause some
 mischief. So I wouldn't call it trivially exploitable, but I would not
 bet my servers that it is not (and of course it is easy to trigger if
 you can convince somebody to run fast-import a stream, so any remote
 helpers produce a potentially vulnerable situation).

  fast-import.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

 diff --git a/fast-import.c b/fast-import.c
 index f25a4ae..a1479e9 100644
 --- a/fast-import.c
 +++ b/fast-import.c
 @@ -1734,14 +1734,16 @@ static void dump_tags(void)
 static const char *msg = fast-import;
 struct tag *t;
 struct ref_lock *lock;
 -   char ref_name[PATH_MAX];
 +   struct strbuf ref_name = STRBUF_INIT;

 for (t = first_tag; t; t = t-next_tag) {
 -   sprintf(ref_name, tags/%s, t-name);
 -   lock = lock_ref_sha1(ref_name, NULL);
 +   strbuf_reset(ref_name);
 +   strbuf_addf(ref_name, tags/%s, t-name);
 +   lock = lock_ref_sha1(ref_name.buf, NULL);
 if (!lock || write_ref_sha1(lock, t-sha1, msg)  0)
 -   failure |= error(Unable to update %s, ref_name);
 +   failure |= error(Unable to update %s, ref_name.buf);
 }
 +   strbuf_release(ref_name);
  }

  static void dump_marks_helper(FILE *f,
 --
 2.1.0.346.ga0367b9

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

59 matches

Mail list logo