Re: [PATCH] userdiff: add built-in pattern for CSS

2016-06-02 Thread Johannes Sixt

Am 03.06.2016 um 00:48 schrieb William Duclot:

Logic behind the "pattern" regex is:


The name of the macro parameter is "pattern", but the actual meaning is 
"function name" regex.



diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..81f60ad 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -525,6 +525,8 @@ patterns are available:

  - `csharp` suitable for source code in the C# language.

+- `css` suitable for source code in the CSS language.


CSS is not so much source code. How about "suitable for cascaded style 
sheets"?



diff --git a/t/t4018/css-common b/t/t4018/css-common
new file mode 100644
index 000..84ed754
--- /dev/null
+++ b/t/t4018/css-common
@@ -0,0 +1,4 @@
+RIGHT label.control-label {
+margin-top: 10px!important;
+border : 10px ChangeMe #C6C6C6;
+}



diff --git a/t/t4018/css-rule b/t/t4018/css-rule
new file mode 100644
index 000..84ed754
--- /dev/null
+++ b/t/t4018/css-rule
@@ -0,0 +1,4 @@
+RIGHT label.control-label {
+margin-top: 10px!important;
+border : 10px ChangeMe #C6C6C6;
+}


These two are the same. Please pick only one. I propose the name 
"common" because it is how CSS rules are written most commonly.



+IPATTERN("css",
+"!^.*;\n"


Is there a difference between this and "!;\n"? Is it necessary to anchor 
the pattern at the beginning of the line?


In the commit message you talk about colon (':'), but you actually use a 
semicolon (';'). Thinking a bit more about it, rejecting lines with 
either one would be even better. Consider this case (without the 
indentation):


   h1 {
   color:
   red;
   }

(New test case, hint, hint!) Therefore, it could be: "![:;]\n".


+"^[_a-z0-9].*$",
+/* -- */
+/*
+ * This regex comes from W3C CSS specs. Should theoretically also
+ * allow ISO 10646 characters U+00A0 and higher,
+ * but they are not handled in this regex.
+ */
+"-?[_a-zA-F][-_a-zA-F0-9]*" /* identifiers */


Drop A-F.


+"|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */


Here, too: it is an IPATTERN.

-- Hannes

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jun 2016, #01; Thu, 2)

2016-06-02 Thread Torsten Bögershausen

On 06/03/2016 01:13 AM, Mike Hommey wrote:

On Thu, Jun 02, 2016 at 03:52:41PM -0700, Junio C Hamano wrote:

* mh/connect (2016-06-01) 9 commits
  - connect: move ssh command line preparation to a separate function
  - connect: actively reject git:// urls with a user part
  - connect: change the --diag-url output to separate user and host
  - connect: make parse_connect_url() return the user part of the url as a 
separate value
  - connect: group CONNECT_DIAG_URL handling code
  - connect: make parse_connect_url() return separated host and port
  - connect: re-derive a host:port string from the separate host and port 
variables
  - connect: call get_host_and_port() earlier
  - connect: document why we sometimes call get_port after get_host_and_port

  Needs review.

I /think/ Torsten reviewed it all, and his last comments are in
$gmane/295800. It's still not clear to me why he wants to remove the
comment about [].

There where 2 comments in the review.
The most important thing is that now
git://[example.com:123]/path/to/repo is valid, but it shouldn't.
This patch fixes it:

@@ -673,7 +669,7 @@ static enum protocol parse_connect_url(const char 
*url_orig, char **ret_user,
 * "host:port" and NULL.
 * To support this undocumented legacy we still need to split the port.
 */
-   if (!port)
+   if (!port && protocol == PROTO_SSH)


The other thing is that I asked for a test case for
git://[example.com:123]/path/to/repo
which shouldn't be hard to do.

If nobody else things that this comment in the code is stale:
-   /*
-* Don't do destructive transforms as protocol code does
-* '[]' unwrapping in get_host_and_port()
-*/

then just leave it as it is.
Thanks
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t1308: do not get fooled by symbolic links to the source tree

2016-06-02 Thread Jeff King
On Thu, Jun 02, 2016 at 04:23:00PM -0700, Stefan Beller wrote:

> > To prevent this in the future, I switched my default --root= to point to
> > a symlink. I wonder if we could do something in the test suite, though,
> > as we did long ago by introducing "trash directory" with a space to
> > catch corner cases.
> >
> > I guess it would be something like:
> >
> >   if test_have_prereq SYMLINKS
> 
> IIUC this would need each test to be marked with SYMLINKS
> when testing with symlinks is desired. Marking a test with that
> is easily forgotten, so I'd rather do it by default as:
> 
> if (system supports symlinks):
> >   then
> > mkdir "$TRASH_DIRECTORY.real" &&
> > ln -s "$TRASH_DIRECTORY.real" "$TRASH_DIRECTORY"
> >   else
> > mkdir "$TRASH_DIRECTORY"
> >   fi

I'm not sure I understand. My intent was to say "does the system support
symlinks" in test-lib.sh when setting up the trash directory. The
test_have_prereq function asks that, AFAIK (the "test_" prefix is not
"does _this_ test require it" but just "this function is in the test
namespace").

> I like the idea of testing with symlinks. (Does it have performance issues
> when everything goes through symlinks?)

I didn't notice any on my system (running with --root pointed at a
directory, and at a symlink to that directory). It would be extra work
whenever we determine a canonical absolute path, but I suspect that is
drowned out in the noise of the rest of the test suite. Plus some minor
extra work in the `ln` and `rm` calls during test setup/teardown, but
likewise that should be a small part of the total cost.

> On the other hand if we do tests by default in a symlinked path, we could
> introduce errors more easily in non-symlinked path, but that is what 
> developers
> use for developing (I guess), so it's not as likely?

True, but I'd be surprised if there are bugs that show up in a
non-symlinked path that _do_ show up in a symlinked one.

I'm not sure if we've seen a case where this would find an actual bug in
git, though. The cases I remember were mostly about the test suite being
picky (i.e., git canonicalizes but the expected output doesn't, or vice
versa).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] pathspec: allow escaped query values

2016-06-02 Thread Stefan Beller
On Thu, Jun 2, 2016 at 4:23 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> However if we add a value restriction here, we need to be as strict in the
>> .gitattributes parsing as well and put a warning there (similar to
>> invalid_attr_name_message) I would think.
>
> Remember, the attribute system is used for many purposes other than
> this new "further limit pathspec".

Right, and in the past we followed a rigid pattern of non-crazy values.
The crazy values will show up once users realize they can
put anything they want into the .gitattributes to query for files.

Before this labeling change, there was no need to put anything
other the officially documented values into the attribute files.

> So I do not think it is necessary or even beneficial to add such a
> warning.

Ok.

>
>> +static char *attr_value_unescape(const char *value)
>> +{
>> + const char *src;
>> + char *dst, *ret;
>> +
>> + ret = xmallocz(strlen(value));
>> + for (src = value, dst = ret; *src; src++, dst++) {
>> + if (*src == '\\') {
>> + if (!src[1])
>> + die(_("Escape character '\\' not allowed as "
>> +   "last character in attr value"));
>> + src++;
>> + }
>> + if (*src && invalid_value_char(*src))
>> + die("cannot use '%c' for value matching", *src);
>> + *dst = *src;
>> + }
>> + *dst = '\0';
>> + return ret;
>> +}
>
> Please sanity-check me.  Just like I said to your original "I doubt
> *i could be NUL here", I now doubt *src could be NUL there where
> invalid_value_char() gets called.
>
> If *src could be NUL there, then *dst gets NUL once, and then after
> loop exits (presumably after incrementing dst), *dst gets another
> NUL, which was the terminating NUL condition being iffy I mentioned,
> but as you said, I do not think it would happen, so we can lose the
> "*src && " before invalid_value_char() is called.

Right, we can lose the *src check before invalid_value_char
as that ought to be caught in  if (!src[1]) die(...) before.

Thanks,
Stefan

>
> Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] pathspec: allow escaped query values

2016-06-02 Thread Junio C Hamano
Stefan Beller  writes:

> However if we add a value restriction here, we need to be as strict in the
> .gitattributes parsing as well and put a warning there (similar to
> invalid_attr_name_message) I would think.

Remember, the attribute system is used for many purposes other than
this new "further limit pathspec".  While the (hopefully temporary)
limitation hurts here, users will limit the attributes they use for
":(attr:VAR=VAL)" pathspec magic, and it is unlikely that either the
VAR part or its VALue are something the core Git currently uses
anyway.

So I do not think it is necessary or even beneficial to add such a
warning.

> +static char *attr_value_unescape(const char *value)
> +{
> + const char *src;
> + char *dst, *ret;
> +
> + ret = xmallocz(strlen(value));
> + for (src = value, dst = ret; *src; src++, dst++) {
> + if (*src == '\\') {
> + if (!src[1])
> + die(_("Escape character '\\' not allowed as "
> +   "last character in attr value"));
> + src++;
> + }
> + if (*src && invalid_value_char(*src))
> + die("cannot use '%c' for value matching", *src);
> + *dst = *src;
> + }
> + *dst = '\0';
> + return ret;
> +}

Please sanity-check me.  Just like I said to your original "I doubt
*i could be NUL here", I now doubt *src could be NUL there where
invalid_value_char() gets called.

If *src could be NUL there, then *dst gets NUL once, and then after
loop exits (presumably after incrementing dst), *dst gets another
NUL, which was the terminating NUL condition being iffy I mentioned,
but as you said, I do not think it would happen, so we can lose the
"*src && " before invalid_value_char() is called.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t1308: do not get fooled by symbolic links to the source tree

2016-06-02 Thread Stefan Beller
On Thu, Jun 2, 2016 at 4:16 PM, Jeff King  wrote:
> On Thu, Jun 02, 2016 at 03:15:39PM -0700, Junio C Hamano wrote:
>
>> When your $PWD does not match $(/bin/pwd), e.g. you have your copy
>> of the git source tree in one place, point it with a symbolic link,
>> and then "cd" to that symbolic link before running 'make test', one
>> of the tests in t1308 expects that the per-user configuration was
>> reported to have been read from the true path (i.e. relative to the
>> target of such a symbolic link), but the test-config program reports
>> a path relative to $PWD (i.e. the symbolic link).
>>
>> Instead, expect a path relative to $HOME (aka $TRASH_DIRECTORY), as
>> per-user configuration is read from $HOME/.gitconfig and the test
>> framework sets these shell variables up in such a way to avoid this
>> problem.
>
> Looks good.
>
>   Acked-by: Jeff King 
>
> To prevent this in the future, I switched my default --root= to point to
> a symlink. I wonder if we could do something in the test suite, though,
> as we did long ago by introducing "trash directory" with a space to
> catch corner cases.
>
> I guess it would be something like:
>
>   if test_have_prereq SYMLINKS

IIUC this would need each test to be marked with SYMLINKS
when testing with symlinks is desired. Marking a test with that
is easily forgotten, so I'd rather do it by default as:

if (system supports symlinks):
>   then
> mkdir "$TRASH_DIRECTORY.real" &&
> ln -s "$TRASH_DIRECTORY.real" "$TRASH_DIRECTORY"
>   else
> mkdir "$TRASH_DIRECTORY"
>   fi
>
> but there may be some other tweaks required (e.g., for cleanup).

I like the idea of testing with symlinks. (Does it have performance issues
when everything goes through symlinks?)

On the other hand if we do tests by default in a symlinked path, we could
introduce errors more easily in non-symlinked path, but that is what developers
use for developing (I guess), so it's not as likely?

Thanks,
Stefan

>
> -Peff
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t1308: do not get fooled by symbolic links to the source tree

2016-06-02 Thread Jeff King
On Thu, Jun 02, 2016 at 03:15:39PM -0700, Junio C Hamano wrote:

> When your $PWD does not match $(/bin/pwd), e.g. you have your copy
> of the git source tree in one place, point it with a symbolic link,
> and then "cd" to that symbolic link before running 'make test', one
> of the tests in t1308 expects that the per-user configuration was
> reported to have been read from the true path (i.e. relative to the
> target of such a symbolic link), but the test-config program reports
> a path relative to $PWD (i.e. the symbolic link).
> 
> Instead, expect a path relative to $HOME (aka $TRASH_DIRECTORY), as
> per-user configuration is read from $HOME/.gitconfig and the test
> framework sets these shell variables up in such a way to avoid this
> problem.

Looks good.

  Acked-by: Jeff King 

To prevent this in the future, I switched my default --root= to point to
a symlink. I wonder if we could do something in the test suite, though,
as we did long ago by introducing "trash directory" with a space to
catch corner cases.

I guess it would be something like:

  if test_have_prereq SYMLINKS
  then
mkdir "$TRASH_DIRECTORY.real" &&
ln -s "$TRASH_DIRECTORY.real" "$TRASH_DIRECTORY"
  else
mkdir "$TRASH_DIRECTORY"
  fi

but there may be some other tweaks required (e.g., for cleanup).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv3] pathspec: allow escaped query values

2016-06-02 Thread Stefan Beller
In our own .gitattributes file we have attributes such as:

*.[ch] whitespace=indent,trail,space

When querying for attributes we want to be able to ask for the exact
value, i.e.

git ls-files :(attr:whitespace=indent,trail,space)

should work, but the commas are used in the attr magic to introduce
the next attr, such that this query currently fails with

fatal: Invalid pathspec magic 'trail' in ':(attr:whitespace=indent,trail,space)'

This change allows escaping characters by a backslash, such that the query

git ls-files :(attr:whitespace=indent\,trail\,space)

will match all path that have the value "indent,trail,space" for the
whitespace attribute. To accomplish this, we need to modify two places.
First `eat_long_magic` needs to not stop early upon seeing a comma or
closing paren that is escaped. As a second step we need to remove any
escaping from the attr value.

Helped-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
---


> That would be true _only_ when "find next escape and copy up to that
> byte" aka "scanning once with optimized strchr(), and copying once
> with optimized memmove()" is faster than "scanning once and copying"
> loop.

Oh right. that would work for larger strings that don't fit into a cache
very well, but in our expected case this will do.

> I was merely reacting to your use of memmove() in a very different
> loop, where if you unescape "a\b\c\defghijk", your memmove() would
> move "efghijk" many times.

Right, at the time of writing I didn't like it, but was ok with it as
we only have a few escapes.

> Also the handling of the terminating NUL may need to be
> updated.

I don't think so.

> That is why I did not say "replace yours with this", but
> merely "along the lines of this" ;-)

This is pretty much your code modulo nits (xmalloz, semicolons) now,
and I am convinced it works by the test.

However if we add a value restriction here, we need to be as strict in the
.gitattributes parsing as well and put a warning there (similar to
invalid_attr_name_message) I would think.

Thanks,
Stefan

 pathspec.c  | 52 +
 t/t6134-pathspec-with-labels.sh | 10 
 2 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/pathspec.c b/pathspec.c
index 326863a..fe53ddf 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -89,6 +89,51 @@ static void prefix_short_magic(struct strbuf *sb, int 
prefixlen,
strbuf_addf(sb, ",prefix:%d)", prefixlen);
 }
 
+static size_t strcspn_escaped(const char *s, const char *stop)
+{
+   const char *i;
+
+   for (i = s; *i; i++) {
+   /* skip the escaped character */
+   if (i[0] == '\\' && i[1]) {
+   i++;
+   continue;
+   }
+
+   if (strchr(stop, *i))
+   break;
+   }
+   return i - s;
+}
+
+static inline int invalid_value_char(const char ch)
+{
+   if (isalnum(ch) || strchr(",-_", ch))
+   return 0;
+   return -1;
+}
+
+static char *attr_value_unescape(const char *value)
+{
+   const char *src;
+   char *dst, *ret;
+
+   ret = xmallocz(strlen(value));
+   for (src = value, dst = ret; *src; src++, dst++) {
+   if (*src == '\\') {
+   if (!src[1])
+   die(_("Escape character '\\' not allowed as "
+ "last character in attr value"));
+   src++;
+   }
+   if (*src && invalid_value_char(*src))
+   die("cannot use '%c' for value matching", *src);
+   *dst = *src;
+   }
+   *dst = '\0';
+   return ret;
+}
+
 static void parse_pathspec_attr_match(struct pathspec_item *item, const char 
*value)
 {
struct string_list_item *si;
@@ -131,10 +176,9 @@ static void parse_pathspec_attr_match(struct pathspec_item 
*item, const char *va
if (attr[attr_len] != '=')
am->match_mode = MATCH_SET;
else {
+   const char *v = [attr_len + 1];
am->match_mode = MATCH_VALUE;
-   am->value = xstrdup([attr_len + 1]);
-   if (strchr(am->value, '\\'))
-   die(_("attr spec values must not 
contain backslashes"));
+   am->value = attr_value_unescape(v);
}
break;
}
@@ -166,7 +210,7 @@ static void eat_long_magic(struct pathspec_item *item, 
const char *elt,
for (copyfrom = elt + 2;
 *copyfrom && *copyfrom != ')';
 copyfrom = nextat) {
-   size_t len = strcspn(copyfrom, ",)");
+   size_t len = strcspn_escaped(copyfrom, ",)");
if (copyfrom[len] 

Re: What's cooking in git.git (Jun 2016, #01; Thu, 2)

2016-06-02 Thread Mike Hommey
On Thu, Jun 02, 2016 at 03:52:41PM -0700, Junio C Hamano wrote:
> * mh/connect (2016-06-01) 9 commits
>  - connect: move ssh command line preparation to a separate function
>  - connect: actively reject git:// urls with a user part
>  - connect: change the --diag-url output to separate user and host
>  - connect: make parse_connect_url() return the user part of the url as a 
> separate value
>  - connect: group CONNECT_DIAG_URL handling code
>  - connect: make parse_connect_url() return separated host and port
>  - connect: re-derive a host:port string from the separate host and port 
> variables
>  - connect: call get_host_and_port() earlier
>  - connect: document why we sometimes call get_port after get_host_and_port
> 
>  Needs review.

I /think/ Torsten reviewed it all, and his last comments are in
$gmane/295800. It's still not clear to me why he wants to remove the
comment about [].

Mike
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] rev-list: disable bitmaps when "-n" is used with listing objects

2016-06-02 Thread Jeff King
You can ask rev-list to use bitmaps to speed up an --objects
traversal, which should generally give you your answers much
faster.

Likewise, you can ask rev-list to limit such a traversal
with `-n`, in which case we'll show only a limited set of
commits (and only the tree and commit objects directly
reachable from those commits).

But if you do both together, the results are nonsensical. We
end up limiting any fallback traversal we do to _find_ the
bitmaps, but the actual set of objects we output will be
picked arbitrarily from the union of any bitmaps we do find,
and will involve the objects of many more commits.

It's possible that somebody might want this as a "show me
what you can, but limit the amount of work you do" flag.
But as with the prior commit clamping "--count", the results
are basically non-deterministic; you'll get the values from
some commits between `n` and the total number, and you can't
tell which.

And unlike the `--count` case, we can't easily generate the
"real" value from the bitmap values (you can't just walk
back `-n` commits and subtract out the reachable objects
from the boundary commits; the bitmaps for `X` record its
total reachability, so you don't know which objects are
directly from `X` itself, which from `X^`, and so on).

So let's just fallback to the non-bitmap code path in this
case, so we always give a sane answer.

Signed-off-by: Jeff King 
---
 builtin/rev-list.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index aaa79a3..9fb6608 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -365,7 +365,8 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
printf("%d\n", commit_count);
return 0;
}
-   } else if (revs.tag_objects && revs.tree_objects && 
revs.blob_objects) {
+   } else if (!revs.max_count &&
+  revs.tag_objects && revs.tree_objects && 
revs.blob_objects) {
if (!prepare_bitmap_walk()) {
traverse_bitmap_commit_list(_object_fast);
return 0;
-- 
2.9.0.rc1.132.g33c7f30
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] rev-list: "adjust" results of "--count --use-bitmap-index -n"

2016-06-02 Thread Jeff King
If you ask rev-list for:

git rev-list --count --use-bitmap-index HEAD

we optimize out the actual traversal and just give you the
number of bits set in the commit bitmap. This is faster,
which is good.

But if you ask to limit the size of the traversal, like:

git rev-list --count --use-bitmap-index -n 100 HEAD

we'll still output the full bitmapped number we found. On
the surface, that might even seem OK. You explicitly asked
to use the bitmap index, and it was cheap to compute the
real answer, so we gave it to you.

But there's something much more complicated going on under
the hood. If we don't have a bitmap directly for HEAD, then
we have to actually traverse backwards, looking for a
bitmapped commit. And _that_ traversal is bounded by our
`-n` count.

This is a good thing, because it bounds the work we have to
do, which is probably what the user wanted by asking for
`-n`. But now it makes the output quite confusing. You might
get many values:

  - your `-n` value, if we walked back and never found a
bitmap (or fewer if there weren't that many commits)

  - the actual full count, if we found a bitmap root for
every path of our traversal with in the `-n` limit

  - any number in between! We might have walked back and
found _some_ bitmaps, but then cut off the traversal
early with some commits not accounted for in the result.

So you cannot even see a value higher than your `-n` and say
"OK, bitmaps kicked in, this must be the real full count".
The only sane thing is for git to just clamp the value to a
maximum of the `-n` value, which means we should output the
exact same results whether bitmaps are in use or not.

The test in t5310 demonstrates this by using `-n 1`.
Without this patch we fail in the full-bitmap case (where we
do not have to traverse at all) but _not_ in the
partial-bitmap case (where we have to walk down to find an
actual bitmap). With this patch, both cases just work.

I didn't implement the crazy in-between case, just because
it's complicated to set up, and is really a subset of the
full-count case, which we do cover.

Signed-off-by: Jeff King 
---
 builtin/rev-list.c  | 2 ++
 t/t5310-pack-bitmaps.sh | 6 ++
 2 files changed, 8 insertions(+)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 275da0d..aaa79a3 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -360,6 +360,8 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
uint32_t commit_count;
if (!prepare_bitmap_walk()) {
count_bitmap_commit_list(_count, NULL, 
NULL, NULL);
+   if (revs.max_count && revs.max_count < 
commit_count)
+   commit_count = revs.max_count;
printf("%d\n", commit_count);
return 0;
}
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index d446706..3893afd 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -47,6 +47,12 @@ rev_list_tests() {
test_cmp expect actual
'
 
+   test_expect_success "counting commits with limit ($state)" '
+   git rev-list --count -n 1 HEAD >expect &&
+   git rev-list --use-bitmap-index --count -n 1 HEAD >actual &&
+   test_cmp expect actual
+   '
+
test_expect_success "counting non-linear history ($state)" '
git rev-list --count other...master >expect &&
git rev-list --use-bitmap-index --count other...master >actual 
&&
-- 
2.9.0.rc1.132.g33c7f30

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] corner cases with "rev-list --use-bitmap-index -n"

2016-06-02 Thread Jeff King
This series fixes two corner-cases I found when using "-n" along with
bitmaps. The results do make _some_ sense if you interpret them
correctly, but are sufficiently confusing that I think it's worth
dealing with. See the commit messages for the gory details.

The good news is that in the first case, we can produce a more sensible
answer without spending any extra work.

The bad news is that the second case cannot, and must fall back to a
regular traversal. I doubt anybody even cares about this case in
practice, though, as "--objects -n" is somewhat nonsensical already (I
didn't run across it in practice; I only noticed while I fixing the
first one, which I did see in practice).

  [1/2]: rev-list: "adjust" results of "--count --use-bitmap-index -n"
  [2/2]: rev-list: disable bitmaps when "-n" is used with listing objects

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] userdiff: add built-in pattern for CSS

2016-06-02 Thread Junio C Hamano
William Duclot  writes:

> diff --git a/userdiff.c b/userdiff.c
> index 6bf2505..00fc3bf 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -148,6 +148,18 @@ PATTERNS("csharp",
>"[a-zA-Z_][a-zA-Z0-9_]*"
>"|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>"|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
> +IPATTERN("css",
> +  "!^.*;\n"
> +  "^[_a-z0-9].*$",
> +  /* -- */
> +  /*
> +   * This regex comes from W3C CSS specs. Should theoretically also
> +   * allow ISO 10646 characters U+00A0 and higher,
> +   * but they are not handled in this regex.
> +   */
> +  "-?[_a-zA-F][-_a-zA-F0-9]*" /* identifiers */
> +  "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */

You could now lose A-F from the above two lines.  In fact, the first
line "identifiers" has "a-zA-F", which probably would work correctly
under IPATTERN() to include 'G-Z' as part of the legit letters for
identifiers, but is indeed misleading.

> +),
>  { "default", NULL, -1, { NULL, 0 } },
>  };
>  #undef PATTERNS
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] pathspec: allow escaped query values

2016-06-02 Thread Junio C Hamano
Stefan Beller  writes:

> Thinking about efficiency, I have the believe that memmove can be faster
> than a `*src=*dst` thing we do ourselves as it may have access to specialized
> assembly instructions to move larger chunks of memory or such.

> So I think ideally we would do a block copy between the escape characters
> (sketched as:)
>
> last = input
> while input not ended:
> current = find next escape character in input
> memcpy from input value in the range of last to current
> last = current + 1
> copy remaining parts if no further escape is found

That would be true _only_ when "find next escape and copy up to that
byte" aka "scanning once with optimized strchr(), and copying once
with optimized memmove()" is faster than "scanning once and copying"
loop.

I was merely reacting to your use of memmove() in a very different
loop, where if you unescape "a\b\c\defghijk", your memmove() would
move "efghijk" many times.

>> ret = xmalloc(strlen(value));
>
> xmallocz at least?

Yes.  Also the handling of the terminating NUL may need to be
updated.  That is why I did not say "replace yours with this", but
merely "along the lines of this" ;-)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] pathspec: allow escaped query values

2016-06-02 Thread Stefan Beller
On Thu, Jun 2, 2016 at 2:54 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> In our own .gitattributes file we have attributes such as:
>>
>> *.[ch] whitespace=indent,trail,space
>>
>> When querying for attributes we want to be able to ask for the exact
>> value, i.e.
>>
>> git ls-files :(attr:whitespace=indent,trail,space)
>>
>> should work, but the commas are used in the attr magic to introduce
>> the next attr, ...
>> ...
>> So here is the "escaping only, but escaping done right" version.
>> (It goes on top of sb/pathspec-label)
>
> The phrase "should work" is probably a bit too strong (I'd have said
> "it would be nice if this worked"), as we do not have to even
> support comma for our immediately expected use cases.  Allowing it
> merely makes a casual test using our own .gitattributes easier.
>
>> +static size_t strcspn_escaped(const char *s, const char *reject)
>
> Perhaps s/reject/stop/?
>
>> +{
>> + const char *i, *j;
>> +
>> + for (i = s; *i; i++) {
>> + /* skip escaped the character */
>> + if (i[0] == '\\' && i[1]) {
>> + i++;
>> + continue;
>> + }
>> + /* see if any of the chars matches the current character */
>> + for (j = reject; *j; j++)
>> + if (!*i || *i == *j)
>> + return i - s;
>
> I somehow doubt that *i can be NUL here.  In any case, this looks
> more like
>
> /* is *i is one of the stop codon? */
> if (strchr(stop, *i))
> break;
>
>> + }
>> + return i - s;
>> +}
>
>> +static char *attr_value_unescape(const char *value)
>> +{
>> + char *i, *ret = xstrdup(value);
>> +
>> + for (i = ret; *i; i++) {
>> + if (i[0] == '\\') {
>> + if (!i[1])
>> + die(_("Escape character '\\' not allowed as "
>> +   "last character in attr value"));
>> +
>> + /* remove the backslash */
>> + memmove(i, i + 1, strlen(i));
>> + /* and ignore the character after that */
>> + i++;
>> + }
>> + }
>> + return ret;
>> +}
>> +
>
> Repeated memmove() and strlen() somehow bothers me.  Would there be
> a more efficient and straight-forward way to do this, perhaps along
> the lines of this instead?

Thinking about efficiency, I have the believe that memmove can be faster
than a `*src=*dst` thing we do ourselves as it may have access to specialized
assembly instructions to move larger chunks of memory or such.

So I think ideally we would do a block copy between the escape characters
(sketched as:)

last = input
while input not ended:
current = find next escape character in input
memcpy from input value in the range of last to current
last = current + 1
copy remaining parts if no further escape is found

It doesn't seem worth the effort to get it right though.

>
> const char *src;
> char *dst, *ret;
>
> ret = xmalloc(strlen(value));

xmallocz at least?

> for (src = value, dst = ret; *src; src++, dst++) {
> if (*src == '\\') {
> if (!src[1])
> die();
> src++;
> }
> if (*src && invalid_value_char(*src))
> die("cannot use '%c' for value matching", *src)
> *dst = *src;
> }
> *dst = '\0'
> return ret;
>

Thanks,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


What's cooking in git.git (Jun 2016, #01; Thu, 2)

2016-06-02 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

2.9-rc2 is scheduled for early next week.  A handful low-impact
topics are expected to be in 'master' by then.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[New Topics]

* et/add-chmod-x (2016-06-01) 2 commits
 - SQUASH??? add: add --chmod=+x / --chmod=-x options
 - add: add --chmod=+x / --chmod=-x options

 Waiting for a reroll (or ack for proposed fixup).


* jc/t2300-setup (2016-06-01) 1 commit
 - t2300: run git-sh-setup in an environment that better mimics the real life

 A test fix.

 Will merge to 'next'.


* jk/shell-portability (2016-06-01) 2 commits
 - t5500 & t7403: lose bash-ism "local"
 - test-lib: add in-shell "env" replacement

 test fixes.

 Will merge to 'next'.


* mh/connect (2016-06-01) 9 commits
 - connect: move ssh command line preparation to a separate function
 - connect: actively reject git:// urls with a user part
 - connect: change the --diag-url output to separate user and host
 - connect: make parse_connect_url() return the user part of the url as a 
separate value
 - connect: group CONNECT_DIAG_URL handling code
 - connect: make parse_connect_url() return separated host and port
 - connect: re-derive a host:port string from the separate host and port 
variables
 - connect: call get_host_and_port() earlier
 - connect: document why we sometimes call get_port after get_host_and_port

 Needs review.


* sb/submodule-helper-list-signal-unmatch-via-exit-status (2016-06-01) 1 commit
  (merged to 'next' on 2016-06-02 at 29064a2)
 + submodule--helper: offer a consistent API
 (this branch is used by sb/submodule-helper-relative-path.)

 The way how "submodule--helper list" signals unmatch error to its
 callers has been updated.

 Will merge to 'master'.


* sb/submodule-helper-relative-path (2016-06-01) 1 commit
  (merged to 'next' on 2016-06-02 at 8a191e1)
 + submodule: remove bashism from shell script
 (this branch uses sb/submodule-helper-list-signal-unmatch-via-exit-status.)

 A bash-ism "local" has been removed from "git submodule" scripted
 Porcelain.

 Will merge to 'master'.


* mr/send-email-doc-gmail-2fa (2016-06-01) 1 commit
  (merged to 'next' on 2016-06-02 at cd2ade8)
 + Documentation/git-send-email: fix typo in gmail 2FA section

 Typofix.

 Will merge to 'master'.


* aq/upload-pack-use-parse-options (2016-05-31) 1 commit
 - upload-pack.c: use parse-options API

 "git upload-pack" command has been updated to use the parse-options
 API.

 Will merge to 'next'.


* jc/clear-pathspec (2016-06-02) 1 commit
 - pathspec: rename free_pathspec() to clear_pathspec()

 We usually call a function that clears the contents a data
 structure X without freeing the structure itself clear_X(), and
 call a function that does clear_X() and also frees it free_X().
 free_pathspec() function has been renamed to clear_pathspec()
 to avoid confusion.

 Will merge to 'next'.

--
[Stalled]

* sb/bisect (2016-04-15) 22 commits
 - SQUASH???
 - bisect: get back halfway shortcut
 - bisect: compute best bisection in compute_relevant_weights()
 - bisect: use a bottom-up traversal to find relevant weights
 - bisect: prepare for different algorithms based on find_all
 - bisect: rename count_distance() to compute_weight()
 - bisect: make total number of commits global
 - bisect: introduce distance_direction()
 - bisect: extract get_distance() function from code duplication
 - bisect: use commit instead of commit list as arguments when appropriate
 - bisect: replace clear_distance() by unique markers
 - bisect: use struct node_data array instead of int array
 - bisect: get rid of recursion in count_distance()
 - bisect: make algorithm behavior independent of DEBUG_BISECT
 - bisect: make bisect compile if DEBUG_BISECT is set
 - bisect: plug the biggest memory leak
 - bisect: add test for the bisect algorithm
 - t6030: generalize test to not rely on current implementation
 - t: use test_cmp_rev() where appropriate
 - t/test-lib-functions.sh: generalize test_cmp_rev
 - bisect: allow 'bisect run' if no good commit is known
 - bisect: write about `bisect next` in documentation

 The internal algorithm used in "git bisect" to find the next commit
 to check has been optimized greatly.

 Expecting a reroll.
 ($gmane/291163)


* nd/shallow-deepen (2016-04-13) 26 commits
 - fetch, upload-pack: --deepen=N extends shallow boundary by N commits
 - upload-pack: add get_reachable_list()
 - upload-pack: split check_unreachable() in two, prep for get_reachable_list()
 - t5500, t5539: tests for shallow depth excluding a ref
 - clone: define shallow 

[PATCH] userdiff: add built-in pattern for CSS

2016-06-02 Thread William Duclot
CSS is widely used, motivating it being included as a built-in pattern.

It must be noted that the word_regex for CSS (i.e. the regex defining
what is a word in the language) does not consider '.' and '#' characters
(in CSS selectors) to be part of the word. This behavior is documented
by the test t/t4018/css-rule.
The logic behind this behavior is the following: identifiers in CSS
selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
character are not part of the identifier, but an indicator of the nature
of the identifier in HTML/XML (class or id). Diffing ".class1" and
".class2" must show that the class name is changed, but we still are
selecting a class.

Logic behind the "pattern" regex is:
1. reject lines containing a colon (properties)
2. if a line begins with a name in column 1, pick the whole line

Credits to Johannes Sixt (j...@kdbg.org) for the pattern regex and most
of the tests.

Signed-off-by: William Duclot 
Signed-off-by: Matthieu Moy 
---
Changes since V2:
- pattern regex has changed
- more tests

 Documentation/gitattributes.txt |  2 ++
 t/t4018-diff-funcname.sh|  1 +
 t/t4018/css-brace-in-col-1  |  5 +
 t/t4018/css-common  |  4 
 t/t4018/css-long-selector-list  |  6 ++
 t/t4018/css-prop-sans-indent|  5 +
 t/t4018/css-rule|  4 
 t/t4018/css-short-selector-list |  4 
 t/t4034-diff-words.sh   |  1 +
 t/t4034/css/expect  | 16 
 t/t4034/css/post| 10 ++
 t/t4034/css/pre | 10 ++
 userdiff.c  | 12 
 13 files changed, 80 insertions(+)
 create mode 100644 t/t4018/css-brace-in-col-1
 create mode 100644 t/t4018/css-common
 create mode 100644 t/t4018/css-long-selector-list
 create mode 100644 t/t4018/css-prop-sans-indent
 create mode 100644 t/t4018/css-rule
 create mode 100644 t/t4018/css-short-selector-list
 create mode 100644 t/t4034/css/expect
 create mode 100644 t/t4034/css/post
 create mode 100644 t/t4034/css/pre

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..81f60ad 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -525,6 +525,8 @@ patterns are available:
 
 - `csharp` suitable for source code in the C# language.
 
+- `css` suitable for source code in the CSS language.
+
 - `fortran` suitable for source code in the Fortran language.
 
 - `fountain` suitable for Fountain documents.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 67373dc..1795ffc 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -30,6 +30,7 @@ diffpatterns="
bibtex
cpp
csharp
+   css
fortran
fountain
html
diff --git a/t/t4018/css-brace-in-col-1 b/t/t4018/css-brace-in-col-1
new file mode 100644
index 000..7831577
--- /dev/null
+++ b/t/t4018/css-brace-in-col-1
@@ -0,0 +1,5 @@
+RIGHT label.control-label
+{
+margin-top: 10px!important;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-common b/t/t4018/css-common
new file mode 100644
index 000..84ed754
--- /dev/null
+++ b/t/t4018/css-common
@@ -0,0 +1,4 @@
+RIGHT label.control-label {
+margin-top: 10px!important;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-long-selector-list b/t/t4018/css-long-selector-list
new file mode 100644
index 000..7ccd25d
--- /dev/null
+++ b/t/t4018/css-long-selector-list
@@ -0,0 +1,6 @@
+p.header,
+label.control-label,
+div ul#RIGHT {
+margin-top: 10px!important;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-prop-sans-indent b/t/t4018/css-prop-sans-indent
new file mode 100644
index 000..a9e3c86
--- /dev/null
+++ b/t/t4018/css-prop-sans-indent
@@ -0,0 +1,5 @@
+RIGHT, label.control-label {
+margin-top: 10px!important;
+padding: 0;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-rule b/t/t4018/css-rule
new file mode 100644
index 000..84ed754
--- /dev/null
+++ b/t/t4018/css-rule
@@ -0,0 +1,4 @@
+RIGHT label.control-label {
+margin-top: 10px!important;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-short-selector-list b/t/t4018/css-short-selector-list
new file mode 100644
index 000..6a0bdee
--- /dev/null
+++ b/t/t4018/css-short-selector-list
@@ -0,0 +1,4 @@
+label.control, div ul#RIGHT {
+margin-top: 10px!important;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index f2f55fc..912df91 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -302,6 +302,7 @@ test_language_driver ada
 test_language_driver bibtex
 test_language_driver cpp
 test_language_driver csharp
+test_language_driver css
 test_language_driver fortran
 test_language_driver html
 test_language_driver java
diff --git a/t/t4034/css/expect 

[PATCH] t1308: do not get fooled by symbolic links to the source tree

2016-06-02 Thread Junio C Hamano
When your $PWD does not match $(/bin/pwd), e.g. you have your copy
of the git source tree in one place, point it with a symbolic link,
and then "cd" to that symbolic link before running 'make test', one
of the tests in t1308 expects that the per-user configuration was
reported to have been read from the true path (i.e. relative to the
target of such a symbolic link), but the test-config program reports
a path relative to $PWD (i.e. the symbolic link).

Instead, expect a path relative to $HOME (aka $TRASH_DIRECTORY), as
per-user configuration is read from $HOME/.gitconfig and the test
framework sets these shell variables up in such a way to avoid this
problem.

Signed-off-by: Junio C Hamano 
---

> Yeah, I think it is the same issue. I think the most accurate value
> there would probably be $HOME, though.

Thanks.

 t/t1308-config-set.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t1308-config-set.sh b/t/t1308-config-set.sh
index 065d5eb..cf716b4 100755
--- a/t/t1308-config-set.sh
+++ b/t/t1308-config-set.sh
@@ -237,7 +237,7 @@ test_expect_success 'iteration shows correct origins' '
key=foo.bar
value=from-home
origin=file
-   name=$(pwd)/.gitconfig
+   name=$HOME/.gitconfig
scope=global
 
key=foo.bar
-- 
2.9.0-rc1-228-gd00d833
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] pathspec: allow escaped query values

2016-06-02 Thread Stefan Beller
On Thu, Jun 2, 2016 at 2:54 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> In our own .gitattributes file we have attributes such as:
>>
>> *.[ch] whitespace=indent,trail,space
>>
>> When querying for attributes we want to be able to ask for the exact
>> value, i.e.
>>
>> git ls-files :(attr:whitespace=indent,trail,space)
>>
>> should work, but the commas are used in the attr magic to introduce
>> the next attr, ...
>> ...
>> So here is the "escaping only, but escaping done right" version.
>> (It goes on top of sb/pathspec-label)
>
> The phrase "should work" is probably a bit too strong (I'd have said
> "it would be nice if this worked"), as we do not have to even
> support comma for our immediately expected use cases.  Allowing it
> merely makes a casual test using our own .gitattributes easier.
>
>> +static size_t strcspn_escaped(const char *s, const char *reject)
>
> Perhaps s/reject/stop/?

I took the signature from the strcspn man page, and my version of the
man page has `reject` there, `stop` certainly works too

>
>> +{
>> + const char *i, *j;
>> +
>> + for (i = s; *i; i++) {
>> + /* skip escaped the character */
>> + if (i[0] == '\\' && i[1]) {
>> + i++;
>> + continue;
>> + }
>> + /* see if any of the chars matches the current character */
>> + for (j = reject; *j; j++)
>> + if (!*i || *i == *j)
>> + return i - s;
>
> I somehow doubt that *i can be NUL here.  In any case, this looks
> more like
>
> /* is *i is one of the stop codon? */
> if (strchr(stop, *i))
> break;

right, that seems easier.


>
>> + }
>> + return i - s;
>> +}
>
>> +static char *attr_value_unescape(const char *value)
>> +{
>> + char *i, *ret = xstrdup(value);
>> +
>> + for (i = ret; *i; i++) {
>> + if (i[0] == '\\') {
>> + if (!i[1])
>> + die(_("Escape character '\\' not allowed as "
>> +   "last character in attr value"));
>> +
>> + /* remove the backslash */
>> + memmove(i, i + 1, strlen(i));
>> + /* and ignore the character after that */
>> + i++;
>> + }
>> + }
>> + return ret;
>> +}
>> +
>
> Repeated memmove() and strlen() somehow bothers me.  Would there be
> a more efficient and straight-forward way to do this, perhaps along
> the lines of this instead?
>
> const char *src;
> char *dst, *ret;
>
> ret = xmalloc(strlen(value));
> for (src = value, dst = ret; *src; src++, dst++) {
> if (*src == '\\') {
> if (!src[1])
> die();
> src++;
> }
> if (*src && invalid_value_char(*src))
> die("cannot use '%c' for value matching", *src)
> *dst = *src;
> }
> *dst = '\0'
> return ret;
>
> Even though I earlier said "Now we have an unquote mechanism, we can
> open the floodgate by lifting the "no backslash in value" check, I
> now realize that we do want to keep some escape hatch for us to
> extend the quoting syntax even more later, so perhaps with something
> like this:
>
> static inline int invalid_value_char(const char ch)
> {
> if (isalnum(ch) || strchr(",-_", ch))
> return 0;
> return -1;
> }

Makes sense.

Later on we could have : or ; for an and/or of the values and
parens and such, so the invalid_value_char makes sense.

>
> Thanks.

Thanks,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pathspec: rename free_pathspec() to clear_pathspec()

2016-06-02 Thread Junio C Hamano
Jeff King  writes:

> I think diff_filespec_clear() would not be quite right. It is freeing
> only the allocated _data_, but leaving the other portions intact.

You are (as usual) more right than I am ;-)

Yes, the free_data() is designed to retain enough information about
the filespec so that the data can be re-read by the next person who
needs to access it after free_data() is called; i.e. it was a
measure to reduce the memory pressure by trading it off with I/O
cost.  It is wrong to name it "clear", which is to "clear the slate,
make it into pristine state" whose side effect is to release held
resources.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mark remote `gc --auto` error messages

2016-06-02 Thread Junio C Hamano
Jeff King  writes:

> So the greater question is not "should this output be marked" but
> "should auto-gc data go over the sideband so that all clients see it
> (and any server-side stderr does not)". And I think the answer is
> probably yes. And that fixes the "remote: " thing as a side effect.

Thanks for stating this a lot more clearly than I could, and I agree
that sending this to the other side regardless of the protocol is
the right thing.  I somehow doubt that server operators would check
Apache logs to decide when to do a proper GC, so I do not consider
it a true loss ;-)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] pathspec: allow escaped query values

2016-06-02 Thread Junio C Hamano
Stefan Beller  writes:

> In our own .gitattributes file we have attributes such as:
>
> *.[ch] whitespace=indent,trail,space
>
> When querying for attributes we want to be able to ask for the exact
> value, i.e.
>
> git ls-files :(attr:whitespace=indent,trail,space)
>
> should work, but the commas are used in the attr magic to introduce
> the next attr, ...
> ...
> So here is the "escaping only, but escaping done right" version.
> (It goes on top of sb/pathspec-label)

The phrase "should work" is probably a bit too strong (I'd have said
"it would be nice if this worked"), as we do not have to even
support comma for our immediately expected use cases.  Allowing it
merely makes a casual test using our own .gitattributes easier.

> +static size_t strcspn_escaped(const char *s, const char *reject)

Perhaps s/reject/stop/?

> +{
> + const char *i, *j;
> +
> + for (i = s; *i; i++) {
> + /* skip escaped the character */
> + if (i[0] == '\\' && i[1]) {
> + i++;
> + continue;
> + }
> + /* see if any of the chars matches the current character */
> + for (j = reject; *j; j++)
> + if (!*i || *i == *j)
> + return i - s;

I somehow doubt that *i can be NUL here.  In any case, this looks
more like

/* is *i is one of the stop codon? */
if (strchr(stop, *i))
break;

> + }
> + return i - s;
> +}

> +static char *attr_value_unescape(const char *value)
> +{
> + char *i, *ret = xstrdup(value);
> +
> + for (i = ret; *i; i++) {
> + if (i[0] == '\\') {
> + if (!i[1])
> + die(_("Escape character '\\' not allowed as "
> +   "last character in attr value"));
> +
> + /* remove the backslash */
> + memmove(i, i + 1, strlen(i));
> + /* and ignore the character after that */
> + i++;
> + }
> + }
> + return ret;
> +}
> +

Repeated memmove() and strlen() somehow bothers me.  Would there be
a more efficient and straight-forward way to do this, perhaps along
the lines of this instead?

const char *src;
char *dst, *ret;

ret = xmalloc(strlen(value));
for (src = value, dst = ret; *src; src++, dst++) {
if (*src == '\\') {
if (!src[1])
die();
src++;
}
if (*src && invalid_value_char(*src))
die("cannot use '%c' for value matching", *src)
*dst = *src;
}
*dst = '\0'
return ret;

Even though I earlier said "Now we have an unquote mechanism, we can
open the floodgate by lifting the "no backslash in value" check, I
now realize that we do want to keep some escape hatch for us to
extend the quoting syntax even more later, so perhaps with something
like this:

static inline int invalid_value_char(const char ch)
{
if (isalnum(ch) || strchr(",-_", ch))
return 0;
return -1;
}

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mark remote `gc --auto` error messages

2016-06-02 Thread Jeff King
On Thu, Jun 02, 2016 at 01:14:02PM -0700, Junio C Hamano wrote:

> > What exactly are you referring to (you only quoted the introduction)?
> > Do you think we should fix the git-gc issue but keep the general
> > behavior of printing messages unaltered? Do you think it would be
> > worthwhile to make server messages distinguishable in general?
> 
> The latter, which I think was what your implementation was attempting to do
> if I read it correctly.

And btw, I don't think this patch fixes the general case. E.g., if
receive-pack hits any of its die("BUG") lines, they will not be
prefixed. Most clients wouldn't see them, but ssh ones would.

To fix that you'd have to do a whole async process wrapping
`receive-pack` that just reads its stdout and stderr and muxes it back
over the sideband. But I can think of two roadblocks there:

  - I think the original design of receive-pack was _not_ to share all
of stderr with the user, because it might contain secret-ish
server-side things. That's why we have rp_error() which copies to
the sideband.

I don't know how useful that is in practice. We copy the stderr
wholesale from sub-processes like index-pack, so things like file
paths are likely to get leaked there.

  - the implementation is a bit tricky, because the die() will take
down the mux thread, too.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mark remote `gc --auto` error messages

2016-06-02 Thread Jeff King
On Thu, Jun 02, 2016 at 01:14:02PM -0700, Junio C Hamano wrote:

> On Thu, Jun 2, 2016 at 1:06 PM, Lukas Fleischer  wrote:
> > On Thu, 02 Jun 2016 at 21:33:33, Junio C Hamano wrote:
> >> Lukas Fleischer  writes:
> >>
> >> > When running `git push`, it might occur that error messages are
> >> > transferred from the server to the client. While most messages (those
> >> > explicitly sent on sideband 2) are prefixed with "remote:", it seems
> >> > that error messages printed during the automatic householding performed
> >> > by git-gc(1) are displayed without any additional decoration. Thus, such
> >> > messages can easily be misinterpreted as git-gc failing locally, see [1]
> >> > for an actual example of where that happened.
> >>
> >> Sounds like a sensible goal to me.
> >
> > What exactly are you referring to (you only quoted the introduction)?
> > Do you think we should fix the git-gc issue but keep the general
> > behavior of printing messages unaltered? Do you think it would be
> > worthwhile to make server messages distinguishable in general?
> 
> The latter, which I think was what your implementation was attempting to do
> if I read it correctly.

I think the implementation is doing much more, but it is probably a good
thing.

Right now we do not send auto-gc output over the sideband, and its
stderr goes to receive-pack's stderr. But that is a different place for
different protocols. For git-over-https, it is probably apache's error
log, or /dev/null if the server admin configured it. For ssh, it may be
back over the ssh stderr channel, or it may go to a log or nowhere if
the server admin intercepts receive-pack and redirects it.

So the greater question is not "should this output be marked" but
"should auto-gc data go over the sideband so that all clients see it
(and any server-side stderr does not)". And I think the answer is
probably yes. And that fixes the "remote: " thing as a side effect.

If it were no, then this is not the right solution, and the solution is
to swap out copy_to_sideband() for something that copies to stderr with
"remote: " prepended, or something.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (May 2016, #09; Tue, 31) t1308 broken

2016-06-02 Thread Jeff King
On Thu, Jun 02, 2016 at 02:31:50PM -0700, Junio C Hamano wrote:

> Perhaps like this, taking hint from the log message of 6eafa6d0
> (submodules: don't stumble over symbolic links when cloning
> recursively, 2012-07-12)?
> 
>  t/t1308-config-set.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/t/t1308-config-set.sh b/t/t1308-config-set.sh
> index 065d5eb..1ba9ecb 100755
> --- a/t/t1308-config-set.sh
> +++ b/t/t1308-config-set.sh
> @@ -237,7 +237,7 @@ test_expect_success 'iteration shows correct origins' '
>   key=foo.bar
>   value=from-home
>   origin=file
> - name=$(pwd)/.gitconfig
> + name=$TRASH_DIRECTORY/.gitconfig
>   scope=global
>  
>   key=foo.bar

Yeah, I think it is the same issue. I think the most accurate value
there would probably be $HOME, though.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (May 2016, #09; Tue, 31) t1308 broken

2016-06-02 Thread Junio C Hamano
Junio C Hamano  writes:

> Torsten Bögershausen  writes:
>
>> It seams as  ./t1308-config-set.sh is broken,
>> when the the directory is a soft link:
>>
>> -name=/home/tb/NoBackup/projects/git/git.pu/t/trash
>> directory.t1308-config-set/.gitconfig
>> +name=/home/tb/projects/git/git.pu/t/trash 
>> directory.t1308-config-set/.gitconfig
>>  scope=global
>
> It does seem that way.  Somebody is affected by $PWD when we should
> be consistently using the physical / real path.
>
>>
>>  key=foo.bar
>> not ok 28 - iteration shows correct origins
>>
>> I havent't digged further, too many conflicts in the config code, may be
>> somebody knows it directly ?

Perhaps like this, taking hint from the log message of 6eafa6d0
(submodules: don't stumble over symbolic links when cloning
recursively, 2012-07-12)?

 t/t1308-config-set.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t1308-config-set.sh b/t/t1308-config-set.sh
index 065d5eb..1ba9ecb 100755
--- a/t/t1308-config-set.sh
+++ b/t/t1308-config-set.sh
@@ -237,7 +237,7 @@ test_expect_success 'iteration shows correct origins' '
key=foo.bar
value=from-home
origin=file
-   name=$(pwd)/.gitconfig
+   name=$TRASH_DIRECTORY/.gitconfig
scope=global
 
key=foo.bar
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2] pathspec: allow escaped query values

2016-06-02 Thread Stefan Beller
In our own .gitattributes file we have attributes such as:

*.[ch] whitespace=indent,trail,space

When querying for attributes we want to be able to ask for the exact
value, i.e.

git ls-files :(attr:whitespace=indent,trail,space)

should work, but the commas are used in the attr magic to introduce
the next attr, such that this query currently fails with

fatal: Invalid pathspec magic 'trail' in ':(attr:whitespace=indent,trail,space)'

This change allows escaping characters by a backslash, such that the query

git ls-files :(attr:whitespace=indent\,trail\,space)

will match all path that have the value "indent,trail,space" for the
whitespace attribute. To accomplish this, we need to modify two places.
First `eat_long_magic` needs to not stop early upon seeing a comma or
closing paren that is escaped. As a second step we need to remove any
escaping from the attr value.

Signed-off-by: Stefan Beller 
---

So here is the "escaping only, but escaping done right" version.
(It goes on top of sb/pathspec-label)

Thanks,
Stefan

 pathspec.c  | 44 +
 t/t6134-pathspec-with-labels.sh | 23 +
 2 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/pathspec.c b/pathspec.c
index 326863a..45bd775 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -89,6 +89,43 @@ static void prefix_short_magic(struct strbuf *sb, int 
prefixlen,
strbuf_addf(sb, ",prefix:%d)", prefixlen);
 }
 
+static size_t strcspn_escaped(const char *s, const char *reject)
+{
+   const char *i, *j;
+
+   for (i = s; *i; i++) {
+   /* skip escaped the character */
+   if (i[0] == '\\' && i[1]) {
+   i++;
+   continue;
+   }
+   /* see if any of the chars matches the current character */
+   for (j = reject; *j; j++)
+   if (!*i || *i == *j)
+   return i - s;
+   }
+   return i - s;
+}
+
+static char *attr_value_unescape(const char *value)
+{
+   char *i, *ret = xstrdup(value);
+
+   for (i = ret; *i; i++) {
+   if (i[0] == '\\') {
+   if (!i[1])
+   die(_("Escape character '\\' not allowed as "
+ "last character in attr value"));
+
+   /* remove the backslash */
+   memmove(i, i + 1, strlen(i));
+   /* and ignore the character after that */
+   i++;
+   }
+   }
+   return ret;
+}
+
 static void parse_pathspec_attr_match(struct pathspec_item *item, const char 
*value)
 {
struct string_list_item *si;
@@ -131,10 +168,9 @@ static void parse_pathspec_attr_match(struct pathspec_item 
*item, const char *va
if (attr[attr_len] != '=')
am->match_mode = MATCH_SET;
else {
+   const char *v = [attr_len + 1];
am->match_mode = MATCH_VALUE;
-   am->value = xstrdup([attr_len + 1]);
-   if (strchr(am->value, '\\'))
-   die(_("attr spec values must not 
contain backslashes"));
+   am->value = attr_value_unescape(v);
}
break;
}
@@ -166,7 +202,7 @@ static void eat_long_magic(struct pathspec_item *item, 
const char *elt,
for (copyfrom = elt + 2;
 *copyfrom && *copyfrom != ')';
 copyfrom = nextat) {
-   size_t len = strcspn(copyfrom, ",)");
+   size_t len = strcspn_escaped(copyfrom, ",)");
if (copyfrom[len] == ',')
nextat = copyfrom + len + 1;
else
diff --git a/t/t6134-pathspec-with-labels.sh b/t/t6134-pathspec-with-labels.sh
index a5c9632..cbea858 100755
--- a/t/t6134-pathspec-with-labels.sh
+++ b/t/t6134-pathspec-with-labels.sh
@@ -163,4 +163,27 @@ test_expect_success 'abort on asking for wrong magic' '
test_must_fail git ls-files . ":(attr:!label=foo)"
 '
 
+test_expect_success 'check attribute list' '
+   cat <<-EOF >>.gitattributes &&
+   * whitespace=indent,trail,space
+   EOF
+   cat .gitattributes &&
+   git ls-files ":(attr:whitespace=indent\,trail\,space)" >actual &&
+   git ls-files >expect &&
+   test_cmp expect actual
+'
+test_expect_success 'wrong escaping caught' '
+   # Pass one backslash to git to fail with a missing closing paren
+   test_must_fail git ls-files ":(attr:marked-with-backslash=\\)" 2>actual 
&&
+   test_i18ngrep Missing actual
+'
+test_expect_success 'check escaped backslash' '
+   cat <<-EOF >>.gitattributes &&
+   /sub/* marked-with-backslash=\\
+   EOF
+   git 

Re: [PATCH] pathspec: rename free_pathspec() to clear_pathspec()

2016-06-02 Thread Jeff King
On Thu, Jun 02, 2016 at 02:18:47PM -0700, Junio C Hamano wrote:

> The function takes a pointer to a pathspec structure, and releases
> the resources held by it, but does not free() the structure itself.
> Such a function should be called "clear", not "free".

Hmm, makes sense, though...

>  * This is just something I noticed.  Among the hits in
> 
> $ git grep free_ \*.h
> 
>I think free_notes() is also a candidate for such renaming, but
>because we are not actively working on that subsystem, we may
>want to leave that dog sleeping to avoid unnecessary code churn.
>The same for diff_free_filespec_data(), for which a better name
>would have been diff_filespec_clear().

I think diff_filespec_clear() would not be quite right. It is freeing
only the allocated _data_, but leaving the other portions intact.
Generally our _clear() functions reset the object back to an initial
state, from which it can be reused. I don't see that as a big problem
because there is an other object for the verb "free" here: "data". We
are just freeing its data, but the rest of the object remains intact and
we may fill in the data again later.

But I think pathspec is in similar boat; it has not been cleared back to
its initial state. But it is in a much _worse_ state than the filespec,
which you can continue to use. It is in a totally broken state where
"nr" does not correspond to the actual number of items, the has_wildcard
flag is bogus, etc.

So I think it would be OK to move it to "clear", but we should probably
also zero the whole thing, too.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (May 2016, #09; Tue, 31) t1308 broken

2016-06-02 Thread Junio C Hamano
Torsten Bögershausen  writes:

> It seams as  ./t1308-config-set.sh is broken,
> when the the directory is a soft link:
>
> -name=/home/tb/NoBackup/projects/git/git.pu/t/trash
> directory.t1308-config-set/.gitconfig
> +name=/home/tb/projects/git/git.pu/t/trash 
> directory.t1308-config-set/.gitconfig
>  scope=global

It does seem that way.  Somebody is affected by $PWD when we should
be consistently using the physical / real path.

>
>  key=foo.bar
> not ok 28 - iteration shows correct origins
>
> I havent't digged further, too many conflicts in the config code, may be
> somebody knows it directly ?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] pathspec: rename free_pathspec() to clear_pathspec()

2016-06-02 Thread Junio C Hamano
The function takes a pointer to a pathspec structure, and releases
the resources held by it, but does not free() the structure itself.
Such a function should be called "clear", not "free".

Signed-off-by: Junio C Hamano 
---

 * This is just something I noticed.  Among the hits in

$ git grep free_ \*.h

   I think free_notes() is also a candidate for such renaming, but
   because we are not actively working on that subsystem, we may
   want to leave that dog sleeping to avoid unnecessary code churn.
   The same for diff_free_filespec_data(), for which a better name
   would have been diff_filespec_clear().

 archive.c  | 2 +-
 builtin/blame.c| 6 +++---
 builtin/reset.c| 2 +-
 builtin/update-index.c | 2 +-
 combine-diff.c | 2 +-
 notes-merge.c  | 4 ++--
 pathspec.c | 2 +-
 pathspec.h | 4 ++--
 revision.c | 2 +-
 tree-diff.c| 4 ++--
 10 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/archive.c b/archive.c
index 5d735ae..42df974 100644
--- a/archive.c
+++ b/archive.c
@@ -322,7 +322,7 @@ static int path_exists(struct tree *tree, const char *path)
pathspec.recursive = 1;
ret = read_tree_recursive(tree, "", 0, 0, ,
  reject_entry, );
-   free_pathspec();
+   clear_pathspec();
return ret != 0;
 }
 
diff --git a/builtin/blame.c b/builtin/blame.c
index 21f42b0..759d84a 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -609,7 +609,7 @@ static struct origin *find_origin(struct scoreboard *sb,
}
}
diff_flush(_opts);
-   free_pathspec(_opts.pathspec);
+   clear_pathspec(_opts.pathspec);
return porigin;
 }
 
@@ -651,7 +651,7 @@ static struct origin *find_rename(struct scoreboard *sb,
}
}
diff_flush(_opts);
-   free_pathspec(_opts.pathspec);
+   clear_pathspec(_opts.pathspec);
return porigin;
 }
 
@@ -1343,7 +1343,7 @@ static void find_copy_in_parent(struct scoreboard *sb,
} while (unblamed);
target->suspects = reverse_blame(leftover, NULL);
diff_flush(_opts);
-   free_pathspec(_opts.pathspec);
+   clear_pathspec(_opts.pathspec);
 }
 
 /*
diff --git a/builtin/reset.c b/builtin/reset.c
index 092c3a5..acd6278 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -158,7 +158,7 @@ static int read_from_tree(const struct pathspec *pathspec,
return 1;
diffcore_std();
diff_flush();
-   free_pathspec();
+   clear_pathspec();
 
return 0;
 }
diff --git a/builtin/update-index.c b/builtin/update-index.c
index b8b8522..6cdfd5f 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -759,7 +759,7 @@ static int do_reupdate(int ac, const char **av,
if (save_nr != active_nr)
goto redo;
}
-   free_pathspec();
+   clear_pathspec();
return 0;
 }
 
diff --git a/combine-diff.c b/combine-diff.c
index 8f2313d..5920df8 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1525,7 +1525,7 @@ void diff_tree_combined(const unsigned char *sha1,
free(tmp);
}
 
-   free_pathspec();
+   clear_pathspec();
 }
 
 void diff_tree_combined_merge(const struct commit *commit, int dense,
diff --git a/notes-merge.c b/notes-merge.c
index 34bfac0..b7814c9 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -170,7 +170,7 @@ static struct notes_merge_pair *diff_tree_remote(struct 
notes_merge_options *o,
   sha1_to_hex(mp->remote));
}
diff_flush();
-   free_pathspec();
+   clear_pathspec();
 
*num_changes = len;
return changes;
@@ -256,7 +256,7 @@ static void diff_tree_local(struct notes_merge_options *o,
   sha1_to_hex(mp->local));
}
diff_flush();
-   free_pathspec();
+   clear_pathspec();
 }
 
 static void check_notes_merge_worktree(struct notes_merge_options *o)
diff --git a/pathspec.c b/pathspec.c
index c9e9b6c..24e0dd5 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -489,7 +489,7 @@ void copy_pathspec(struct pathspec *dst, const struct 
pathspec *src)
   sizeof(struct pathspec_item) * dst->nr);
 }
 
-void free_pathspec(struct pathspec *pathspec)
+void clear_pathspec(struct pathspec *pathspec)
 {
free(pathspec->items);
pathspec->items = NULL;
diff --git a/pathspec.h b/pathspec.h
index 0c11262..4a80f6f 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -19,7 +19,7 @@
 #define PATHSPEC_ONESTAR 1 /* the pathspec pattern satisfies GFNM_ONESTAR 
*/
 
 struct pathspec {
-   const char **_raw; /* get_pathspec() result, not freed by 
free_pathspec() */
+   const char **_raw; /* get_pathspec() result, not freed by 
clear_pathspec() */
int nr;
unsigned int has_wildcard:1;
unsigned int recursive:1;
@@ -74,7 +74,7 @@ extern void parse_pathspec(struct 

Re: [PATCH/RFC 0/4] Add option to enable filters in git-svn

2016-06-02 Thread Eric Wong
Matteo Bertini  wrote:
> Il 2016-05-31 20:12 Eric Wong ha scritto:
> >Matteo Bertini  wrote:
> >>Sorry to all, but I missed a Checksum mismatch error, I'll have a
> >>look and submit an update.
> 
> I've had a look, the problem is a missing smudge filter.
> Unluckily the small svn revision range i was using was never updating
> a filtered file.
> 
> The code actually uses `cat-file --batch` to get the blobs,
> but the stored blob is not what SVN::TxDelta::apply need.
> What I need is the smudged blob, but cat-file does not supports it.
> 
> I can modify cat-file with a new option to call
> `convert_to_working_tree`,
> and have the filters applied, for example --use-filters, that expects an
> extra field, like $sha\t$path, in the stdin stream.
> 
> I don't like a lot putting an high level feature in a low level command,
> but --textconv seems very similar to this.
> 
> Opinions or other suggestions?

In the absense of other opinions or suggestions, I suggest you
try it and see what others think :)

I'm not that experienced at the C code of this project,
but I can try to follow along as best as I can.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mark remote `gc --auto` error messages

2016-06-02 Thread Junio C Hamano
On Thu, Jun 2, 2016 at 1:06 PM, Lukas Fleischer  wrote:
> On Thu, 02 Jun 2016 at 21:33:33, Junio C Hamano wrote:
>> Lukas Fleischer  writes:
>>
>> > When running `git push`, it might occur that error messages are
>> > transferred from the server to the client. While most messages (those
>> > explicitly sent on sideband 2) are prefixed with "remote:", it seems
>> > that error messages printed during the automatic householding performed
>> > by git-gc(1) are displayed without any additional decoration. Thus, such
>> > messages can easily be misinterpreted as git-gc failing locally, see [1]
>> > for an actual example of where that happened.
>>
>> Sounds like a sensible goal to me.
>
> What exactly are you referring to (you only quoted the introduction)?
> Do you think we should fix the git-gc issue but keep the general
> behavior of printing messages unaltered? Do you think it would be
> worthwhile to make server messages distinguishable in general?

The latter, which I think was what your implementation was attempting to do
if I read it correctly.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mark remote `gc --auto` error messages

2016-06-02 Thread Lukas Fleischer
On Thu, 02 Jun 2016 at 21:33:33, Junio C Hamano wrote:
> Lukas Fleischer  writes:
> 
> > When running `git push`, it might occur that error messages are
> > transferred from the server to the client. While most messages (those
> > explicitly sent on sideband 2) are prefixed with "remote:", it seems
> > that error messages printed during the automatic householding performed
> > by git-gc(1) are displayed without any additional decoration. Thus, such
> > messages can easily be misinterpreted as git-gc failing locally, see [1]
> > for an actual example of where that happened.
> 
> Sounds like a sensible goal to me.

What exactly are you referring to (you only quoted the introduction)?
Do you think we should fix the git-gc issue but keep the general
behavior of printing messages unaltered? Do you think it would be
worthwhile to make server messages distinguishable in general?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP-PATCH 1/2] send-email: create email parser subroutine

2016-06-02 Thread Eric Wong
Samuel GROOT  wrote:
> On 05/29/2016 01:33 AM, Eric Wong wrote:
> >Matthieu Moy  wrote:
> >>Samuel GROOT  writes:
> >>
> >>>Parsing and processing in send-email is done in the same loop.
> >>>
> >>>To make the code more maintainable, we create two subroutines:
> >>>- `parse_email` to separate header and body
> >>>- `parse_header` to retrieve data from header
> >>
> >>These routines are not specific to git send-email, nor to Git.
> >>
> >>Does it make sense to use an external library, like
> >>http://search.cpan.org/~rjbs/Email-Simple-2.210/lib/Email/Simple.pm ,
> >>either by depending on it, or by copying it in Git's source tree ?
> >
> >That might be overkill and increase installation/maintenance
> >burden.  Bundling it would probably be problematic to distros,
> >too.
> 
> We have 5 solutions here:
> 
>   1. Make a new dependence to Email::Simple.
> 
>   2. Bundle Email::Simple in Git's source tree.
> 
>   3. Use Email::Simple if installed, else use our library.
> 
>   4. Making our own email parser library.
> 
>   5. Duplicate parser loop as we did for our patch to implement
>  `--quote-email` as proposed in $gmane/295772 .
> 
> Obviously, option (5) is the easiest one for us, but it leaves refactoring
> for later, and option (1) is also easier but adds a new dependence which is
> not that good.

I would go with (5) for now and leave (4) for later (which
might just be moving the function to a new file).

> Since our project ends next week, we might not have enough time to finish
> developing a custom parser API so (4) is not a viable option for now but
> could be done in the future.
> 
> We could consider bundling Email::Simple as the best option, as it's
> developed since 2003 and might be safer to use than anything we could write
> in several weeks.

In an ideal world, (1) would be nice.  But (IMHO) git-send-email
should remain installable on non-ideal systems which do not
provide Email::Simple as a package.

(2) would probably be non-ideal for distro maintainers
(+Cc: Jonathan for opinions), and (3) is the most complex
and difficult-to-support.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] pathspec: allow escaped query values

2016-06-02 Thread Ramsay Jones


On 02/06/16 20:46, Junio C Hamano wrote:
> Ramsay Jones  writes:
> 
>> I think Junio wants to go with just " quoting (see other thread).
> 
> No.  I meant just \ quoting.

Yes, sorry, I only just read your last email on the other thread.

ATB,
Ramsay Jones


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] pathspec: allow escaped query values

2016-06-02 Thread Ramsay Jones


On 02/06/16 20:29, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
>> On Thu, Jun 2, 2016 at 11:42 AM, Ramsay Jones
>>  wrote:

 That would be workable, I would think.  Before attr:VAR=VAL
 extention, supported pathspec  were only single lowercase-ascii
 alphabet tokens, so nobody would have used " as a part of magic.  So
 quting with double-quote pair would work.
>>>
>>> I was thinking about both ' and ", so that you could do:
>>
>> Yes, I understood your suggestion as such. Quoting like shells would work
>> without breaking backward compatibility for the same reason quoting with
>> double-quote and backslash only without supporting single-quotes would
>> work.
> 
> Having said that, "It would work" does not have to mean "Hence we
> must do it that way" at all.  Quoting character pairs make the
> parsing and unquoting significantly more complex.
> 
> As you said, not many people used attributes and pathspec magic, and
> I do not think those who want to use the new "further limits with
> attributes" magic, envisioned primarily to be those who want to give
> classes to submodules, have compelling reason to name their classes
> with anything but lowercase-ascii-alphabet tokens.  So for a practical
> purposes, I'd rather see Stefan
> 
>  * just implement backquote-blindly-passes-the-next-byte and nothing
>more elaborate; and
> 
>  * forbid [^-a-z0-9,_] from being used in the value part in the
>attr:VAR=VAL magic.
> 
> at least for now, and concentrate more on the other more important
> parts of the submodule enhancement topic.

OK, that reasonable. I didn't mean to derail Stefan's development!

ATB,
Ramsay Jones

> 
> That way, those who will start using attr:VAR=VAL magic will stick
> themselves to lowercase-ascii-alphabet tokens for now (simply
> because an attribut that has the forbidden characters in its value
> would not be usable with the magic), and we can later extend the
> magic syntax parser in a backward compatible way to allow paired
> quotes and other "more convenient" syntax.
> 
> 
> [Footnote]
> 
> *1* The reason I prefer to keep the initially allowed value
> characters narrow is because I envision that something like
> 
>   :(attr:VAR=())
> 
> may become necessary, and do not want to promise users that open or
> close parentheses will forever be taken literally.
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] pathspec: allow escaped query values

2016-06-02 Thread Junio C Hamano
Ramsay Jones  writes:

> I think Junio wants to go with just " quoting (see other thread).

No.  I meant just \ quoting.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] pathspec: allow escaped query values

2016-06-02 Thread Ramsay Jones


On 02/06/16 20:04, Stefan Beller wrote:
> On Thu, Jun 2, 2016 at 11:42 AM, Ramsay Jones
>  wrote:
>>
>>
>> On 02/06/16 17:10, Junio C Hamano wrote:
>>> Ramsay Jones  writes:
>>>
 So, at risk of annoying you, let me continue in my ignorance a little
 longer and ask: even if you have to protect all of this 'magic' from
 the shell with '/" quoting, could you not use (nested) quotes to
 protect the  part of an ? For example:

 git ls-files ':(attr:whitespace="indent,trail,space",icase)'
>>>
>>> That would be workable, I would think.  Before attr:VAR=VAL
>>> extention, supported pathspec  were only single lowercase-ascii
>>> alphabet tokens, so nobody would have used " as a part of magic.  So
>>> quting with double-quote pair would work.
>>
>> I was thinking about both ' and ", so that you could do:
>>
>>$ ./args ':(attr:whitespace="indent,trail,space",icase)'
>> 1::(attr:whitespace="indent,trail,space",icase)
>>
>>$ ./args ":(attr:whitespace='indent,trail,space',icase)"
>> 1::(attr:whitespace='indent,trail,space',icase)
>>
>>$ p=':(attr:whitespace="indent,trail,space",icase)'
>>$ ./args "$p"
>> 1::(attr:whitespace="indent,trail,space",icase)
>>
>>$ p=":(attr:whitespace=\"indent,trail,space\",icase)"
>>$ ./args "$p"
>> 1::(attr:whitespace="indent,trail,space",icase)
>>
>> but limiting it to " would probably be OK too.
>>
>>> You'd need to come up with a way to quote a double quote that
>>> happens to be a part of VAL somehow, though.
>>
>> Yes I was assuming \ quoting as well - I just want to reduce the
>> need for such quoting (especially on windows).
>>
>>>  I think attribute
>>> value is limited to a string with non-whitespace letters; even
>>> though the built-in attributes that have defined meaning to the Git
>>> itself may not use values with letters beyond [-a-zA-Z0-9,], end
>>> users and projects can add arbitrary values within the allowed
>>> syntax, so it is not unconceivable that some project may have a
>>> custom attribute that lists forbidden characters in a path with
>>>
>>>   === .gitattributes ===
>>> *.txt forbidden=`"
> 
> We restrict the 'forbidden' to follow [-a-zA-Z0-9,], so we could enforce
> it for the values, too.
> 
> 
>>
>>$ ./args ":(attr:*.txt forbidden=\'\\\",icase)"
>> 1::(attr:*.txt forbidden=\'\",icase)
> 
> You should lose the *.txt in there, but put it at the back

Ah, yes, just shows my ignorance of the attribute system!

> 
>>  $ ./args ":(attr:forbidden=\'\\\",icase)*.txt"
> 
>>
>>$ ./args ':(attr:*.txt forbidden=\'\''\",icase)'
>> 1::(attr:*.txt forbidden=\'\",icase)
> 
> I see, so quoting by " or ' is preferred. What if the user
> wants to do a

I think Junio wants to go with just " quoting (see other thread).

> forbidden=',"
> 
> so we have to escape those in there, such as
> 
> ./args ':(attr:"forbidden=\',\"")'

No, that won't work (" is not terminated), try this:

   $ ./args ':(attr:"forbidden='\'',\"")'
1::(attr:"forbidden=',\"")
   $ 

   $ ./args ":(attr:\"forbidden=',\\\"\")"
1::(attr:"forbidden=',\"")
   $ 

[half of the problem is just getting past the shell]

ATB,
Ramsay Jones

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mark remote `gc --auto` error messages

2016-06-02 Thread Junio C Hamano
Lukas Fleischer  writes:

> When running `git push`, it might occur that error messages are
> transferred from the server to the client. While most messages (those
> explicitly sent on sideband 2) are prefixed with "remote:", it seems
> that error messages printed during the automatic householding performed
> by git-gc(1) are displayed without any additional decoration. Thus, such
> messages can easily be misinterpreted as git-gc failing locally, see [1]
> for an actual example of where that happened.

Sounds like a sensible goal to me.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] pathspec: allow escaped query values

2016-06-02 Thread Junio C Hamano
Junio C Hamano  writes:

> On Thu, Jun 2, 2016 at 11:42 AM, Ramsay Jones
>  wrote:
>>>
>>> That would be workable, I would think.  Before attr:VAR=VAL
>>> extention, supported pathspec  were only single lowercase-ascii
>>> alphabet tokens, so nobody would have used " as a part of magic.  So
>>> quting with double-quote pair would work.
>>
>> I was thinking about both ' and ", so that you could do:
>
> Yes, I understood your suggestion as such. Quoting like shells would work
> without breaking backward compatibility for the same reason quoting with
> double-quote and backslash only without supporting single-quotes would
> work.

Having said that, "It would work" does not have to mean "Hence we
must do it that way" at all.  Quoting character pairs make the
parsing and unquoting significantly more complex.

As you said, not many people used attributes and pathspec magic, and
I do not think those who want to use the new "further limits with
attributes" magic, envisioned primarily to be those who want to give
classes to submodules, have compelling reason to name their classes
with anything but lowercase-ascii-alphabet tokens.  So for a practical
purposes, I'd rather see Stefan

 * just implement backquote-blindly-passes-the-next-byte and nothing
   more elaborate; and

 * forbid [^-a-z0-9,_] from being used in the value part in the
   attr:VAR=VAL magic.

at least for now, and concentrate more on the other more important
parts of the submodule enhancement topic.

That way, those who will start using attr:VAR=VAL magic will stick
themselves to lowercase-ascii-alphabet tokens for now (simply
because an attribut that has the forbidden characters in its value
would not be usable with the magic), and we can later extend the
magic syntax parser in a backward compatible way to allow paired
quotes and other "more convenient" syntax.


[Footnote]

*1* The reason I prefer to keep the initially allowed value
characters narrow is because I envision that something like

:(attr:VAR=())

may become necessary, and do not want to promise users that open or
close parentheses will forever be taken literally.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Mark remote `gc --auto` error messages

2016-06-02 Thread Lukas Fleischer
When running `git push`, it might occur that error messages are
transferred from the server to the client. While most messages (those
explicitly sent on sideband 2) are prefixed with "remote:", it seems
that error messages printed during the automatic householding performed
by git-gc(1) are displayed without any additional decoration. Thus, such
messages can easily be misinterpreted as git-gc failing locally, see [1]
for an actual example of where that happened.

Do we want anything like the following patch (completely untested)?

-- 8< --
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index a744437..15c323a 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1775,9 +1775,20 @@ int cmd_receive_pack(int argc, const char **argv, const 
char *prefix)
const char *argv_gc_auto[] = {
"gc", "--auto", "--quiet", NULL,
};
-   int opt = RUN_GIT_CMD | RUN_COMMAND_STDOUT_TO_STDERR;
+   struct child_process proc = CHILD_PROCESS_INIT;
+
+   proc.no_stdin = 1;
+   proc.stdout_to_stderr = 1;
+   proc.err = use_sideband ? -1 : 0;
+   proc.git_cmd = 1;
+   proc.argv = argv_gc_auto;
+
close_all_packs();
-   run_command_v_opt(argv_gc_auto, opt);
+   if (!start_command()) {
+   if (use_sideband)
+   copy_to_sideband(proc.err, -1, NULL);
+   finish_command();
+   }
}
if (auto_update_server_info)
update_server_info(0);
-- 8< --

More generally, do we care about making *all* "remote" strings easily
distinguishable from "local" strings? Even though it is unlikely to use
this for an actual attack, it seems that a malicious server can
currently trick a user into performing an action by printing a message
that looks like something coming from "local" Git. Prefixing every
server message by "remote:" might look a bit ugly but maybe we can
simply use a different color instead and fall back to the prefix on
terminals without color support. Opinions?

[1] https://lists.archlinux.org/pipermail/aur-general/2016-June/032340.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] pathspec: allow escaped query values

2016-06-02 Thread Stefan Beller
On Thu, Jun 2, 2016 at 11:42 AM, Ramsay Jones
 wrote:
>
>
> On 02/06/16 17:10, Junio C Hamano wrote:
>> Ramsay Jones  writes:
>>
>>> So, at risk of annoying you, let me continue in my ignorance a little
>>> longer and ask: even if you have to protect all of this 'magic' from
>>> the shell with '/" quoting, could you not use (nested) quotes to
>>> protect the  part of an ? For example:
>>>
>>> git ls-files ':(attr:whitespace="indent,trail,space",icase)'
>>
>> That would be workable, I would think.  Before attr:VAR=VAL
>> extention, supported pathspec  were only single lowercase-ascii
>> alphabet tokens, so nobody would have used " as a part of magic.  So
>> quting with double-quote pair would work.
>
> I was thinking about both ' and ", so that you could do:
>
>$ ./args ':(attr:whitespace="indent,trail,space",icase)'
> 1::(attr:whitespace="indent,trail,space",icase)
>
>$ ./args ":(attr:whitespace='indent,trail,space',icase)"
> 1::(attr:whitespace='indent,trail,space',icase)
>
>$ p=':(attr:whitespace="indent,trail,space",icase)'
>$ ./args "$p"
> 1::(attr:whitespace="indent,trail,space",icase)
>
>$ p=":(attr:whitespace=\"indent,trail,space\",icase)"
>$ ./args "$p"
> 1::(attr:whitespace="indent,trail,space",icase)
>
> but limiting it to " would probably be OK too.
>
>> You'd need to come up with a way to quote a double quote that
>> happens to be a part of VAL somehow, though.
>
> Yes I was assuming \ quoting as well - I just want to reduce the
> need for such quoting (especially on windows).
>
>>  I think attribute
>> value is limited to a string with non-whitespace letters; even
>> though the built-in attributes that have defined meaning to the Git
>> itself may not use values with letters beyond [-a-zA-Z0-9,], end
>> users and projects can add arbitrary values within the allowed
>> syntax, so it is not unconceivable that some project may have a
>> custom attribute that lists forbidden characters in a path with
>>
>>   === .gitattributes ===
>> *.txt forbidden=`"

We restrict the 'forbidden' to follow [-a-zA-Z0-9,], so we could enforce
it for the values, too.


>
>$ ./args ":(attr:*.txt forbidden=\'\\\",icase)"
> 1::(attr:*.txt forbidden=\'\",icase)

You should lose the *.txt in there, but put it at the back

>  $ ./args ":(attr:forbidden=\'\\\",icase)*.txt"

>
>$ ./args ':(attr:*.txt forbidden=\'\''\",icase)'
> 1::(attr:*.txt forbidden=\'\",icase)

I see, so quoting by " or ' is preferred. What if the user
wants to do a
forbidden=',"

so we have to escape those in there, such as

./args ':(attr:"forbidden=\',\"")'

We need to escape both ' and ", one for the outer
shell and one for Git parsing.

Does that seem ok?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] pathspec: allow escaped query values

2016-06-02 Thread Junio C Hamano
On Thu, Jun 2, 2016 at 11:42 AM, Ramsay Jones
 wrote:
>>
>> That would be workable, I would think.  Before attr:VAR=VAL
>> extention, supported pathspec  were only single lowercase-ascii
>> alphabet tokens, so nobody would have used " as a part of magic.  So
>> quting with double-quote pair would work.
>
> I was thinking about both ' and ", so that you could do:

Yes, I understood your suggestion as such. Quoting like shells would work
without breaking backward compatibility for the same reason quoting with
double-quote and backslash only without supporting single-quotes would
work.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] pathspec: allow escaped query values

2016-06-02 Thread Ramsay Jones


On 02/06/16 17:10, Junio C Hamano wrote:
> Ramsay Jones  writes:
> 
>> So, at risk of annoying you, let me continue in my ignorance a little
>> longer and ask: even if you have to protect all of this 'magic' from
>> the shell with '/" quoting, could you not use (nested) quotes to
>> protect the  part of an ? For example:
>>
>> git ls-files ':(attr:whitespace="indent,trail,space",icase)'
> 
> That would be workable, I would think.  Before attr:VAR=VAL
> extention, supported pathspec  were only single lowercase-ascii
> alphabet tokens, so nobody would have used " as a part of magic.  So
> quting with double-quote pair would work.

I was thinking about both ' and ", so that you could do:

   $ ./args ':(attr:whitespace="indent,trail,space",icase)'
1::(attr:whitespace="indent,trail,space",icase)

   $ ./args ":(attr:whitespace='indent,trail,space',icase)"
1::(attr:whitespace='indent,trail,space',icase)

   $ p=':(attr:whitespace="indent,trail,space",icase)'
   $ ./args "$p"
1::(attr:whitespace="indent,trail,space",icase)

   $ p=":(attr:whitespace=\"indent,trail,space\",icase)"
   $ ./args "$p"
1::(attr:whitespace="indent,trail,space",icase)

but limiting it to " would probably be OK too.

> You'd need to come up with a way to quote a double quote that
> happens to be a part of VAL somehow, though.

Yes I was assuming \ quoting as well - I just want to reduce the
need for such quoting (especially on windows).

>  I think attribute
> value is limited to a string with non-whitespace letters; even
> though the built-in attributes that have defined meaning to the Git
> itself may not use values with letters beyond [-a-zA-Z0-9,], end
> users and projects can add arbitrary values within the allowed
> syntax, so it is not unconceivable that some project may have a
> custom attribute that lists forbidden characters in a path with
> 
>   === .gitattributes ===
> *.txt forbidden=`"
> 
> that tells their documentation cannot have these letters in it, or
> something like that.

Heh, yeah, that gets ugly:

   $ ./args ":(attr:*.txt forbidden=\'\\\",icase)"
1::(attr:*.txt forbidden=\'\",icase)

   $ ./args ':(attr:*.txt forbidden=\'\''\",icase)'
1::(attr:*.txt forbidden=\'\",icase)

[Note the initial ' 1:' above is output from args]

ATB,
Ramsay Jones



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] completion: add git status

2016-06-02 Thread Junio C Hamano
Thomas Braun  writes:

> + untracked_state="$(__git_find_on_cmdline "--untracked-files=no\
> + --untracked-files=normal --untracked-files=all")"

Just wondering but does this help my use of the command like

$ git status -uno 

or do I now have to spell it out like

$ git status --untracked-files=no 

to take advantage of it?

> + untracked_state=${untracked_state##--untracked-files=}
> +
> + if [ -z "$untracked_state" ]; then
> + untracked_state="$(git --git-dir="$(__gitdir)" config 
> "status.showUntrackedFiles")"
> + fi
> +
> + case "$untracked_state" in
> + no)
> + # --ignored option does not matter

Style.  I see existing case/esac statements that use this style, but
our preference is not to indent case arms like this; rather:

case "$untracked_state" in
no)
# --ignored ...

which saves the indentation one level overall.

> + complete_opt=
> + ;;
> + all|normal|*)
> + complete_opt="--cached --directory --no-empty-directory 
> --others"
> +
> + if [ -n "$(__git_find_on_cmdline "--ignored")" ]; then

Same question as the "--untracked-files=no vs -uno" applies here.

> + complete_opt="$complete_opt --ignored 
> --exclude=*"
> + fi
> + ;;
> + esac
> +
> + __git_complete_index_file "$complete_opt"
> +}
> +
>  __git_config_get_set_variables ()
>  {
>   local prevword word config_file= c=$cword
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Fix prune/gc problem with multiple worktrees

2016-06-02 Thread Junio C Hamano
Michael Haggerty  writes:

> Fixing reachability via the index and detached HEADs feels relatively
> important.
> ...

I agree with the order of importance above.  But "relatively" is a
very good keyword.  Just like bisection refs, what is in the index
and the commit detached HEAD points at are expected to be tentative.
As a part of still-experimental feature, I'd rather see our
bandwidth spent on fixing it the right way first time, instead of
piling on an unproven quick-fix as a band aid, having to rip it off
and fixing it properly later.

> It's hard for me to predict when the ref-iterator stuff will be merged.
> It is a big change, but so far the feedback seems pretty good. I can
> tell you that pushing it and ref-stores forward is high on my priority list.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 28/39] i18n: config: unfold error messages marked for translation

2016-06-02 Thread Junio C Hamano
Vasco Almeida  writes:

> Às 17:43 de 01-06-2016, Junio C Hamano escreveu:
>> Vasco Almeida  writes:
>> 
>>> Introduced in 473166b ("config: add 'origin_type' to config_source
>>> struct", 2016-02-19), Git can inform the user about the origin of a
>>> config error, but the implementation does not allow translators to
>>> translate the keywords 'file', 'blob, 'standard input', and
>>> 'submodule-blob'. Moreover, for the second message, a reason for the
>>> error is appended to the message, not allowing translators to translate
>>> that reason either.
>> 
>> Good intentions.
>> 
>>> @@ -417,6 +417,7 @@ static int git_parse_source(config_fn_t fn, void *data)
>>> int comment = 0;
>>> int baselen = 0;
>>> struct strbuf *var = >var;
>>> +   char error_msg[128];
>>>  
>>> /* U+FEFF Byte Order Mark in UTF8 */
>>> const char *bomptr = utf8_bom;
>>> @@ -471,10 +472,38 @@ static int git_parse_source(config_fn_t fn, void 
>>> *data)
>>> if (get_value(fn, data, var) < 0)
>>> break;
>>> }
>>> +
>>> +   switch (cf->origin_type) {
>>> +   case CFG_BLOB:
>>> +   xsnprintf(error_msg, sizeof(error_msg),
>>> + _("bad config line %d in blob %s"),
>>> + cf->linenr, cf->name);
>> 
>> Use xstrfmt() intead, perhaps?  That would be cleaner.
>> 
> Wouldn't that create a memory leak?

Yes, I didn't mean to suggest "use xstrfmt() instead of xsnprintf()
without changing anything else".  It was merely to suggest that you
do not have to have 128-byte limit if you used xstrfmt(); having to
free the result was too obvious that I left it unsaid, and as expected,
you noticed the need to do so, which is good ;-)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Segfault in the attr stack

2016-06-02 Thread Stefan Beller
On Thu, Jun 2, 2016 at 8:59 AM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
 Gaah, of course.

 This is coming from the cache preload codepath, where multiple threads
 try to run ce_path_match().
 It used to be OK because pathspec magic never looked at attributes,
 but now it does, and attribute system is not thread-safe.
>
> I'll look into teaching a threadble interface to the attribute
> subsystem, but for now, this should get you unstuck, I think.

Thanks for looking into this!
(I was about to do that if you would not have stepped up, as the bug
only surfaces when using the pathspec labels.)

>
> +   /* Do not preload when pathspec uses non-threadable subsystems */
> +   if (pathspec && pathspec_is_non_threadable(pathspec))
> +   return; /* for now ... */

This fixes the problem for now; I'll look into the comma escaping then.

Thanks,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/6] worktree.c: find_worktree() learns to identify worktrees by basename

2016-06-02 Thread Junio C Hamano
Duy Nguyen  writes:

> On Thu, Jun 2, 2016 at 1:44 AM, Junio C Hamano  wrote:
>>> We would
>>> need to convert or match both '/' and '\' in "to/foo" case because of
>>> Windows, so it's not much easier than basename().
>>
>> I never said "easier to implement".  But can this codepath get
>> backslashed paths in the first place?  I somehow thought that
>> normalization would happen a lot before the control reaches here.
>>
>> You'll be calling into fspathcmp() anyway; shouldn't the function
>> know that '/' and '\' are equivalent on some platforms, or is it
>> legal to only call fspathcmp() on a single path component without
>> directory separator?
>
> We still need to calculate the length to compare, which could be
> problematic when utf-8 is involved, or some other encoding. 

Hmph.  I was unaware that fspathcmp() used here does more than
byte-for-byte processing, which would cause problems due to encoding
issues when you hand code the comparison.

+static struct worktree *find_worktree_by_basename(struct worktree **list,
+ const char *base_name)
+{
+   struct worktree *found = NULL;
+   int nr_found = 0;
+
+   for (; *list && nr_found < 2; list++) {
+   char *path = xstrdup((*list)->path);
+   if (!fspathcmp(base_name, basename(path))) {
+   found = *list;
+   nr_found++;
+   }
+   free(path);
+   }
+   return nr_found == 1 ? found : NULL;
+}


> If we always split at '/' boundary though (e.g. "abc/def/ghi",
> "def/ghi" or "ghi" but never "ef/ghi") then it should be ok.

Does "basename()" used here know '/' and '\' can both be a directory
separator, or does worktree->path have a normalized representation
of the path, i.e. '/' is the only directory separator?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/13] refs: introduce an iterator interface

2016-06-02 Thread Michael Haggerty
On 06/02/2016 12:08 PM, Duy Nguyen wrote:
> On Mon, May 30, 2016 at 2:55 PM, Michael Haggerty  
> wrote:
>> Currently, the API for iterating over references is via a family of
>> for_each_ref()-type functions that invoke a callback function for each
>> selected reference. All of these eventually call do_for_each_ref(),
>> which knows how to do one thing: iterate in parallel through two
>> ref_caches, one for loose and one for packed refs, giving loose
>> references precedence over packed refs. This is rather complicated code,
>> and is quite specialized to the files backend. It also requires callers
>> to encapsulate their work into a callback function, which often means
>> that they have to define and use a "cb_data" struct to manage their
>> context.
>>
>> The current design is already bursting at the seams, and will become
>> even more awkward in the upcoming world of multiple reference storage
>> backends:
>>
>> * Per-worktree vs. shared references are currently handled via a kludge
>>   in git_path() rather than iterating over each part of the reference
>>   namespace separately and merging the results. This kludge will cease
>>   to work when we have multiple reference storage backends.
> 
> Question from a refs user. Right now worktree.c:get_worktrees() peeks
> directly to "$GIT_DIR/worktrees/xxx/HEAD" and parses the content
> itself, something that I  promised to fix but never got around to do
> it. Will we have an iterator to go through all worktrees' HEAD, or
> will  there be an API to say "resolve ref HEAD from worktree XYZ"?

My preference is that there is a way to say "create a ref_store object
representing the loose references stored physically under
"$GIT_DIR/worktrees/xxx". Then that ref_store could be asked to read its
`HEAD` (or iterate over all of the refs under that path or whatever).
You could even write `HEAD` through the same ref_store.

Michael

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fetch: better alignment in ref summary

2016-06-02 Thread Junio C Hamano
Duy Nguyen  writes:

> On Mon, May 23, 2016 at 7:58 AM, Junio C Hamano  wrote:
>> That is, I wonder if the above can become something like:
>>
>>> From github.com:pclouds/git
>>>  * [new branch]  { -> pclouds/}2nd-index
>>>  * [new branch]  { -> pclouds/}3nd-index
>>>  * [new branch]  { -> pclouds/}file-watcher
>>>  ...
>
> for context of the following quote...
>
> On Thu, May 26, 2016 at 12:18 PM, Jeff King  wrote:
>> I do agree with Junio that we could probably improve the output quite a
>> bit by not showing each refname twice in the common case. I don't quite
>> find the "{ -> origin/}whatever" particularly pretty, but something like
>> that which keeps the variable bits at the end would make this problem
>> just go away.
>
> I tried it out. It's a bit hard to read at the "/}" boundary. Color
> highlight might help. But it occurs to me, could we extend refspec a
> bit to allow "foo/bar:base/..." be be equivalent of
> "foo/bar:base/foo/bar". Then fetch output could become
>
>>>  * [new branch]  2nd-index:pclouds/...
>>>  * [new branch]  3nd-index:pclouds/...
>>>  * [new branch]  file-watcher:pclouds/...
>
> It might be a tiny bit better, and a forced update could be displayed
> with a prefix '+'. Hmm?

I do not find that "..." particularly more readable, but that
probably is very subjective.  It is much less copy, when
compared to pasting "pclouds/}2nd-index" and then removing "}".




--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] pathspec: allow escaped query values

2016-06-02 Thread Junio C Hamano
Ramsay Jones  writes:

> So, at risk of annoying you, let me continue in my ignorance a little
> longer and ask: even if you have to protect all of this 'magic' from
> the shell with '/" quoting, could you not use (nested) quotes to
> protect the  part of an ? For example:
>
> git ls-files ':(attr:whitespace="indent,trail,space",icase)'

That would be workable, I would think.  Before attr:VAR=VAL
extention, supported pathspec  were only single lowercase-ascii
alphabet tokens, so nobody would have used " as a part of magic.  So
quting with double-quote pair would work.

You'd need to come up with a way to quote a double quote that
happens to be a part of VAL somehow, though.  I think attribute
value is limited to a string with non-whitespace letters; even
though the built-in attributes that have defined meaning to the Git
itself may not use values with letters beyond [-a-zA-Z0-9,], end
users and projects can add arbitrary values within the allowed
syntax, so it is not unconceivable that some project may have a
custom attribute that lists forbidden characters in a path with

=== .gitattributes ===
*.txt forbidden=`"

that tells their documentation cannot have these letters in it, or
something like that.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Segfault in the attr stack

2016-06-02 Thread Junio C Hamano
Junio C Hamano  writes:

>>> Gaah, of course.
>>>
>>> This is coming from the cache preload codepath, where multiple threads
>>> try to run ce_path_match().
>>> It used to be OK because pathspec magic never looked at attributes,
>>> but now it does, and attribute system is not thread-safe.

I'll look into teaching a threadble interface to the attribute
subsystem, but for now, this should get you unstuck, I think.

One of the ways preload codepath assures that multiple threads do
not stomp on the shared datastructure is by copying things that are
involved in the operation, and there is copy_pathspec() call.  I
think the "pathspec magic that limits with the attributes" topic,
which adds to the pathspec structure, needs to update the function
so that its shared data structure is properly copied.  Before the
series, I think the pathspec structure only has either pointers to
borrowed strings (e.g. raw) or scalars, and it was sufficient to do
a shallow copy with memcpy().  With the change, that is no longer
the case.

Which would mean we'd need to make sure that any codepath that calls
copy_pathspec() needs to think about how to clean it up.  Before the
change, preload-index codepath didn't have to, but with the change,
it will.  We'd need pathspec_clear() or something like that.

 pathspec.c  | 12 
 pathspec.h  |  2 ++
 preload-index.c |  4 
 3 files changed, 18 insertions(+)

diff --git a/pathspec.c b/pathspec.c
index 0a02255..326863a 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -208,6 +208,18 @@ static void eat_long_magic(struct pathspec_item *item, 
const char *elt,
*copyfrom_ = copyfrom;
 }
 
+int pathspec_is_non_threadable(const struct pathspec *pathspec)
+{
+   int i;
+
+   for (i = 0; i < pathspec->nr; i++) {
+   const struct pathspec_item *item = >items[i];
+   if (item->attr_check)
+   return 1;
+   }
+   return 0;
+}
+
 /*
  * Take an element of a pathspec and check for magic signatures.
  * Append the result to the prefix. Return the magic bitmap.
diff --git a/pathspec.h b/pathspec.h
index 5308137..07d21f0 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -115,4 +115,6 @@ extern void add_pathspec_matches_against_index(const struct 
pathspec *pathspec,
 extern const char *check_path_for_gitlink(const char *path);
 extern void die_if_path_beyond_symlink(const char *path, const char *prefix);
 
+extern int pathspec_is_non_threadable(const struct pathspec *);
+
 #endif /* PATHSPEC_H */
diff --git a/preload-index.c b/preload-index.c
index c1fe3a3..af46878 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -76,6 +76,10 @@ static void preload_index(struct index_state *index,
if (!core_preload_index)
return;
 
+   /* Do not preload when pathspec uses non-threadable subsystems */
+   if (pathspec && pathspec_is_non_threadable(pathspec))
+   return; /* for now ... */
+
threads = index->cache_nr / THREAD_COST;
if (threads < 2)
return;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] push: deny policy to prevent pushes to unwanted remotes.

2016-06-02 Thread Lars Schneider


> On 30 May 2016, at 06:45, Antoine Queru 
>  wrote:
> 
> Currently, a user wanting to prevent accidental pushes to the wrong remote
> has to create a pre-push hook.
> The feature offers a configuration to allow users to prevent accidental pushes
> to the wrong remote. The user may define a list of whitelisted remotes, a list
> of blacklisted remotes and a default policy ("allow" or "deny"). A push
> is denied if the remote is explicitely blacklisted or if it isn't
> whitelisted and the default policy is "deny".
> 
> This feature is intended as a safety net more than a real security, the user
> will always be able to modify the config if he wants to. It is here for him to
> consciously restrict his push possibilities. For example, it may be useful
> for an unexperimented user fearing to push to the wrong remote, or for
> companies wanting to avoid unintentionnal leaking of private code on public
> repositories.

Thanks for working on this feature. Unfortunately I won't be able to test and 
review it before June 14. I am traveling without laptop and only very sporadic 
internet access :)

One thing that I noticed already: I think a custom warning/error message for 
rejected pushes would be important because, as you wrote above, this feature 
does not provide real security. That means if a push is rejected for someone in 
an organization then the user needs to understand what is going on. E.g. in my 
organization I would point the user to the open source contribution guidelines 
etc.

Thanks,
Lars


> 
> By default the whitelist/blacklist feature is disabled since the default 
> policy
> is "allow".
> 
> Signed-off-by: Antoine Queru 
> Signed-off-by: Francois Beutin 
> Signed-off-by: Matthieu Moy 
> ---
> This is the first implementation of the feature proposed in SoCG 2016.
> The conversation about it can be found here:
> http://thread.gmane.org/gmane.comp.version-control.git/295166
> and http://thread.gmane.org/gmane.comp.version-control.git/289749.
> We have included in cc all the participants in the previous conversation.
> Documentation/config.txt| 17 +++
> Documentation/git-push.txt  | 32 +
> builtin/push.c  | 75 ---
> t/t5544-push-whitelist-blacklist.sh | 90 +
> 4 files changed, 209 insertions(+), 5 deletions(-)
> create mode 100755 t/t5544-push-whitelist-blacklist.sh
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 53f00db..1478ce3 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2517,6 +2517,23 @@ remote.pushDefault::
>`branch..remote` for all branches, and is overridden by
>`branch..pushRemote` for specific branches.
> 
> +remote.pushBlacklisted::
> +The list of remotes the user is forbidden to push to.
> +See linkgit:git-push[1]
> +
> +remote.pushWhitelisted::
> +The list of remotes the user is allowed to push to.
> +See linkgit:git-push[1]
> +
> +remote.pushDefaultPolicy::
> +Can be set to either 'allow' or 'deny', defines the policy to adopt
> +when the user tries to push to a remote not in the whitelist or 
> +the blacklist. Defaults to 'allow'.
> +
> +remote.pushDenyMessage::
> +The message printed when pushing to a forbidden remote.
> +Defaults to "Pushing to this remote has been forbidden".
> +
> remote..url::
>The URL of a remote repository.  See linkgit:git-fetch[1] or
>linkgit:git-push[1].
> diff --git a/Documentation/git-push.txt b/Documentation/git-push.txt
> index cf6ee4a..644bfde 100644
> --- a/Documentation/git-push.txt
> +++ b/Documentation/git-push.txt
> @@ -368,6 +368,38 @@ reason::
>refs, no explanation is needed. For a failed ref, the reason for
>failure is described.
> 
> +
> +Note about denying pushes to the wrong remotes
> +--
> +
> +If you fear accidental pushes to the wrong remotes, you can use the
> +blacklist/whitelist feature. The goal is to catch pushes to unwanted
> +remotes before they happen.
> +
> +The simplest way to forbid pushes to a remote is to add its url in the
> +config file under the 'remote.pushBlacklisted' variable.
> +A more restrictive way is to change 'remote.pushDefaultPolicy' to 'deny',
> +thus denying pushes to every remote not whitelisted. You can then add
> +the right remotes under 'remote.pushWhitelisted'.
> +
> +You can also configure more advanced acceptation/denial behavior following
> +this rule: the more the url in the config prefixes the asked url the more
> +priority it has.
> +
> +For example, if we set up the configuration variables like this:
> +git config --add remote.pushBlacklisted repository.com
> +git config --add remote.pushWhitelisted repository.com/Special_Folder
> +Any push of this form will 

Re: [RFC/PATCH] pathspec: allow escaped query values

2016-06-02 Thread Ramsay Jones


On 02/06/16 06:46, Junio C Hamano wrote:
> Ramsay Jones  writes:
> 
>> Not having given this much thought at all, but the question which comes
>> to mind is: can you use some other separator for the -s rather than
>> a comma? That way you don't need to quote them in the  part of the
>> -spec.
>>
>> (I dunno, maybe use ; or : instead?)
> 
> There are two kinds of comma involved in this discussion.
> 
>  * Multiple pathspec magic can be attached to augment the way
> selects paths. ":(,,...)" is
>the syntax, and  are things like "icase" (select the path
>that matches  case-insensitively), "top" ( must
>match from the top level of the working tree, even when you are
>running the command from a subdirectory).  We added a new kind of
> whose syntax is "attr:VAR=VAL" recently, which says "not
>only  must match the path, in order to be selected, the
>path must have the attribute VAR with value VAL".
> 
>The comma that separates multiple magic is not something you can
>change now; it has been with us since v1.7.6 (Jun 2011)
> 
>  * My example wanted to use the attr:VAR=VAL form to select those
>paths that has one specific string as the value for whitespace
>attribute, i.e. VAR in this case is "whitespace".  The value for
>whitespace attribute determines what kind of whitespace anomalies
>are considered as errors by "git apply" and "git diff", and it is
>formed by concatenating things like "indent-with-non-tab" (starts
>a line with more than 8 consecutive SPs), "space-before-tab" (a
>SP appears immediately before HT in the indent), etc., with a
>comma in between.
> 
>The comma that separates various kinds of whitespace errors is
>not something you can change now; it has been with us since
>v1.5.4 (Feb 2008).
> 
> So using different separator is not a viable solution.

Ah, OK, makes sense. Note that I have not used 'pathspec magic' or
the attribute system in git (never felt/had the need)! ;-)

So, at risk of annoying you, let me continue in my ignorance a little
longer and ask: even if you have to protect all of this 'magic' from
the shell with '/" quoting, could you not use (nested) quotes to
protect the  part of an ? For example:

git ls-files ':(attr:whitespace="indent,trail,space",icase)'

[Hmm, I don't seem to be able to find documentation on the syntax
of these features (apart from the code, of course).]

ATB,
Ramsay Jones


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/2] completion: factor out untracked file modes into a variable

2016-06-02 Thread Thomas Braun
Signed-off-by: Thomas Braun 
---
 contrib/completion/git-completion.bash | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 3402475..addea89 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1098,6 +1098,8 @@ _git_clone ()
esac
 }
 
+__git_untracked_file_modes="all no normal"
+
 _git_commit ()
 {
case "$prev" in
@@ -1119,7 +1121,7 @@ _git_commit ()
return
;;
--untracked-files=*)
-   __gitcomp "all no normal" "" "${cur##--untracked-files=}"
+   __gitcomp "$__git_untracked_file_modes" "" 
"${cur##--untracked-files=}"
return
;;
--*)


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] completion: add git status

2016-06-02 Thread Thomas Braun
Signed-off-by: Thomas Braun 
---
 contrib/completion/git-completion.bash | 55 ++
 1 file changed, 55 insertions(+)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index addea89..fa7a03a 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1782,6 +1782,61 @@ _git_stage ()
_git_add
 }
 
+_git_status ()
+{
+   local complete_opt
+   local untracked_state
+
+   case "$cur" in
+   --ignore-submodules=*)
+   __gitcomp "none untracked dirty all" "" 
"${cur##--ignore-submodules=}"
+   return
+   ;;
+   --untracked-files=*)
+   __gitcomp "$__git_untracked_file_modes" "" 
"${cur##--untracked-files=}"
+   return
+   ;;
+   --column=*)
+   __gitcomp "
+   always never auto column row plain dense nodense
+   " "" "${cur##--column=}"
+   return
+   ;;
+   --*)
+   __gitcomp "
+   --short --branch --porcelain --long --verbose
+   --untracked-files= --ignore-submodules= --ignored
+   --column= --no-column
+   "
+   return
+   ;;
+   esac
+
+   untracked_state="$(__git_find_on_cmdline "--untracked-files=no\
+   --untracked-files=normal --untracked-files=all")"
+   untracked_state=${untracked_state##--untracked-files=}
+
+   if [ -z "$untracked_state" ]; then
+   untracked_state="$(git --git-dir="$(__gitdir)" config 
"status.showUntrackedFiles")"
+   fi
+
+   case "$untracked_state" in
+   no)
+   # --ignored option does not matter
+   complete_opt=
+   ;;
+   all|normal|*)
+   complete_opt="--cached --directory --no-empty-directory 
--others"
+
+   if [ -n "$(__git_find_on_cmdline "--ignored")" ]; then
+   complete_opt="$complete_opt --ignored 
--exclude=*"
+   fi
+   ;;
+   esac
+
+   __git_complete_index_file "$complete_opt"
+}
+
 __git_config_get_set_variables ()
 {
local prevword word config_file= c=$cword
-- 
2.8.3.windows.1



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] completion: add git status

2016-06-02 Thread Thomas Braun
Am 01.06.2016 um 14:15 schrieb SZEDER Gábor:
> 
> Quoting Thomas Braun :
> 
>> Signed-off-by: Thomas Braun 
>> ---
>>  contrib/completion/git-completion.bash | 29
>> +
>>  1 file changed, 29 insertions(+)
>>
>> diff --git a/contrib/completion/git-completion.bash
>> b/contrib/completion/git-completion.bash
>> index addea89..77343da 100644
>> --- a/contrib/completion/git-completion.bash
>> +++ b/contrib/completion/git-completion.bash
>> @@ -1782,6 +1782,35 @@ _git_stage ()
>>  _git_add
>>  }
>>
>> +_git_status ()
>> +{
>> +case "$cur" in
>> +--ignore-submodules=*)
>> +__gitcomp "none untracked dirty all" ""
>> "${cur##--ignore-submodules=}"
>> +return
>> +;;
>> +--untracked-files=*)
>> +__gitcomp "$__git_untracked_file_modes" ""
>> "${cur##--untracked-files=}"
>> +return
>> +;;
>> +--column=*)
>> +__gitcomp "
>> +always never auto column row plain dense nodense
>> +" "" "${cur##--column=}"
>> +return
>> +;;
>> +--*)
>> +__gitcomp "
>> +--short --branch --porcelain --long --verbose
>> +--untracked-files= --ignore-submodules= --ignored
>> +--column= --no-column
>> +"
>> +return
>> +;;
>> +esac
>> +__git_complete_file
> 
> __git_complete_file()'s job is to complete the ':' notation,
> e.g. 'master:Mak',  which is not what we want here, because this
> notation doesn't make sense for 'git status' and because 'git status
> ' would then offer refs instead of files.

Correct. I might have been mislead by the name ;)

> I think there are two choices what to do instead:
> 
>   - Don't do anything :)  Bash will then fall back to filename
> completion, which is quite close to what we want here (and in this
> case the return statements from the other case arms can go away as
> well).  The drawback is that all ignored files in the current
> working directory will show up after 'git status '.
> 
>   - use __git_complete_index_file() with appropriate options, perhaps
> '--cached --others', but I didn't think this through.  For bonus
> points pass additional options when certain 'git status' options are
> already present on the command line, e.g. pass '--ignored', too, if
> it is present.

I went for the bonus points way. If that is too involved I can also go
back to "Don't do anything".


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] completion: create variable for untracked file modes

2016-06-02 Thread Thomas Braun
Am 01.06.2016 um 13:59 schrieb SZEDER Gábor:
> 
> This subject would perhaps read better:
> 
>   completion: factor out untracked file modes into a variable

Yes, definitly. Will be included in reroll.

> Quoting Thomas Braun :
> 
>> Signed-off-by: Thomas Braun 
>> ---
>>  contrib/completion/git-completion.bash | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/contrib/completion/git-completion.bash
>> b/contrib/completion/git-completion.bash
>> index 3402475..addea89 100644
>> --- a/contrib/completion/git-completion.bash
>> +++ b/contrib/completion/git-completion.bash
>> @@ -1098,6 +1098,8 @@ _git_clone ()
>>  esac
>>  }
>>
>> +__git_untracked_file_modes="all no normal"
>> +
>>  _git_commit ()
>>  {
>>  case "$prev" in
>> @@ -1119,7 +1121,7 @@ _git_commit ()
>>  return
>>  ;;
>>  --untracked-files=*)
>> -__gitcomp "all no normal" "" "${cur##--untracked-files=}"
>> +__gitcomp "$__git_untracked_file_modes" ""
>> "${cur##--untracked-files=}"
>>  return
>>  ;;
>>  --*)
> 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 28/39] i18n: config: unfold error messages marked for translation

2016-06-02 Thread Vasco Almeida
Às 17:43 de 01-06-2016, Junio C Hamano escreveu:
> Vasco Almeida  writes:
> 
>> Introduced in 473166b ("config: add 'origin_type' to config_source
>> struct", 2016-02-19), Git can inform the user about the origin of a
>> config error, but the implementation does not allow translators to
>> translate the keywords 'file', 'blob, 'standard input', and
>> 'submodule-blob'. Moreover, for the second message, a reason for the
>> error is appended to the message, not allowing translators to translate
>> that reason either.
> 
> Good intentions.
> 
>> @@ -417,6 +417,7 @@ static int git_parse_source(config_fn_t fn, void *data)
>>  int comment = 0;
>>  int baselen = 0;
>>  struct strbuf *var = >var;
>> +char error_msg[128];
>>  
>>  /* U+FEFF Byte Order Mark in UTF8 */
>>  const char *bomptr = utf8_bom;
>> @@ -471,10 +472,38 @@ static int git_parse_source(config_fn_t fn, void *data)
>>  if (get_value(fn, data, var) < 0)
>>  break;
>>  }
>> +
>> +switch (cf->origin_type) {
>> +case CFG_BLOB:
>> +xsnprintf(error_msg, sizeof(error_msg),
>> +  _("bad config line %d in blob %s"),
>> +  cf->linenr, cf->name);
> 
> Use xstrfmt() intead, perhaps?  That would be cleaner.
> 
Wouldn't that create a memory leak?
Because, in this patch, error_msg is used in the following way

if (cf->die_on_error)
die(error_msg);
else
return error(error_msg);
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 13/39] i18n: git-sh-setup.sh: mark strings for translation

2016-06-02 Thread Vasco Almeida
Às 20:30 de 01-06-2016, Junio C Hamano escreveu:
> Junio C Hamano  writes:
> Would it allow you to lose the $(git --exec-path) prefix in the new
> dot-source to have this patch before applying your patch?
> 
> -- >8 --
> Subject: t2300: run git-sh-setup in an environment that better mimics the 
> real life
> 
> When we run scripted Porcelains, "git" potty has set up the $PATH by
> prepending $GIT_EXEC_PATH, the path given by "git --exec-path=$there
> $cmd", etc. already.  Because of this, scripted Porcelains can
> dot-source shell script library like git-sh-setup with simple dot
> without specifying any path.
> 
> t2300 however dot-sources git-sh-setup without adjusting $PATH like
> the real "git" potty does.  This has not been a problem so far, but
> once git-sh-setup wants to rely on the $PATH adjustment, just like
> any scripted Porcelains already do, it would become one.  It cannot
> for example dot-source another shell library without specifying the
> full path to it by prefixing $(git --exec-path).
> 
> Signed-off-by: Junio C Hamano 
> ---
>  t/t2300-cd-to-toplevel.sh | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/t/t2300-cd-to-toplevel.sh b/t/t2300-cd-to-toplevel.sh
> index 9965bc5..cccd7d9 100755
> --- a/t/t2300-cd-to-toplevel.sh
> +++ b/t/t2300-cd-to-toplevel.sh
> @@ -8,7 +8,8 @@ test_cd_to_toplevel () {
>   test_expect_success $3 "$2" '
>   (
>   cd '"'$1'"' &&
> - . "$(git --exec-path)"/git-sh-setup &&
> + PATH="$(git --exec-path):$PATH" &&
> + . git-sh-setup &&
>   cd_to_toplevel &&
>   [ "$(pwd -P)" = "$TOPLEVEL" ]
>   )
> 
Yes, this patch allows to have . git-sh-i18n instead of
. "$(git --exec-path)"/git-sh-i18n in git-sh-setup.sh file.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 24/39] i18n: bisect: enable l10n of bisect terms in messages

2016-06-02 Thread Vasco Almeida
Às 17:38 de 01-06-2016, Junio C Hamano escreveu:
> Vasco Almeida  writes:
> 
>> +enum term { BAD, GOOD, OLD, NEW };
>> +static const char *term_names[] = {
>> +/* TRANSLATORS: in bisect.c source code file, the following terms are
>> +   used to describe a "bad commit", "good commit", "new revision", etc.
>> +   Please, if you can, check the source when you are not sure if a %s
>> +   would be replaced by one of the following terms. */
>> +N_("bad"), N_("good"), N_("old"), N_("new"),  NULL
>> +};
>> +
>>  /* Remember to update object flag allocation in object.h */
>>  #define COUNTED (1u<<16)
>>  
>> @@ -725,12 +734,12 @@ static void handle_bad_merge_base(void)
>>  if (is_expected_rev(current_bad_oid)) {
>>  char *bad_hex = oid_to_hex(current_bad_oid);
>>  char *good_hex = join_sha1_array_hex(_revs, ' ');
>> -if (!strcmp(term_bad, "bad") && !strcmp(term_good, "good")) {
>> +if (!strcmp(term_bad, term_names[BAD]) && !strcmp(term_good, 
>> term_names[GOOD])) {
>>  fprintf(stderr, _("The merge base %s is bad.\n"
>>  "This means the bug has been fixed "
>>  "between %s and [%s].\n"),
>>  bad_hex, bad_hex, good_hex);
>> -} else if (!strcmp(term_bad, "new") && !strcmp(term_good, 
>> "old")) {
>> +} else if (!strcmp(term_bad, term_names[NEW]) && 
>> !strcmp(term_good, term_names[OLD])) {
>>  fprintf(stderr, _("The merge base %s is new.\n"
>>  "The property has changed "
>>  "between %s and [%s].\n"),
>> @@ -739,7 +748,7 @@ static void handle_bad_merge_base(void)
>>  fprintf(stderr, _("The merge base %s is %s.\n"
>>  "This means the first '%s' commit is "
>>  "between %s and [%s].\n"),
>> -bad_hex, term_bad, term_good, bad_hex, 
>> good_hex);
>> +bad_hex, _(term_bad), _(term_good), bad_hex, 
>> good_hex);
> 
> These "bad" and "good" that are compared with term_bad and term_good
> are the literal tokens the end user gives from the "git bisect"
> command line.  I do not think you would want to catch them with
> 
> $ git bisect novo 
> $ git bisect velho 
> 
> unless the user has done
> 
> $ git bisect --term-old=velho --term-new=novo
> 
> previously.

I may be misunderstanding you, but we do not "catch" those terms with
this patch, although I'm not sure what you mean by "catch them". I think
you forget that no-operation N_("good"), does not affect in any way the
string "good", it only enables xgettext to extract it to .pot file, does
not trigger translation.
Overall, I don't understand what are you trying to tell me here.

> 
> And that "custom bisect terms" case is covered by the last "else"
> clause in this if/elseif cascade (outside the context we can see in
> your message).
> 
> The only thing you need to do around here is to mark the string as
> translatable.  I do not think we need "enum term", or term_names[].

This patch tries to make bisect output those term translated within the
also translated message. To enable this, it is handy to have
term_names[] in order to mark each term, although I could have mark them
anywhere they appeared in the source. It was only for that I chose to
have term_names[].
> 
>> @@ -747,7 +756,7 @@ static void handle_bad_merge_base(void)
>>  fprintf(stderr, _("Some %s revs are not ancestor of the %s rev.\n"
>>  "git bisect cannot work properly in this case.\n"
>>  "Maybe you mistook %s and %s revs?\n"),
>> -term_good, term_bad, term_good, term_bad);
>> +_(term_good), _(term_bad), _(term_good), _(term_bad));
> 
> Likewise for all _(term_good), _(term_bad) and use of term_names[]
> we see in this patch.
> 

Indeed this was more of a PATCH/RFC to see what people would think of
it. If nobody contest and there is no value in it, I'll happily drop
this patch in the next re-roll.

My motivation for this patch was that a user could feel embarrassment
reading a message in her language with those terms untranslated.
Although I do believe that no translating them is also a possibility.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] strbuf: improve API

2016-06-02 Thread William Duclot
On Thu, Jun 02, 2016 at 02:58:22PM +0200, Matthieu Moy wrote:
> Michael Haggerty  writes:
> 
>> 1. The amount of added code complexity is small and quite
>>encapsulated.
> 
> Actually, STRBUF_OWNS_MEMORY can even be seen as a simplification if
> done properly: we already have the case where the strbuf does not own
> the memory with strbuf_slopbuf. I already pointed places in
> strbuf_grow() which could be simplified after the patch. Re-reading the
> code it seems at lesat the call to strbuf_grow(sb, 0); in strbuf_detach
> becomes useless. The same in strbuf_attach() probably is, too.
> 
> So, the final strbuf.[ch] code might not be "worse" that the previous.
> 
> I'm unsure about the complexity of the future code using the new API. I
> don't forsee cases where using the new API would lead to a high
> maintenance cost, but I don't claim I considered all possible uses.
> 
>> 2. The ability to use strbufs without having to allocate memory might
>>make enough *psychological* difference that it encourages some
>>devs to use strbufs where they would otherwise have done manual
>>memory management. I think this would be a *big* win in terms of
>>potential bugs and security vulnerabilities avoided.
> 
> Note that this can also be seen as a counter-argument, since it
> may psychologically encourage people to micro-optimize code and use
> contributors/reviewers neurons to spend time on "shall this be on-stack
> or malloced?".
> 
> I think we already have a tendency to micro-optimize non-critical code
> too much in Git's codebase, so it's not necessarily a step in the right
> direction.
> 
> In conclusion, I don't have a conclusion, sorry ;-).

Thank you all for your input and your tests, those are very valuable!
Me and Simon have to take a decision, as this contribution is part of a
school project that comes to an end. We won't have the time to create
tests that are representative of the use of strbuf in the Git codebase
(partly because we lack knowledge of the codebase) to settle on whether
this API improvement is useful or not.

Jeff made very good points, and the tests we ran ourselves seem to
agree with yours. That being said, the tests made by Michael, more
detailed, suggest that there may be room for an improvement of the
strbuf API (even thought that's to confront to the reality of the
codebase).

Having little time, we will refactor the patch and send it as a V2: this
way, if it appears one day that improving the API with stack-allocated
memory is indeed useful, the patch will be ready to merge :)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (May 2016, #09; Tue, 31) t1308 broken

2016-06-02 Thread Torsten Bögershausen
It seams as  ./t1308-config-set.sh is broken,
when the the directory is a soft link:

-name=/home/tb/NoBackup/projects/git/git.pu/t/trash
directory.t1308-config-set/.gitconfig
+name=/home/tb/projects/git/git.pu/t/trash directory.t1308-config-set/.gitconfig
 scope=global

 key=foo.bar
not ok 28 - iteration shows correct origins

I havent't digged further, too many conflicts in the config code, may be
somebody knows it directly ?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fetch: better alignment in ref summary

2016-06-02 Thread Duy Nguyen
On Mon, May 23, 2016 at 7:58 AM, Junio C Hamano  wrote:
> That is, I wonder if the above can become something like:
>
>> From github.com:pclouds/git
>>  * [new branch]  { -> pclouds/}2nd-index
>>  * [new branch]  { -> pclouds/}3nd-index
>>  * [new branch]  { -> pclouds/}file-watcher
>>  ...

for context of the following quote...

On Thu, May 26, 2016 at 12:18 PM, Jeff King  wrote:
> I do agree with Junio that we could probably improve the output quite a
> bit by not showing each refname twice in the common case. I don't quite
> find the "{ -> origin/}whatever" particularly pretty, but something like
> that which keeps the variable bits at the end would make this problem
> just go away.

I tried it out. It's a bit hard to read at the "/}" boundary. Color
highlight might help. But it occurs to me, could we extend refspec a
bit to allow "foo/bar:base/..." be be equivalent of
"foo/bar:base/foo/bar". Then fetch output could become

>>  * [new branch]  2nd-index:pclouds/...
>>  * [new branch]  3nd-index:pclouds/...
>>  * [new branch]  file-watcher:pclouds/...

It might be a tiny bit better, and a forced update could be displayed
with a prefix '+'. Hmm?
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] strbuf: improve API

2016-06-02 Thread Matthieu Moy
Michael Haggerty  writes:

> 1. The amount of added code complexity is small and quite
>encapsulated.

Actually, STRBUF_OWNS_MEMORY can even be seen as a simplification if
done properly: we already have the case where the strbuf does not own
the memory with strbuf_slopbuf. I already pointed places in
strbuf_grow() which could be simplified after the patch. Re-reading the
code it seems at lesat the call to strbuf_grow(sb, 0); in strbuf_detach
becomes useless. The same in strbuf_attach() probably is, too.

So, the final strbuf.[ch] code might not be "worse" that the previous.

I'm unsure about the complexity of the future code using the new API. I
don't forsee cases where using the new API would lead to a high
maintenance cost, but I don't claim I considered all possible uses.

> 2. The ability to use strbufs without having to allocate memory might
>make enough *psychological* difference that it encourages some
>devs to use strbufs where they would otherwise have done manual
>memory management. I think this would be a *big* win in terms of
>potential bugs and security vulnerabilities avoided.

Note that this can also be seen as a counter-argument, since it
may psychologically encourage people to micro-optimize code and use
contributors/reviewers neurons to spend time on "shall this be on-stack
or malloced?".

I think we already have a tendency to micro-optimize non-critical code
too much in Git's codebase, so it's not necessarily a step in the right
direction.

In conclusion, I don't have a conclusion, sorry ;-).

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] Triangular Workflow UI improvement: Documentation

2016-06-02 Thread Michael Haggerty
On 05/31/2016 04:33 PM, Matthieu Moy wrote:
> Jordan DE GEA  writes:
> [...]
>> +DESCRIPTION
>> +---
>> +
>> +Triangular Workflow (or Asymmetric Workflow) is a workflow which gives
>> +the possibility to:
>> +
>> +- fetch (or pull) from a repository
>> +- push to another repository
> 
> [...]
> 
> I find Michael Haggerty's definition of triangular workflow much
> clearer:
> 
> https://github.com/blog/2042-git-2-5-including-multiple-worktrees-and-triangular-workflows#improved-support-for-triangular-workflows
> 
> I don't see a licence on the GitHub blog, so I don't think it's legal to
> copy-past directly to our docs, but Michael might allow us to do so?

I'm glad you find that blog post useful!

You are correct that the text of GitHub blog posts is copyrighted, so
indeed you need to ask before using it.

It is OK with me (and more importantly with GitHub, because I wrote this
text in their employ) for you to use the text about triangular workflows
from the blog post mentioned above under the Git project's license.

I strongly suggest that you adapt it to make it fit better with the rest
of the Git documentation, and because having verbatim copies of the same
text in two places seems a little bit silly. But that's up to you.

Signed-off-by: Michael Haggerty 

Michael

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Fix prune/gc problem with multiple worktrees

2016-06-02 Thread Michael Haggerty
On 06/02/2016 11:53 AM, Duy Nguyen wrote:
> (from patch 4/4 mail)
> 
> On Wed, Jun 1, 2016 at 10:51 PM, Michael Haggerty  
> wrote:
>>> + path = xstrdup(worktree_git_path(wt, "logs/refs/bisect"));
>>> + if (file_exists(path))
>>> + handle_one_reflog(path, NULL, 0, );
>>> + free(path);
>>> +}
>>
>> `refs/bisect` is not a single reference. It is a namespace that contains
>> references with names like `refs/bisect/bad` and
>> `refs/bisect/good-66106691a1b71e445fe5e4d6b8b043dffc7dfe4c`.
> 
> Yeah I missed that. I'm not going to write another directory walker to
> collect all logs/refs/bisect/*. I didn't add pending objects for
> refs/bisect/* of other worktrees either. At that point waiting for the
> new ref iterator makes more sense...
> 
> On Wed, Jun 1, 2016 at 11:06 PM, Junio C Hamano  wrote:
>> Nguyễn Thái Ngọc Duy   writes:
>>
>>> This series makes sure that objects referenced by all worktrees are
>>> marked reachable so that we don't accidentally delete objects that are
>>> being used. Previously per-worktree references in index, detached HEAD
>>> or per-worktree reflogs come from current worktree only, not all
>>> worktrees.
>>
>> I'll let this topic simmer on the list for now, instead of picking
>> it up immediately to 'pu', as Michael in $gmane/296068 makes me
>> wonder if we want to keep piling on the current "worktree ref
>> iterations are bolted on" or if we want to first clean it up, whose
>> natural fallout hopefully would eliminate the bug away.
> 
> So what should be the way forward? My intention was having something
> that can fix the problem for now, even if a bit hacky while waiting
> for ref iterator to be ready, then convert to use it and clean things
> up, because I don't how long ref-iterator would take and losing data
> is serous enough that I'd like to fix it soon. If we go with "fix soon
> then convert to ref-iterator later", then I will drop the
> logs/bisect/* check, checking logs/HEAD alone should be good enough.
> Otherwise I'll prepare a series on top of ref-iterator.

Fixing reachability via the index and detached HEADs feels relatively
important.

Fixing refs/bisect/* feels a bit less urgent, because (1) usually
bisections don't take very long, and (2) if the commits that one is
bisecting are not otherwise reachable anymore, then the bisection is
probably not interesting anymore. However, these are real references, so
if they get broken, the worktree will be seen as corrupt and recovery is
not super obvious (I guess most people would end up deleting the whole
worktree).

Fixing the reflogs for HEAD and (especially) refs/bisect/* in worktrees
feels even less important, because reflogs are not nearly as important
as current ref values, Git is relatively tolerant of broken reflog
entries, and there are easy ways to prune them if breakage occurs.

It's hard for me to predict when the ref-iterator stuff will be merged.
It is a big change, but so far the feedback seems pretty good. I can
tell you that pushing it and ref-stores forward is high on my priority list.

Michael

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] strbuf: improve API

2016-06-02 Thread Michael Haggerty
On 06/01/2016 11:07 PM, Jeff King wrote:
> On Wed, Jun 01, 2016 at 03:42:18AM -0400, Jeff King wrote:
> 
>> I have no idea if those ideas would work. But I wouldn't want to start
>> looking into either of them without some idea of how much time we're
>> actually spending on strbuf mallocs (or how much time we would spend if
>> strbufs were used in some proposed sites).
> 
> So I tried to come up with some numbers.
> 
> Here's an utterly silly use of strbufs, but one that I think should
> over-emphasize the effect of any improvements we make:
> 
> diff --git a/Makefile b/Makefile
> index 7a0551a..72b968a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -579,6 +579,7 @@ PROGRAM_OBJS += shell.o
>  PROGRAM_OBJS += show-index.o
>  PROGRAM_OBJS += upload-pack.o
>  PROGRAM_OBJS += remote-testsvn.o
> +PROGRAM_OBJS += foo.o
>  
>  # Binary suffix, set to .exe for Windows builds
>  X =
> diff --git a/foo.c b/foo.c
> index e69de29..b62dd97 100644
> --- a/foo.c
> +++ b/foo.c
> @@ -0,0 +1,18 @@
> +#include "git-compat-util.h"
> +#include "strbuf.h"
> +
> +int main(void)
> +{
> + const char *str = "this is a string that we'll repeatedly insert";
> + size_t len = strlen(str);
> +
> + int i;
> + for (i = 0; i < 100; i++) {
> + struct strbuf buf = STRBUF_INIT;
> + int j;
> + for (j = 0; j < 500; j++)
> + strbuf_add(, str, len);
> + strbuf_release();
> + }
> + return 0;
> +}

Thanks for generating actual data.

Your benchmark could do two things to stress malloc()/free()
even more. I'm not claiming that this makes the benchmark more typical
of real-world use, but it maybe gets us closer to the theoretical upper
limit on improvement.

1. Since strbuf_grow() allocates new space in geometrically increasing
sizes, the number of mallocs needed to do N strbuf_add()s increases only
like log(N). So decreasing the "500" would increase the fraction of the
time spent on allocations.

2. Since the size of the string being appended increases the time spent
copying bytes, without appreciably changing the number of allocations
(also because of geometric growth), decreasing the length of the string
would also increase the fraction of the time spent on allocations.

I put those together as options, programmed several variants of the
string-concatenating loop, and added a perf script to run them; you can
see the full patch here:


https://github.com/mhagger/git/commit/b417935a4425e0f2bf62e59a924dc652bb2eae0c

The guts look like this:

>   int j;
>   if (variant == 0) {
>   /* Use buffer allocated a single time */
>   char *buf = big_constant_lifetime_buf;
> 
>   for (j = 0; j < reps; j++)
>   strcpy(buf + j * len, str);
>   } else if (variant == 1) {
>   /* One correct-sized buffer malloc per iteration */
>   char *buf = xmalloc(reps * len + 1);
> 
>   for (j = 0; j < reps; j++)
>   strcpy(buf + j * len, str);
> 
>   free(buf);
>   } else if (variant == 2) {
>   /* Conventional use of strbuf */
>   struct strbuf buf = STRBUF_INIT;
> 
>   for (j = 0; j < reps; j++)
>   strbuf_add(, str, len);
> 
>   strbuf_release();
>   } else if (variant == 3) {
>   /* strbuf initialized to correct size */
>   struct strbuf buf;
>   strbuf_init(, reps * len);
> 
>   for (j = 0; j < reps; j++)
>   strbuf_add(, str, len);
> 
>   strbuf_release();
>   } else if (variant == 4) {
>   /*
>* Simulated fixed strbuf with correct size.
>* This code only works because we know how
>* strbuf works internally, namely that it
>* will never realloc() or free() the buffer
>* that we attach to it.
>*/
>   struct strbuf buf = STRBUF_INIT;
>   strbuf_attach(, big_constant_lifetime_buf, 0, reps * len + 
> 1);
> 
>   for (j = 0; j < reps; j++)
>   strbuf_add(, str, len);
> 
>   /* No strbuf_release() here! */
>   }

I ran this for a short string ("short") and a long string ("this is a
string that we will repeatedly insert"), and also concatenating the
string 5, 20, or 500 times. The number of loops around the whole program
is normalized to make the total number of concatenations approximately
constant. Here are the full results:


   time (s)
Test   0  1  2  3  4

5 short strings 1.64   3.37   8.72   6.08   3.65
20 short strings1.72   2.12   5.43   4.01   3.39
500 short strings   1.62   1.61   3.36   3.26   3.10
5 long strings  2.08   6.64  13.09   8.50   3.79
20 long strings 2.16   3.33   7.03 

Re: [PATCH 09/13] refs: introduce an iterator interface

2016-06-02 Thread Duy Nguyen
On Mon, May 30, 2016 at 2:55 PM, Michael Haggerty  wrote:
> Currently, the API for iterating over references is via a family of
> for_each_ref()-type functions that invoke a callback function for each
> selected reference. All of these eventually call do_for_each_ref(),
> which knows how to do one thing: iterate in parallel through two
> ref_caches, one for loose and one for packed refs, giving loose
> references precedence over packed refs. This is rather complicated code,
> and is quite specialized to the files backend. It also requires callers
> to encapsulate their work into a callback function, which often means
> that they have to define and use a "cb_data" struct to manage their
> context.
>
> The current design is already bursting at the seams, and will become
> even more awkward in the upcoming world of multiple reference storage
> backends:
>
> * Per-worktree vs. shared references are currently handled via a kludge
>   in git_path() rather than iterating over each part of the reference
>   namespace separately and merging the results. This kludge will cease
>   to work when we have multiple reference storage backends.

Question from a refs user. Right now worktree.c:get_worktrees() peeks
directly to "$GIT_DIR/worktrees/xxx/HEAD" and parses the content
itself, something that I  promised to fix but never got around to do
it. Will we have an iterator to go through all worktrees' HEAD, or
will  there be an API to say "resolve ref HEAD from worktree XYZ"?
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Fix prune/gc problem with multiple worktrees

2016-06-02 Thread Duy Nguyen
(from patch 4/4 mail)

On Wed, Jun 1, 2016 at 10:51 PM, Michael Haggerty  wrote:
>> + path = xstrdup(worktree_git_path(wt, "logs/refs/bisect"));
>> + if (file_exists(path))
>> + handle_one_reflog(path, NULL, 0, );
>> + free(path);
>> +}
>
> `refs/bisect` is not a single reference. It is a namespace that contains
> references with names like `refs/bisect/bad` and
> `refs/bisect/good-66106691a1b71e445fe5e4d6b8b043dffc7dfe4c`.

Yeah I missed that. I'm not going to write another directory walker to
collect all logs/refs/bisect/*. I didn't add pending objects for
refs/bisect/* of other worktrees either. At that point waiting for the
new ref iterator makes more sense...

On Wed, Jun 1, 2016 at 11:06 PM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> This series makes sure that objects referenced by all worktrees are
>> marked reachable so that we don't accidentally delete objects that are
>> being used. Previously per-worktree references in index, detached HEAD
>> or per-worktree reflogs come from current worktree only, not all
>> worktrees.
>
> I'll let this topic simmer on the list for now, instead of picking
> it up immediately to 'pu', as Michael in $gmane/296068 makes me
> wonder if we want to keep piling on the current "worktree ref
> iterations are bolted on" or if we want to first clean it up, whose
> natural fallout hopefully would eliminate the bug away.

So what should be the way forward? My intention was having something
that can fix the problem for now, even if a bit hacky while waiting
for ref iterator to be ready, then convert to use it and clean things
up, because I don't how long ref-iterator would take and losing data
is serous enough that I'd like to fix it soon. If we go with "fix soon
then convert to ref-iterator later", then I will drop the
logs/bisect/* check, checking logs/HEAD alone should be good enough.
Otherwise I'll prepare a series on top of ref-iterator.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/6] worktree.c: find_worktree() learns to identify worktrees by basename

2016-06-02 Thread Duy Nguyen
On Thu, Jun 2, 2016 at 1:44 AM, Junio C Hamano  wrote:
>> We would
>> need to convert or match both '/' and '\' in "to/foo" case because of
>> Windows, so it's not much easier than basename().
>
> I never said "easier to implement".  But can this codepath get
> backslashed paths in the first place?  I somehow thought that
> normalization would happen a lot before the control reaches here.
>
> You'll be calling into fspathcmp() anyway; shouldn't the function
> know that '/' and '\' are equivalent on some platforms, or is it
> legal to only call fspathcmp() on a single path component without
> directory separator?

We still need to calculate the length to compare, which could be
problematic when utf-8 is involved, or some other encoding. If we
always split at '/' boundary though (e.g. "abc/def/ghi", "def/ghi" or
"ghi" but never "ef/ghi") then it should be ok.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] reachable.c: mark reachable objects in index from all worktrees

2016-06-02 Thread Duy Nguyen
On Thu, Jun 2, 2016 at 1:57 AM, David Turner  wrote:
>> + struct index_state istate;
>> +
>> + memset(, 0, sizeof(istate));
>
>
> Why not just struct index_state istate = {0}; ?
>

My first thought was.. "hmm.. C99?" but then there are 24 of them in
the code base already. Will change.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] reachable.c: mark reachable objects in index from all worktrees

2016-06-02 Thread Duy Nguyen
On Thu, Jun 2, 2016 at 1:13 AM, Eric Sunshine  wrote:
> On Wed, Jun 1, 2016 at 6:45 AM, Nguyễn Thái Ngọc Duy  
> wrote:
>> Current mark_reachable_objects() only marks objects from index from
>> _current_ worktree as reachable instead of all worktrees. Because this
>> function is used for pruning, there is a chance that objects referenced
>> by other worktrees may be deleted. Fix that.
>>
>> Small behavior change in "one worktree" case, the index is read again
>> from file. In the current implementation, if the_index is already
>> loaded, the index file will not be read from file again. This adds some
>> more cost to this operation, hopefully insignificant because
>> reachability test is usually very expensive already.
>
> Could this extra index read be avoided by taking advantage of 'struct
> worktree::is_current'[1] and passing the already-loaded index to
> add_index_objects_to_pending() if true?
>
> Or, am I misunderstanding the issue?
>
> [1]: http://article.gmane.org/gmane.comp.version-control.git/292194

Hah! I broke my worktree changes into many pieces and lost track of
them. I thought that one was still in some unsent series. Will use it.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] pretty: support "mboxrd" output format

2016-06-02 Thread Eric Wong
Eric Wong  wrote:
> Eric Sunshine  wrote:
> > On Tue, May 31, 2016 at 3:45 AM, Eric Wong  wrote:
> > > Eric Sunshine  wrote:
> > >> I wonder if hand-coding, rather than using a regex, could be an 
> > >> improvement:
> > >>
> > >> static int is_mboxrd_from(const char *s, size_t n)
> > >> {
> > >> size_t f = strlen("From ");
> > >> const char *t = s + n;
> > >>
> > >> while (s < t && *s == '>')
> > >> s++;
> > >> return t - s >= f && !memcmp(s, "From ", f);
> > >> }
> > >>
> > >> or something.
> > >
> > > Yikes.  I mostly work in high-level languages and do my best to
> > > avoid string parsing in C; so that scares me.  A lot.
> > 
> > The hand-coded is_mboxrd_from() above is pretty much idiomatic C and
> > (I think) typical of how such a function would be coded in Git itself,
> > so it looks normal and easy to grok to me (but, of course, I'm
> > probably biased since I wrote it).

For reference, here is the gfrom function from qmail (gfrom.c,
source package netqmail-1.06 in Debian, reformatted git style)

int gfrom(char *s, int len)
{
while ((len > 0) && (*s == '>')) {
++s;
--len;
}

return (len >= 5) && !str_diffn(s, "From ", 5);
}

Similar to yours, but a several small things improves
readability for me:

* the avoidance of subtraction from the "return" conditional
* s/n/len/ variable name
* extra parentheses
* removal of "t" variable (t for "terminal/termination"?)

str_diffn is memcmp-like, I assume.  My eyes glazed over
when I saw that function implemented in str_diffn.c, too.

Just thinking out loud, with sufficient tests I could go with
either.  Will reroll when/if I get the chance tomorrow.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html