Re: [PATCH 00/17] Clean up how fetch_pack() handles the heads list

2012-09-02 Thread Michael Haggerty
On 08/23/2012 10:31 PM, Jeff King wrote:
 On Thu, Aug 23, 2012 at 03:56:48PM -0400, Jeff King wrote:
 
 This code blames back to:

  commit 4bcb310c2539b66d535e87508d1b7a90fe29c083
  Author: Alexandre Julliard julli...@winehq.org
  Date:   Fri Nov 24 16:00:13 2006 +0100

 fetch-pack: Do not fetch tags for shallow clones.

 A better fix may be to only fetch tags that point to commits that we
 are downloading, but git-clone doesn't have support for following
 tags. This will happen automatically on the next git-fetch though.

 So it is about making sure that git clone --depth=1 does not
 accidentally pull a single commit from v1.0, v1.1, v1.2, and so forth,
 losing the purpose of using --depth in the first place. These days it is
 largely irrelevant, since this code path is not followed by clone, and
 clone will automatically restrict its list of fetched refs to a single
 branch if --depth is used.
 
 I think part of the confusion of this code is that inside the loop over
 the refs it is sometimes checking aspects of the ref, and sometimes
 checking invariants of the loop (like args.fetch_all). Splitting it into
 separate loops makes it easier to see what is going on, like the patch
 below (on top of Michael's series).
 
 I'm not sure if it ends up being more readable, since the generic cut
 down this linked list function has to operate through callbacks with a
 void pointer. On the other hand, that function could also be used
 elsewhere.
 
 diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
 index 90683ca..877cf38 100644
 --- a/builtin/fetch-pack.c
 +++ b/builtin/fetch-pack.c
 @@ -521,51 +521,80 @@ static void mark_recent_complete_commits(unsigned long 
 cutoff)
   }
  }
  
 -static void filter_refs(struct ref **refs, int *nr_heads, char **heads)
 +static void filter_refs_callback(struct ref **refs,
 +  int (*want)(struct ref *, void *),
 +  void *data)
  {
 - struct ref *newlist = NULL;
 - struct ref **newtail = newlist;
 + struct ref **tail = refs;
   struct ref *ref, *next;
 - int match_pos = 0, unmatched = 0;
  
   for (ref = *refs; ref; ref = next) {
 - int keep_ref = 0;
   next = ref-next;
 - if (!memcmp(ref-name, refs/, 5) 
 - check_refname_format(ref-name, 0))
 - ; /* trash */
 - else if (args.fetch_all 
 -(!args.depth || prefixcmp(ref-name, refs/tags/)))
 - keep_ref = 1;
 - else
 - while (match_pos  *nr_heads) {
 - int cmp = strcmp(ref-name, heads[match_pos]);
 - if (cmp  0) { /* definitely do not have it */
 - break;
 - } else if (cmp == 0) { /* definitely have it */
 - free(heads[match_pos++]);
 - keep_ref = 1;
 - break;
 - } else { /* might have it; keep looking */
 - heads[unmatched++] = heads[match_pos++];
 - }
 - }
 -
 - if (keep_ref) {
 - *newtail = ref;
 - ref-next = NULL;
 - newtail = ref-next;
 - } else {
 + if (want(ref, data))
 + tail = ref-next;
 + else {
   free(ref);
 + *tail = next;
   }
   }
 +}
  
 - /* copy any remaining unmatched heads: */
 - while (match_pos  *nr_heads)
 - heads[unmatched++] = heads[match_pos++];
 - *nr_heads = unmatched;
 +static int ref_name_is_ok(struct ref *ref, void *data)
 +{
 + return memcmp(ref-name, refs/, 5) ||
 + !check_refname_format(ref-name, 0);
 +}
 +
 +static int ref_ok_for_shallow(struct ref *ref, void *data)
 +{
 + return prefixcmp(ref-name, refs/tags/);
 +}
  
 - *refs = newlist;
 +struct filter_by_name_data {
 + char **heads;
 + int nr_heads;
 + int match_pos;
 + int unmatched;
 +};
 +
 +static int want_ref_name(struct ref *ref, void *data)
 +{
 + struct filter_by_name_data *f = data;
 +
 + while (f-match_pos  f-nr_heads) {
 + int cmp = strcmp(ref-name, f-heads[f-match_pos]);
 + if (cmp  0) /* definitely do not have it */
 + return 0;
 + else if (cmp == 0) { /* definitely have it */
 + free(f-heads[f-match_pos++]);
 + return 1;
 + } else /* might have it; keep looking */
 + f-heads[f-unmatched++] = f-heads[f-match_pos++];
 + }
 + return 0;
 +}
 +
 +static void filter_refs(struct ref **refs, int *nr_heads, char **heads)
 +{
 + struct filter_by_name_data f;
 +

Re: [PATCH 9/9] Add git-check-ignores

2012-09-02 Thread Adam Spiers
Hi there,

Firstly, thanks for the quick feedback!

On Sun, Sep 2, 2012 at 11:41 AM, Nguyen Thai Ngoc Duy pclo...@gmail.com wrote:
 On Sun, Sep 2, 2012 at 7:12 AM, Adam Spiers g...@adamspiers.org wrote:
 This works in a similar manner to git-check-attr.  Some code
 was reused from add.c by refactoring out into pathspec.c.

 Thanks, comments from a quick glance. First of all, can we make it
 work (or share code) with .gitattributes? We may need to debug
 .gitattributes as well as .gitignore. A common command would be nice.

I'm no expert on .gitattributes and check-attr, but AFAICS, all the
opportunities to share code in the plumbing and front-end seem to be
taken already, e.g. the directory traversal and path handling.  The
CLI argument parsing is necessarily different because check-attr
requires a list of attributes as well as a list of files, and of
course the output routines have to be different too.

The only opportunity for code reuse which I saw but /didn't/ take was
around the --stdin line parsing code which is duplicated between:

check_attr_stdin_paths
check_ignore_stdin_paths
cmd_checkout_index
cmd_update_index
hash_stdin_paths

I attempted to refactor these, but quickly realised that due to the
lack of proper closures in C, the overheads and complexity incurred by
performing such a refactoring probably outweighed the benefits, so I
gave up on the idea.

Having said that, I'm totally open to suggestions if you can spot
other places where code could be reused :)

 +SYNOPSIS
 +
 +[verse]
 +'git check-ignore' pathname...
 +'git check-ignore' --stdin [-z]  list-of-paths

 Also --quiet option, where check-ignore returns 0 if the given path is
 ignored, 1 otherwise?

I considered that, but couldn't think of appropriate behaviour when
multiple paths are given, so in the end I decided to remain consistent
with check-attr, which always returns 0.  But I'm happy to change it
if you can think of a more useful behaviour.  For example we could
have a --count option which produces no output but has an exit status
corresponding to the number of ignored files.

  - If many paths are given, then perhaps we could print ignored paths
 (no extra info).

How is this different to git ls-files -i -o ?

  - Going to the next level, we could print path and the the location
 of the final exclude/include rule (file and line number).

That's the current behaviour, and I believe it covers the most common
use case.

  - For debugging, given one path, we print all the rules that are
 applied to it, which may help understand how/why it goes wrong.

That would be nice, but I'm not sure it's a tremendously common use
case.  Could you think of a scenario in which it would be useful?  I
guess it could be done by adding a new DIR_DEBUG_IGNORED flag to
dir_struct which would make the exclude matcher functions collect all
matching patterns, rather than just returning the first one.  This in
turn would require another field for collecting all matched patterns.

 @@ -338,6 +338,7 @@ static void handle_internal_command(int argc, const char 
 **argv)
 { bundle, cmd_bundle, RUN_SETUP_GENTLY },
 { cat-file, cmd_cat_file, RUN_SETUP },
 { check-attr, cmd_check_attr, RUN_SETUP },
 +   { check-ignore, cmd_check_ignore, RUN_SETUP | 
 NEED_WORK_TREE },
 { check-ref-format, cmd_check_ref_format },
 { checkout, cmd_checkout, RUN_SETUP | NEED_WORK_TREE },
 { checkout-index, cmd_checkout_index,

 I don't think we really need NEED_WORK_TREE here. .gitignore can be
 read from index only.

I thought about that, but in the end I decided it probably didn't make
sense, because none of the exclude matching routines match against the
index - they all match against the working tree and core.excludesfile.
This would also require changing the matching logic to honor the index,
but I didn't see the benefit in doing that, since all operations which
involve excludes (add, status, etc.) relate to a work tree.

But as with all of the above, please don't hesitate to point out if
I've missed something.  You guys are the experts, not me ;-)

Thanks again,
Adam
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/9] For each exclude pattern, store information about where it came from

2012-09-02 Thread Philip Oakley

From: Adam Spiers g...@adamspiers.org
Sent: Sunday, September 02, 2012 1:12 AM
Subject: [PATCH 6/9] For each exclude pattern, store information about 
where it came from



For exclude patterns read in from files, the filename is stored 
together

with the corresponding line number (counting starting at 1).


Is there a way to identify the config core.excludesfile if present? i.e. 
that it is from that config variable, rather than directory traversal.


This came up in a recent 
http://stackoverflow.com/questions/12199150/effective-gitignore-and-clean-strategy/12205852#12205852 
discussion.




For exclude patterns provided on the command line, the sequence number
is negative, with counting starting at -1, so for example the 2nd
pattern provided via --exclude would be numbered -2.  This allows any
future consumers of that data to easily distinguish between exclude
patterns from files vs. from the CLI.

Signed-off-by: Adam Spiers g...@adamspiers.org
---
builtin/clean.c|  2 +-
builtin/ls-files.c |  3 ++-
dir.c  | 25 +++--
dir.h  |  5 -
4 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 0c7b3d0..f618231 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -99,7 +99,7 @@ int cmd_clean(int argc, const char **argv, const 
char *prefix)


 for (i = 0; i  exclude_list.nr; i++)
 add_exclude(exclude_list.items[i].string, , 0,
- dir.exclude_list[EXC_CMDL]);
+ dir.exclude_list[EXC_CMDL], --exclude option, -(i+1));

 pathspec = get_pathspec(prefix, argv);

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 31b3f2d..420ff40 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -35,6 +35,7 @@ static int error_unmatch;
static char *ps_matched;
static const char *with_tree;
static int exc_given;
+static int exclude_args = 0;

static const char *tag_cached = ;
static const char *tag_unmerged = ;
@@ -423,7 +424,7 @@ static int option_parse_exclude(const struct 
option *opt,

 struct exclude_list *list = opt-value;

 exc_given = 1;
- add_exclude(arg, , 0, list);
+ add_exclude(arg, , 0, list, --exclude option, --exclude_args);

 return 0;
}
diff --git a/dir.c b/dir.c
index 3d438c3..ac8c838 100644
--- a/dir.c
+++ b/dir.c
@@ -311,7 +311,7 @@ static int no_wildcard(const char *string)
}

void add_exclude(const char *string, const char *base,
- int baselen, struct exclude_list *el)
+ int baselen, struct exclude_list *el, const char *src, int srcpos)
{
 struct exclude *x;
 size_t len;
@@ -341,6 +341,8 @@ void add_exclude(const char *string, const char 
*base,

 x-base = base;
 x-baselen = baselen;
 x-flags = flags;
+ x-src = src;
+ x-srcpos = srcpos;
 if (!strchr(string, '/'))
 x-flags |= EXC_FLAG_NODIR;
 x-nowildcardlen = simple_length(string);
@@ -393,7 +395,7 @@ int add_excludes_from_file_to_list(const char 
*fname,

int check_index)
{
 struct stat st;
- int fd, i;
+ int fd, i, lineno = 1;
 size_t size = 0;
 char *buf, *entry;

@@ -436,8 +438,9 @@ int add_excludes_from_file_to_list(const char 
*fname,

 if (buf[i] == '\n') {
 if (entry != buf + i  entry[0] != '#') {
 buf[i - (i  buf[i-1] == '\r')] = 0;
- add_exclude(entry, base, baselen, el);
+ add_exclude(entry, base, baselen, el, fname, lineno);
 }
+ lineno++;
 entry = buf + i + 1;
 }
 }
@@ -472,8 +475,10 @@ static void prep_exclude(struct dir_struct *dir, 
const char *base, int baselen)

 !strncmp(dir-basebuf, base, stk-baselen))
 break;
 dir-exclude_stack = stk-prev;
- while (stk-exclude_ix  el-nr)
- free(el-excludes[--el-nr]);
+ while (stk-exclude_ix  el-nr) {
+ struct exclude *exclude = el-excludes[--el-nr];
+ free(exclude);
+ }
 free(stk-filebuf);
 free(stk);
 }
@@ -500,7 +505,15 @@ static void prep_exclude(struct dir_struct *dir, 
const char *base, int baselen)

 memcpy(dir-basebuf + current, base + current,
stk-baselen - current);
 strcpy(dir-basebuf + stk-baselen, dir-exclude_per_dir);
- add_excludes_from_file_to_list(dir-basebuf,
+
+ /* dir-basebuf gets reused by the traversal, but we
+ * need fname to remain unchanged to ensure the src
+ * member of each struct exclude correctly back-references
+ * its source file.
+ */
+ char *fname = strdup(dir-basebuf);
+
+ add_excludes_from_file_to_list(fname,
dir-basebuf, stk-baselen,
stk-filebuf, el, 1);
 dir-exclude_stack = stk;
diff --git a/dir.h b/dir.h
index 1b4f9dc..81efee4 100644
--- a/dir.h
+++ b/dir.h
@@ -31,6 +31,9 @@ struct exclude_list {
 int baselen;
 int to_exclude;
 int flags;
+ const char *src;
+ int srcpos; /* counting starts from 1 for line numbers in ignore 
files,

+and from -1 decrementing for patterns from CLI (--exclude) */
 } **excludes;
};

@@ -123,7 +126,7 @@ extern int add_excludes_from_file_to_list(const 
char *fname, const char *base, i

   char **buf_p, struct exclude_list *el, int check_index);
extern void add_excludes_from_file(struct dir_struct *, const char 
*fname);

extern void add_exclude(const char *string, const char *base,

Re: [PATCH v4] Thunderbird: fix appp.sh format problems

2012-09-02 Thread Junio C Hamano
Johannes Sixt j...@kdbg.org writes:

 Am 31.08.2012 16:09, schrieb Marco Stornelli:
 +CCS=`perl -e 'local $/=undef; open FILE, $ENV{'PATCHTMP'}; $text=FILE;
 +close FILE; $addr = $1 if $text =~ /Cc: (.*?(,\n .*?)*)\n/s; $addr =~ 
 s/\n//g;
 +print $addr;'`

 The quoting is broken in this line (sq within sq does not work).

As you said later in the thread, perl lets you be loose and say
$ENV{PATCHTMP} without quoting the string PATCHTMP, so it is not
quite _broken_ per-se, but the above gives a false impression to
readers that the author meant to feed perl

open FILE, $ENV{'PATCHTMP'};

which is not happening, so at least it is misleading.

 Am I correct that you intend to treat continuation lines with this
 non-trivial perl script?

As the above regexp seems to try to match

Cc: marco, git,
  j6t, other recipient

I think that indeed is the intent.

 All this processing gets tedious and
 unreadable. Let me suggest this pattern instead:

 # translate to Thunderbird language
 LANG_TO=To:
 LANG_SUBJ=Subject:
 LANG_CC=Cc:

 LF=   # terminates the _previous_ line
 while read -r line
 do
   case $line in
   'To: '*)
   printf ${LF}%s $LANG_TO ${line#To: }
   ;;
   'Cc: '*) ...similar...
   'Subject: '*) ...similar...
   ' '*)   # continuation line
   printf %s $line
   ;;
   '')
   print ${LF}\n
   cat
   ;;
   esac
   LF='\n'
 done $PATCH $1

I think that is much more readable.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] Thunderbird: fix appp.sh format problems

2012-09-02 Thread Junio C Hamano
Marco Stornelli marco.storne...@gmail.com writes:

 I also wonder what would happen if To: and Cc: in the input were
 split into continuation lines, but that was already present in the

 Do you mean To: mail1,.mailN\nCc: mail1,.mailN?

No, I meant To: mail1,...\n mailN\n.

But see my response to J6t's message.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/9] For each exclude pattern, store information about where it came from

2012-09-02 Thread Junio C Hamano
Philip Oakley philipoak...@iee.org writes:

 Is there a way to identify the config core.excludesfile if present?
 i.e. that it is from that config variable, rather than directory
 traversal.

If the code handles $GIT_DIR/info/exclude then that configuration
would also be handled the same way, no?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] Thunderbird: fix appp.sh format problems

2012-09-02 Thread Junio C Hamano
Marco Stornelli marco.storne...@gmail.com writes:

 Il 01/09/2012 15:59, Johannes Sixt ha scritto:

 Look how you write:

perl -e '... $ENV{'PATCHTMP'} ...'

 That is, perl actually sees this script:

... $ENV{PATCHTMP} ...

 (no quotes around PATCHTMP). That my be perfectly valid perl, but is not
 what you intended.

 I don't understand what you mean when you say is not what you
 intended...

When you wrote this:

CCS=`perl -e 'local $/=undef; open FILE, $ENV{'PATCHTMP'};...

which one of the following did you mean to feed Perl?

 (1) open FILE, $ENV{'PATCHTMP'};
 (2) open FILE, $ENV{PATCHTMP};

The patch text makes it look as if you wanted to do (1), but what is
fed to perl is (2), as Johannes points out.

Saying:

CCS=`perl -e 'local $/=undef; open FILE, $ENV{PATCHTMP};...

would have been more natural if you really meant to do (2), don't
you think?  So what you wrote is at least misleading.

But I think I agree with Johannes's rewrite of the loop, so this may
be a moot point.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] new git check-ignore sub-command

2012-09-02 Thread Junio C Hamano
Adam Spiers g...@adamspiers.org writes:

 I was browsing stackoverflow the other day and came across this question:

 
 http://stackoverflow.com/questions/12144633/which-gitignore-rule-is-ignoring-my-file/

 A quick google revealed this thread from 2009:

 http://thread.gmane.org/gmane.comp.version-control.git/108671/focus=108815

 where Junio and Jeff discussed the possibility of adding a new `git
 check-ignore' subcommand somewhat analogous to the existing `git
 check-attr', and suggested the beginnings of an implementation.  It
 struck me that it might not be too hard to follow these ideas to their
 natural conclusion, so I decided it would make a fun project :-)

Thanks.  I wish there are more people like you ;-)

As to styles, I spotted only three kinds of Huh?:

 * do not initialise statics to 0 or NULL, e.g.

-static int exclude_args = 0;
+static int exclude_args;

 * avoid unnnecessary braces {} around single statement blocks, e.g.

-if (exclude) {
+if (exclude)
return exclude;
-}

 * else should follow close brace '}' of if clause, e.g.

 if (...) {
 ...
-}
-else {
+} else {
 ...

For reviews on substance, please see other messages from me.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] Rename cryptic 'which' variable to more consistent name

2012-09-02 Thread Junio C Hamano
Adam Spiers g...@adamspiers.org writes:

 'el' is only *slightly* less cryptic, but is already used as the
 variable name for a struct exclude_list pointer in numerous other
 places, so this reduces the number of cryptic variable names in use by
 one :-)

The name originally meant to mean to which element of the array
dir_struct.exclude_list[] are we adding this entry? but I agree
el that stands for ExcludeList would be a better name.

Often we use el (or elem) for elements of an iterable we are
iterating on in a loop, and the name of the iterable does not have
to be EsomethingLsomething, by the way.  Because no existing use of
el in this file is of that kind, I do not think this change
introduces new confusion to the code.

 Signed-off-by: Adam Spiers g...@adamspiers.org
 ---
  dir.c | 10 +-
  dir.h |  4 ++--
  2 files changed, 7 insertions(+), 7 deletions(-)

 diff --git a/dir.c b/dir.c
 index c9f341a..57a5d11 100644
 --- a/dir.c
 +++ b/dir.c
 @@ -311,7 +311,7 @@ static int no_wildcard(const char *string)
  }
  
  void add_exclude(const char *string, const char *base,
 -  int baselen, struct exclude_list *which)
 +  int baselen, struct exclude_list *el)
  {
   struct exclude *x;
   size_t len;
 @@ -346,8 +346,8 @@ void add_exclude(const char *string, const char *base,
   x-nowildcardlen = simple_length(string);
   if (*string == '*'  no_wildcard(string+1))
   x-flags |= EXC_FLAG_ENDSWITH;
 - ALLOC_GROW(which-excludes, which-nr + 1, which-alloc);
 - which-excludes[which-nr++] = x;
 + ALLOC_GROW(el-excludes, el-nr + 1, el-alloc);
 + el-excludes[el-nr++] = x;
  }
  
  static void *read_skip_worktree_file_from_index(const char *path, size_t 
 *size)
 @@ -389,7 +389,7 @@ int add_excludes_from_file_to_list(const char *fname,
  const char *base,
  int baselen,
  char **buf_p,
 -struct exclude_list *which,
 +struct exclude_list *el,
  int check_index)
  {
   struct stat st;
 @@ -436,7 +436,7 @@ int add_excludes_from_file_to_list(const char *fname,
   if (buf[i] == '\n') {
   if (entry != buf + i  entry[0] != '#') {
   buf[i - (i  buf[i-1] == '\r')] = 0;
 - add_exclude(entry, base, baselen, which);
 + add_exclude(entry, base, baselen, el);
   }
   entry = buf + i + 1;
   }
 diff --git a/dir.h b/dir.h
 index a226fbc..549a187 100644
 --- a/dir.h
 +++ b/dir.h
 @@ -117,10 +117,10 @@ extern int path_excluded(struct path_exclude_check *, 
 const char *, int namelen,
  
  
  extern int add_excludes_from_file_to_list(const char *fname, const char 
 *base, int baselen,
 -   char **buf_p, struct exclude_list 
 *which, int check_index);
 +   char **buf_p, struct exclude_list 
 *el, int check_index);
  extern void add_excludes_from_file(struct dir_struct *, const char *fname);
  extern void add_exclude(const char *string, const char *base,
 - int baselen, struct exclude_list *which);
 + int baselen, struct exclude_list *el);
  extern void free_excludes(struct exclude_list *el);
  extern int file_exists(const char *);
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] Add git-check-ignores

2012-09-02 Thread Junio C Hamano
Adam Spiers g...@adamspiers.org writes:

 +SYNOPSIS
 +
 +[verse]
 +'git check-ignore' pathname...
 +'git check-ignore' --stdin [-z]  list-of-paths

 Also --quiet option, where check-ignore returns 0 if the given path is
 ignored, 1 otherwise?

I agree that multiple paths are problematic.

We could error out if multiple paths are given with --quiet until we
figure out what the useful result would be in such a case, and still
give a useful answer to callers that feed a single path, though.

That may encourage suboptimal coding to casual Porcelain writers,
i.e. it would allow

for path in $paths
do
if git check-ignore -q $path
then
do something to $path
fi
done

even though we would rather want to encourage

git check-ignore --name-only $paths |
while read path
do
do something to $path
done

But from lay-scriptors' point of view, being able to easily write a
script (even though it may be inefficient) to do the job at hand is
far better than having to give up writing one because the tool does
not allow easy-and-stupid scripting, so it is not exactly a huge
downside.

  - If many paths are given, then perhaps we could print ignored paths
 (no extra info).

 How is this different to git ls-files -i -o ?

I personally think the parts of ls-files that deal with paths not in
the index outlived its usefulness ;-) and users deserve to be given
a better UI.

  - Going to the next level, we could print path and the the location
 of the final exclude/include rule (file and line number).

 That's the current behaviour, and I believe it covers the most common
 use case.

Yes; I have a reservation on your output format, though.

  - For debugging, given one path, we print all the rules that are
 applied to it, which may help understand how/why it goes wrong.

I do not think that would be terribly useful.  Maybe for people who
are learning how dir.c internally works, but not for people who are
trying to improve the set of .gitignore files in their project.

 I thought about that, but in the end I decided it probably didn't make
 sense, because none of the exclude matching routines match against the
 index - they all match against the working tree and core.excludesfile.
 This would also require changing the matching logic to honor the index,
 but I didn't see the benefit in doing that, since all operations which
 involve excludes (add, status, etc.) relate to a work tree.

The mechanism primarily is to see if a path in the working tree is a
cruft or a valuable still to be added; I am OK with NEED_WORK_TREE;
when we have a useful case to run this in a bare repository, we can
lift it.

As with the what to do with multiple paths and -q, it is better to
start with feature set to cover only the known or easily anticipated
use cases, rejecting the cases for which good semantics are not
thought out.

An alternative would be a code that operates sanely only for known
or anticipated cases and do random things with irrational semantics
in other cases, and people start relying on the irrational behaviour
without realizing their input and the behaviour they are seeing are
not something that the feature is designed to for, but whatever the
code with loose precondition checking happens to do.  We do not want
to repeat that kind of mistake, which is hard to fix in future
versions.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] Add git-check-ignores

2012-09-02 Thread Junio C Hamano
Adam Spiers g...@adamspiers.org writes:

 +OPTIONS
 +---
 +--stdin::
 + Read file names from stdin instead of from the command-line.
 +
 +-z::
 + Only meaningful with `--stdin`; paths are separated with a
 + NUL character instead of a linefeed character.

On input, or on output, or both?

The answer should be both, otherwise you cannot safely handle
paths with funny character in your script, even if you wanted to.
Which means that this cannot only be meaningful with --stdin, I
think.

 +OUTPUT
 +--
 +
 +The output is a series of lines of the form:
 +
 +path COLON SP type SP pattern SP source SP position LF
 +
 +path is the path of a file being queried, type is either
 +'excluded' or 'included' (for patterns prefixed with '!'), pattern
 +is the matching pattern, source is the pattern's source file (either
 +as an absolute path or relative to the repository root), and
 +position is the position of the pattern within that source.

Let's step back a bit and think what this command is about.  What is
the reason why the user wants to run check-ignore $path in the
first place?  I think there are two (or three, depending on how you
count).

 (1) You have one (or more) paths at hand.  You want to know if it
 is (or some of them are) ignored, but you do not particularly
 care how they are ignored.  Think of implementing your own git
 add as a script.

 (2) You have one or more paths that are ignored but do not want
 them to be, and want to find out why they are.  

For the former, your script may want to see the paths sifted into
ignored bin and not-ignored bin, so

git check-ignore [-z] --name-only $paths

that gives you only the paths without any reason is more useful.
You also may want the opposite (show only paths not ignored), but
that can be computed easily by the script, so it is of lessor
importance.

For the latter, you are debugging the set of exclude sources and
want to learn where the decision to exclude it comes from.  For that
kind of use, it would be more useful if the output mimicked error
messages from the compilers and output from grep -n to show the
source, e.g.

.gitignore:13:/git-am   git-am

Emacs users can use M-x grepRETgit check-ignore -v git-amRET,
see the output, and find the hit in its output (I would imagine vim
would have a similar feature).  The output format would be something
like:

source COLON linenum COLON pattern HT pathname

I do not think you need excluded/included type as a separate item
in the non -z output, as it should be clear from the pattern.
Substitute cmdline (literally) as source for patterns obtained
from the command line.

I would also suggest to

 (1) make --name-only the default (i.e. no need to have the
 --name-only option);

 (2) give the version that mimicks grep -n when -v|--verbose is
 given; and

 (3) support --quiet; the command would exit with status 0 if
 _any_ of the paths given is ignored, or status 1 if none of the
 paths is ignored (or error out with die() when --quiet and
 multiple paths are given).

Regarding -z (for script consumption), I do not object to the
broken down format, e.g., check-ignore -z -v may give a sequence
of

pathname NUL type NUL pattern NUL source NUL position NUL

while check-ignore -z would give a sequence of

pathname NUL
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: diff/merge tool that ignores whitespace changes

2012-09-02 Thread Enrico Weigelt
Hi,

 Would that help ?
 git help diff
 [snip]
  --ignore-space-at-eol
Ignore changes in whitespace at EOL.
 
-b, --ignore-space-change
Ignore changes in amount of whitespace. This ignores
whitespace at
line end, and considers all other sequences of one or more
whitespace characters to be equivalent.
 
-w, --ignore-all-space
Ignore whitespace when comparing lines. This ignores
differences
even if one line has whitespace where the other line has
none.

That might be it :)
Now I yet need to find out how to configure tig for it.

By the way: anything similar for merge/rebase ?


thx
-- 
Mit freundlichen Grüßen / Kind regards 

Enrico Weigelt 
VNC - Virtual Network Consult GmbH 
Head Of Development 

Pariser Platz 4a, D-10117 Berlin
Tel.: +49 (30) 3464615-20
Fax: +49 (30) 3464615-59

enrico.weig...@vnc.biz; www.vnc.de 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/9] For each exclude pattern, store information about where it came from

2012-09-02 Thread Philip Oakley

From: Junio C Hamano gits...@pobox.com
Sent: Sunday, September 02, 2012 8:02 PM

Philip Oakley philipoak...@iee.org writes:


Is there a way to identify the config core.excludesfile if present?
i.e. that it is from that config variable, rather than directory
traversal.


If the code handles $GIT_DIR/info/exclude then that configuration
would also be handled the same way, no?


Probably not. The $GIT_DIR/info/exclude is directly a path, while the 
core.excludesfile could point anywhere. This assumes the path to the 
relevant ignore file is shown.


Given the suggested report format in the Documentation, this path could 
be reported as 'coreexclude', not just an 'exclude'.


If I've understood the regular code correctly, the core.excludesfile is 
always at one end of the exclude struct so should be easy to check at 
that position. 


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] i18n.pathencoding

2012-09-02 Thread Robin Rosenberg

Torsten Bögershausen skrev 2012-09-01 08.11: Allow path names to be encoded in 
UTF-8 in the repository
 and checkout out as e.g. ISO-8859-1 in the working tree.

Ack for attempting this.

Did it myself if 2007, but times weren't ripe then, I guess.

 +i18n.pathEncoding::
 +  This option is only used by some implementations of git.
 +  When git init sets core.supportspathencoding to true,
 +  i18n.pathEncoding can be set to re-encode path names when
 +  a working tree is checked out.
 +  Path names may be e.g. encoded in ISO-8859-1 and are stored as
 +  UTF-8 encoded in the repository.
 +  When not set, the encoding of path names is the same in working tree
 +  and the repository.

If set, then core.precomposeunicode is ignored on Mac OS X.

 diff --git a/compat/reencode_pathname.c b/compat/reencode_pathname.c
 new file mode 100644
 index 000..3bdc776
 --- /dev/null
 +++ b/compat/reencode_pathname.c
 @@ -0,0 +1,441 @@
 +/*
 + * Converts pathnames from one encoding into another.
 + * The pathnames are stored as UTF-8 in the repository,
 + * and might be checkout out as e.g. ISO-8859-1 in the working tree
 + *
 + * On MacOS X decomposed unicode is converted into precomposed unicode.
, ignoring the setting of core.precomposeunicode.

[...]
 + */
 +
 +#define REENCODE_PATHNAME_C
 +#include cache.h
 +#include utf8.h
 +#include reencode_pathname.h
 +
 +#if defined(OLD_ICONV) || (defined(__sun__)  !defined(_XPG6))
 +  typedef const char *iconv_ibp;
 +#else
 +  typedef char *iconv_ibp;
 +#endif
 +
 +const static char *repo_path_encoding = UTF-8;
 +
 +static iconv_t iconv_open_or_die(const char *tocode, const char *fromcode)
 +{
 +  iconv_t my_iconv;
 +  my_iconv = iconv_open(tocode, fromcode);
join these two lines

 +  if (my_iconv == (iconv_t) -1)
 +  die_errno(_(iconv_open(%s,%s) failed), tocode, fromcode);
 +  return my_iconv;
 +}
 +
 +static size_t has_non_ascii(const char *s, size_t maxlen, size_t *strlen_c)
 +{
 +  const uint8_t *ptr = (const uint8_t *)s;
 +  size_t strlen_chars = 0;
 +  size_t ret = 0;
 +
 +  if (!ptr || !*ptr)
 +  return 0;
 +
 +  while (*ptr  maxlen) {
 +  if (*ptr  0x80)
 +  ret++;
 +  strlen_chars++;
 +  ptr++;
 +  maxlen--;
 +  }
 +  if (strlen_c)
 +  *strlen_c = strlen_chars;
 +
 +  return ret;
 +}
 +
 +#ifdef PRECOMPOSE_UNICODE
 +void probe_utf8_pathname_composition(char *path, int len)
 +{
 +  static const char *auml_nfc = \xc3\xa4;
 +  static const char *auml_nfd = \x61\xcc\x88;
 +  int output_fd;
 +  if (precomposed_unicode != -1)
 +  return; /* We found it defined in the global config, respect it */
a bland line here would be nice

 +  strcpy(path + len, auml_nfc);
 +  output_fd = open(path, O_CREAT|O_EXCL|O_RDWR, 0600);
 +  if (output_fd = 0) {
 +  close(output_fd);
 +  strcpy(path + len, auml_nfd);
 +  /* Indicate to the user, that we can configure it to true */
 +  if (!access(path, R_OK))
 +  git_config_set(core.precomposeunicode, false);
 +  /* To be backward compatible, set precomposed_unicode to 0 */
 +  precomposed_unicode = 0;
 +  strcpy(path + len, auml_nfc);
 +  if (unlink(path))
 +  die_errno(_(failed to unlink '%s'), path);
 +  }
 +}
 +#endif

[...]

 +struct dirent_psx *renc_pn_readdir(RENC_FN_DIR *renc_pn_dir)
 +{
 +  struct dirent *res;
 +  res = readdir(renc_pn_dir-dirp);
 +  if (res) {
 +  size_t namelenz = strlen(res-d_name) + 1; /* \0 */
 +  size_t new_len_needed = 0;
 +  int ret_errno = errno;
 +
 +  renc_pn_dir-dirent_utf8-d_ino= res-d_ino;
 +  renc_pn_dir-dirent_utf8-d_type = res-d_type;
 +  do {
 +   if (new_len_needed  renc_pn_dir-dirent_utf8-max_name_len) {
indent

[...]

 diff --git a/environment.c b/environment.c
 index 85edd7f..ba81575 100644
 --- a/environment.c
 +++ b/environment.c
 @@ -59,6 +59,7 @@ int grafts_replace_parents = 1;
   int core_apply_sparse_checkout;
   int merge_log_config = -1;
   int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 +const char *wt_path_encoding = NULL;
indent

   struct startup_info *startup_info;
   unsigned long pack_size_limit_cfg;

 diff --git a/git-compat-util.h b/git-compat-util.h
 index 35b095e..877b060 100644
 --- a/git-compat-util.h
 +++ b/git-compat-util.h
 @@ -153,13 +153,21 @@
   #endif
   #endif

 -/* used on Mac OS X */
 -#ifdef PRECOMPOSE_UNICODE
 -#include compat/precompose_utf8.h
 +#if defined(PATH_ENCODING) || defined(PRECOMPOSE_UNICODE)
 +#include compat/reencode_pathname.h
   #else
 -#define precompose_str(in,i_nfd2nfc)
 -#define precompose_argv(c,v)
 -#define probe_utf8_pathname_composition(a,b)
 +#define reencode_argv(c,v)
 +#endif
 +
 +/* needed for Mac OS X */
 +#ifndef PRECOMPOSE_UNICODE
 +#define probe_utf8_pathname_composition(a,b);
 +#endif
 +
 +#ifndef PATH_ENCODING
 +#define str_worktree2repolen(in, insz) (NULL)
 +#define 

Re: [PATCH 9/9] Add git-check-ignores

2012-09-02 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Let's step back a bit and think what this command is about.  What is
 the reason why the user wants to run check-ignore $path in the
 first place?  I think there are two (or three, depending on how you
 count).

  (1) You have one (or more) paths at hand.  You want to know if it
  is (or some of them are) ignored, but you do not particularly
  care how they are ignored.  Think of implementing your own git
  add as a script.

  (2) You have one or more paths that are ignored but do not want
  them to be, and want to find out why they are.  

A reason related to (2) is to find out why the paths you want to be
excluded are included, so that you can fix it by disabling an entry
in .gitignore that covers too widely, or by adding a new entry to
override it.

For that to work, the -v mode needs to show all paths that were
given from the command line (or --stdin), to explain why each of
them is ignored or not ignored.  Hence, in addition:

 For the latter, you are debugging the set of exclude sources and
 want to learn where the decision to exclude it comes from.  For that
 kind of use, it would be more useful if the output mimicked error
 messages from the compilers and output from grep -n to show the
 source, e.g.

   .gitignore:13:/git-am   git-am

we would need an entry that shows !include pattern in the output.

A path that does not match any pattern anywhere in the exclude files
is taken as included---I am not sure what is the best way to explain
the reason why they are included.  If we are going to show the entry
that finally matched (either with negative or positive pattern) and
decided the fate as the explanation for both excluded and included
paths, perhaps not showing an entry for such a never matched path
might be a good enough explanation.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Test failures in t4034

2012-09-02 Thread Junio C Hamano
Ramsay Jones ram...@ramsay1.demon.co.uk writes:

 Yes, there was a net increase in the line count when I introduced
 die(), but the main program flow was less cluttered by error handling.
 The net result looked much better, so I thought it was worth it.

 What may not be too obvious, however, is that test-regex.c was written
 to be independent of git.

That part I was very aware of actually; it it is a bit tricky to
tell what the right thing to do, though.  Your test itself needs to
be pretty much portable without the portability help you would get
git-compat-util.h, but the point of this kind of test is to tell if
you want to define preprocessor macros that may affect the behaviour
of such compatibility layer ;-)

 Given that I'm now building it as part of git, I should have simply
 #included git-compat-util.h and used the die() routine from libgit.a
 (since I'm now *relying* on test-regex being linked with libgit.a).

OK.

 +int main(int argc, char **argv)
 +{
 +   char *pat = [^={} \t]+;
 +   char *str = ={}\nfred;
 +   regex_t r;
 +   regmatch_t m[1];
 +
 +   if (regcomp(r, pat, REG_EXTENDED | REG_NEWLINE))
 +   die(failed regcomp() for pattern '%s', pat);
 +   if (regexec(r, str, 1, m, 0))
 +   die(no match of pattern '%s' to string '%s', pat, str);
 +
 +   /* http://sourceware.org/bugzilla/show_bug.cgi?id=3957  */
 +   if (m[0].rm_so == 3) /* matches '\n' when it should not */
 +   exit(1);
 
 This could be the third call site of die() that tells the user to
 build with NO_REGEX=1.  Then cd t  sh t0070-fundamental.sh -i -v would
 give that message directly to the user.

 Hmm, even without -i -v, it's *very* clear what is going on, but sure
 it wouldn't hurt either. (Also, I wanted to be able to distinguish an exit
 via die() from a test failed error return).

OK.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] Add git-check-ignores

2012-09-02 Thread Nguyen Thai Ngoc Duy
On Sun, Sep 2, 2012 at 9:50 PM, Adam Spiers g...@adamspiers.org wrote:
 I'm no expert on .gitattributes and check-attr, but AFAICS, all the
 opportunities to share code in the plumbing and front-end seem to be
 taken already, e.g. the directory traversal and path handling.  The
 CLI argument parsing is necessarily different because check-attr
 requires a list of attributes as well as a list of files, and of
 course the output routines have to be different too.

 The only opportunity for code reuse which I saw but /didn't/ take was
 around the --stdin line parsing code which is duplicated between:

 check_attr_stdin_paths
 check_ignore_stdin_paths
 cmd_checkout_index
 cmd_update_index
 hash_stdin_paths

 I attempted to refactor these, but quickly realised that due to the
 lack of proper closures in C, the overheads and complexity incurred by
 performing such a refactoring probably outweighed the benefits, so I
 gave up on the idea.

 Having said that, I'm totally open to suggestions if you can spot
 other places where code could be reused :)

Yeah. That was my impression too. I just hoped a new set of eyes might
discover something ;) At lease we could prepare the output format that
can be reused (maybe with little changes) for check-attr debugging if
it comes later. Or make this command part of check-attr..

 Also --quiet option, where check-ignore returns 0 if the given path is
 ignored, 1 otherwise?

 I considered that, but couldn't think of appropriate behaviour when
 multiple paths are given, so in the end I decided to remain consistent
 with check-attr, which always returns 0.  But I'm happy to change it
 if you can think of a more useful behaviour.  For example we could
 have a --count option which produces no output but has an exit status
 corresponding to the number of ignored files.

We could take this opportunity to kill add --ignore-missing, which
is basically .gitignore checking and it accepts multiple paths, I
think.

  - If many paths are given, then perhaps we could print ignored paths
 (no extra info).

 How is this different to git ls-files -i -o ?

I think ls-files requires real files on working directory, but
check-ignore can deal with just non-existing paths.

  - For debugging, given one path, we print all the rules that are
 applied to it, which may help understand how/why it goes wrong.

 That would be nice, but I'm not sure it's a tremendously common use
 case.  Could you think of a scenario in which it would be useful?  I
 guess it could be done by adding a new DIR_DEBUG_IGNORED flag to
 dir_struct which would make the exclude matcher functions collect all
 matching patterns, rather than just returning the first one.  This in
 turn would require another field for collecting all matched patterns.

Mixing include/exclude ignore rules multiple times could be hard to
figure out what goes wrong. But as we haven't seen an actual use case
yet, just leave it out.

 I don't think we really need NEED_WORK_TREE here. .gitignore can be
 read from index only.

 I thought about that, but in the end I decided it probably didn't make
 sense, because none of the exclude matching routines match against the
 index - they all match against the working tree and core.excludesfile.
 This would also require changing the matching logic to honor the index,
 but I didn't see the benefit in doing that, since all operations which
 involve excludes (add, status, etc.) relate to a work tree.

 But as with all of the above, please don't hesitate to point out if
 I've missed something.  You guys are the experts, not me ;-)

Again I was thinking that check-ignore could work with imaginary paths
and could be used by scripts. If a script just wants to check certain
paths are excluded, it should not need to move to working directory
(though it probably is in working directory most of the time).
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html