date:20170817

[bug] Git submodule command interprets switch as argument and switch

2017-08-17 Thread R0b0t1

The issue is as follows:

R0b0t1@host:~/devel/project$ git submodule add
https://github.com/user/project -f
Cloning into '/home/R0b0t1/devel/project/-f'...

My .gitignore's first line is *, and then I explicitly allow things.
Despite the presence of "project/" in the .gitignore the submodule
command says it is ignored. The "force" flag is interpreted as a flag
and also as the destination directory.

It is possible the argument parsing code for other commands exhibits this error.

R0b0t1.

Re: Submodule regression in 2.14?

2017-08-17 Thread Stefan Beller

On Thu, Aug 17, 2017 at 7:13 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> Are you saying this might be a design mistake and
>> the .update ought to be respected by all the other
>> commands? For example
>> git reset --recurse-submodules
>> should ignore the .update= none?
>
> I have been under the impression that that has been the traditional
> desire of what .update ought to mean.  I personally do not have a
> strong opinion---at least not yet.

In this context note v2.14.0-rc1-34-g7463e2ec3
(bw/submodule-config-cleanup~7, "unpack-trees:
don't respect submodule.update") that is going opposite of
your impression.

>> When designing these new recursive submodule functionality
>> outside the "submodule" command, I'd want submodules
>> to behave as much as possible like trees.
>
> I think that is sensible as long as the user does not explicitly say
> "this and that submodule behave differently" by giving configuration
> variables.  Perhaps .update is one of those that should countermand
> the default behaviour of "--recurse-submodules"?

Maybe, I'll think about it. However there is no such
equivalent for trees (and AFAICT never came up) to
treat a specific directory other than the rest in worktree
operations.

The problem with the issue in question is however:
git-pull is a combination of two other high level commands
(fetch/merge), the fetch component already had
a recursive behavior, and that commit in question
added a bit for the merge component, so the UX
is hard to get right for both of them:

git pull --recurse=fetch-only
git pull --recurse=merge-respects-update-strategy

is what I'd want to avoid.

So maybe we can just respect the update strategy
before starting the local part.

Stefan

Re: [PATCH v4 2/3] builtin/branch: stop supporting the use of --set-upstream option

2017-08-17 Thread Kaartic Sivaraam


On Friday 18 August 2017 01:25 AM, Junio C Hamano wrote:

Martin Ågren  writes:


On 17 August 2017 at 04:54, Kaartic Sivaraam
 wrote:

Helped-by: Martin Ågren ,  Junio C Hamano 

Signed-off-by: Kaartic Sivaraam 

I didn't expect a "Helped-by", all I did was to give some random
comments. :-) I'm not so sure about the comma-separation, that seems to
be a first in the project.

I didn't either ;-)

The line looks odd so I'll remove it while queuing.

Thanks for noticing.

I should have been better with my wordings :) How about converting that
line into two 'Suggestions-by:' or 'Reviewed-by:' ?


---
Kaartic

Re: [PATCH v4 2/3] builtin/branch: stop supporting the use of --set-upstream option

2017-08-17 Thread Kaartic Sivaraam


On Friday 18 August 2017 01:28 AM, Junio C Hamano wrote:

Kaartic Sivaraam  writes:


diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
index 81bd0a7b7..948d9c9ef 100644
--- a/Documentation/git-branch.txt
+++ b/Documentation/git-branch.txt
@@ -195,10 +195,10 @@ start-point is either a local or remote-tracking branch.
branch.autoSetupMerge configuration variable is true.
  
  --set-upstream::

-   If specified branch does not exist yet or if `--force` has been
-   given, acts exactly like `--track`. Otherwise sets up configuration
-   like `--track` would when creating the branch, except that where
-   branch points to is not changed.
+   As this option had confusing syntax it's no longer supported. Please use
+   --track or --set-upstream-to instead.
++
+Note: This could possibly become an alias of --set-upstream-to in the future.

I'll tweak `--track` and `--set-upstream-to` in the updated text
and remove the 'Note:' thing that does not give any useful
information to the end users while queuing (no need to resend).
I thought I explained the reason for adding the note in one of the 
previous mails.

Here's the portion of the mail,

On Monday 14 August 2017 02:20 PM, Kaartic Sivaraam wrote:
>
> On Wednesday 09 August 2017 12:03 AM, Martin Ågren wrote:
>>
>> Maybe the final note could be removed? Someone who is looking up
>> --set-upstream because Git just "crashed" on them will only want to know
>> what they should do instead. Our thoughts about the future are perhaps
>> not that interesting.
>
> I thought it's better to document it to avoid people from getting 
surprised

> when the options *starts working* again.
>

I hope that explains the reason.

---
Kaartic

Re: [PATCH/RFC] git-grep: correct exit code with --quiet and -L

2017-08-17 Thread Junio C Hamano

Anthony Sottile  writes:

> The handling of `status_only` no longer interferes with the handling of
> `unmatch_name_only`.  `--quiet` no longer affects the exit code when using
> `-L`/`--files-without-match`.
>
> Signed-off-by: Anthony Sottile 
> ---

Thanks, Will queue.

>  grep.c  | 2 +-
>  t/t7810-grep.sh | 5 +
>  2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/grep.c b/grep.c
> index 2efec0e..c9e7cc7 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -1821,7 +1821,7 @@ static int grep_source_1(struct grep_opt *opt, struct 
> grep_source *gs, int colle
>   return 0;
>  
>   if (opt->status_only)
> - return 0;
> + return opt->unmatch_name_only;
>   if (opt->unmatch_name_only) {
>   /* We did not see any hit, so we want to show this */
>   show_name(opt, gs->name);
> diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
> index f106387..2a6679c 100755
> --- a/t/t7810-grep.sh
> +++ b/t/t7810-grep.sh
> @@ -374,6 +374,11 @@ test_expect_success 'grep -L -C' '
>   test_cmp expected actual
>  '
>  
> +test_expect_success 'grep --files-without-match --quiet' '
> + git grep --files-without-match --quiet nonexistent_string >actual &&
> + test_cmp /dev/null actual
> +'
> +
>  cat >expected <  file:foo mmap bar_mmap
>  EOF

Re: [PATCH v3 0/4] Modernize read_graft_line implementation

2017-08-17 Thread Junio C Hamano

Thanks, will queue.

Re: Submodule regression in 2.14?

2017-08-17 Thread Junio C Hamano

Stefan Beller  writes:

> Are you saying this might be a design mistake and
> the .update ought to be respected by all the other
> commands? For example
> git reset --recurse-submodules
> should ignore the .update= none?

I have been under the impression that that has been the traditional
desire of what .update ought to mean.  I personally do not have a
strong opinion---at least not yet.

> When designing these new recursive submodule functionality
> outside the "submodule" command, I'd want submodules
> to behave as much as possible like trees.

I think that is sensible as long as the user does not explicitly say
"this and that submodule behave differently" by giving configuration
variables.  Perhaps .update is one of those that should countermand
the default behaviour of "--recurse-submodules"?

[PATCH v3 3/4] commit: allocate array using object_id size

2017-08-17 Thread Patryk Obara

struct commit_graft aggregates an array of object_id's, which have
size >= GIT_MAX_RAWSZ bytes. This change prevents memory allocation
error when size of object_id is larger than GIT_SHA1_RAWSZ.

Signed-off-by: Patryk Obara 
---
 commit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index 019e733..61528a5 100644
--- a/commit.c
+++ b/commit.c
@@ -149,7 +149,8 @@ struct commit_graft *read_graft_line(struct strbuf *line)
if ((len + 1) % entry_size)
goto bad_graft_data;
i = (len + 1) / entry_size - 1;
-   graft = xmalloc(st_add(sizeof(*graft), st_mult(GIT_SHA1_RAWSZ, i)));
+   graft = xmalloc(st_add(sizeof(*graft),
+  st_mult(sizeof(struct object_id), i)));
graft->nr_parent = i;
if (get_oid_hex(buf, >oid))
goto bad_graft_data;
-- 
2.9.5

[PATCH v3 4/4] commit: rewrite read_graft_line

2017-08-17 Thread Patryk Obara

Determine the number of object_id's to parse in a single graft line by
counting separators (whitespace characters) instead of dividing by
length of hash representation.

This way graft parsing code can support different sizes of hashes
without any further code adaptations.

Signed-off-by: Patryk Obara 
---
 commit.c | 28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/commit.c b/commit.c
index 61528a5..46ee2db 100644
--- a/commit.c
+++ b/commit.c
@@ -137,29 +137,27 @@ int register_commit_graft(struct commit_graft *graft, int 
ignore_dups)
 struct commit_graft *read_graft_line(struct strbuf *line)
 {
/* The format is just "Commit Parent1 Parent2 ...\n" */
-   int i, len;
-   char *buf = line->buf;
+   int i, n;
+   const char *tail = NULL;
struct commit_graft *graft = NULL;
-   const int entry_size = GIT_SHA1_HEXSZ + 1;
 
strbuf_rtrim(line);
if (line->buf[0] == '#' || line->len == 0)
return NULL;
-   len = line->len;
-   if ((len + 1) % entry_size)
-   goto bad_graft_data;
-   i = (len + 1) / entry_size - 1;
+   /* count number of blanks to determine size of array to allocate */
+   for (i = 0, n = 0; i < line->len; i++)
+   if (isspace(line->buf[i]))
+   n++;
graft = xmalloc(st_add(sizeof(*graft),
-  st_mult(sizeof(struct object_id), i)));
-   graft->nr_parent = i;
-   if (get_oid_hex(buf, >oid))
+  st_mult(sizeof(struct object_id), n)));
+   graft->nr_parent = n;
+   if (parse_oid_hex(line->buf, >oid, ))
goto bad_graft_data;
-   for (i = GIT_SHA1_HEXSZ; i < len; i += entry_size) {
-   if (buf[i] != ' ')
-   goto bad_graft_data;
-   if (get_sha1_hex(buf + i + 1, graft->parent[i/entry_size].hash))
+   for (i = 0; i < graft->nr_parent; i++)
+   if (!isspace(*tail++) || parse_oid_hex(tail, >parent[i], 
))
goto bad_graft_data;
-   }
+   if (tail[0] != '\0')
+   goto bad_graft_data;
return graft;
 
 bad_graft_data:
-- 
2.9.5

[PATCH v3 1/4] sha1_file: fix definition of null_sha1

2017-08-17 Thread Patryk Obara

The array is declared in cache.h as:

  extern const unsigned char null_sha1[GIT_MAX_RAWSZ];

Definition in sha1_file.c must match.

Signed-off-by: Patryk Obara 
---
 sha1_file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index b60ae15..f5b5bec 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -32,7 +32,7 @@
 #define SZ_FMT PRIuMAX
 static inline uintmax_t sz_fmt(size_t s) { return s; }
 
-const unsigned char null_sha1[20];
+const unsigned char null_sha1[GIT_MAX_RAWSZ];
 const struct object_id null_oid;
 const struct object_id empty_tree_oid = {
EMPTY_TREE_SHA1_BIN_LITERAL
-- 
2.9.5

[PATCH v3 2/4] commit: replace the raw buffer with strbuf in read_graft_line

2017-08-17 Thread Patryk Obara

This simplifies function declaration and allows for use of strbuf_rtrim
instead of modifying buffer directly.

Signed-off-by: Patryk Obara 
---
 builtin/blame.c |  2 +-
 commit.c| 15 ---
 commit.h|  2 +-
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index bda1a78..d4472e9 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -488,7 +488,7 @@ static int read_ancestry(const char *graft_file)
return -1;
while (!strbuf_getwholeline(, fp, '\n')) {
/* The format is just "Commit Parent1 Parent2 ...\n" */
-   struct commit_graft *graft = read_graft_line(buf.buf, buf.len);
+   struct commit_graft *graft = read_graft_line();
if (graft)
register_commit_graft(graft, 0);
}
diff --git a/commit.c b/commit.c
index 8b28415..019e733 100644
--- a/commit.c
+++ b/commit.c
@@ -134,17 +134,18 @@ int register_commit_graft(struct commit_graft *graft, int 
ignore_dups)
return 0;
 }
 
-struct commit_graft *read_graft_line(char *buf, int len)
+struct commit_graft *read_graft_line(struct strbuf *line)
 {
/* The format is just "Commit Parent1 Parent2 ...\n" */
-   int i;
+   int i, len;
+   char *buf = line->buf;
struct commit_graft *graft = NULL;
const int entry_size = GIT_SHA1_HEXSZ + 1;
 
-   while (len && isspace(buf[len-1]))
-   buf[--len] = '\0';
-   if (buf[0] == '#' || buf[0] == '\0')
+   strbuf_rtrim(line);
+   if (line->buf[0] == '#' || line->len == 0)
return NULL;
+   len = line->len;
if ((len + 1) % entry_size)
goto bad_graft_data;
i = (len + 1) / entry_size - 1;
@@ -161,7 +162,7 @@ struct commit_graft *read_graft_line(char *buf, int len)
return graft;
 
 bad_graft_data:
-   error("bad graft data: %s", buf);
+   error("bad graft data: %s", line->buf);
free(graft);
return NULL;
 }
@@ -174,7 +175,7 @@ static int read_graft_file(const char *graft_file)
return -1;
while (!strbuf_getwholeline(, fp, '\n')) {
/* The format is just "Commit Parent1 Parent2 ...\n" */
-   struct commit_graft *graft = read_graft_line(buf.buf, buf.len);
+   struct commit_graft *graft = read_graft_line();
if (!graft)
continue;
if (register_commit_graft(graft, 1))
diff --git a/commit.h b/commit.h
index 6d857f0..baecc0a 100644
--- a/commit.h
+++ b/commit.h
@@ -247,7 +247,7 @@ struct commit_graft {
 };
 typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *);
 
-struct commit_graft *read_graft_line(char *buf, int len);
+struct commit_graft *read_graft_line(struct strbuf *line);
 int register_commit_graft(struct commit_graft *, int);
 struct commit_graft *lookup_commit_graft(const struct object_id *oid);
 
-- 
2.9.5

[PATCH v3 0/4] Modernize read_graft_line implementation

2017-08-17 Thread Patryk Obara

Changes since v2:
- commit implementing free_graft dropped (no longer needed)
- several more lines moved from last commit to commit replacing
  raw buffer with strbuf
- fix for memory allocation separated from change in hash
  parsing
- commit_graft struct uses FLEX_ARRAY again, meaning free_graft,
  memset, nor oid_array were not needed after all

Patryk Obara (4):
  sha1_file: fix definition of null_sha1
  commit: replace the raw buffer with strbuf in read_graft_line
  commit: allocate array using object_id size
  commit: rewrite read_graft_line

 builtin/blame.c |  2 +-
 commit.c| 38 +++---
 commit.h|  2 +-
 sha1_file.c |  2 +-
 4 files changed, 22 insertions(+), 22 deletions(-)

-- 
2.9.5

base-commit: b3622a4ee94e4916cd05e6d96e41eeb36b941182

[PATCH/RFC] git-grep: correct exit code with --quiet and -L

2017-08-17 Thread Anthony Sottile

The handling of `status_only` no longer interferes with the handling of
`unmatch_name_only`.  `--quiet` no longer affects the exit code when using
`-L`/`--files-without-match`.

Signed-off-by: Anthony Sottile 
---
 grep.c  | 2 +-
 t/t7810-grep.sh | 5 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 2efec0e..c9e7cc7 100644
--- a/grep.c
+++ b/grep.c
@@ -1821,7 +1821,7 @@ static int grep_source_1(struct grep_opt *opt, struct 
grep_source *gs, int colle
return 0;
 
if (opt->status_only)
-   return 0;
+   return opt->unmatch_name_only;
if (opt->unmatch_name_only) {
/* We did not see any hit, so we want to show this */
show_name(opt, gs->name);
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index f106387..2a6679c 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -374,6 +374,11 @@ test_expect_success 'grep -L -C' '
test_cmp expected actual
 '
 
+test_expect_success 'grep --files-without-match --quiet' '
+   git grep --files-without-match --quiet nonexistent_string >actual &&
+   test_cmp /dev/null actual
+'
+
 cat >expected <

Re: reftable [v7]: new ref storage format

2017-08-17 Thread Shawn Pearce

On Tue, Aug 15, 2017 at 7:48 PM, Shawn Pearce  wrote:
> 7th iteration of the reftable storage format.
>
> You can read a rendered version of this here:
> https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md

FYI, JGit has adopted this v7 draft as version 1 of the reftable
format. The file reader/writer implementations will be in the next
version of JGit. Transaction support in the $GIT_DIR layout still
needs to be implemented.

Re: [PATCH v3 2/2] File commited with CRLF should roundtrip diff and apply

2017-08-17 Thread Junio C Hamano

tbo...@web.de writes:

> @@ -1712,11 +1726,15 @@ static int parse_fragment(struct apply_state *state,
>   if (!deleted && !added)
>   leading++;
>   trailing++;
> + if (!state->apply_in_reverse)
> + check_old_for_crlf(patch, line, len);
>   if (!state->apply_in_reverse &&
>   state->ws_error_action == correct_ws_error)
>   check_whitespace(state, line, len, 
> patch->ws_rule);
>   break;

This one is wrong.  You are looking at " " (common context) and you
should unconditionally call check_old for them.


>   case '-':
> + if (!state->apply_in_reverse)
> + check_old_for_crlf(patch, line, len);

This is correct.

There is "case '+':" below here you did not touch.  There should be

case '+':
+   if (state->apply_in_reverse)
+   check_old_for_crlf(...);

there.  Note that we call check_old() only when applying in reverse.

> @@ -2268,8 +2286,11 @@ static void show_stats(struct apply_state *state, 
> struct patch *patch)
>   add, pluses, del, minuses);
>  }
>  
> -static int read_old_data(struct stat *st, const char *path, struct strbuf 
> *buf)
> +static int read_old_data(struct stat *st, struct patch *patch,
> +  const char *path, struct strbuf *buf)

The order of argument to have the patch structure earlier is
different from my version; what we see here looks much better to me.

>  {
> + enum safe_crlf safe_crlf = patch->crlf_in_old ?
> + SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_RENORMALIZE;
>   switch (st->st_mode & S_IFMT) {
>   case S_IFLNK:
>   if (strbuf_readlink(buf, path, st->st_size) < 0)
> @@ -2278,7 +2299,15 @@ static int read_old_data(struct stat *st, const char 
> *path, struct strbuf *buf)
>   case S_IFREG:
>   if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
>   return error(_("unable to open or read %s"), path);
> - convert_to_git(_index, path, buf->buf, buf->len, buf, 0);
> +/*
> + * "git apply" without "--index/--cached" should never look
> + * at the index; the target file may not have been added to
> + * the index yet, and we may not even be in any Git repository.
> + * Pass NULL to convert_to_git() to stress this; the function
> + * should never look at the index when explicit crlf option
> + * is given.
> + */
> + convert_to_git(NULL, path, buf->buf, buf->len, buf, safe_crlf);

This comment is somewhat strangly indented.  I thought opening "/*"
alighs with the usual tab stop.

Thanks.

Re: [PATCH v3 2/2] File commited with CRLF should roundtrip diff and apply

2017-08-17 Thread Junio C Hamano

tbo...@web.de writes:

> Changes since v2:
> - Manually integrated all code changes from Junio
>   (Thanks, I hope that I didn't miss something)

I suspect that "apply -R makes '+' preimage" change is not here.

> - Having examples of "git diff" in the commit message confuses "git apply",
>   so that all examples for git diff have a '*' at the beginnig of the line
>   (V2 used '$' which is typically an example for a shell script)

Just FYI we tend to just indent them further, just like any
displayed material in the proposed log message.

> - The official version to apply the CRLF-rules without having an index is
>   SAFE_CRLF_RENORMALIZE, that is already working today.

Ah, good find.  I forgot about that thing you added some time ago.

Beloved

2017-08-17 Thread Mrs Susan Patrick

I am Mrs Susan Patrick i have a proposal donation if you are interested get 
back to me via email: susanpatrick...@gmail.com

Thanks 
Hope to hear from you soon
Mrs Susan Patrick

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com

Re: Submodule regression in 2.14?

2017-08-17 Thread Stefan Beller

On Thu, Aug 17, 2017 at 2:21 PM, Lars Schneider
 wrote:
>
>> Oh, wait.
>> $ git log --oneline v2.13.0..v2.14.1 -- builtin/pull.c
>> c9c63ee558 Merge branch 'sb/pull-rebase-submodule'
>> a6d7eb2c7a pull: optionally rebase submodules (remote submodule changes only)
>> could also be a culprit. Do you have pull.rebase set?
>
> I bisected the problem today and "a6d7eb2c7a pull: optionally rebase 
> submodules
> (remote submodule changes only)" is indeed the culprit.
>
> The commit seems to break the following test case.
>
> diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
> index dcac364c5f..24f9729015 100755
> --- a/t/t7400-submodule-basic.sh
> +++ b/t/t7400-submodule-basic.sh
> @@ -1289,4 +1289,19 @@ test_expect_success 'init properly sets the config' '
> test_must_fail git -C multisuper_clone config --get 
> submodule.sub1.active
>  '
>
> +test_expect_success 'submodule update and git pull with disabled submodule' '
> +   test_when_finished "rm -rf multisuper_clone" &&
> +   pwd=$(pwd) &&
> +   git clone file://"$pwd"/multisuper multisuper_clone &&
> +   (
> +   cd multisuper_clone &&
> +   git config --local submodule.sub0.update none &&
> +   git submodule update --init --recursive &&
> +   git pull --recurse-submodules &&
> +   git submodule status | cut -c 1,43- >actual
> +   ) &&
> +   ls &&
> +   test_cmp expect multisuper_clone/actual
> +'

Thanks for providing this test.

cd trash directory.t7400-submodule-basic/multisuper_clone
cat .git/config
[submodule "sub0"]
  update = none
  active = true
  url = file:///.../t/trash directory.t7400-submodule-basic/sub1

submodule..update
The default update procedure for a submodule.
This variable is populated by git submodule init
from the gitmodules(5) file. See description of
update command in git-submodule(1).

The first sentence of .update is misleading IMHO as the
these settings should strictly apply to the "submodule update"
command. So "pull --recurse-submodules" ought to ignore it,
instead the pull can do whatever it wants, namely treat the
submodule roughly like a tree and either merge/rebase
inside the submodule as well. The user *asked* for recursive
pull after all.

Are you saying this might be a design mistake and
the .update ought to be respected by all the other
commands? For example
git reset --recurse-submodules
should ignore the .update= none?

When designing these new recursive submodule functionality
outside the "submodule" command, I'd want submodules
to behave as much as possible like trees.

ideas?

Thanks,
Stefan

> +
>  test_done
>
>
> I am not familiar with the code. Does anyone see the problem
> right away?
>
> Thanks,
> Lars
>
>

Re: [PATCH] sub-process: print the cmd when a capability is unsupported

2017-08-17 Thread Jonathan Tan

On Thu, 17 Aug 2017 23:34:33 +0200
Lars Schneider  wrote:

> 
> > On 17 Aug 2017, at 23:01, Junio C Hamano  wrote:
> > 
> > Christian Couder  writes:
> > 
> >> ... but I think we should then emphasize more in our test
> >> scripts (maybe by giving a good example) and perhaps also in the doc
> >> that the filters/sub-processes should really pay attention and not
> >> output any capability that are not supported by Git.
> > 
> > Oh, absolutely.  If you know there is such a breakage in our test
> > script, please do fix it.
> > 
> > Thanks.
> 
> Junio's reasoning [1] is spot on from my point of view.

Agreed.

> 
> I intentionally did not add the negotiation to the test code to keep
> the test as simple as possible.

I think this is the correct approach - the test was testing Git's
behavior, not the filter's behavior. (Although, if someone wanted to add
a test for a misbehaving filter, that would be great, although such a
test would have hardcoded output from the filter anyway.)

> However, I wrote this in the
> gitattributes docs [2]:
> 
>   After the version negotiation Git sends a list of all capabilities that
>   it supports and a flush packet. Git expects to read a list of desired
>   capabilities, which must be a subset of the supported capabilities list,
>   and a flush packet as response:
> 
> Maybe we should revive "Documentation/technical/api-sub-process.txt" [3]
> after all to explain these kind of things?

As for reviving that specific file, I saw you wrote a similar comment
but I didn't reply to it - sorry.  The commit in question is 7e2e1bb
("Documentation: migrate sub-process docs to header", 2017-07-26), and
in that commit, everything in api-sub-process.txt was migrated. I think
it's better for such documentation to be in a .h file, rather than a
.txt file.

As for describing the Long Running Process Protocol in a
Documentation/.../txt file, my plan was to create a new file
specifically for that, as you can see in an e-mail I sent [4]. I'm OK
with putting it in a different place, but it probably shouldn't be named
"api", because those are for internal Git APIs.

[4] 
https://public-inbox.org/git/eadce97b6a1e80345a2621e71ce187e9e6bc05bf.1501532294.git.jonathanta...@google.com/

> [1] https://public-inbox.org/git/xmqq8tijpkrv@gitster.mtv.corp.google.com/
> [2] 
> https://github.com/git/git/blob/b3622a4ee94e4916cd05e6d96e41eeb36b941182/Documentation/gitattributes.txt#L408-L411
> [3] 
> https://public-inbox.org/git/20170807102136.30b23...@twelve2.svl.corp.google.com/

[PATCH v3 2/2] File commited with CRLF should roundtrip diff and apply

2017-08-17 Thread tboegi

From: Torsten Bögershausen 

When a file had been commited with CRLF but now .gitattributes say
"* text=auto" (or core.autocrlf is true),
the following does not roundtrip, `git apply` fails:

printf "Added line\r\n" >>file &&
git diff >patch &&
git checkout -- . &&
git apply patch

Before applying the patch, the file from working tree is converted into the
index format (clean filter, CRLF conversion, ...)
Here, when commited with CRLF, the line endings should not be converted.

Note that `git apply --index` or `git apply --cache` doesn't call
convert_to_git() because the source material is already in index format.

Analyze the patch if there is a) any context line with CRLF,
or b) if any line with CRLF is to be removed.
In this case the patch file `patch` has mixed line endings, for a)
it looks like this (ignore the * at the begin of the line):

* diff --git a/one b/one
* index 533790e..c30dea8 100644
* --- a/one
* +++ b/one
* @@ -1 +1,2 @@
*  a\r
* +b\r

And for b) it looks like this:

* diff --git a/one b/one
* index 533790e..485540d 100644
* --- a/one
* +++ b/one
* @@ -1 +1 @@
* -a\r
* +b\r

If `git apply` detects that the patch itself has CRLF, (look at the line
" a\r" or "-a\r" above), the new flag crlf_in_old is set in "struct patch"
and two things will happen:
- read_old_data() will not convert CRLF into LF by calling
  convert_to_git(..., SAFE_CRLF_KEEP_CRLF);
- The WS_CR_AT_EOL bit is set in the "white space rule",
  CRLF are no longer treated as white space.

Thanks to Junio C Hamano, his input became the base for the changes in t4124.
One test case is split up into 3:
- Detect the " a\r" line in the patch
- Detect the "-a\r" line in the patch
- Use LF in repo and CLRF in the worktree.

Reported-by: Anthony Sottile 
Signed-off-by: Torsten Bögershausen 
---
Changes since v2:
- Manually integrated all code changes from Junio
  (Thanks, I hope that I didn't miss something)
- Having examples of "git diff" in the commit message confuses "git apply",
  so that all examples for git diff have a '*' at the beginnig of the line
  (V2 used '$' which is typically an example for a shell script)
- The official version to apply the CRLF-rules without having an index is
  SAFE_CRLF_RENORMALIZE, that is already working today.
- Now we have convert_to_git(NULL, ..., safe_crlf) with
  enum safe_crlf safe_crlf = patch->crlf_in_old ?
  SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_RENORMALIZE;

apply.c  | 40 +++-
 t/t4124-apply-ws-rule.sh | 33 +++--
 2 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/apply.c b/apply.c
index f2d599141d..691f47c783 100644
--- a/apply.c
+++ b/apply.c
@@ -220,6 +220,7 @@ struct patch {
unsigned int recount:1;
unsigned int conflicted_threeway:1;
unsigned int direct_to_threeway:1;
+   unsigned int crlf_in_old:1;
struct fragment *fragments;
char *result;
size_t resultsize;
@@ -1662,6 +1663,19 @@ static void check_whitespace(struct apply_state *state,
record_ws_error(state, result, line + 1, len - 2, state->linenr);
 }
 
+/*
+ * Check if the patch has context lines with CRLF or
+ * the patch wants to remove lines with CRLF.
+ */
+static void check_old_for_crlf(struct patch *patch, const char *line, int len)
+{
+   if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') {
+   patch->ws_rule |= WS_CR_AT_EOL;
+   patch->crlf_in_old = 1;
+   }
+}
+
+
 /*
  * Parse a unified diff. Note that this really needs to parse each
  * fragment separately, since the only way to know the difference
@@ -1712,11 +1726,15 @@ static int parse_fragment(struct apply_state *state,
if (!deleted && !added)
leading++;
trailing++;
+   if (!state->apply_in_reverse)
+   check_old_for_crlf(patch, line, len);
if (!state->apply_in_reverse &&
state->ws_error_action == correct_ws_error)
check_whitespace(state, line, len, 
patch->ws_rule);
break;
case '-':
+   if (!state->apply_in_reverse)
+   check_old_for_crlf(patch, line, len);
if (state->apply_in_reverse &&
state->ws_error_action != nowarn_ws_error)
check_whitespace(state, line, len, 
patch->ws_rule);
@@ -2268,8 +2286,11 @@ static void show_stats(struct apply_state *state, struct 
patch *patch)
add, pluses, del, minuses);
 }
 
-static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
+static int read_old_data(struct stat *st, struct patch *patch,
+const char *path, struct strbuf *buf)
 {
+   enum

[PATCH v3 1/2] convert: Add SAFE_CRLF_KEEP_CRLF

2017-08-17 Thread tboegi

From: Torsten Bögershausen 

When convert_to_git() is called, the caller may want to keep CRLF
to be kept as CRLF (and not converted into LF).

This will be used in the next commit, when apply works with files that have
CRLF and patches are applied onto these files.

Add the new value "SAFE_CRLF_KEEP_CRLF" to safe_crlf.

Prepare convert_to_git() to be able to run the clean filter,
skip the CRLF conversion and run the ident filter.

Signed-off-by: Torsten Bögershausen 
---
 convert.c | 10 ++
 convert.h |  3 ++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/convert.c b/convert.c
index deaf0ba7b3..040123b4fe 100644
--- a/convert.c
+++ b/convert.c
@@ -1104,10 +1104,12 @@ int convert_to_git(const struct index_state *istate,
src = dst->buf;
len = dst->len;
}
-   ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, 
checksafe);
-   if (ret && dst) {
-   src = dst->buf;
-   len = dst->len;
+   if (checksafe != SAFE_CRLF_KEEP_CRLF) {
+   ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, 
checksafe);
+   if (ret && dst) {
+   src = dst->buf;
+   len = dst->len;
+   }
}
return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
diff --git a/convert.h b/convert.h
index cecf59d1aa..cabd5ed6dd 100644
--- a/convert.h
+++ b/convert.h
@@ -10,7 +10,8 @@ enum safe_crlf {
SAFE_CRLF_FALSE = 0,
SAFE_CRLF_FAIL = 1,
SAFE_CRLF_WARN = 2,
-   SAFE_CRLF_RENORMALIZE = 3
+   SAFE_CRLF_RENORMALIZE = 3,
+   SAFE_CRLF_KEEP_CRLF = 4
 };
 
 extern enum safe_crlf safe_crlf;
-- 
2.14.1.145.gb3622a4ee9

Re: upgarding GIT

2017-08-17 Thread Bryan Turner

On Mon, Aug 7, 2017 at 4:20 PM, James Wells  wrote:
>
> As you can see my version of git is not supported with the current version of 
> bitbucket. I will have to perform a two stage upgrade anyway as the version 
> of STASH I am running cannot be upgraded directly to bitbucket 5.2 as well.
>
> Is there an easy way just to install a higher support version of git like 2.9 
> on the same server and then move all the repos and basically everything 
> across. Can you install another TAR ball later version on top of another git 
> , so it's like overwriting it.

Hey James, Bitbucket Server developer here. Sorry for not responding
sooner. I just wanted to check in with you and see if you'd gotten
this all resolved? If not, feel free to message me directly; I'm happy
to help you off the list. (Or on list, for that matter; I just figure
the general Git list isn't interested in this issue.)

Best regards,
Bryan Turner

Re: [PATCH v2 4/4] commit: rewrite read_graft_line

2017-08-17 Thread Patryk Obara

Understood :) - yes, oid_array is not completely necessary here - and it
gives the wrong impression about usage of this struct.

FLEX_ARRAY will be brought back in v3.

On Thu, Aug 17, 2017 at 11:20 PM, Junio C Hamano  wrote:
> Patryk Obara  writes:
>
>> The previous implementation of read_graft_line used calculations based
>> on GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ to determine the number of commit
>> ids in a single graft line.  New implementation does not depend on these
>> constants, so it adapts to any object_id buffer size.
>>
>> To make this possible, FLEX_ARRAY of object_id in struct was replaced
>> by an oid_array.
>
> There is a leap in logic between the two paragraphs.  Your use of
> parse_oid_hex() is good.  But I do not think moving the array body
> to outside commit_graft structure and forcing it to be separately
> allocated is necessary or beneficial.  When we got a single line, we
> know how many fake parents a child described by that graft line has,
> and you can still use of FLEX_ARRAY to avoid separate allocation
> and need for separate freeing of it.



-- 
| ← Ceci n'est pas une pipe
Patryk Obara

Re: [PATCH 3/5] commit: replace the raw buffer with strbuf in read_graft_line

2017-08-17 Thread Patryk Obara

> In fact, the former is already how we represent the list of fake
> parents in the commit_graft structure, so I think patch 5/5 in this
> series does two unrelated things, one of which is bad (i.e. use of
> parse_oid_hex() is good; turning the FLEX_ARRAY at the end into a
> oid_array that requires a separate allocation of the array is bad).

Agreed; I already split patch 5 into two separate changes (one fixing
memory allocation issue, one parsing object_ids into FLEX_ARRAY,
without modifying graft struct). In result patch 4 (free_graft) can
be dropped. I will send these changes as v3.


On Thu, Aug 17, 2017 at 11:17 PM, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> I'd expect most of the GIT_MAX constants to eventually go away in favor
>> of "struct object_id", but that will still be using the same "big enough
>> to hold any hash" size under the hood.
>
> Indeed.  It is good to see major contributors are in agreement ;-)
> I'd expect that an array of "struct object_id" would be how a fixed
> number of object names would be represented, i.e.
>
> struct object_id thing[num_elements];
>
> instead of an array of uchar that is MAX bytes long, i.e.
>
> unsigned char name[GIT_MAX_RAWSZ][num_elements];
>
> In fact, the former is already how we represent the list of fake
> parents in the commit_graft structure, so I think patch 5/5 in this
> series does two unrelated things, one of which is bad (i.e. use of
> parse_oid_hex() is good; turning the FLEX_ARRAY at the end into a
> oid_array that requires a separate allocation of the array is bad).
>
>> Agreed. Most code should be dealing with the abstract concept of a hash
>> and shouldn't have to care about the size. I really like parse_oid_hex()
>> for that reason (and I think parsing is the main place we've found that
>> needs to care).
>
> Yes.



-- 
| ← Ceci n'est pas une pipe
Patryk Obara

Re: [RFC PATCH] Updated "imported object" design

2017-08-17 Thread Jonathan Tan

Thanks for your comments. I'll reply to both your e-mails in this one
e-mail.

> This illustrates another place we need to resolve the
> naming/vocabulary.  We should at least be consistent to make it easier
> to discuss/explain.  We obviously went with "virtual" when building
> GVFS but I'm OK with "lazy" as long as we're consistent.  Some
> examples of how the naming can clarify or confuse:
> 
> 'Promise-enable your repo by setting the "extensions.lazyObject" flag'
> 
> 'Enable your repo to lazily fetch objects by setting the
> "extensions.lazyObject"'
> 
> 'Virtualize your repo by setting the "extensions.virtualize" flag'
> 
> We may want to carry the same name into the filename we use to mark
> the (virtualized/lazy/promised/imported) objects.
> 
> (This reminds me that there are only 2 hard problems in computer
> science...) ;)

Good point about the name. Maybe the 2nd one is the best? (Mainly
because I would expect a "virtualized" repo to have virtual refs too.)

But if there was a good way to refer to the "anti-projection" in a
virtualized system (that is, the "real" thing or "object" behind the
"virtual" thing or "image"), then maybe the "virtualized" language is
the best. (And I would gladly change - I'm having a hard time coming up
with a name for the "anti-projection" in the "lazy" language.)

Also, I should probably standardize on "lazily fetch" instead of "lazily
load". I didn't want to overlap with the existing fetching, but after
some thought, it's probably better to do that. The explanation would
thus be that you can either use the built-in Git fetcher (to be built,
although I have an old version here [1]) or supply a custom fetcher.

[1] https://github.com/jonathantanmy/git/commits/partialclone

> I think this all works and would meet the requirements we've been
> discussing.  The big trade off here vs what we first discussed with
> promises is that we are generating the list of promises on the fly
> when they are needed rather than downloading and maintaining a list
> locally.
> 
> My biggest concern with this model is the cost of opening and parsing
> every imported object (loose and pack for local and alternates) to
> build the oidset of promises.
> 
> In fsck this probably won't be an issue as it already focuses on
> correctness at the expense of speed.  I'm more worried about when we
> add the same/similar logic into check_connected.  That impacts fetch,
> clone, and receive_pack.
> 
> I guess the only way we can know for sure it to do a perf test and
> measure the impact.

As for fetching from the main repo, the connectivity check does not need
to be performed at all because all objects are "imported", so the
performance of the connectivity check does not matter. Same for cloning.

This is not true if you're fetching from another repo or if you're using
receive-pack, but (1) I think these are not used as much in such a
situation, and (2) if you do use them, the slowness only "kicks in" if
you do not have the objects referred to (whether non-"imported" or
"imported") and thus have to check the references in all "imported"
objects.

> I think this topic should continue to move forward so that we can 
> provide reasonable connectivity tests for fsck and check_connected in 
> the face of partial clones.  I'm not sure the prototype implementation 
> of reading/parsing all imported objects to build the promised oidset is 
> the most performant model but we can continue to investigate the best 
> options.

Agreed - I think the most important thing here is settling on the API
(name of extension and the nature of the object mark).

> Given all we need is an existance check for a given oid,

This is true...

> I wonder if it 
> would be faster overall to do a binary search through the list of 
> imported idx files + an existence test for an imported loose object.

...but what we're checking is the existence of a reference, not the
existence of an object. For a concrete example, consider what happens if
we both have an "imported" tree and a non-"imported" tree that
references a blob that we do not have. When checking the non-"imported"
tree for connectivity, we have to iterate through all "imported" trees
to see if any can vouch for the existence of such a blob. We cannot
merely binary-search the .idx file.

Re: [PATCH] files-backend: cheapen refname_available check when locking refs

2017-08-17 Thread Junio C Hamano

Jeff King  writes:

>> diff --git a/refs/files-backend.c b/refs/files-backend.c
>> index e9b95592b6..f2a420c611 100644
>> --- a/refs/files-backend.c
>> +++ b/refs/files-backend.c
>> @@ -631,11 +631,11 @@ static int lock_raw_ref(struct files_ref_store *refs,
>>  
>>  /*
>>   * If the ref did not exist and we are creating it,
>> - * make sure there is no existing ref that conflicts
>> - * with refname:
>> + * make sure there is no existing packed ref that
>> + * conflicts with refname:
>>   */
>>  if (refs_verify_refname_available(
>> ->base, refname,
>> +refs->packed_ref_store, refname,
>>  extras, skip, err))
>>  goto error_return;
>>  }
>
> This seems too easy to be true. :) But I think it matches what we were
> doing before 524a9fdb51 (so it's correct), and the performance numbers
> don't lie.

Thanks, all.  The log message explained the change very well, even
though I agree that the patch text does indeed look too easy to be
true ;-).

Will queue.

Re: Inconsistent exit code for `git grep --files-without-match` with `--quiet`

2017-08-17 Thread Paul Eggert

Anthony reported this issue to bug-g...@gnu.org. From what I can see, 
git-grep's behavior is better, so I changed GNU grep to behave like 
git-grep. Please see:


https://debbugs.gnu.org/28105

I hope this saves you folks a little bit of work.

Re: [PATCH] sub-process: print the cmd when a capability is unsupported

2017-08-17 Thread Lars Schneider

> On 17 Aug 2017, at 23:01, Junio C Hamano  wrote:
> 
> Christian Couder  writes:
> 
>> ... but I think we should then emphasize more in our test
>> scripts (maybe by giving a good example) and perhaps also in the doc
>> that the filters/sub-processes should really pay attention and not
>> output any capability that are not supported by Git.
> 
> Oh, absolutely.  If you know there is such a breakage in our test
> script, please do fix it.
> 
> Thanks.

Junio's reasoning [1] is spot on from my point of view.

I intentionally did not add the negotiation to the test code to keep
the test as simple as possible. However, I wrote this in the
gitattributes docs [2]:

  After the version negotiation Git sends a list of all capabilities that
  it supports and a flush packet. Git expects to read a list of desired
  capabilities, which must be a subset of the supported capabilities list,
  and a flush packet as response:

Maybe we should revive "Documentation/technical/api-sub-process.txt" [3]
after all to explain these kind of things?

- Lars

[1] https://public-inbox.org/git/xmqq8tijpkrv@gitster.mtv.corp.google.com/
[2] 
https://github.com/git/git/blob/b3622a4ee94e4916cd05e6d96e41eeb36b941182/Documentation/gitattributes.txt#L408-L411
[3] 
https://public-inbox.org/git/20170807102136.30b23...@twelve2.svl.corp.google.com/

Re: [PATCH] check-ref-format: require a repository for --branch

2017-08-17 Thread Junio C Hamano

Jeff King  writes:

> On Mon, Jul 17, 2017 at 10:27:09AM -0700, Jonathan Nieder wrote:
>
>> > --- a/t/t1402-check-ref-format.sh
>> > +++ b/t/t1402-check-ref-format.sh
>> > @@ -161,6 +161,10 @@ test_expect_success 'check-ref-format --branch from 
>> > subdir' '
>> >test "$refname" = "$sha1"
>> >  '
>> >  
>> > +test_expect_success 'check-ref-format --branch from non-repo' '
>> > +  test_must_fail nongit git check-ref-format --branch @{-1}
>> > +'
>> > +
>> >  valid_ref_normalized() {
>> >prereq=
>> >case $1 in
>> 
>> I don't think it's right.  Today I can do
>> 
>>  $ cd /tmp
>>  $ git check-ref-format --branch master
>>  master
>> 
>> You might wonder why I'd ever do such a thing.  But that's what "git
>> check-ref-format --branch" is for --- it is for taking a 
>> argument and turning it into a branch name.  For example, if you have
>> a script with an $opt_branch variable that defaults to "master", it
>> may do
>> 
>>  resolved_branch=$(git check-ref-format --branch "$opt_branch")
>> 
>> even though it is in a mode that not going to have to use
>> $resolved_branch and it is not running from a repository.
>
> I'm not sure I buy that. What does "turning it into a branch name" even
> mean when you are not in a repository? Clearly @{-1} must fail. And
> everything else is just going to output the exact input you provided.

This "just going to output the exact input" is not entirely correct;
there is just one use case for it.

"git check-ref-format --branch a..b" would fail with a helpful error
message, while the command run with "a.b" would tell you the name is
safe.

Use of 'check-ref-format --branch "@{-1}"' *IS* a nonsense, whether
it is inside or outside a repository; it is OK to fail it outside a
repository and I would say it is even OK to fail it inside a
repository.  After all "check-ref-format" is about checking if the
string is a sensible name to use.

I think calling interpret_branch_name() in the codepath is a mistake
and we should instead report if "refs/heads/@{-1}" literally is a
valid refname we could create instead.

Re: Submodule regression in 2.14?

2017-08-17 Thread Lars Schneider


> On 16 Aug 2017, at 20:51, Stefan Beller  wrote:
> 
> On Wed, Aug 16, 2017 at 11:35 AM, Lars Schneider
>  wrote:
>> Hi,
>> 
>> I think we discovered a regression in Git 2.14.1 today.
>> It looks like as if "git submodule update --init --recursive" removes
>> the "skip submodules" config.
>> 
>> Consider the following steps:
>> 
>>git clone https://server/repo.git
>>cd repo
>>git config --local submodule.some/other/repo.update none
>>git submodule update --init --recursive
>>git pull --recurse-submodules
>> 
>> With Git 2.14 the last "git pull" will clone the "some/other/repo"
>> submodule. This did not happen with Git 2.13.
>> 
>> Bug or feature? I don't have anymore time for Git today. I am happy to
>> provide a proper test case tomorrow, though.
> 
> $ git log --oneline v2.13.0..v2.14.1 -- git-submodule.sh
> 532139940c add: warn when adding an embedded repository
> (I am confident this is not the suspect, let's keep searching.
> Not a lot happened in submodule land apparently)
> 
> Looking through all commits v2.13..v2.14 doesn't have me
> suspect any of them.
> 
> Any chance the "did not happen with 2.13" was not
> freshly cloned but tested on an existing repo? If so a hot
> candidate for suspicion is a93dcb0a56 (Merge branch
> 'bw/submodule-is-active', 2017-03-30), IMHO, just
> gut feeling, though.
> 
> Oh, wait.
> $ git log --oneline v2.13.0..v2.14.1 -- builtin/pull.c
> c9c63ee558 Merge branch 'sb/pull-rebase-submodule'
> a6d7eb2c7a pull: optionally rebase submodules (remote submodule changes only)
> could also be a culprit. Do you have pull.rebase set?

I bisected the problem today and "a6d7eb2c7a pull: optionally rebase submodules 
(remote submodule changes only)" is indeed the culprit.

The commit seems to break the following test case.

diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index dcac364c5f..24f9729015 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -1289,4 +1289,19 @@ test_expect_success 'init properly sets the config' '
test_must_fail git -C multisuper_clone config --get 
submodule.sub1.active
 '
 
+test_expect_success 'submodule update and git pull with disabled submodule' '
+   test_when_finished "rm -rf multisuper_clone" &&
+   pwd=$(pwd) &&
+   git clone file://"$pwd"/multisuper multisuper_clone &&
+   (
+   cd multisuper_clone &&
+   git config --local submodule.sub0.update none &&
+   git submodule update --init --recursive &&
+   git pull --recurse-submodules &&
+   git submodule status | cut -c 1,43- >actual
+   ) &&
+   ls &&
+   test_cmp expect multisuper_clone/actual
+'
+
 test_done


I am not familiar with the code. Does anyone see the problem
right away?

Thanks,
Lars

Re: [PATCH v2 4/4] commit: rewrite read_graft_line

2017-08-17 Thread Junio C Hamano

Patryk Obara  writes:

> The previous implementation of read_graft_line used calculations based
> on GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ to determine the number of commit
> ids in a single graft line.  New implementation does not depend on these
> constants, so it adapts to any object_id buffer size.
>
> To make this possible, FLEX_ARRAY of object_id in struct was replaced
> by an oid_array.

There is a leap in logic between the two paragraphs.  Your use of
parse_oid_hex() is good.  But I do not think moving the array body
to outside commit_graft structure and forcing it to be separately
allocated is necessary or beneficial.  When we got a single line, we
know how many fake parents a child described by that graft line has,
and you can still use of FLEX_ARRAY to avoid separate allocation
and need for separate freeing of it.

Re: [PATCH 3/5] commit: replace the raw buffer with strbuf in read_graft_line

2017-08-17 Thread Junio C Hamano

Jeff King  writes:

> I'd expect most of the GIT_MAX constants to eventually go away in favor
> of "struct object_id", but that will still be using the same "big enough
> to hold any hash" size under the hood.

Indeed.  It is good to see major contributors are in agreement ;-)
I'd expect that an array of "struct object_id" would be how a fixed
number of object names would be represented, i.e.

struct object_id thing[num_elements];

instead of an array of uchar that is MAX bytes long, i.e.

unsigned char name[GIT_MAX_RAWSZ][num_elements];

In fact, the former is already how we represent the list of fake
parents in the commit_graft structure, so I think patch 5/5 in this
series does two unrelated things, one of which is bad (i.e. use of
parse_oid_hex() is good; turning the FLEX_ARRAY at the end into a
oid_array that requires a separate allocation of the array is bad).

> Agreed. Most code should be dealing with the abstract concept of a hash
> and shouldn't have to care about the size. I really like parse_oid_hex()
> for that reason (and I think parsing is the main place we've found that
> needs to care).

Yes.

Re: [PATCH] sub-process: print the cmd when a capability is unsupported

2017-08-17 Thread Junio C Hamano

Christian Couder  writes:

> ... but I think we should then emphasize more in our test
> scripts (maybe by giving a good example) and perhaps also in the doc
> that the filters/sub-processes should really pay attention and not
> output any capability that are not supported by Git.

Oh, absolutely.  If you know there is such a breakage in our test
script, please do fix it.

Thanks.

Re: [PATCH v2] update revisions doc for quoting in ':/' notation

2017-08-17 Thread Junio C Hamano

ryenus  writes:

> To make sure the `` in `:/` is seen as one search string,
> one should quote/escape `` properly.
>
> Especially, the example given in the manual `:/fix nasty bug` does not
> work because of missing quotes when used in shell. A note about
> quoting/escaping is added along with a working example, however, the
> original example is left-as-is to be consistent with other examples.
>
> Signed-off-by: ryenus 
> ---
>  Documentation/revisions.txt | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/revisions.txt b/Documentation/revisions.txt
> index 61277469c..d2862d55d 100644
> --- a/Documentation/revisions.txt
> +++ b/Documentation/revisions.txt
> @@ -185,7 +185,9 @@ existing tag object.
>e.g. ':/^foo'. The special sequence ':/!' is reserved for modifiers to what
>is matched. ':/!-foo' performs a negative match, while ':/!!foo' matches a
>literal '!' character, followed by 'foo'. Any other sequence beginning with
> -  ':/!' is reserved for now.
> +  ':/!' is reserved for now. And make sure to quote/escape for the text to be
> +  interpreted as the expected search string/pattern, e.g., for a commit whose
> +  message matches a literal \'`$`', use `git show :/\\\$` or `git show 
> ':/\$'`.

Hmph.  

This seems to miss the most important point Andreas raised, which is
that the way to quote them is entirely up to the shell and other UI
machinery the user is using.

And I am not sure if those who are using CUI should be told about
shell's quoting rules when they are learning about :/ syntax.  There
are tons of other ways that the user needs to pass a string with
whitespace in it as a single argument to commands, many of which may
not even be related to Git at all.  I was actually expecting a much
milder text, perhaps literal copy of what Andreas gave you in his
message <956ccc83-c291-4217-795c-fcef33fac...@gmail.com>.

By the way, I do not mean to dictate what name and address you use
to communicate with other people, but especially with something that
is supposed to hopefully have some legal value down the line if
somebody starts making SCO noises, it really would be nice to have a
real person to associate things with recorded as the author of a
change and the person who signed off the patch.  It would be
embarrassing later if there is no way to even look you up somehow.

Thanks.

Re: [RFC PATCH] Updated "imported object" design

2017-08-17 Thread Ben Peart




On 8/16/2017 5:35 PM, Jonathan Tan wrote:

On Wed, 16 Aug 2017 13:32:23 -0700
Junio C Hamano  wrote:


Jonathan Tan  writes:


Also, let me know if there's a better way to send out these patches for
review. Some of the code here has been reviewed before, for example.

[1] https://public-inbox.org/git/cover.1502241234.git.jonathanta...@google.com/

[2] 
https://public-inbox.org/git/ffb734d277132802bcc25baa13e8ede3490af62a.1501532294.git.jonathanta...@google.com/

[3] 
https://public-inbox.org/git/20170807161031.7c4ea...@twelve2.svl.corp.google.com/


... and some of the code exists only in the list archive, so we
don't know which other topic if any we may want to eject tentatively
if we wanted to give precedence to move this topic forward over
others.  I'll worry about it later but help from others is also
appreciated.


Thanks - I can help take a look when it is time to move the code in.



I agree that having this depend on patches elsewhere in the list archive 
makes it more difficult to review.  I know I like to see things in 
context to get a better picture.



I think the issue here is whether we want to move this topic forward or
not, that is, if this (special ".imported" objects) is the best way to
solve (at least partially) the connectivity check part of tolerating
missing objects. I hope that we can continue to talk about it.



I think this topic should continue to move forward so that we can 
provide reasonable connectivity tests for fsck and check_connected in 
the face of partial clones.  I'm not sure the prototype implementation 
of reading/parsing all imported objects to build the promised oidset is 
the most performant model but we can continue to investigate the best 
options.



This collects names of the objects that are _directly_ referred to
by imported objects.  An imported pack may have a commit, whose
top-level tree may or may not appear in the same pack, or the tree
may exist locally but not in the same pack.  Or the tree may not be
locally available at all.  In any of these four cases, the top-level
tree is listed in the "promises" set.  Same for trees and tags.

I wonder if all of the calls to oidset_insert() in this function
want to be guarded by "mark it as promised only when the referrent
is *not* locally available" to keep the promises set minimally
populated.  The only change needed to fsck in order to make it
refrain from treating a missing but promised object as an error
would be:

 -   if (object is missing)
 +   if (object is missing && object is not promised)
 error("that object must be there but missing");

so there is no point in throwing something that we know we locally
have in this oidset, right?

On the other hand, cost of such additional checks in this function
may outweigh the savings of both memory pressure and look-up cost,
so I do not know how the tradeoff would turn out.


I also don't know how the tradeoff would turn out, so I leaned towards
the slightly simpler solution of not doing the check. In the future,
maybe a t/perf test can be done to decide between the two.


+static int is_promise(const struct object_id *oid)
+{
+   if (!promises_prepared) {
+   if (repository_format_lazy_object)
+   for_each_packed_object(add_promise, NULL,
+  FOR_EACH_OBJECT_IMPORTED_ONLY);
+   promises_prepared = 1;
+   }
+   return oidset_contains(, oid);
+}


Somehow I'm tempted to call this function "is_promised()" but that
is a minor naming issue.




Given all we need is an existance check for a given oid, I wonder if it 
would be faster overall to do a binary search through the list of 
imported idx files + an existence test for an imported loose object.


Especially in the check_connected case which isn't verifying every 
object, that should be a lot less IO than loading all the imported 
commits, trees and blobs and pre-computing an oidset of all possible 
objects.  The lookup for each object would be slower than a simple call 
to oidset_contains but we avoid the up front cost.


With some caching of idx files and threading, I suspect this could be 
made pretty fast.



I was trying to be consistent in using the name "promise" instead of
"promised object/tag/commit/tree/blob" everywhere, but we can switch if
need be (for example, if we don't want to limit the generic name
"promise" to merely objects).


  static const char *describe_object(struct object *obj)
  {
static struct strbuf buf = STRBUF_INIT;
@@ -410,7 +472,7 @@ static void fsck_handle_reflog_oid(const char *refname, 
struct object_id *oid,
xstrfmt("%s@{%"PRItime"}", refname, 
timestamp));
obj->used = 1;
mark_object_reachable(obj);
-   } else {
+   } else if (!is_promise(oid)) {

Re: git fetch with refspec does not include tags?

2017-08-17 Thread Kevin Daudt

On Thu, Aug 17, 2017 at 01:38:36PM -0700, Junio C Hamano wrote:
> Kevin Daudt  writes:
> 
> > On Thu, Aug 17, 2017 at 12:38:58PM -0700, Junio C Hamano wrote:
> >> Jeff King  writes:
> >> 
> >> >   # no tags, we just populate FETCH_HEAD because of the bare URL
> >> >   git fetch ../parent
> >> >
> >> >   # this does fetch tags, because we're storing the result according to
> >> >   # the configured refspec ("refs/heads/*:refs/remotes/origin/*").
> >> >   git fetch origin
> >> 
> >> The above two look good.
> >> 
> >> >   # this doesn't fetch tags, as the main command is "just" populating
> >> >   # FETCH_HEAD. But then our logic for "hey, we fetched the ref for
> >> >   # refs/remotes/origin/master, so let's update it on the side" kicks
> >> >   # in. And we end up updating FETCH_HEAD _and_ the tracking branch, but
> >> >   # not the tags. Weird.
> >> >   git fetch origin master
> >> 
> >> Yes, it looks weird, but I suspect that it is probably more correct
> >> not to fetch tags in this case.  I wonder if it would be a solution
> >> not to do the "on the side" thing---after all the user didn't tell
> >> us to update refs/remotes/origin/master with this command line.
> >
> > Isn't that how git fetch used to behave, or am I misunderstanding what
> > you mean? It used to be that git fetch   would not
> > update any remote tracking branches.
> >
> > From the 1.8.4 release notes:
> >
> >> "git fetch origin master" unlike "git fetch origin" or "git fetch"
> >> did not update "refs/remotes/origin/master"; this was an early
> >> design decision to keep the update of remote tracking branches
> >> predictable, but in practice it turns out that people find it more
> >> convenient to opportunistically update them whenever we have a
> >> chance, and we have been updating them when we run "git push" which
> >> already breaks the original "predictability" anyway.
> 
> No, you are not misunderstanding anything.  The "pretend that we
> immediately turned around and fetched" done by "git push" was
> already breaking the predictability, but the change in 1.8.4 made it
> even worse.  I am saying that going back to the old behaviour may be
> one option to address the issue being discussed in this thread.
> 

Ok. The reason I'm bring this up is because we often had to tell users
in the irc channel that git fetch   did not update the
remote tracking branches, which confused them, so reverting back might
reintroduce this confusion again.

Re: git fetch with refspec does not include tags?

2017-08-17 Thread Junio C Hamano

Kevin Daudt  writes:

> On Thu, Aug 17, 2017 at 12:38:58PM -0700, Junio C Hamano wrote:
>> Jeff King  writes:
>> 
>> >   # no tags, we just populate FETCH_HEAD because of the bare URL
>> >   git fetch ../parent
>> >
>> >   # this does fetch tags, because we're storing the result according to
>> >   # the configured refspec ("refs/heads/*:refs/remotes/origin/*").
>> >   git fetch origin
>> 
>> The above two look good.
>> 
>> >   # this doesn't fetch tags, as the main command is "just" populating
>> >   # FETCH_HEAD. But then our logic for "hey, we fetched the ref for
>> >   # refs/remotes/origin/master, so let's update it on the side" kicks
>> >   # in. And we end up updating FETCH_HEAD _and_ the tracking branch, but
>> >   # not the tags. Weird.
>> >   git fetch origin master
>> 
>> Yes, it looks weird, but I suspect that it is probably more correct
>> not to fetch tags in this case.  I wonder if it would be a solution
>> not to do the "on the side" thing---after all the user didn't tell
>> us to update refs/remotes/origin/master with this command line.
>
> Isn't that how git fetch used to behave, or am I misunderstanding what
> you mean? It used to be that git fetch   would not
> update any remote tracking branches.
>
> From the 1.8.4 release notes:
>
>> "git fetch origin master" unlike "git fetch origin" or "git fetch"
>> did not update "refs/remotes/origin/master"; this was an early
>> design decision to keep the update of remote tracking branches
>> predictable, but in practice it turns out that people find it more
>> convenient to opportunistically update them whenever we have a
>> chance, and we have been updating them when we run "git push" which
>> already breaks the original "predictability" anyway.

No, you are not misunderstanding anything.  The "pretend that we
immediately turned around and fetched" done by "git push" was
already breaking the predictability, but the change in 1.8.4 made it
even worse.  I am saying that going back to the old behaviour may be
one option to address the issue being discussed in this thread.

Re: [Patch size_t V3 00/19] use size_t

2017-08-17 Thread Torsten Bögershausen

On Wed, Aug 16, 2017 at 10:16:12PM +0200, Martin Koegler wrote:
> From: Martin Koegler 
> 
> This patchset is for next [24db08a6e8fed761d3bace7f2d5997806e20b9f7].

I applied it succesfully, and run the test suite on a 32 bit system.
The Laptop reported one brekage in t0040:
t0040-parse-options.sh   not ok 19 - OPT_MAGNITUDE() 3giga

Beside some t5561-http-backend.sh (which are most probably not caused
by this patch.

The raspi had 2 deadlocks, like "git push --signed dst noop ff +noff"
or
trash directory.t5541-http-push-smart/gpghome --sign -u commit...@example.com
Even this most probably not caused by the patch. (and the test is still running)

Well done, and on which platforms did you test ?

Re: [Patch size_t V3 01/19] delta: fix enconding size larger than an "uint" can hold

2017-08-17 Thread Torsten Bögershausen

On Wed, Aug 16, 2017 at 10:16:13PM +0200, Martin Koegler wrote:
> From: Martin Koegler 
> 
> The current delta code produces incorrect pack objects for files > 4GB.

This may need a little bit more info (E.g. it does not fix anything on
a 32 bit Linux)

How about this:
The current delta code produces incorrect pack objects for files > 4GB
when "unsigned long" has 32 bits and size_t 64 bits.

Re: git fetch with refspec does not include tags?

2017-08-17 Thread Kevin Daudt

On Thu, Aug 17, 2017 at 12:38:58PM -0700, Junio C Hamano wrote:
> Jeff King  writes:
> 
> >   # no tags, we just populate FETCH_HEAD because of the bare URL
> >   git fetch ../parent
> >
> >   # this does fetch tags, because we're storing the result according to
> >   # the configured refspec ("refs/heads/*:refs/remotes/origin/*").
> >   git fetch origin
> 
> The above two look good.
> 
> >   # this doesn't fetch tags, as the main command is "just" populating
> >   # FETCH_HEAD. But then our logic for "hey, we fetched the ref for
> >   # refs/remotes/origin/master, so let's update it on the side" kicks
> >   # in. And we end up updating FETCH_HEAD _and_ the tracking branch, but
> >   # not the tags. Weird.
> >   git fetch origin master
> 
> Yes, it looks weird, but I suspect that it is probably more correct
> not to fetch tags in this case.  I wonder if it would be a solution
> not to do the "on the side" thing---after all the user didn't tell
> us to update refs/remotes/origin/master with this command line.

Isn't that how git fetch used to behave, or am I misunderstanding what
you mean? It used to be that git fetch   would not
update any remote tracking branches.

>From the 1.8.4 release notes:

> "git fetch origin master" unlike "git fetch origin" or "git fetch"
> did not update "refs/remotes/origin/master"; this was an early
> design decision to keep the update of remote tracking branches
> predictable, but in practice it turns out that people find it more
> convenient to opportunistically update them whenever we have a
> chance, and we have been updating them when we run "git push" which
> already breaks the original "predictability" anyway.

Re: [RFC PATCH] Updated "imported object" design

2017-08-17 Thread Ben Peart




On 8/15/2017 8:32 PM, Jonathan Tan wrote:

This patch is based on an updated version of my refactoring of
pack-related functions [1].

This corresponds to patch 1 of my previous series [2]. I'm sending this
patch out (before I update the rest) for 2 reasons:
  * to provide a demonstration of how the feature could be implemented,
in the hope of restarting the discussion
  * to obtain comments about this patch to see if I'm heading in the
right direction

In an earlier e-mail [3], I suggested that loose objects can also be
marked as ".imported" (formerly ".remote" - after looking at the code, I
decided to use "imported" throughout, since "remote" can be easily
confused as the opposite of "local", used to represent objects in the
local store as opposed to an alternate store).

However, I have only implemented the imported packed objects part -
imported loose objects can be added later.

It still remains to be discussed whether we should mark the imported
objects or the non-imported objects as the source of promises, but I
still think that we should mark the imported objects. In this way, the
new properties (the provision of promises and the mark) coincide on the
same object, and the same things (locally created objects, fetches from
non-lazy-object-serving remotes) behave in the same way regardless of
whether extensions.lazyObject is set (allowing, for example, a repo to
be converted into a promise-enabled one solely through modifying the
configuration).



This illustrates another place we need to resolve the naming/vocabulary. 
 We should at least be consistent to make it easier to discuss/explain. 
 We obviously went with "virtual" when building GVFS but I'm OK with 
"lazy" as long as we're consistent.  Some examples of how the naming can 
clarify or confuse:


'Promise-enable your repo by setting the "extensions.lazyObject" flag'

'Enable your repo to lazily fetch objects by setting the 
"extensions.lazyObject"'


'Virtualize your repo by setting the "extensions.virtualize" flag'

We may want to carry the same name into the filename we use to mark the 
(virtualized/lazy/promised/imported) objects.


(This reminds me that there are only 2 hard problems in computer 
science...) ;)



It is true that converting a normal repo to 
virtualized/lazy/promise-enabled repo wouldn't have to munge with the 
object marking but since this only happens once, I wouldn't make that 
primary decision maker (and switching back would require munging the 
objects anyway).  I think we can make either work (ie tagging local vs 
non-local/remote/imported/promised/lazy/virtual).


I'm fine leaving "how" the objects are marked as an implementation 
detail - the design requires them to be marked, the person writing the 
code can figure out the fastest way to accomplish that.


In the current patch/design, the access time to check for the existence 
of an "imported" file is going to be dwarfed by the cost of opening and 
parsing every object (loose and pack) to get its oid so that isn't 
really an issue.



+/*
+ * Objects that are believed to be loadable by the lazy loader, because
+ * they are referred to by an imported object. If an object that we have
+ * refers to such an object even though we don't have that object, it is
+ * not an error.
+ */
+static struct oidset promises;
+static int promises_prepared;
+
+static int add_promise(const struct object_id *oid, struct packed_git *pack,
+  uint32_t pos, void *data)
+{
+   struct object *obj = parse_object(oid);
+   if (!obj)
+   /*
+* Error messages are given when packs are verified, so
+* do not print any here.
+*/
+   return 0;
+   
+   /*
+* If this is a tree, commit, or tag, the objects it refers
+* to are promises. (Blobs refer to no objects.)
+*/
+   if (obj->type == OBJ_TREE) {
+   struct tree *tree = (struct tree *) obj;
+   struct tree_desc desc;
+   struct name_entry entry;
+   if (init_tree_desc_gently(, tree->buffer, tree->size))
+   /*
+* Error messages are given when packs are
+* verified, so do not print any here.
+*/
+   return 0;
+   while (tree_entry_gently(, ))
+   oidset_insert(, entry.oid);
+   } else if (obj->type == OBJ_COMMIT) {
+   struct commit *commit = (struct commit *) obj;
+   struct commit_list *parents = commit->parents;
+
+   oidset_insert(, >tree->object.oid);
+   for (; parents; parents = parents->next)
+   oidset_insert(, >item->object.oid);
+   } else if (obj->type == OBJ_TAG) {
+   struct tag *tag = (struct tag *) obj;
+   oidset_insert(, >tagged->oid);
+   }
+   return 0;
+}
+
+static int

Re: [PATCH v4 2/3] builtin/branch: stop supporting the use of --set-upstream option

2017-08-17 Thread Junio C Hamano

Kaartic Sivaraam  writes:

> diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
> index 81bd0a7b7..948d9c9ef 100644
> --- a/Documentation/git-branch.txt
> +++ b/Documentation/git-branch.txt
> @@ -195,10 +195,10 @@ start-point is either a local or remote-tracking branch.
>   branch.autoSetupMerge configuration variable is true.
>  
>  --set-upstream::
> - If specified branch does not exist yet or if `--force` has been
> - given, acts exactly like `--track`. Otherwise sets up configuration
> - like `--track` would when creating the branch, except that where
> - branch points to is not changed.
> + As this option had confusing syntax it's no longer supported. Please use
> + --track or --set-upstream-to instead.
> ++
> +Note: This could possibly become an alias of --set-upstream-to in the future.

I'll tweak `--track` and `--set-upstream-to` in the updated text
and remove the 'Note:' thing that does not give any useful
information to the end users while queuing (no need to resend). 

Thanks.

Re: [PATCH v4 2/3] builtin/branch: stop supporting the use of --set-upstream option

2017-08-17 Thread Junio C Hamano

Martin Ågren  writes:

> On 17 August 2017 at 04:54, Kaartic Sivaraam
>  wrote:
>> Helped-by: Martin Ågren ,  Junio C Hamano 
>> 
>> Signed-off-by: Kaartic Sivaraam 
>
> I didn't expect a "Helped-by", all I did was to give some random
> comments. :-) I'm not so sure about the comma-separation, that seems to
> be a first in the project.

I didn't either ;-) 

The line looks odd so I'll remove it while queuing.

Thanks for noticing.

ignoring extra bitmap file?

2017-08-17 Thread Andreas Krey

Hi everyone,

I'm seeing the message

   remote: warning: ignoring extra bitmap file: 
./objects/pack/pack-2943dc24pack

and indeed, there is such a thing (two, actually):

   171736188 Aug 17 08:20 pack-2943dc2477026f87b280ebcefa93fe28412688df.idx
12662268 Aug 17 08:24 pack-2943dc2477026f87b280ebcefa93fe28412688df.bitmap
 12927989355 Aug 17 08:27 pack-2943dc2477026f87b280ebcefa93fe28412688df.pack
   164857412 Aug 17 08:33 pack-8b4a42ca7aa2aca6f354292007910de1110117b2.idx
13164932 Aug 17 08:49 pack-8b4a42ca7aa2aca6f354292007910de1110117b2.bitmap
  281872 Aug 17 09:40 pack-bddb40f984124ba8c2a4e5c55b0d1b2804fd5817.pack
   13280 Aug 17 09:40 pack-bddb40f984124ba8c2a4e5c55b0d1b2804fd5817.idx
7904 Aug 17 15:51 pack-0f8b1478e17174c562d9a52cf577e0e050bdb7c5.idx
 2373948 Aug 17 16:09 pack-23253e17510cacaae3bb38fb5429073b3bc59480.pack
6980 Aug 17 16:09 pack-23253e17510cacaae3bb38fb5429073b3bc59480.idx
  144158 Aug 17 17:03 pack-0f8b1478e17174c562d9a52cf577e0e050bdb7c5.pack
 12927996484 Aug 17 19:19 pack-8b4a42ca7aa2aca6f354292007910de1110117b2.pack
  153332 Aug 17 20:17 pack-65ff13a10c29a6c1604017c50dc9a320044ee605.pack
   14036 Aug 17 20:17 pack-65ff13a10c29a6c1604017c50dc9a320044ee605.idx

But it looks like something went wrong in that repack cycle (that
pack-2943dc247702 is the full repo), and it won't get removed later
in the next repack in the evening.

Question: Can I safely remove the .bitmap file, and repack will then
clean up the .pack and .idx files as will?

(This is still that repo in bitbucket (latest 4.x) server
with git 2.6.2, now with cg.auto=0.)

- Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds 
Date: Fri, 22 Jan 2010 07:29:21 -0800

Re: git fetch with refspec does not include tags?

2017-08-17 Thread Junio C Hamano

Jeff King  writes:

> I think it's a bit more complex because "git pull" uses "git fetch"
> under the hood. In fact, your "git fetch origin master" is exactly what
> gets run when you do:
>
>   git pull origin master
>
> That's maybe OK. But I think one-off pulls like:
>
>   git pull https://example.com/repo.git master
>
> probably wouldn't want it. I'd have to give it some thought.

I agree with both.  If you have named remote, you presumably are
keeping copies of their branches as remote-tracking branches, and it
may be fine to follow tags.  An explicit URL used for one-off should
not grab anything but the named thing, I would think.

Re: git fetch with refspec does not include tags?

2017-08-17 Thread Junio C Hamano

Jeff King  writes:

>   # no tags, we just populate FETCH_HEAD because of the bare URL
>   git fetch ../parent
>
>   # this does fetch tags, because we're storing the result according to
>   # the configured refspec ("refs/heads/*:refs/remotes/origin/*").
>   git fetch origin

The above two look good.

>   # this doesn't fetch tags, as the main command is "just" populating
>   # FETCH_HEAD. But then our logic for "hey, we fetched the ref for
>   # refs/remotes/origin/master, so let's update it on the side" kicks
>   # in. And we end up updating FETCH_HEAD _and_ the tracking branch, but
>   # not the tags. Weird.
>   git fetch origin master

Yes, it looks weird, but I suspect that it is probably more correct
not to fetch tags in this case.  I wonder if it would be a solution
not to do the "on the side" thing---after all the user didn't tell
us to update refs/remotes/origin/master with this command line.

Then not following tags will be in line with all of the above
reasoning above.

>   # and this one does fetch tags, because we have a real destination.
>   git fetch origin master:foo

Yup, that is expected.

> So what I'd say is:
>
>   1. Definitely these defaults are under-documented. I couldn't find
>  them anywhere in git-fetch(1).

Yup.

>   2. If we continue to follow the "are we storing any refs" rule for the
>  default, possibly it should expand to "did we store anything,
>  including opportunistic tracking-ref updates".

That also is a workable way to make things consistent.

Re: git add -p breaks after split on change at the top of the file

2017-08-17 Thread Junio C Hamano

Jeff King  writes:

> Of course this is a pretty ridiculous input in the first place. In
> theory it _could_ be figured out, but overlapping hunks like this are
> always going to cause problems (in this case, context from the second
> hunk became a removal, and the second hunk no longer applies).

To be honest, I do not think it could be figured out by "git apply",
but "git add -p" _should_ be able to.  I've said already this
earlier in the discussion:

  https://public-inbox.org/git/7vab5cn7wr@alter.siamese.dyndns.org/

"add -p" knows how the hunk _before_ it gave the user to edit looked
like, and it knows how the hunk after the user edited looks like, so
it may have enough information to figure it out.  But the "(e)dit"
feature was done in a way to discard that information and forced an
impossible "guess what the correct shape of the patch would be, out
of this broken patch input" with --recount and --allow-overlap.  

If we want to make "add -p/(e)dit" work in the corner cases and make
it trustworthy, "apply" is a wrong tree to back at.  A major part of
the effort to fix would go to "add -p", not to "apply".

Re: git add -p breaks after split on change at the top of the file

2017-08-17 Thread Junio C Hamano

Jeff King  writes:

> [+cc Junio, as this gets deep into git-apply innards]

I've written off --recount and --allow-overlap as ugly workaround
that happens to work some of the time but cannot be trusted long
time ago.

IIRC, before the "(e)dit" thing was added to "add -p", we counted
the line numbers correctly and merged the adjacent hunks before
applying and neither of these two kluge was necessary.

These threads may give us a bit more background:

  https://public-inbox.org/git/7viqk1ndlk@alter.siamese.dyndns.org/
  
https://public-inbox.org/git/1304117373-592-1-git-send-email-gits...@pobox.com/

The original that introduced the "(e)dit" thing was in this thread:

  https://public-inbox.org/git/200805232221.45406.tr...@student.ethz.ch/

As you can see, I was very much against it, as it cannot
fundamentally sanely implemented (which I think is the same
conclusion you reached at the end of the current thread).

I think there should be a better failure mode, though.

Re: [PATCH] add test for bug in git-mv with nested submodules

2017-08-17 Thread Stefan Beller

On Thu, Aug 17, 2017 at 3:34 AM, Heiko Voigt  wrote:
> When using git-mv with a submodule it will detect that and update the
> paths for its configurations (.gitmodules, worktree and gitfile). This
> does not work for nested submodules where a user renames the root
> submodule.
>
> We discovered this fact when working on on-demand fetch for renamed
> submodules. Lets add a test to document.
>
> Signed-off-by: Heiko Voigt 
> ---
>  t/t7001-mv.sh | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
> index e365d1f..39f8aed 100755
> --- a/t/t7001-mv.sh
> +++ b/t/t7001-mv.sh
> @@ -491,4 +491,13 @@ test_expect_success 'moving a submodule in nested 
> directories' '
> test_cmp actual expect
>  '
>
> +test_expect_failure 'moving nested submodules' '
> +   git commit -am "cleanup commit" &&
> +   git submodule add ./. sub_nested &&

If possible, I would avoid adding the repo itself
as a submodule as it is unrealistic in the wild.

While it may be ok for the test here, later down the road
other tests making use of it it may become an issue with
the URL of the submodule.

> +   git commit -m "add sub_nested" &&
> +   git submodule update --init --recursive &&
> +   git mv sub_nested sub_nested_moved &&
> +   git status
> +'
> +
>  test_done
> --
> 2.0.0.274.g6b2cd91
>

Re: reftable [v7]: new ref storage format

2017-08-17 Thread Junio C Hamano

Michael Haggerty  writes:

> On Wed, Aug 16, 2017 at 11:05 PM, Junio C Hamano  wrote:
>> I found it a slightly odd that we do not insist that update_indices
>> that appear in a single reftable file are consecutive, yet we
>> require that min_update_index of a reftable file must be one greater
>> than the max_update_index of a previous one.  That is not a new
>> issue in v7, though.
>
> I think of `update_index` like a pseudo-time, and the
> `min_update_index` and `max_update_index` to be stating that "this
> reftable covers the time interval specified". So it's reasonable to
> say that the reftable files, together, should cover all time.
>
> But it might be that there are values of `update_index` for which no
> events survived within a reftable file that covers that time interval.
> This can happen if reference update records have been compacted away
> because later reference updates overwrote their effects, and either
>
> * reflogs were turned off for those updates, or
> * the corresponding reflogs have been compacted into a separate file, or
> * the corresponding reflog entries for those updates have been expired.

Yeah, and I think it is reasonable that the specification does not
dictate that indices within a single reftable must be consecutive.

And if update_indices within a single reftable are allowed to be
sparse, e.g. recording three transactions with indices 1 3 and 5, it
is not immediately obvious to me why the transactions that are
recorded in the next reftable cannot be with indices 7 8 and 10,
leaving a gap between the max in the first table (i.e. 5) and the
min in the second table (i.e. 7).  That is what I found slightly
odd.

Re: [PATCH v4 2/3] builtin/branch: stop supporting the use of --set-upstream option

2017-08-17 Thread Martin Ågren

On 17 August 2017 at 04:54, Kaartic Sivaraam
 wrote:
> Helped-by: Martin Ågren ,  Junio C Hamano 
> 
> Signed-off-by: Kaartic Sivaraam 

I didn't expect a "Helped-by", all I did was to give some random
comments. :-) I'm not so sure about the comma-separation, that seems to
be a first in the project.

> *  The option has not yet been removed from the synopsis of the 
> documentation and I think
>we can't remove it from the 'Synopsis' porion of the documentation as 
> it doesn't make
>sense (at least to me) to give a description of an option not listed 
> in the synopsis.

The "git interpret-trailers --parse" thread nearby is adding some
options without mentioning them in the synopsis [1], and those options
can actually be useful, whereas "--set-upstream" only results in a fatal
error. So I don't know.

>Moreover, we have to state the reason for not supporting it in some 
> place.
>
>  I guess the phrase 'no longer supported' is equally communicative. Let me 
> know if that was not
>  a right decision.

I think it's ok. Of course, I know exactly what you want to say, and
why, so I'm biased. :-)

[1] 
https://public-inbox.org/git/20170815102334.qc4w7akl44bti...@sigill.intra.peff.net/

Re: [PATCH v4 5/8] interpret-trailers: add --parse convenience option

2017-08-17 Thread Martin Ågren

On 16 August 2017 at 10:20, Jeff King  wrote:
> On Tue, Aug 15, 2017 at 01:26:53PM +0200, Martin Ågren wrote:
>
>> >  This command reads some patches or commit messages from either the
>> > - arguments or the standard input if no  is specified. Then
>> > -this command applies the arguments passed using the `--trailer`
>> > -option, if any, to the commit message part of each input file. The
>> > -result is emitted on the standard output.
>> > + arguments or the standard input if no  is specified. If
>> > +`--parse` is specified, the output consists of the parsed trailers.
>> > +
>> > +Otherwise, the this command applies the arguments passed using the
>> > +`--trailer` option, if any, to the commit message part of each input
>> > +file. The result is emitted on the standard output.
>>
>> "the this"
>
> Thanks.
>
>> I think I get why you use --parse above (and in the synopsis), although
>> it kind of feels like it should be --only-input or perhaps "--only-input
>> (or --parse)".
>
> I really wanted to point people to --parse as the go-to option for the
> parsing mode for the sake of simplicity (in fact I initially considered
> not even exposing them at all). And I hoped that if they jumped to the
> definition of --parse, that would lead them to the other options.
>
> I dunno. I agree it is does not read particularly well. Probably the
> whole description section could be rewritten to cover the newly dual
> nature of the command a bit better. But I didn't want to disrupt the
> existing flow too much.

Certainly. It's not like I can offer a "better" way. After thinking
about this some more, I think this is a good example of "less is more".
It's possibly less precise or complete in some sense, but it's probably
much more useful and readable.

Re: [Patch size_t V3 17/19] Convert ref-filter to size_t

2017-08-17 Thread Junio C Hamano

Junio C Hamano  writes:

> I did not see anything fishy in the remaining parts I did not
> comment on.
>
> Thanks.

What I meant was "remainder of this patch 17/19".  I am not claiming
that I read all other patches; I haven't, not yet anyway.

Re: [Patch size_t V3 17/19] Convert ref-filter to size_t

2017-08-17 Thread Junio C Hamano

Martin Koegler  writes:

> -static char *copy_subject(const char *buf, unsigned long len)
> +static char *copy_subject(const char *buf, size_t len)
>  {
>   char *r = xmemdupz(buf, len);
>   int i;

This has the same "we are still iterating with 'int i' over an array
that can be 'size_t len' long" issue as I pointed out in 19/19.  The
remedy is similar, i.e.

static char *copy_subject(const char *buf, size_t len)
{
char *ret = xmemdupz(buf, len);
char *cp;

for (cp = ret; cp < ret + len; cp++)
if (*cp == '\n')
*cp = ' ';
return ret;
}

which should come either before the whole series or after the series
settles, not inside this series.

I do also wonder if we want to do more for other control characters
(e.g. CR, HT, etc.) in this function, though.  That is totally outside
the scope of this series and should be done independently.

> @@ -898,7 +898,7 @@ static void grab_date(const char *buf, struct atom_value 
> *v, const char *atomnam
>  }
>  
>  /* See grab_values */
> -static void grab_person(const char *who, struct atom_value *val, int deref, 
> struct object *obj, void *buf, unsigned long sz)
> +static void grab_person(const char *who, struct atom_value *val, int deref, 
> struct object *obj, void *buf, size_t sz)
>  {
>   int i;
>   int wholen = strlen(who);

Shouldn't this also become size_t, I wonder?

I did not see anything fishy in the remaining parts I did not
comment on.

Thanks.

Re: [PATCH] files-backend: cheapen refname_available check when locking refs

2017-08-17 Thread Brandon Williams

On 08/17, Jeff King wrote:
> On Thu, Aug 17, 2017 at 05:12:50PM +0200, Michael Haggerty wrote:
> 
> > I was testing this using the reporter's recipe (but fetching from a
> > local clone), and found the following surprising timing numbers:
> > 
> > b05855b5bc (before the slowdown): 22.7 s
> > 524a9fdb51 (immediately after the slowdown): 13 minutes
> > 4e81f1ecf1 (after this fix): 14.5 s
> > 
> > The fact that the fetch is now significantly *faster* than before the
> > slowdown seems not to have anything to do with the reference code.
> 
> I bisected this (with some hackery, since the commits in the middle all
> take 13 minutes to run). The other speedup is indeed unrelated, and is
> due to Brandon's aacc5c1a81 (submodule: refactor logic to determine
> changed submodules, 2017-05-01).
> 
> The commit message doesn't mention performance (it's mostly about code
> reduction). I think the speedup comes from using
> diff_tree_combined_merge() instead of manually diffing each commit
> against its parents. But I didn't do further timings to verify that (I'm
> reporting it here mostly as an interesting curiosity for submodule
> folks).

Haha always great to see an unintended improvement in performance!  Yeah
that commit was mostly about removing duplicate code but I'm glad that
it ended up being a benefit to perf too.

> 
> > diff --git a/refs/files-backend.c b/refs/files-backend.c
> > index e9b95592b6..f2a420c611 100644
> > --- a/refs/files-backend.c
> > +++ b/refs/files-backend.c
> > @@ -631,11 +631,11 @@ static int lock_raw_ref(struct files_ref_store *refs,
> >  
> > /*
> >  * If the ref did not exist and we are creating it,
> > -* make sure there is no existing ref that conflicts
> > -* with refname:
> > +* make sure there is no existing packed ref that
> > +* conflicts with refname:
> >  */
> > if (refs_verify_refname_available(
> > -   >base, refname,
> > +   refs->packed_ref_store, refname,
> > extras, skip, err))
> > goto error_return;
> > }
> 
> This seems too easy to be true. :) But I think it matches what we were
> doing before 524a9fdb51 (so it's correct), and the performance numbers
> don't lie.
> 
> -Peff

-- 
Brandon Williams

Re: [Patch size_t V3 18/19] Convert tree-walk to size_t

2017-08-17 Thread Junio C Hamano

Martin Koegler  writes:

> From: Martin Koegler 
>
> Signed-off-by: Martin Koegler 
> ---
>  tree-walk.c | 17 +
>  tree-walk.h |  4 ++--
>  tree.h  |  2 +-
>  3 files changed, 12 insertions(+), 11 deletions(-)

In this one, I did not find anything suspicious.

> diff --git a/tree-walk.c b/tree-walk.c
> index 7c9f9e3..a7d8b2a 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -22,10 +22,11 @@ static const char *get_mode(const char *str, unsigned int 
> *modep)
>   return str;
>  }
>  
> -static int decode_tree_entry(struct tree_desc *desc, const char *buf, 
> unsigned long size, struct strbuf *err)
> +static int decode_tree_entry(struct tree_desc *desc, const char *buf, size_t 
> size, struct strbuf *err)
>  {
>   const char *path;
> - unsigned int mode, len;
> + unsigned int mode;
> + size_t len;
>  
>   if (size < 23 || buf[size - 21]) {
>   strbuf_addstr(err, _("too-short tree object"));

The original was especially bad around here, as it didn't even use
ulong.  The conversion makes sense.

Thanks.

Re: [Patch size_t V3 19/19] Convert xdiff-interface to size_t

2017-08-17 Thread Junio C Hamano

Martin Koegler  writes:

> diff --git a/combine-diff.c b/combine-diff.c
> index acf39ec..ad5d177 100644
> --- a/combine-diff.c
> +++ b/combine-diff.c
> @@ -343,7 +343,7 @@ struct combine_diff_state {
>   struct sline *lost_bucket;
>  };
>  
> -static void consume_line(void *state_, char *line, unsigned long len)
> +static void consume_line(void *state_, char *line, size_t len)
>  {
>   struct combine_diff_state *state = state_;
>   if (5 < len && !memcmp("@@ -", line, 4)) {

This is a correct logical consequence of making consume_fn to take
size_t.

> diff --git a/diff.c b/diff.c
> index c12d062..f665f8d 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -406,7 +406,7 @@ static struct diff_tempfile {
>   struct tempfile tempfile;
>  } diff_temp[2];
>  
> -typedef unsigned long (*sane_truncate_fn)(char *line, unsigned long len);
> +typedef size_t (*sane_truncate_fn)(char *line, size_t len);

It turns out that this type, the member of this type in ecb struct, and
a conditional call to the function when it is not NULL, are all unused.
If this were used, this conversion is correct ;-)

> @@ -4536,7 +4536,7 @@ struct patch_id_t {
>   int patchlen;
>  };
>  
> -static int remove_space(char *line, int len)
> +static size_t remove_space(char *line, size_t len)
>  {
>   int i;
>   char *dst = line;

This function may also want to be rewritten.  The loop counter and
array index "i" is used this way:

for (i = 0; i < len i++)
if (!isspace((c = line[i]))
*dst++ = c;
return dst - line;

So you are still risking data loss (later parts of line[] may not be
scanned due to int possibly being narrower than size_t).

Turning "len" and the return type into size_t is absolutely the
right thing to do, and it is tempting to turn "i" also into size_t,
but we may just want to scan the thing with another pointer, i.e.

size_t remove_space(char *line, size_t len)
{
char *src = line;
char *dst = line;

while (src < line + len) {
char c = *src++;
if (!isspace(c))
*dst++ = c;
}
return dst - line;
}

Changes to size_t from types other than "ulong" is worth looking at
twice for a potential issue like this.

> diff --git a/xdiff-interface.c b/xdiff-interface.c
> index d82cd4a..f12285d 100644
> --- a/xdiff-interface.c
> +++ b/xdiff-interface.c
> @@ -26,7 +26,7 @@ static int parse_num(char **cp_p, int *num_p)
>   return 0;
>  }
>  
> -int parse_hunk_header(char *line, int len,
> +int parse_hunk_header(char *line, size_t len,
> int *ob, int *on,
> int *nb, int *nn)
>  {

This is a correct conversion for the purpose of this series, but the
implementation of this function before (and after) this patch is
already incorrect.  It does not pay any attention to "len", and
neither the helper function this function uses, parse_num(), is
told how many bytes are remaining on the line to be parsed, so it
will happily go beyond "len".

The two things we may want to fix is obviously outside the scope of
this series, but they should be fixed (not necessarily by you) after
this series once the codebase solidifies, or before this series as
preliminary clean-up.

Thanks.

Re: [RFC PATCH 2/2] submodule: simplify decision tree whether to or not to fetch

2017-08-17 Thread Brandon Williams

On 08/17, Stefan Beller wrote:
> On Thu, Aug 17, 2017 at 4:00 AM, Heiko Voigt  wrote:
> > To make extending this logic later easier.
> >
> > Signed-off-by: Heiko Voigt 
> > ---
> > I am quite sure I replicated the same logic but a few more eyes would be
> > appreciated.
> 
> A code cleanup is appreciated!
> 
> I thought Brandon had a series in flight doing a very similar cleanup here,
> but in master..pu there is nothing to be found.

Yeah there are 2 series in flight which will probably conflict here.
bw/grep-recurse-submodules and bw/submodule-config-cleanup

> 
> > Cheers Heiko
> 
> The code looks good to me.
> 
> Cheers!
> Stefan
> 
> >
> >  submodule.c | 55 +++
> >  1 file changed, 27 insertions(+), 28 deletions(-)
> >
> > diff --git a/submodule.c b/submodule.c
> > index 3ed78ac..a1011f4 100644
> > --- a/submodule.c
> > +++ b/submodule.c
> > @@ -1171,6 +1171,21 @@ int submodule_touches_in_range(struct object_id 
> > *excl_oid,
> > return ret;
> >  }
> >
> > +static int get_fetch_recurse_config(const struct submodule *submodule, int 
> > command_line_option)
> > +{
> > +   if (command_line_option != RECURSE_SUBMODULES_DEFAULT)
> > +   return command_line_option;
> > +
> > +   if (submodule && submodule->fetch_recurse != 
> > RECURSE_SUBMODULES_NONE)
> > +   /* local config overrules everything except commandline */
> > +   return submodule->fetch_recurse;
> > +
> > +   if (gitmodules_is_unmerged)
> > +   return RECURSE_SUBMODULES_OFF;
> > +
> > +   return config_fetch_recurse_submodules;
> > +}
> > +
> >  struct submodule_parallel_fetch {
> > int count;
> > struct argv_array args;
> > @@ -1203,37 +1218,21 @@ static int get_next_submodule(struct child_process 
> > *cp,
> > if (!submodule)
> > submodule = submodule_from_name(_oid, 
> > ce->name);
> >
> > -   default_argv = "yes";
> > -   if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) 
> > {
> > -   if (submodule &&
> > -   submodule->fetch_recurse !=
> > -   RECURSE_SUBMODULES_NONE) {
> > -   if (submodule->fetch_recurse ==
> > -   RECURSE_SUBMODULES_OFF)
> > -   continue;
> > -   if (submodule->fetch_recurse ==
> > -   
> > RECURSE_SUBMODULES_ON_DEMAND) {
> > -   if 
> > (!unsorted_string_list_lookup(_submodule_names,
> > -
> > submodule->name))
> > -   continue;
> > -   default_argv = "on-demand";
> > -   }
> > -   } else {
> > -   if ((config_fetch_recurse_submodules == 
> > RECURSE_SUBMODULES_OFF) ||
> > -   gitmodules_is_unmerged)
> > -   continue;
> > -   if (config_fetch_recurse_submodules == 
> > RECURSE_SUBMODULES_ON_DEMAND) {
> > -   if 
> > (!unsorted_string_list_lookup(_submodule_names,
> > -
> > submodule->name))
> > -   continue;
> > -   default_argv = "on-demand";
> > -   }
> > -   }
> > -   } else if (spf->command_line_option == 
> > RECURSE_SUBMODULES_ON_DEMAND) {
> > -   if 
> > (!unsorted_string_list_lookup(_submodule_names,
> > +   switch (get_fetch_recurse_config(submodule, 
> > spf->command_line_option))
> > +   {
> > +   default:
> > +   case RECURSE_SUBMODULES_DEFAULT:
> > +   case RECURSE_SUBMODULES_ON_DEMAND:
> > +   if (!submodule || 
> > !unsorted_string_list_lookup(_submodule_names,
> >  submodule->name))
> > continue;
> > default_argv = "on-demand";
> > +   break;
> > +   case RECURSE_SUBMODULES_ON:
> > +   default_argv = "yes";
> > +   break;
> > +   case RECURSE_SUBMODULES_OFF:
> > +   continue;
> > }
> >
> > strbuf_addf(_path, "%s/%s", spf->work_tree, 
> > ce->name);
> > --
> > 2.0.0.274.g6b2cd91
> >

-- 
Brandon Williams

Re: [PATCH] t5526: fix some broken && chains

2017-08-17 Thread Stefan Beller

On Thu, Aug 17, 2017 at 3:36 AM, Heiko Voigt  wrote:
> Signed-off-by: Heiko Voigt 

Reviewed-by: Stefan Beller 

Thanks,
Stefan

> ---
>  t/t5526-fetch-submodules.sh | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index ce788e9..22a7358 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -193,7 +193,7 @@ test_expect_success "recurseSubmodules=true propagates 
> into submodules" '
> add_upstream_commit &&
> (
> cd downstream &&
> -   git config fetch.recurseSubmodules true
> +   git config fetch.recurseSubmodules true &&
> git fetch >../actual.out 2>../actual.err
> ) &&
> test_must_be_empty actual.out &&
> @@ -218,7 +218,7 @@ test_expect_success "--no-recurse-submodules overrides 
> config setting" '
> add_upstream_commit &&
> (
> cd downstream &&
> -   git config fetch.recurseSubmodules true
> +   git config fetch.recurseSubmodules true &&
> git fetch --no-recurse-submodules >../actual.out 
> 2>../actual.err
> ) &&
> ! test -s actual.out &&
> @@ -232,7 +232,7 @@ test_expect_success "Recursion doesn't happen when no new 
> commits are fetched in
> cd submodule &&
> git config --unset fetch.recurseSubmodules
> ) &&
> -   git config --unset fetch.recurseSubmodules
> +   git config --unset fetch.recurseSubmodules &&
> git fetch >../actual.out 2>../actual.err
> ) &&
> ! test -s actual.out &&
> @@ -312,7 +312,7 @@ test_expect_success "Recursion picks up all submodules 
> when necessary" '
> ) &&
> head1=$(git rev-parse --short HEAD^) &&
> git add subdir/deepsubmodule &&
> -   git commit -m "new deepsubmodule"
> +   git commit -m "new deepsubmodule" &&
> head2=$(git rev-parse --short HEAD) &&
> echo "Fetching submodule submodule" > ../expect.err.sub &&
> echo "From $pwd/submodule" >> ../expect.err.sub &&
> --
> 2.0.0.274.g6b2cd91
>

Re: [PATCH] diff: retire sane_truncate_fn

2017-08-17 Thread Stefan Beller

On Thu, Aug 17, 2017 at 10:27 AM, Junio C Hamano  wrote:
> Long time ago, 23707811 ("diff: do not chomp hunk-header in the
> middle of a character", 2008-01-02) introduced sane_truncate_line()
> helper function to trim the "function header" line that is shown at
> the end of the hunk header line, in order to avoid chomping it in
> the middle of a single UTF-8 character.  It also added a facility to
> define a custom callback function to make it possible to extend it
> to non UTF-8 encodings.
>
> During the following 8 1/2 years, nobody found need for this custom
> callback facility.
>
> A custom callback function is a wrong design to use here anyway---if
> your contents need support for non UTF-8 encoding, you shouldn't
> have to write a custom function and recompile Git to plumb it in.  A
> better approach would be to extend sane_truncate_line() function and
> have a new member in emit_callback to conditionally trigger it.
>
> Signed-off-by: Junio C Hamano 

This patch is
Reviewed-by: Stefan Beller 

However while strolling around in code nearby, I do
wonder if sane_truncate_line needs to make use
of the return value of utf8_width. But that is not the case
as we're interested in the byte length, not the print length.

Thanks
Stefan

Re: Question about git gc on large text files

2017-08-17 Thread Kai Zhang


> On Aug 17, 2017, at 10:33 AM, Jeff King  wrote:
> 
> On Thu, Aug 17, 2017 at 10:28:00AM -0700, Kai Zhang wrote:
> 
>> I have a git repository maintaining one large json file (along with
>> several other small files). With commits for large json file, the
>> repository become bigger and bigger, so I tried to run command "git gc
>> --prune=now --aggressive" to reduce disk usage, then I found .git
>> folder size did not change. I had wonderful experience with git gc
>> against files around 10M to 20M, and I am wondering if there is any
>> configuration need to tweak for large text files?
>> 
>> Here I have more details:
>> 1. Json file size: 1G
> 
> Git won't try to delta-compress anything over 500MB by default. Try:
> 
>  git config core.bigfilethreshold 2G
>  git gc --aggressive
> 
> -Peff

It works! Thank you so much!

Kai

Re: Question about git gc on large text files

2017-08-17 Thread Jeff King

On Thu, Aug 17, 2017 at 10:28:00AM -0700, Kai Zhang wrote:

> I have a git repository maintaining one large json file (along with
> several other small files). With commits for large json file, the
> repository become bigger and bigger, so I tried to run command "git gc
> --prune=now --aggressive" to reduce disk usage, then I found .git
> folder size did not change. I had wonderful experience with git gc
> against files around 10M to 20M, and I am wondering if there is any
> configuration need to tweak for large text files?
> 
> Here I have more details:
> 1. Json file size: 1G

Git won't try to delta-compress anything over 500MB by default. Try:

  git config core.bigfilethreshold 2G
  git gc --aggressive

-Peff

Question about git gc on large text files

2017-08-17 Thread Kai Zhang

Hi

I have a git repository maintaining one large json file (along with several 
other small files). With commits for large json file, the repository become 
bigger and bigger, so I tried to run command "git gc --prune=now --aggressive" 
to reduce disk usage, then I found .git folder size did not change. I had 
wonderful experience with git gc against files around 10M to 20M, and I am 
wondering if there is any configuration need to tweak for large text files?

Here I have more details:
1. Json file size: 1G
2. .git folder size before gc: 2G, after gc: 2G
3. Pack folder content before gc:
pack$ du -sh *
1.5Kpack-087e186e44b4f3dfdbb9479c457f3f4e6cef1c58.idx
146Mpack-087e186e44b4f3dfdbb9479c457f3f4e6cef1c58.pack
1.5Kpack-26c3bc5ee1aee6ab5355cda205a7a071bd464950.idx
146Mpack-26c3bc5ee1aee6ab5355cda205a7a071bd464950.pack
1.5Kpack-2ce66eb23ee42a9ac080fc31d1d476c2f27a8ee7.idx
146Mpack-2ce66eb23ee42a9ac080fc31d1d476c2f27a8ee7.pack
1.5Kpack-523304e1b27f476d1ec8c92356af38762880077c.idx
146Mpack-523304e1b27f476d1ec8c92356af38762880077c.pack
1.5Kpack-53a474a44e2bbc5406c6ac9e2deabdd77fd7c129.idx
146Mpack-53a474a44e2bbc5406c6ac9e2deabdd77fd7c129.pack
1.5Kpack-5b4cc71e4a675cdf72e837d1928af6ae2a7b8d44.idx
146Mpack-5b4cc71e4a675cdf72e837d1928af6ae2a7b8d44.pack
1.5Kpack-84d95c86aca367915fb26a227b32ea08f03a48d5.idx
146Mpack-84d95c86aca367915fb26a227b32ea08f03a48d5.pack
1.5Kpack-85f1f587416cbbe7613496a7efcd0cd01ad262c5.idx
146Mpack-85f1f587416cbbe7613496a7efcd0cd01ad262c5.pack
1.5Kpack-9522c7ced56ec8a379fa79ab8209cc6e8052a75c.idx
146Mpack-9522c7ced56ec8a379fa79ab8209cc6e8052a75c.pack
1.5Kpack-d7e517de6a72557cb68a7a406f1b526698b8f5d9.idx
146Mpack-d7e517de6a72557cb68a7a406f1b526698b8f5d9.pack
1.5Kpack-dd87e41420828d24edaf886ba2a11a49c76fbc4c.idx
146Mpack-dd87e41420828d24edaf886ba2a11a49c76fbc4c.pack
1.5Kpack-e0b47f03d74919a5ca6f8c40b04d49886e2a767a.idx
146Mpack-e0b47f03d74919a5ca6f8c40b04d49886e2a767a.pack
1.5Kpack-ee5deaeb15ae6d713d99059a7361fa51ae21a173.idx
146Mpack-ee5deaeb15ae6d713d99059a7361fa51ae21a173.pack
1.5Kpack-f727952b8f8928a875e35f34549d3f9ebe5f0740.idx
146Mpack-f727952b8f8928a875e35f34549d3f9ebe5f0740.pack

4. Pack folder content after gc:
du -sh *
6.0Kpack-2a8326dbe94de5c62c51b617397ef80729fde92f.idx
2.0Gpack-2a8326dbe94de5c62c51b617397ef80729fde92f.pack

5. Output for git verify-pack after gc:
git verify-pack -v 
repo/.git/objects/pack/pack-2a8326dbe94de5c62c51b617397ef80729fde92f.idx
z73c2ff9093fb1daf96364d49c8d2d025b1da6cf5 commit 245 147 12
eca81268167f362e69136df4f098e83bd49f7c3b commit 83 87 159 1 
73c2ff9093fb1daf96364d49c8d2d025b1da6cf5
b9546502b6a77dc75bb25fc249606880285a1082 commit 80 89 246 1 
73c2ff9093fb1daf96364d49c8d2d025b1da6cf5
dd500e7a39df13dd08eee1ab45d537a14764046c commit 82 90 335 1 
73c2ff9093fb1daf96364d49c8d2d025b1da6cf5
018c0ebbbcd53fa1575b923cfc953a9bf0c6b9e6 commit 245 147 425
a44facd2fa2bc820c7ad0d9ce0a7cf5c9e0bb4ee commit 84 90 572 1 
018c0ebbbcd53fa1575b923cfc953a9bf0c6b9e6
064d040a4baf19469b8d0ffbac8b7b61dc570ef6 commit 79 86 662 2 
a44facd2fa2bc820c7ad0d9ce0a7cf5c9e0bb4ee
965b259ebee154290d59c3bd10b44465521deb17 commit 81 89 748 2 
a44facd2fa2bc820c7ad0d9ce0a7cf5c9e0bb4ee
0bcee651e5575e87d8a2b0c6f4b9058f2b11629d commit 245 148 837
e9362deba68746c1667b9d238e0b0fde3d5a4ec9 commit 82 90 985 1 
0bcee651e5575e87d8a2b0c6f4b9058f2b11629d
684c356682afdf5273e9dba3c8272d330174a464 commit 80 89 1075 2 
e9362deba68746c1667b9d238e0b0fde3d5a4ec9
45cd004175c238d27f1af9d340c3fa5867f6c9e3 commit 80 88 1164 1 
0bcee651e5575e87d8a2b0c6f4b9058f2b11629d
e876b6dde0c407855cd8fa4f25cff13b44c78489 commit 245 148 1252
831a50b8338b2b211b35d29a4210179155022179 commit 84 90 1400 1 
e876b6dde0c407855cd8fa4f25cff13b44c78489
6eaebe76669a4a16d74d53aec605dee3c0427527 commit 86 91 1490 1 
e876b6dde0c407855cd8fa4f25cff13b44c78489
3cccb7d0a997a0dbc887cd47b8b8e9c46fabc66a commit 81 89 1581 2 
6eaebe76669a4a16d74d53aec605dee3c0427527
136592f9618d17712a818f0f84bed9ffd5655c0a commit 80 87 1670 3 
3cccb7d0a997a0dbc887cd47b8b8e9c46fabc66a
9c7cf0528e49f16118afb514709fd392d917769c commit 245 150 1757
c564ffd73eb621349aa8d601acb035990c159103 commit 80 90 1907 1 
9c7cf0528e49f16118afb514709fd392d917769c
db804ffd8b7e3106b2a30c593866549eabbffda8 commit 81 89 1997 1 
9c7cf0528e49f16118afb514709fd392d917769c
6c18d72d3308b3c0a5ab412fe05ce0b01acc5b8b commit 245 149 2086
f3f6d9b9cb798f818a34addc500972e28149c579 commit 82 89 2235 1 
6c18d72d3308b3c0a5ab412fe05ce0b01acc5b8b
c737d0af7fa8fca38c7c8478d6755fbfbf94e947 commit 80 89 2324 2 
f3f6d9b9cb798f818a34addc500972e28149c579
12630d4e5f25a06e95636f773adbad42768587d7 commit 79 89 2413 3 
c737d0af7fa8fca38c7c8478d6755fbfbf94e947
66386f99e1f3c6c1d4fa2a64d3171757ded84a35 commit 82 91 2502 2 
f3f6d9b9cb798f818a34addc500972e28149c579
0f57e9c185de68585feaa46fc88c1031afb0dd73 commit 245 150 2593
f66dc5d75d72474ca84bd23bf40f90b37b6b2d10 commit 83 88 2743 1

[PATCH] diff: retire sane_truncate_fn

2017-08-17 Thread Junio C Hamano

Long time ago, 23707811 ("diff: do not chomp hunk-header in the
middle of a character", 2008-01-02) introduced sane_truncate_line()
helper function to trim the "function header" line that is shown at
the end of the hunk header line, in order to avoid chomping it in
the middle of a single UTF-8 character.  It also added a facility to
define a custom callback function to make it possible to extend it
to non UTF-8 encodings.  

During the following 8 1/2 years, nobody found need for this custom
callback facility.

A custom callback function is a wrong design to use here anyway---if
your contents need support for non UTF-8 encoding, you shouldn't
have to write a custom function and recompile Git to plumb it in.  A
better approach would be to extend sane_truncate_line() function and
have a new member in emit_callback to conditionally trigger it.

Signed-off-by: Junio C Hamano 
---
 diff.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/diff.c b/diff.c
index a628ac3a95..4bdaed8197 100644
--- a/diff.c
+++ b/diff.c
@@ -408,8 +408,6 @@ static struct diff_tempfile {
struct tempfile tempfile;
 } diff_temp[2];
 
-typedef unsigned long (*sane_truncate_fn)(char *line, unsigned long len);
-
 struct emit_callback {
int color_diff;
unsigned ws_rule;
@@ -417,7 +415,6 @@ struct emit_callback {
int blank_at_eof_in_postimage;
int lno_in_preimage;
int lno_in_postimage;
-   sane_truncate_fn truncate;
const char **label_path;
struct diff_words_data *diff_words;
struct diff_options *opt;
@@ -1246,8 +1243,6 @@ static unsigned long sane_truncate_line(struct 
emit_callback *ecb, char *line, u
unsigned long allot;
size_t l = len;
 
-   if (ecb->truncate)
-   return ecb->truncate(line, len);
cp = line;
allot = l;
while (0 < l) {

Re: [RFC PATCH 2/2] submodule: simplify decision tree whether to or not to fetch

2017-08-17 Thread Stefan Beller

On Thu, Aug 17, 2017 at 4:00 AM, Heiko Voigt  wrote:
> To make extending this logic later easier.
>
> Signed-off-by: Heiko Voigt 
> ---
> I am quite sure I replicated the same logic but a few more eyes would be
> appreciated.

A code cleanup is appreciated!

I thought Brandon had a series in flight doing a very similar cleanup here,
but in master..pu there is nothing to be found.

> Cheers Heiko

The code looks good to me.

Cheers!
Stefan

>
>  submodule.c | 55 +++
>  1 file changed, 27 insertions(+), 28 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index 3ed78ac..a1011f4 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1171,6 +1171,21 @@ int submodule_touches_in_range(struct object_id 
> *excl_oid,
> return ret;
>  }
>
> +static int get_fetch_recurse_config(const struct submodule *submodule, int 
> command_line_option)
> +{
> +   if (command_line_option != RECURSE_SUBMODULES_DEFAULT)
> +   return command_line_option;
> +
> +   if (submodule && submodule->fetch_recurse != RECURSE_SUBMODULES_NONE)
> +   /* local config overrules everything except commandline */
> +   return submodule->fetch_recurse;
> +
> +   if (gitmodules_is_unmerged)
> +   return RECURSE_SUBMODULES_OFF;
> +
> +   return config_fetch_recurse_submodules;
> +}
> +
>  struct submodule_parallel_fetch {
> int count;
> struct argv_array args;
> @@ -1203,37 +1218,21 @@ static int get_next_submodule(struct child_process 
> *cp,
> if (!submodule)
> submodule = submodule_from_name(_oid, ce->name);
>
> -   default_argv = "yes";
> -   if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
> -   if (submodule &&
> -   submodule->fetch_recurse !=
> -   RECURSE_SUBMODULES_NONE) {
> -   if (submodule->fetch_recurse ==
> -   RECURSE_SUBMODULES_OFF)
> -   continue;
> -   if (submodule->fetch_recurse ==
> -   RECURSE_SUBMODULES_ON_DEMAND) 
> {
> -   if 
> (!unsorted_string_list_lookup(_submodule_names,
> -
> submodule->name))
> -   continue;
> -   default_argv = "on-demand";
> -   }
> -   } else {
> -   if ((config_fetch_recurse_submodules == 
> RECURSE_SUBMODULES_OFF) ||
> -   gitmodules_is_unmerged)
> -   continue;
> -   if (config_fetch_recurse_submodules == 
> RECURSE_SUBMODULES_ON_DEMAND) {
> -   if 
> (!unsorted_string_list_lookup(_submodule_names,
> -
> submodule->name))
> -   continue;
> -   default_argv = "on-demand";
> -   }
> -   }
> -   } else if (spf->command_line_option == 
> RECURSE_SUBMODULES_ON_DEMAND) {
> -   if 
> (!unsorted_string_list_lookup(_submodule_names,
> +   switch (get_fetch_recurse_config(submodule, 
> spf->command_line_option))
> +   {
> +   default:
> +   case RECURSE_SUBMODULES_DEFAULT:
> +   case RECURSE_SUBMODULES_ON_DEMAND:
> +   if (!submodule || 
> !unsorted_string_list_lookup(_submodule_names,
>  submodule->name))
> continue;
> default_argv = "on-demand";
> +   break;
> +   case RECURSE_SUBMODULES_ON:
> +   default_argv = "yes";
> +   break;
> +   case RECURSE_SUBMODULES_OFF:
> +   continue;
> }
>
> strbuf_addf(_path, "%s/%s", spf->work_tree, 
> ce->name);
> --
> 2.0.0.274.g6b2cd91
>

Re: [RFC PATCH 1/2] implement fetching of moved submodules

2017-08-17 Thread Stefan Beller

On Thu, Aug 17, 2017 at 3:53 AM, Heiko Voigt  wrote:
> We store the changed submodules paths to calculate which submodule needs
> fetching. This does not work for moved submodules since their paths do
> not stay the same in case of a moved submodules. In case of new
> submodules we do not have a path in the current checkout, since they
> just appeared in this fetch.
>
> It is more general to collect the submodule names for changes instead of
> their paths to include the above cases.
>
> With the change described above we implement 'on-demand' fetching of
> changes in moved submodules.

This sounds as if this would also enable fetching new submodules
eventually?

> Note: This does only work when repositories have a .gitmodules file. In
> other words: It breaks if we do not get a name for a repository.
> IIRC, consensus was that this is a requirement to get nice submodule
> handling these days?

I think that should have been the consensus since ~1.7.8 (since the
submodules git dir can live inside the superprojects
/module/).

A gitlink entry without corresponding .gitmodules entry is just a gitlink.
If we happen to have a repository at that path of the gitlink, we can
be nice and pretend like it is a functional submodule, but it really is
not. It's just another repo inside the superproject that happen to live
at the path of a gitlink.

> Signed-off-by: Heiko Voigt 
> ---
>
> I updated the leftover code from my series implementing recursive fetch
> for moved submodules[1] to the current master.
>
> This breaks t5531 and t5545 because they do not use a .gitmodules file.
>
> I also have some code leftover that does fallback on paths in case no
> submodule names can be found. But I do not really like it. The question
> here is how far do we support not using .gitmodules. Is it e.g.
> reasonable to say: "For --recurse-submodules=on-demand you need a
> .gitmodules file?"

I would not intentionally break users here, but any new functionality can
safely assume (a) we have a proper .gitmodules entry or (b) it is not a
submodule, so do nothing/be extra careful.

For example in recursive diff sort of makes sense to also handle
non-submodule gitlinks, but fetch is harder to tell.

(just last night I was rereading
https://public-inbox.org/git/CAJo=hjvnapnaddcaawavu9c4rveqdos3ev9wtguhx4fd0v_...@mail.gmail.com/
which I think is a super cute application of gitlinks. If you happen
to checkout such
a tree, you don't want to fetch all of the fake submodules)

>
> [1] 
> https://public-inbox.org/git/f5baa2acc09531a16f4f693eebbe60706bb8ed1e.1361751905.git.hvo...@hvoigt.net/

Oha, that is from way back in the time. :)

>  submodule.c | 92 
> +
>  t/t5526-fetch-submodules.sh | 35 +
>  2 files changed, 86 insertions(+), 41 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index 27de65a..3ed78ac 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -23,7 +23,7 @@
>  static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
>  static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
>  static int parallel_jobs = 1;
> -static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP;
> +static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
>  static int initialized_fetch_ref_tips;
>  static struct oid_array ref_tips_before_fetch;
>  static struct oid_array ref_tips_after_fetch;
> @@ -742,11 +742,11 @@ const struct submodule *submodule_from_ce(const struct 
> cache_entry *ce)
>  }
>
>  static struct oid_array *submodule_commits(struct string_list *submodules,
> -  const char *path)
> +  const char *name)
>  {
> struct string_list_item *item;
>
> -   item = string_list_insert(submodules, path);
> +   item = string_list_insert(submodules, name);
> if (item->util)
> return (struct oid_array *) item->util;
>
> @@ -755,39 +755,34 @@ static struct oid_array *submodule_commits(struct 
> string_list *submodules,
> return (struct oid_array *) item->util;
>  }
>
> +struct collect_changed_submodules_cb_data {
> +   struct string_list *changed;

Here a comment would be helpful or a more concise variable name.
(What is changed?)

> +   const struct object_id *commit_oid;
> +};
> +
>  static void collect_changed_submodules_cb(struct diff_queue_struct *q,
>   struct diff_options *options,
>   void *data)
>  {
> +   struct collect_changed_submodules_cb_data *me = data;
> +   struct string_list *changed = me->changed;
> +   const struct object_id *commit_oid = me->commit_oid;
> int i;
> -   struct string_list *changed = data;
>
> for (i = 0; i < q->nr; i++) {
> struct diff_filepair *p = q->queue[i];
>

Re: [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case

2017-08-17 Thread Junio C Hamano

Junio C Hamano  writes:

> Torsten Bögershausen  writes:
>
>> Unless we re-define the meaning of "NULL" into "don't do CRLF conversions,
>> like SAFE_CRLF_KEEP_CRLF does.
>
> My preference is not to use NULL as any hint.  Instead, the "flag"
> parameter we already pass to convert_to_git(), just like the updated
> read_old_data() uses SAFE_CRLF_KEEP_CRLF to tell it that it should
> not disturb existing CRLF without looking at the istate, should be
> used to tell convert_to_git() to do the opposite, but do so without
> looking at the istate.
>
> Perhaps SAFE_CRLF_FALSE should be such a flag.  Or perhaps we need
> to invent another flag.  I dunno.

I grepped for SAFE_CRLF_FALSE and found only two callers that
explicitly pass it down the callchain, both of which essentially
says "if we are writing the object out, use core.safecrlf, but if we
are merely hashing, do SAFE_CRLF_FALSE thing".  

I think their use case is quite similar to the codepath we are
discussing---they have a data that come from the outside world, and
they know the index entry that happens to be at the path has nothing
to do with the data they are asking convert_to_git() to massage
(i.e. it is _wrong_ if the contents of the blob that happens to be
in the index at the path affected the outcome of the conversion).

So I think the fix to convert_to_git() can just use SAFE_CRLF_FALSE
as such an instruction that tells the function not do the "safe
crlf" thing, which looks at the contents in the index and decide if
the CRLFs in the contents being converted should be turned into LFs.
And because the function is told not to look at the index, it should
be made safe to pass istate=NULL.  There does not seem to be a need
to invent another flag.

Thanks.

Re: [PATCH] files-backend: cheapen refname_available check when locking refs

2017-08-17 Thread Jeff King

On Thu, Aug 17, 2017 at 05:12:50PM +0200, Michael Haggerty wrote:

> I was testing this using the reporter's recipe (but fetching from a
> local clone), and found the following surprising timing numbers:
> 
> b05855b5bc (before the slowdown): 22.7 s
> 524a9fdb51 (immediately after the slowdown): 13 minutes
> 4e81f1ecf1 (after this fix): 14.5 s
> 
> The fact that the fetch is now significantly *faster* than before the
> slowdown seems not to have anything to do with the reference code.

I bisected this (with some hackery, since the commits in the middle all
take 13 minutes to run). The other speedup is indeed unrelated, and is
due to Brandon's aacc5c1a81 (submodule: refactor logic to determine
changed submodules, 2017-05-01).

The commit message doesn't mention performance (it's mostly about code
reduction). I think the speedup comes from using
diff_tree_combined_merge() instead of manually diffing each commit
against its parents. But I didn't do further timings to verify that (I'm
reporting it here mostly as an interesting curiosity for submodule
folks).

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index e9b95592b6..f2a420c611 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -631,11 +631,11 @@ static int lock_raw_ref(struct files_ref_store *refs,
>  
>   /*
>* If the ref did not exist and we are creating it,
> -  * make sure there is no existing ref that conflicts
> -  * with refname:
> +  * make sure there is no existing packed ref that
> +  * conflicts with refname:
>*/
>   if (refs_verify_refname_available(
> - >base, refname,
> + refs->packed_ref_store, refname,
>   extras, skip, err))
>   goto error_return;
>   }

This seems too easy to be true. :) But I think it matches what we were
doing before 524a9fdb51 (so it's correct), and the performance numbers
don't lie.

-Peff

[PATCH] files-backend: cheapen refname_available check when locking refs

2017-08-17 Thread Michael Haggerty

When locking references in preparation for updating them, we need to
check that none of the newly added references D/F conflict with
existing references (e.g., we don't allow `refs/foo` to be added if
`refs/foo/bar` already exists, or vice versa).

Prior to 524a9fdb51 (refs_verify_refname_available(): use function in
more places, 2017-04-16), conflicts with existing loose references
were checked by looking directly in the filesystem, and then conflicts
with existing packed references were checked by running
`verify_refname_available_dir()` against the packed-refs cache.

But that commit changed the final check to call
`refs_verify_refname_available()` against the *whole* files ref-store,
including both loose and packed references, with the following
comment:

> This means that those callsites now check for conflicts with all
> references rather than just packed refs, but the performance cost
> shouldn't be significant (and will be regained later).

That comment turned out to be too sanguine. User s...@kazlauskas.me
reported that fetches involving a very large number of references in
neighboring directories were slowed down by that change.

The problem is that when fetching, each reference is updated
individually, within its own reference transaction. This is done
because some reference updates might succeed even though others fail.
But every time a reference update transaction is finished,
`clear_loose_ref_cache()` is called. So when it is time to update the
next reference, part of the loose ref cache has to be repopulated for
the `refs_verify_refname_available()` call. If the references are all
in neighboring directories, then the cost of repopulating the
reference cache increases with the number of references, resulting in
O(N²) effort.

The comment above also claims that the performance cost "will be
regained later". The idea was that once the packed-refs were finished
being split out into a separate ref-store, we could limit the
`refs_verify_refname_available()` call to the packed references again.
That is what we do now.

Signed-off-by: Michael Haggerty 
---
This patch applies on top of branch mh/packed-ref-store. It can also
be obtained from my fork [1] as branch "faster-refname-available-check".

I was testing this using the reporter's recipe (but fetching from a
local clone), and found the following surprising timing numbers:

b05855b5bc (before the slowdown): 22.7 s
524a9fdb51 (immediately after the slowdown): 13 minutes
4e81f1ecf1 (after this fix): 14.5 s

The fact that the fetch is now significantly *faster* than before the
slowdown seems not to have anything to do with the reference code.

 refs/files-backend.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index e9b95592b6..f2a420c611 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -631,11 +631,11 @@ static int lock_raw_ref(struct files_ref_store *refs,
 
/*
 * If the ref did not exist and we are creating it,
-* make sure there is no existing ref that conflicts
-* with refname:
+* make sure there is no existing packed ref that
+* conflicts with refname:
 */
if (refs_verify_refname_available(
-   >base, refname,
+   refs->packed_ref_store, refname,
extras, skip, err))
goto error_return;
}
@@ -938,7 +938,7 @@ static struct ref_lock *lock_ref_sha1_basic(struct 
files_ref_store *refs,
 * our refname.
 */
if (is_null_oid(>old_oid) &&
-   refs_verify_refname_available(>base, refname,
+   refs_verify_refname_available(refs->packed_ref_store, refname,
  extras, skip, err)) {
last_errno = ENOTDIR;
goto error_return;
-- 
2.11.0

Re: git fetch with refspec does not include tags?

2017-08-17 Thread Jeff King

On Thu, Aug 17, 2017 at 11:29:48AM +, Carlsson, Magnus wrote:

> >  2. If we continue to follow the "are we storing any refs" rule for the
> > default, possibly it should expand to "did we store anything,
> > including opportunistic tracking-ref updates".
> 
> Personally I think that I would prefer to always fetch relevant tags.
> If I don't want tags I can simply use the "--no-tags"

I think it's a bit more complex because "git pull" uses "git fetch"
under the hood. In fact, your "git fetch origin master" is exactly what
gets run when you do:

  git pull origin master

That's maybe OK. But I think one-off pulls like:

  git pull https://example.com/repo.git master

probably wouldn't want it. I'd have to give it some thought.

-Peff

(no subject)

2017-08-17 Thread Barr. Richard Williams




--
I'd like you to be in custody of my late client's fortune.
My client died along with his family including his next-of-kin
The funds shall be used for investment under your management

Do reply for details.
Regards
Richard Williams
Email:rich19willi...@gmail.com

Liebste,

2017-08-17 Thread infob...@ono.com

Liebste,
 
Wie gehts Ihnen heute, ich hoffe, dass meine Post Sie in gutem Zustand 
der Gesundheit erfüllt? Sehr geehrte Damen und Herren, ich habe mich 
entschlossen, mit Ihnen in Kontakt zu treten, wenn ich bedenkt habe, 
dass wir uns noch nicht getroffen haben, aber wegen eines Umstandes, 
der mich verpflichtet hat, habe ich mich entschlossen, Sie wegen der 
Dringlichkeit meiner gegenwärtigen Situation hier im Flüchtlingslager 
für Ihre Rettung zu kontaktieren Auch für ein Business-Venture / 
Projekt, das ich brauche Ihre Assistentin in diesem Business-
Niederlassung in Ihrem Land als mein ausländischer Partner sowie meine 
juristische ernannte Treuhänder.
 
Ich bin Aisha Muammar Gaddafi, die einzige Tochter des umkämpften 
Präsidenten von Libyen, Hon. Muammar Gaddafi Ich wohne derzeit in 
Burkina Faso leider als Flüchtling. Ich schreibe diese Post mit Tränen 
und Trauer aus meinem Herzen und frage nach deiner dringenden Hilfe. 
Ich bin seit dem Tod meines verstorbenen Vaters durch Schmerzen und 
traurigen Moment gegangen.
 
Mittlerweile ist meine Familie das Ziel der westlichen Nationen, die 
von Nato geführt werden, die meinen Vater um jeden Preis zerstören 
will. Unsere Investitionen und Bankkonten in mehreren Ländern sind ihre 
Ziele zum Einfrieren. Mein Vater des gesegneten Gedächtnisses stellte 
die Summe von $ 10.5M (Zehn Millionen, Fünfhundert Tausend Dollar) in 
(ECO) Bank Burkina Faso, die er benutzte meinen Namen als die nächste 
Angehörige. Ich habe von der (ECO) Bank beauftragt, einen 
interessierten ausländischen Investor / Partner zu präsentieren, der 
als Treuhänder stehen kann und den Fonds in seinem Konto für eine 
mögliche Investition in sein Land aufgrund meines Flüchtlingsstatus 
hier in Burkina Faso erhalten kann.
 
Ich bin auf der Suche nach einer ehrlichen und zuverlässigen Person, 
die mir helfen und als Treuhänder stehen wird, damit ich ihn der Bank 
für die Übertragung des Fonds auf sein Bankkonto in Übersee vorstellen 
werde. Ich habe beschlossen, dich nach meinen Gebeten zu kontaktieren 
und ich glaube, dass du mein Vertrauen nicht verraten wirst. Aber 
nehmen Sie mich als Ihre eigene Schwester oder Tochter. Wenn diese 
Transaktion Sie interessiert, müssen Sie es niemandem offenlegen, weil 
was mit meiner ganzen Familie los ist, wenn die vereinigte Nation 
dieses Konto kennt, werden sie es einfrieren, wenn sie andere 
einfrieren, also bitte diese Transaktion behalten Nur für dich, bis wir 
es fertig machen
 
Apologetic für meine Bilder werde ich es in meine nächste Mail und mehr 
über mich einschließen, wenn ich von dir höre okay. Bitte, ich möchte, 
dass Sie mich hier für weitere Konversationen kontaktieren. Danke und 
beste Grüße, Mit freundlichen Grüßen.

Aisha Gaddafi

Re: What's cooking in git.git (Aug 2017, #03; Mon, 14)

2017-08-17 Thread Kaartic Sivaraam


On Thursday 17 August 2017 12:58 AM, Junio C Hamano wrote:

Nothing has.  Neither thread seems to have any comment from anybody
but you, and I took it as an indication that people do not think it
is a good change.

I do not find the s/branch/parameter/ too bad (although I would have
said "arguments" instead).

For the other one, I personally think split sentences and lines make
output look worse, but this is obviously subjective (just like the
patch itself).


Thanks! Will keep this in mind in future. BTW, what about the patch that
did get user attention?

https://public-inbox.org/git/<0102015d7ae53b0a-a6505296-9257-4b0d-84d6-2152e17eb070-000...@eu-west-1.amazonses.com>

---
Kaartic

Re: git fetch with refspec does not include tags?

2017-08-17 Thread Carlsson, Magnus

Thanks for your quick answer!

>  1. Definitely these defaults are under-documented. I couldn't find
>them anywhere in git-fetch(1).

Yes, some sort of explanation would be good, especially since one of the first 
sentences state that you always get relevant tags.

>  2. If we continue to follow the "are we storing any refs" rule for the
> default, possibly it should expand to "did we store anything,
> including opportunistic tracking-ref updates".

Personally I think that I would prefer to always fetch relevant tags. If I 
don't want tags I can simply use the "--no-tags"

-- Magnus

MAGNUS CARLSSON
Staff Software Engineer
ARRIS

o: +46 13 36 75 92
e: magnus.carls...@arris.com
w: www.arris.com

ARRIS:  Legal entity: Arris Sweden AB - Registered Office: Teknikringen 2, 583 
30 Linkoping, Sweden - Reg No:556518-5831 - VAT No:SE 556518-583

This electronic transmission (and any attached document) is for the sole use of 
the individual or entity to whom it is addressed.  It is confidential and may 
be attorney-client privileged.  In any event the Sender reserves, to the 
fullest extent, any "legal advice privilege".  Any further distribution or 
copying of this message is strictly prohibited.  If you received this message 
in error, please notify the Sender immediately and destroy the attached message 
(and all attached documents).

From: Jeff King 
Sent: Thursday, August 17, 2017 11:28
To: Carlsson, Magnus
Cc: git@vger.kernel.org
Subject: Re: git fetch with refspec does not include tags?

On Thu, Aug 17, 2017 at 09:02:52AM +, Carlsson, Magnus wrote:

> In the git fetch documentation it states that by default you will
> fetch all tags that point into the history to the branches fetched.
>
> "By default, any tag that points into the histories being fetched is
> also fetched; the effect is to fetch tags that point  at branches that
> you are interested in. This default behavior can be changed by using
> the --tags or --no-tags options or by configuring
> remote..tagOpt. By using a refspec that fetches tags explicitly,
> you can fetch tags that do not point into branches  you are interested
> in as well."
>
> But for me I get tags if I do "git fetch" or "git fetch origin" but if
> I do "git fetch origin master" I don't get tags related to the master
> branch.
>
> I understand that this might be due to me specifying a refspec and
> then it will only get that exact refspec, but for me it's not that
> clear from the documentation what I should expect. I read it as when I
> fetch something all related tags will come along.

I'll admit that our tag-autofollow behavior has often confused me. So
I'll try to untangle what's happening at least if not the reasoning. :)

I think the problem is not that you have a refspec, but that your
refspec has no destination. Looking at the fetch code, we seem to turn
on autotags only when the destination is a "real" ref and not just the
default FETCH_HEAD. Which sort-of makes sense. If you're doing a one-off
into FETCH_HEAD, you probably don't want to create tags, even if you
have the objects they point to.

But this is further complicated by the opportunistic tracking-ref
updates.  You can see some interesting behavior with a setup like this:

  git init parent
  git -C parent commit --allow-empty -m one &&
  git -C parent tag -m foo mytag

  git init child
  cd child
  git remote add origin ../parent

and then:

  # no tags, we just populate FETCH_HEAD because of the bare URL
  git fetch ../parent

  # this does fetch tags, because we're storing the result according to
  # the configured refspec ("refs/heads/*:refs/remotes/origin/*").
  git fetch origin

  # this doesn't fetch tags, as the main command is "just" populating
  # FETCH_HEAD. But then our logic for "hey, we fetched the ref for
  # refs/remotes/origin/master, so let's update it on the side" kicks
  # in. And we end up updating FETCH_HEAD _and_ the tracking branch, but
  # not the tags. Weird.
  git fetch origin master

  # and this one does fetch tags, because we have a real destination.
  git fetch origin master:foo

So what I'd say is:

  1. Definitely these defaults are under-documented. I couldn't find
 them anywhere in git-fetch(1).

  2. If we continue to follow the "are we storing any refs" rule for the
 default, possibly it should expand to "did we store anything,
 including opportunistic tracking-ref updates".

-Peff

[RFC PATCH 2/2] submodule: simplify decision tree whether to or not to fetch

2017-08-17 Thread Heiko Voigt

To make extending this logic later easier.

Signed-off-by: Heiko Voigt 
---
I am quite sure I replicated the same logic but a few more eyes would be
appreciated.

Cheers Heiko

 submodule.c | 55 +++
 1 file changed, 27 insertions(+), 28 deletions(-)

diff --git a/submodule.c b/submodule.c
index 3ed78ac..a1011f4 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1171,6 +1171,21 @@ int submodule_touches_in_range(struct object_id 
*excl_oid,
return ret;
 }
 
+static int get_fetch_recurse_config(const struct submodule *submodule, int 
command_line_option)
+{
+   if (command_line_option != RECURSE_SUBMODULES_DEFAULT)
+   return command_line_option;
+
+   if (submodule && submodule->fetch_recurse != RECURSE_SUBMODULES_NONE)
+   /* local config overrules everything except commandline */
+   return submodule->fetch_recurse;
+
+   if (gitmodules_is_unmerged)
+   return RECURSE_SUBMODULES_OFF;
+
+   return config_fetch_recurse_submodules;
+}
+
 struct submodule_parallel_fetch {
int count;
struct argv_array args;
@@ -1203,37 +1218,21 @@ static int get_next_submodule(struct child_process *cp,
if (!submodule)
submodule = submodule_from_name(_oid, ce->name);
 
-   default_argv = "yes";
-   if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
-   if (submodule &&
-   submodule->fetch_recurse !=
-   RECURSE_SUBMODULES_NONE) {
-   if (submodule->fetch_recurse ==
-   RECURSE_SUBMODULES_OFF)
-   continue;
-   if (submodule->fetch_recurse ==
-   RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(_submodule_names,
-
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   } else {
-   if ((config_fetch_recurse_submodules == 
RECURSE_SUBMODULES_OFF) ||
-   gitmodules_is_unmerged)
-   continue;
-   if (config_fetch_recurse_submodules == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(_submodule_names,
-
submodule->name))
-   continue;
-   default_argv = "on-demand";
-   }
-   }
-   } else if (spf->command_line_option == 
RECURSE_SUBMODULES_ON_DEMAND) {
-   if 
(!unsorted_string_list_lookup(_submodule_names,
+   switch (get_fetch_recurse_config(submodule, 
spf->command_line_option))
+   {
+   default:
+   case RECURSE_SUBMODULES_DEFAULT:
+   case RECURSE_SUBMODULES_ON_DEMAND:
+   if (!submodule || 
!unsorted_string_list_lookup(_submodule_names,
 submodule->name))
continue;
default_argv = "on-demand";
+   break;
+   case RECURSE_SUBMODULES_ON:
+   default_argv = "yes";
+   break;
+   case RECURSE_SUBMODULES_OFF:
+   continue;
}
 
strbuf_addf(_path, "%s/%s", spf->work_tree, ce->name);
-- 
2.0.0.274.g6b2cd91

Re: tsan: t5400: set_try_to_free_routine

2017-08-17 Thread Jeff King

On Tue, Aug 15, 2017 at 02:53:07PM +0200, Martin Ågren wrote:

> Using SANITIZE=thread made t5400-send-pack.sh hit the potential race
> below.
> 
> This is set_try_to_free_routine in wrapper.c. The race relates to the
> reading of the "old" value. The caller doesn't care about the "old"
> value, so this should be harmless right now. But it seems that using
> this mechanism from multiple threads and restoring the earlier value
> will probably not work out every time. (Not necessarily because of the
> race in set_try_to_free_routine, but, e.g., because callers might not
> restore the function pointer in the reverse order of how they
> originally set it.)
> 
> Properly "fixing" this for thread-safety would probably require some
> redesigning, which at the time might not be warranted. I'm just posting
> this for completeness.
> 
> Martin
> 
> WARNING: ThreadSanitizer: data race (pid=21382)
>   Read of size 8 at 0x00979970 by thread T1:
> #0 set_try_to_free_routine wrapper.c:35 (git+0x006cde1c)
> #1 prepare_trace_line trace.c:105 (git+0x006a3bf0)
> #2 trace_strbuf_fl trace.c:185 (git+0x006a3bf0)
> #3 packet_trace pkt-line.c:80 (git+0x005f9f43)
> #4 packet_read pkt-line.c:309 (git+0x005fbe10)
> #5 recv_sideband sideband.c:37 (git+0x00684c5e)
> #6 sideband_demux send-pack.c:216 (git+0x0065a38c)
> #7 run_thread run-command.c:933 (git+0x00655a93)
> #8   (libtsan.so.0+0x000230d9)

I was curious why the trace code would care about the free routine in
the first place. Digging in the mailing list, I didn't find a lot of
discussion. But I think the problem is basically that the trace
infrastructure wants to be thread-safe, but the default free-pack-memory
callback isn't.

It's ironic that we fix the thread-unsafety of the free-pack-memory
function by using the also-thread-unsafe set_try_to_free_routine.

Further irony: the trace routines aren't thread-safe in the first place,
as they do lazy initialization of key->fd using an "initialized" field.
In practice it probably means double-writing key->fd and leaking a
descriptor (though there are no synchronizing operations there, so it's
entirely possible a compiler could reorder the assignments to key->fd
and key->initialized and a simultaneous reader could read a garbage
key->fd value).  We also call getenv(), which isn't thread-safe with
other calls to getenv() or setenv().

I can think of a few possible directions:

  1. Make set_try_to_free_routine() skip the write if it would be a
 noop. This is racy if threads are actually changing the value, but
 in practice they aren't (the first trace of any kind will set it to
 NULL, and it will remain there).

  2. Make the free-packed routine thread-safe by taking a lock. It
 should hardly ever be called, so performance wouldn't matter. The
 big question is: _which_ lock.  pack-objects, which uses threads
 already, has a version which does this. But it knows to take the
 big program-wide "I'm accessing unsafe parts of Git" lock that the
 rest of the program uses during its multi-threaded parts.
 There's no notion in the rest of Git for "now we're going into a
 multi-threaded part, so most calls will need to take a big global
 lock before doing anything interesting".

 For parts of Git that are explicitly multi-threaded (like the
 pack-objects delta search, or index-pack's delta resolution) that's
 not so bad. But the example above is just using a sideband demuxer.
 It would be unfortunate if the entire rest of send-pack had to
 start caring about taking that lock.

  3. Is the free-pack-memory thing actually accomplishing much these
 days? It comes from 97bfeb34df (Release pack windows before
 reporting out of memory., 2006-12-24), and the primary issue is not
 actual allocated memory, but mmap'd packs clogging up the address
 space so that malloc can't find a suitable block.

 On 64-bit systems this is likely doing nothing. We have tons of
 address space. But even on 32-bit systems, the default
 core.packedGitLimit is only 256MiB (which was set around the same
 time). You can certainly come up with a corner case where freeing
 up that address space could matter. But I'd be surprised if this
 has actually helped much in practice over the years. And if you
 have a repo which is running so close to the address space limits
 of your system, the right answer is probably: upgrade to a 64-bit
 system. Even if the try-to-free thing helped in one run, it's
 likely that similar runs are not going to be so lucky, and even
 with it you're going to see sporadic out-of-memory failures.

-Peff

[RFC PATCH 1/2] implement fetching of moved submodules

2017-08-17 Thread Heiko Voigt

We store the changed submodules paths to calculate which submodule needs
fetching. This does not work for moved submodules since their paths do
not stay the same in case of a moved submodules. In case of new
submodules we do not have a path in the current checkout, since they
just appeared in this fetch.

It is more general to collect the submodule names for changes instead of
their paths to include the above cases.

With the change described above we implement 'on-demand' fetching of
changes in moved submodules.

Note: This does only work when repositories have a .gitmodules file. In
other words: It breaks if we do not get a name for a repository.
IIRC, consensus was that this is a requirement to get nice submodule
handling these days?

Signed-off-by: Heiko Voigt 
---

I updated the leftover code from my series implementing recursive fetch
for moved submodules[1] to the current master.

This breaks t5531 and t5545 because they do not use a .gitmodules file.

I also have some code leftover that does fallback on paths in case no
submodule names can be found. But I do not really like it. The question
here is how far do we support not using .gitmodules. Is it e.g.
reasonable to say: "For --recurse-submodules=on-demand you need a
.gitmodules file?"

[1] 
https://public-inbox.org/git/f5baa2acc09531a16f4f693eebbe60706bb8ed1e.1361751905.git.hvo...@hvoigt.net/

 submodule.c | 92 +
 t/t5526-fetch-submodules.sh | 35 +
 2 files changed, 86 insertions(+), 41 deletions(-)

diff --git a/submodule.c b/submodule.c
index 27de65a..3ed78ac 100644
--- a/submodule.c
+++ b/submodule.c
@@ -23,7 +23,7 @@
 static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
 static int parallel_jobs = 1;
-static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP;
+static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -742,11 +742,11 @@ const struct submodule *submodule_from_ce(const struct 
cache_entry *ce)
 }
 
 static struct oid_array *submodule_commits(struct string_list *submodules,
-  const char *path)
+  const char *name)
 {
struct string_list_item *item;
 
-   item = string_list_insert(submodules, path);
+   item = string_list_insert(submodules, name);
if (item->util)
return (struct oid_array *) item->util;
 
@@ -755,39 +755,34 @@ static struct oid_array *submodule_commits(struct 
string_list *submodules,
return (struct oid_array *) item->util;
 }
 
+struct collect_changed_submodules_cb_data {
+   struct string_list *changed;
+   const struct object_id *commit_oid;
+};
+
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
  struct diff_options *options,
  void *data)
 {
+   struct collect_changed_submodules_cb_data *me = data;
+   struct string_list *changed = me->changed;
+   const struct object_id *commit_oid = me->commit_oid;
int i;
-   struct string_list *changed = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
struct oid_array *commits;
+   const struct submodule *submodule;
+
if (!S_ISGITLINK(p->two->mode))
continue;
 
-   if (S_ISGITLINK(p->one->mode)) {
-   /*
-* NEEDSWORK: We should honor the name configured in
-* the .gitmodules file of the commit we are examining
-* here to be able to correctly follow submodules
-* being moved around.
-*/
-   commits = submodule_commits(changed, p->two->path);
-   oid_array_append(commits, >two->oid);
-   } else {
-   /* Submodule is new or was moved here */
-   /*
-* NEEDSWORK: When the .git directories of submodules
-* live inside the superprojects .git directory some
-* day we should fetch new submodules directly into
-* that location too when config or options request
-* that so they can be checked out from there.
-*/
+   submodule = submodule_from_path(commit_oid, p->two->path);
+   if (!submodule)
continue;
-   }
+
+   commits = submodule_commits(changed, submodule->name);
+

Re: [PATCH 1/9] Convert pack-objects to size_t

2017-08-17 Thread Torsten Bögershausen

On Wed, Aug 16, 2017 at 10:22:39PM +0200, Martin Koegler wrote:
> On Mon, Aug 14, 2017 at 10:08:05AM -0700, Junio C Hamano wrote:
> >It may help reducing the maintenance if we introduced obj_size_t
> >that is defined to be size_t for now, so that we can later swap
> >it to ofs_t or some larger type when we know we do need to
> >support objects whose size cannot be expressed in size_t, but I
> >do not offhand know what the pros-and-cons with such an approach
> >would look like.
> 
> Where should the use of obj_size_t end and the use of size_t start? 
> 
> We often determine a object size and then pass it to malloc. 
> We would start with a larger datatyp and then truncate for memory allocation, 
> which use size_t.

The truncation is done with xsize_t:
The "old" sha1_file.c has something like this:
idx_size = xsize_t(st.st_size);

I personally don't think that obj_size_t gives us so much.
There are "objects" which are "on disk", and they may have a length off_t,
And there are "objects" loaded into memory, and they have a length size_t.
And everybody can check that we use the right type.
Additionally I don't like it very much, when object size in memory is counted
in a 64 bit value (and this will happen if  obj_size_ = off_t == 64bit)

Anyhow, to answer your question:
All calles xmalloc() must be prepared like this:

p = xmalloc(xsize_t(size_of_object));

That should do the trick.

> 
> Regards,
> Martin

[PATCH] t5526: fix some broken && chains

2017-08-17 Thread Heiko Voigt

Signed-off-by: Heiko Voigt 
---
 t/t5526-fetch-submodules.sh | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index ce788e9..22a7358 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -193,7 +193,7 @@ test_expect_success "recurseSubmodules=true propagates into 
submodules" '
add_upstream_commit &&
(
cd downstream &&
-   git config fetch.recurseSubmodules true
+   git config fetch.recurseSubmodules true &&
git fetch >../actual.out 2>../actual.err
) &&
test_must_be_empty actual.out &&
@@ -218,7 +218,7 @@ test_expect_success "--no-recurse-submodules overrides 
config setting" '
add_upstream_commit &&
(
cd downstream &&
-   git config fetch.recurseSubmodules true
+   git config fetch.recurseSubmodules true &&
git fetch --no-recurse-submodules >../actual.out 2>../actual.err
) &&
! test -s actual.out &&
@@ -232,7 +232,7 @@ test_expect_success "Recursion doesn't happen when no new 
commits are fetched in
cd submodule &&
git config --unset fetch.recurseSubmodules
) &&
-   git config --unset fetch.recurseSubmodules
+   git config --unset fetch.recurseSubmodules &&
git fetch >../actual.out 2>../actual.err
) &&
! test -s actual.out &&
@@ -312,7 +312,7 @@ test_expect_success "Recursion picks up all submodules when 
necessary" '
) &&
head1=$(git rev-parse --short HEAD^) &&
git add subdir/deepsubmodule &&
-   git commit -m "new deepsubmodule"
+   git commit -m "new deepsubmodule" &&
head2=$(git rev-parse --short HEAD) &&
echo "Fetching submodule submodule" > ../expect.err.sub &&
echo "From $pwd/submodule" >> ../expect.err.sub &&
-- 
2.0.0.274.g6b2cd91

[PATCH] add test for bug in git-mv with nested submodules

2017-08-17 Thread Heiko Voigt

When using git-mv with a submodule it will detect that and update the
paths for its configurations (.gitmodules, worktree and gitfile). This
does not work for nested submodules where a user renames the root
submodule.

We discovered this fact when working on on-demand fetch for renamed
submodules. Lets add a test to document.

Signed-off-by: Heiko Voigt 
---
 t/t7001-mv.sh | 9 +
 1 file changed, 9 insertions(+)

diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
index e365d1f..39f8aed 100755
--- a/t/t7001-mv.sh
+++ b/t/t7001-mv.sh
@@ -491,4 +491,13 @@ test_expect_success 'moving a submodule in nested 
directories' '
test_cmp actual expect
 '
 
+test_expect_failure 'moving nested submodules' '
+   git commit -am "cleanup commit" &&
+   git submodule add ./. sub_nested &&
+   git commit -m "add sub_nested" &&
+   git submodule update --init --recursive &&
+   git mv sub_nested sub_nested_moved &&
+   git status
+'
+
 test_done
-- 
2.0.0.274.g6b2cd91

Re: [PATCH] check-ref-format: require a repository for --branch

2017-08-17 Thread Jeff King

On Mon, Jul 17, 2017 at 11:18:43PM +0200, Marko Kungla wrote:

> I guess that should only be about that it should not hit a (BUG).
> In my case in the example I gave I scan trough the directories to
> check repository status one of the tasks make use of check-ref-format.
> Since it may hit directory which is not a git repository it should not
> expose error (BUG) right.

Right. The BUG should definitely be corrected. Between what Jonathan is
suggesting and my patch, either would be fine for the case you described
originally ("--branch @{-1}" would always fail in a non-repo).

-Peff

Re: [PATCH] check-ref-format: require a repository for --branch

2017-08-17 Thread Jeff King

On Mon, Jul 17, 2017 at 10:27:09AM -0700, Jonathan Nieder wrote:

> > --- a/t/t1402-check-ref-format.sh
> > +++ b/t/t1402-check-ref-format.sh
> > @@ -161,6 +161,10 @@ test_expect_success 'check-ref-format --branch from 
> > subdir' '
> > test "$refname" = "$sha1"
> >  '
> >  
> > +test_expect_success 'check-ref-format --branch from non-repo' '
> > +   test_must_fail nongit git check-ref-format --branch @{-1}
> > +'
> > +
> >  valid_ref_normalized() {
> > prereq=
> > case $1 in
> 
> I don't think it's right.  Today I can do
> 
>   $ cd /tmp
>   $ git check-ref-format --branch master
>   master
> 
> You might wonder why I'd ever do such a thing.  But that's what "git
> check-ref-format --branch" is for --- it is for taking a 
> argument and turning it into a branch name.  For example, if you have
> a script with an $opt_branch variable that defaults to "master", it
> may do
> 
>   resolved_branch=$(git check-ref-format --branch "$opt_branch")
> 
> even though it is in a mode that not going to have to use
> $resolved_branch and it is not running from a repository.

I'm not sure I buy that. What does "turning it into a branch name" even
mean when you are not in a repository? Clearly @{-1} must fail. And
everything else is just going to output the exact input you provided.
So any script calling "check-ref-format --branch" outside of a repo
would be better off not calling it at all. At best it does nothing, and
at worst it's going to give a confusing error when $opt_branch is
something like "@{-1}".

A more compelling argument along these lines is something like:

  Accepting --branch outside of a repo is pointless, but it's something
  we've historically accepted. To avoid breaking existing scripts (even
  if they are doing something pointless), we'll continue to allow it.

I'm not sure I buy _that_ line of reasoning either, but it at least
makes sense to me (I just think it isn't worth the complexity of trying
to audit the innards of interpret_branch_name()).

-Peff

Dear Talented

2017-08-17 Thread Blue Sky Studio

Dear Talented,

I am Talent Scout For BLUE SKY FILM STUDIO, Present Blue Sky Studio a Film 
Corporation Located in the United State, is Soliciting for the Right to use 
Your Photo/Face and Personality as One of the Semi -Major Role/ Character in 
our Upcoming ANIMATED Stereoscope 3D Movie-The Story of Anubis (Anubis 2018) 
The Movie is Currently Filming (In Production) Please Note That There Will Be 
No Auditions, Traveling or Any Special / Professional Acting Skills, Since the 
Production of This Movie Will Be Done with our State of Art Computer 
-Generating Imagery Equipment. We Are Prepared to Pay the Total Sum of 
$620,000.00 USD.For More Information/Understanding.


Talent Scout
Kim Sharma

Quick Loans

2017-08-17 Thread Sec Capital Loan

Loan Offer at 3%, Feel Free to REPLY back to us for more info

Re: git fetch with refspec does not include tags?

2017-08-17 Thread Jeff King

On Thu, Aug 17, 2017 at 09:02:52AM +, Carlsson, Magnus wrote:

> In the git fetch documentation it states that by default you will
> fetch all tags that point into the history to the branches fetched.
> 
> "By default, any tag that points into the histories being fetched is
> also fetched; the effect is to fetch tags that point  at branches that
> you are interested in. This default behavior can be changed by using
> the --tags or --no-tags options or by configuring
> remote..tagOpt. By using a refspec that fetches tags explicitly,
> you can fetch tags that do not point into branches  you are interested
> in as well."
> 
> But for me I get tags if I do "git fetch" or "git fetch origin" but if
> I do "git fetch origin master" I don't get tags related to the master
> branch.
> 
> I understand that this might be due to me specifying a refspec and
> then it will only get that exact refspec, but for me it's not that
> clear from the documentation what I should expect. I read it as when I
> fetch something all related tags will come along.

I'll admit that our tag-autofollow behavior has often confused me. So
I'll try to untangle what's happening at least if not the reasoning. :)

I think the problem is not that you have a refspec, but that your
refspec has no destination. Looking at the fetch code, we seem to turn
on autotags only when the destination is a "real" ref and not just the
default FETCH_HEAD. Which sort-of makes sense. If you're doing a one-off
into FETCH_HEAD, you probably don't want to create tags, even if you
have the objects they point to.

But this is further complicated by the opportunistic tracking-ref
updates.  You can see some interesting behavior with a setup like this:

  git init parent
  git -C parent commit --allow-empty -m one &&
  git -C parent tag -m foo mytag

  git init child
  cd child
  git remote add origin ../parent

and then:

  # no tags, we just populate FETCH_HEAD because of the bare URL
  git fetch ../parent

  # this does fetch tags, because we're storing the result according to
  # the configured refspec ("refs/heads/*:refs/remotes/origin/*").
  git fetch origin

  # this doesn't fetch tags, as the main command is "just" populating
  # FETCH_HEAD. But then our logic for "hey, we fetched the ref for
  # refs/remotes/origin/master, so let's update it on the side" kicks
  # in. And we end up updating FETCH_HEAD _and_ the tracking branch, but
  # not the tags. Weird.
  git fetch origin master

  # and this one does fetch tags, because we have a real destination.
  git fetch origin master:foo

So what I'd say is:

  1. Definitely these defaults are under-documented. I couldn't find
 them anywhere in git-fetch(1).

  2. If we continue to follow the "are we storing any refs" rule for the
 default, possibly it should expand to "did we store anything,
 including opportunistic tracking-ref updates".

-Peff

Re: git add -p breaks after split on change at the top of the file

2017-08-17 Thread Jeff King

On Thu, Aug 17, 2017 at 05:03:08AM -0400, Jeff King wrote:

> But that does the opposite of what we want: it makes this case work when
> --allow-overlap isn't specified. I think my first attempt is probably
> closer to the right direction (but we'd want to have it kick in only
> when --allow-overlap is specified; well formed patches shouldn't overlap
> but we'd want to barf loudly if they do).
> 
> I'll stop here for now before digging any further. I'm not all that
> familiar with the apply code, so I have a feeling Junio's comments may
> stop me from going down an unproductive dead end. :)

OK, last email, I promise. :)

My first attempt which turns off match_beginning for the overlapping
hunk really does feel wrong. It would blindly match a similar hunk later
in the file. So if you did something like:

  1. The first hunk deletes context line "x".

  2. The second overlapping hunk claims to start at the beginning of the
 file (line 1), and then adds a line after "x".

  3. Later in the file, we have another line "x".

Then the second hunk should be rejected, and not find the unrelated "x"
from (3). But with my patch, I suspect it would be found (because we
dropped the requirement to find it at the beginning).

Of course this is a pretty ridiculous input in the first place. In
theory it _could_ be figured out, but overlapping hunks like this are
always going to cause problems (in this case, context from the second
hunk became a removal, and the second hunk no longer applies).

So maybe it's not worth caring about.  But I think the most robust
solution would involve checking the lines between try_lno and the start
to make sure they were all added. I just don't know that we have an easy
way of checking that. Perhaps if we recorded the cumulative number of
lines added in previous hunks, and used that as our starting point
(instead of "line 0").

I also suspect that match_end has a similar problem, but I couldn't
trigger it in practice (perhaps it's impossible, or perhaps my brain is
just not up to the challenge today).

-Peff

Re: git add -p breaks after split on change at the top of the file

2017-08-17 Thread Jeff King

On Thu, Aug 17, 2017 at 04:41:09AM -0400, Jeff King wrote:

> diff --git a/apply.c b/apply.c
> index 41ee63e1db..606db58218 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -2966,8 +2966,9 @@ static int apply_one_fragment(struct apply_state *state,
>* In other words, a hunk that is (frag->oldpos <= 1) with or
>* without leading context must match at the beginning.
>*/
> - match_beginning = (!frag->oldpos ||
> -(frag->oldpos == 1 && !state->unidiff_zero));
> + match_beginning = (nth_fragment == 1 &&
> +(!frag->oldpos ||
> + (frag->oldpos == 1 && !state->unidiff_zero)));
>  
>   /*
>* A hunk without trailing lines must match at the end.
> 
> 
> But I'm not quite sure if that's right. The rest of the overlap code
> seems to mark patched lines with a flag. Meaning that instead of giving
> up and saying "well, this is the second line so we can't ever try
> matching the beginning", we should be redefining "beginning" in that
> case to allow 0 or more PATCHED lines starting from the beginning of the
> file.

Hmm, actually I was reading that part of the code backwards. We record
the PATCHED flag _only_ when allow_overlap isn't set, not the other way
around. So I had been imagining we'd want something like this:

diff --git a/apply.c b/apply.c
index 41ee63e1db..4048751807 100644
--- a/apply.c
+++ b/apply.c
@@ -2489,8 +2489,11 @@ static int match_fragment(struct apply_state *state,
return 0;
}
 
-   if (match_beginning && try_lno)
-   return 0;
+   if (match_beginning) {
+   for (i = 0; i < try_lno; i++)
+   if (!(img->line[i].flag & LINE_PATCHED))
+   return 0;
+   }
 
/* Quick hash check */
for (i = 0; i < preimage_limit; i++)

But that does the opposite of what we want: it makes this case work when
--allow-overlap isn't specified. I think my first attempt is probably
closer to the right direction (but we'd want to have it kick in only
when --allow-overlap is specified; well formed patches shouldn't overlap
but we'd want to barf loudly if they do).

I'll stop here for now before digging any further. I'm not all that
familiar with the apply code, so I have a feeling Junio's comments may
stop me from going down an unproductive dead end. :)

-Peff

git fetch with refspec does not include tags?

2017-08-17 Thread Carlsson, Magnus

Hi

In the git fetch documentation it states that by default you will fetch all 
tags that point into the history to the branches fetched.

"By default, any tag that points into the histories being fetched is also 
fetched; the effect is to fetch tags that point  at branches that you are 
interested in. This default behavior can be changed by using the --tags or 
--no-tags options or by configuring remote..tagOpt. By using a refspec 
that fetches tags explicitly, you can fetch tags that do not point into 
branches  you are interested in as well."

But for me I get tags if I do "git fetch" or "git fetch origin" but if I do 
"git fetch origin master" I don't get tags related to the master branch.

I understand that this might be due to me specifying a refspec and then it will 
only get that exact refspec, but for me it's not that clear from the 
documentation what I should expect. I read it as when I fetch something all 
related tags will come along.

Using: git version 2.13.0.rc1.15.gd2bbb7c

-- Magnus

MAGNUS CARLSSON
Staff Software Engineer
ARRIS

o: +46 13 36 75 92
e: magnus.carls...@arris.com
w: www.arris.com

ARRIS:  Legal entity: Arris Sweden AB - Registered Office: Teknikringen 2, 583 
30 Linkoping, Sweden - Reg No:556518-5831 - VAT No:SE 556518-583

This electronic transmission (and any attached document) is for the sole use of 
the individual or entity to whom it is addressed.  It is confidential and may 
be attorney-client privileged.  In any event the Sender reserves, to the 
fullest extent, any "legal  advice privilege".  Any further distribution or 
copying of this message is strictly prohibited.  If you received this message 
in error, please notify the Sender immediately and destroy the attached message 
(and all attached documents).

Re: git add -p breaks after split on change at the top of the file

2017-08-17 Thread Jeff King

[+cc Junio, as this gets deep into git-apply innards]

On Wed, Aug 16, 2017 at 10:25:04PM +0200, Simon Ruderich wrote:

> $ git add -p
> diff --git a/file b/file
> index 587be6b..74a69a0 100644
> --- a/file
> +++ b/file
> @@ -1 +1,4 @@
> +a
> +b
>  x
> +c
> Stage this hunk [y,n,q,a,d,/,s,e,?]? <-- press s
> Split into 2 hunks.
> @@ -1 +1,3 @@
> +a
> +b
>  x
> Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? <-- press e
> 
> Now delete the line "+a" in your editor and save.
> 
> error: patch failed: file:1
> error: file: patch does not apply
> 
> I expected git add -p to stage this change without error. It
> works fine without splitting the hunk (by deleting the first and
> last + line in the diff).

Interesting case. The problem isn't in add--interactive itself (I don't
think), but in git-apply. This is the diff we end up feeding to "git
apply --cached --check --recount --allow-overlap" to see if it applies:

  diff --git a/file b/file
  index 587be6b..74a69a0 100644
  --- a/file
  +++ b/file
  @@ -1 +1,3 @@
  +b
   x
  @@ -1 +3,2 @@
   x
  +c

The first hunk (that we edited) applies fine. But the second one does
not. Its hunk header says that it starts at line "1", so we expect to
find it at the beginning of the file. But of course it _isn't_ at the
beginning of the file anymore, because the first hunk added a line
before there.

So this diff is somewhat bogus; it has two hunks which start at the same
spot. But I think that's exactly the sort of thing that
"--allow-overlap" should handle. Doing this makes your case work for me:

diff --git a/apply.c b/apply.c
index 41ee63e1db..606db58218 100644
--- a/apply.c
+++ b/apply.c
@@ -2966,8 +2966,9 @@ static int apply_one_fragment(struct apply_state *state,
 * In other words, a hunk that is (frag->oldpos <= 1) with or
 * without leading context must match at the beginning.
 */
-   match_beginning = (!frag->oldpos ||
-  (frag->oldpos == 1 && !state->unidiff_zero));
+   match_beginning = (nth_fragment == 1 &&
+  (!frag->oldpos ||
+   (frag->oldpos == 1 && !state->unidiff_zero)));

/*
 * A hunk without trailing lines must match at the end.

But I'm not quite sure if that's right. The rest of the overlap code
seems to mark patched lines with a flag. Meaning that instead of giving
up and saying "well, this is the second line so we can't ever try
matching the beginning", we should be redefining "beginning" in that
case to allow 0 or more PATCHED lines starting from the beginning of the
file.

-Peff

Re: [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case

2017-08-17 Thread Torsten Bögershausen

On Thu, Aug 17, 2017 at 12:12:36AM -0700, Junio C Hamano wrote:
> Torsten Bögershausen  writes:
> 
> > Unless we re-define the meaning of "NULL" into "don't do CRLF conversions,
> > like SAFE_CRLF_KEEP_CRLF does.
> 
> My preference is not to use NULL as any hint.  Instead, the "flag"
> parameter we already pass to convert_to_git(), just like the updated
> read_old_data() uses SAFE_CRLF_KEEP_CRLF to tell it that it should
> not disturb existing CRLF without looking at the istate, should be
> used to tell convert_to_git() to do the opposite, but do so without
> looking at the istate.
> 
> Perhaps SAFE_CRLF_FALSE should be such a flag.  Or perhaps we need
> to invent another flag.  I dunno.

OK, message taken, in short:
I will come up with a new series in a couple of days - 
thanks for the input.

Re: reftable [v7]: new ref storage format

2017-08-17 Thread Michael Haggerty

On Wed, Aug 16, 2017 at 11:05 PM, Junio C Hamano  wrote:
> I found it a slightly odd that we do not insist that update_indices
> that appear in a single reftable file are consecutive, yet we
> require that min_update_index of a reftable file must be one greater
> than the max_update_index of a previous one.  That is not a new
> issue in v7, though.

I think of `update_index` like a pseudo-time, and the
`min_update_index` and `max_update_index` to be stating that "this
reftable covers the time interval specified". So it's reasonable to
say that the reftable files, together, should cover all time.

But it might be that there are values of `update_index` for which no
events survived within a reftable file that covers that time interval.
This can happen if reference update records have been compacted away
because later reference updates overwrote their effects, and either

* reflogs were turned off for those updates, or
* the corresponding reflogs have been compacted into a separate file, or
* the corresponding reflog entries for those updates have been expired.

Michael

Re: [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case

2017-08-17 Thread Junio C Hamano

Torsten Bögershausen  writes:

> Unless we re-define the meaning of "NULL" into "don't do CRLF conversions,
> like SAFE_CRLF_KEEP_CRLF does.

My preference is not to use NULL as any hint.  Instead, the "flag"
parameter we already pass to convert_to_git(), just like the updated
read_old_data() uses SAFE_CRLF_KEEP_CRLF to tell it that it should
not disturb existing CRLF without looking at the istate, should be
used to tell convert_to_git() to do the opposite, but do so without
looking at the istate.

Perhaps SAFE_CRLF_FALSE should be such a flag.  Or perhaps we need
to invent another flag.  I dunno.

Re: [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case

2017-08-17 Thread Junio C Hamano

Torsten Bögershausen  writes:

> I don't have time to look at this today or tomorrow,
> please give a hint if you are working further.

It is past my bedtime, and generally I prefer not to touch topics
that I know other people are willing to look into, especially when
I know those "other people" are well informed and capable.

Thanks.

Re: [PATCH] fix revisions doc about quoting for ':/' notation

2017-08-17 Thread ryenus

On 17 August 2017 at 05:57, Junio C Hamano  wrote:
> Andreas Heiduk  writes:
>
>> Am 16.08.2017 um 05:21 schrieb ryenus:
>>> To make sure the `` in `:/` is seen as one search string,
>>> one should quote/escape `` properly.
>>>
>>> Especially, the example given in the manual `:/fix nasty bug` does not
>>> work because of missing quotes. The examples are now corrected, and a
>>> note about quoting/escaping is added as well.
>>
>> Right now the documentation describes the syntax as git sees the
>> parameters. This is agnostic of the shell or other UI with their
>> different quoting rules.  For example users of fish must quote
>> `rev@{2}`. A GUI might require no quoting at all. In that case `:/"fix
>> nasty bugs"` would be given to git verbatim and hence not find the revision.
>
> These are all good points that I didn't consider when responding.
>

Makes sense for me, too. I've just sent a v2 patch, which leaves the original
example as-is, meanwhile added a example inside the explanation.

Thanks!

[PATCH v2] update revisions doc for quoting in ':/' notation

2017-08-17 Thread ryenus

To make sure the `` in `:/` is seen as one search string,
one should quote/escape `` properly.

Especially, the example given in the manual `:/fix nasty bug` does not
work because of missing quotes when used in shell. A note about
quoting/escaping is added along with a working example, however, the
original example is left-as-is to be consistent with other examples.

Signed-off-by: ryenus 
---
 Documentation/revisions.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/revisions.txt b/Documentation/revisions.txt
index 61277469c..d2862d55d 100644
--- a/Documentation/revisions.txt
+++ b/Documentation/revisions.txt
@@ -185,7 +185,9 @@ existing tag object.
   e.g. ':/^foo'. The special sequence ':/!' is reserved for modifiers to what
   is matched. ':/!-foo' performs a negative match, while ':/!!foo' matches a
   literal '!' character, followed by 'foo'. Any other sequence beginning with
-  ':/!' is reserved for now.
+  ':/!' is reserved for now. And make sure to quote/escape for the text to be
+  interpreted as the expected search string/pattern, e.g., for a commit whose
+  message matches a literal \'`$`', use `git show :/\\\$` or `git show ':/\$'`.

 ':', e.g. 'HEAD:README', ':README', 'master:./README'::
   A suffix ':' followed by a path names the blob or tree
-- 
2.14.1

Re: [PATCH v2 2/2] File commited with CRLF should roundtrip diff and apply

2017-08-17 Thread Junio C Hamano

tbo...@web.de writes:

> Analyze the patch if there is a) any context line with CRLF,
> or b) if any line with CRLF is to be removed.
> Thanks to Junio C Hamano, his input became the base for the changes in t4124.
> One test case is split up into 3:
> - Detect the " a\r" line in the patch
> - Detect the "-a\r" line in the patch
> - Use LF in repo and CLRF in the worktree. (*)
>
> * This one proves that convert_to_git(_index,...) still needs to pass
> the , otherwise Git will crash.

I do not understand why you think it proves anything like that.
Forget about "in repo" when you think about "git apply" without
"--index/--cache".  There is *NO* role the index should play in that
mode.

"Otherwise Git will crash" is true, because convert_to_git() tries
to dereference the istate it is given to see if there is CR in the
blob that happens to be in the index at the path.

But step back and *think*.  It only proves that convert_to_git() you
have and/or the way read_old_data() you updated calls it after
applying these two patches are still broken.

The "blob that happens to be in the index at the path" does *NOT*
have anything to do with the file in the working tree that you are
patching.  Such an index entry may not exist (and the code would see
that there is 0 CRs and 0 LFs---so what?), or it may have a blob
full of CRLF, or it may have a blob full of CR not LF, or any random
thing that has NO relation to the file you are patching.  Why should
that random blob (which may not even exist---we are discussing "git
apply" without "--index/--cache" here) affect the outcome?  If it
does not affect the outcome, why should convert_to_git() even look
at it?

It shouldn't be calling down to "has_cr_in_index(istate, path)" that
is called from crlf_to_git() in the first place.  The check for CR
was done in the caller of convert_to_git(), i.e. read_old_data(),
long before convert_to_git() is called, and the caller knows if it
is (or it is not) dealing with data that needs crlf conversion at
that point, based on the contents of the file being patched (which,
again, does not have any relation to the blob that may or may not
happen to be in the index).  

Your updated caller is already telling convert_to_git() codepath
when it needs CRLF and it refrains from peeking in the index with
SAFE_CRLF_KEEP_CRLF flag.  The bug still in the code after applying
these two patches is that the other choice, i.e. SAFE_CRLF_FALSE,
that is passed from read_old_data() to convert_to_git() does *not*
prevent the latter from peeking into the in-core index.  

And that is why "Otherwise Git will crash".

Re: [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case

2017-08-17 Thread Torsten Bögershausen

On Wed, Aug 16, 2017 at 11:34:45AM -0700, Junio C Hamano wrote:
> With the previous fixes to CRLF handling in place, read_old_data()
> knows what it wants convert_to_git() to do with respect to CRLF.  In
> fact, this codepath is about applying a patch to a file in the
> filesystem, which may not exist in the index, or may exist but may
> not match what is recorded in the index, or in the extreme case, we
> may not even be in a Git repository.  If convert_to_git() peeked at
> the index while doing its work, it *would* be a bug.
> 
> Pass NULL instead of _index to the function to make sure we
> catch future bugs to clarify this.

Thanks for the work, and now our emails crossed.

Calling convert_to_git(NULL,...) makes Git crash today, see t4124, my
latest version, "LF in repo, CRLF in working tree)
Unless we re-define the meaning of "NULL" into "don't do CRLF conversions,
like SAFE_CRLF_KEEP_CRLF does.
The combination of convert_to_git(NULL,...,SAFE_CRLF_KEEP_CRLF) is OK,
but all others must supply an 

At a very first glance the end result may look like this:
- Take my changes in convert.[ch]
- Take your changes/commit in apply.c (except the NULL, see above)
- Take my changes in t4124.

I don't have time to look at this today or tomorrow,
please give a hint if you are working further.

[PATCH v2 1/2] convert: Add SAFE_CRLF_KEEP_CRLF

2017-08-17 Thread tboegi

From: Torsten Bögershausen 

When convert_to_git() is called, the caller may want to keep CRLF
to be kept as CRLF (and not converted into LF).

This will be used in the next commit, when apply works with files that have
CRLF and patches are applied onto these files.

Add the new value "SAFE_CRLF_KEEP_CRLF" to safe_crlf.

Prepare convert_to_git() to be able to run the clean filter,
skip the CRLF conversion and run the ident filter.

Signed-off-by: Torsten Bögershausen 
---
 convert.c | 10 ++
 convert.h |  3 ++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/convert.c b/convert.c
index deaf0ba7b3..040123b4fe 100644
--- a/convert.c
+++ b/convert.c
@@ -1104,10 +1104,12 @@ int convert_to_git(const struct index_state *istate,
src = dst->buf;
len = dst->len;
}
-   ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, 
checksafe);
-   if (ret && dst) {
-   src = dst->buf;
-   len = dst->len;
+   if (checksafe != SAFE_CRLF_KEEP_CRLF) {
+   ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, 
checksafe);
+   if (ret && dst) {
+   src = dst->buf;
+   len = dst->len;
+   }
}
return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
diff --git a/convert.h b/convert.h
index cecf59d1aa..cabd5ed6dd 100644
--- a/convert.h
+++ b/convert.h
@@ -10,7 +10,8 @@ enum safe_crlf {
SAFE_CRLF_FALSE = 0,
SAFE_CRLF_FAIL = 1,
SAFE_CRLF_WARN = 2,
-   SAFE_CRLF_RENORMALIZE = 3
+   SAFE_CRLF_RENORMALIZE = 3,
+   SAFE_CRLF_KEEP_CRLF = 4
 };
 
 extern enum safe_crlf safe_crlf;
-- 
2.14.1.145.gb3622a4ee9

[PATCH v2 2/2] File commited with CRLF should roundtrip diff and apply

2017-08-17 Thread tboegi

From: Torsten Bögershausen 

When a file had been commited with CRLF but now .gitattributes say
"* text=auto" (or core.autocrlf is true),
the following does not roundtrip, `git apply` fails:

printf "Added line\r\n" >>file &&
git diff >patch &&
git checkout -- . &&
git apply patch

Before applying the patch, the file from working tree is converted into the
index format (clean filter, CRLF conversion, ...)
Here, when commited with CRLF, the line endings should not be converted.

Note that `git apply --index` or `git apply --cache` doesn't call
convert_to_git() because the source material is already in index format.

Analyze the patch if there is a) any context line with CRLF,
or b) if any line with CRLF is to be removed.
In this case the patch file `patch` has mixed line endings, for a)
it looks like this (ignore the $ at the begin of the line):

$ diff --git a/one b/one
$ index 533790e..c30dea8 100644
$ --- a/one
$ +++ b/one
$ @@ -1 +1,2 @@
$  a\r
$ +b\r

And for b) it looks like this:

$ diff --git a/one b/one
$ index 533790e..485540d 100644
$ --- a/one
$ +++ b/one
$ @@ -1 +1 @@
$ -a\r
$ +b\r

If `git apply` detects that the patch itself has CRLF, (look at the line
" a\r" or "-a\r" above), the new flag has_crlf is set in "struct patch"
and two things will happen:
- read_old_data() will not convert CRLF into LF by calling
  convert_to_git(..., SAFE_CRLF_KEEP_CRLF);
- The WS_CR_AT_EOL bit is set in the "white space rule",
  CRLF are no longer treated as white space.

Thanks to Junio C Hamano, his input became the base for the changes in t4124.
One test case is split up into 3:
- Detect the " a\r" line in the patch
- Detect the "-a\r" line in the patch
- Use LF in repo and CLRF in the worktree. (*)

* This one proves that convert_to_git(_index,...) still needs to pass
the , otherwise Git will crash.

Reported-by: Anthony Sottile 
Signed-off-by: Torsten Bögershausen 
---
 apply.c  | 28 +++-
 t/t4124-apply-ws-rule.sh | 33 +++--
 2 files changed, 50 insertions(+), 11 deletions(-)

diff --git a/apply.c b/apply.c
index f2d599141d..bebb176099 100644
--- a/apply.c
+++ b/apply.c
@@ -220,6 +220,7 @@ struct patch {
unsigned int recount:1;
unsigned int conflicted_threeway:1;
unsigned int direct_to_threeway:1;
+   unsigned int has_crlf:1;
struct fragment *fragments;
char *result;
size_t resultsize;
@@ -1662,6 +1663,17 @@ static void check_whitespace(struct apply_state *state,
record_ws_error(state, result, line + 1, len - 2, state->linenr);
 }
 
+/* Check if the patch has context lines with CRLF or
+   the patch wants to remove lines with CRLF */
+static void check_old_for_crlf(struct patch *patch, const char *line, int len)
+{
+   if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') {
+   patch->ws_rule |= WS_CR_AT_EOL;
+   patch->has_crlf = 1;
+   }
+}
+
+
 /*
  * Parse a unified diff. Note that this really needs to parse each
  * fragment separately, since the only way to know the difference
@@ -1712,11 +1724,13 @@ static int parse_fragment(struct apply_state *state,
if (!deleted && !added)
leading++;
trailing++;
+   check_old_for_crlf(patch, line, len);
if (!state->apply_in_reverse &&
state->ws_error_action == correct_ws_error)
check_whitespace(state, line, len, 
patch->ws_rule);
break;
case '-':
+   check_old_for_crlf(patch, line, len);
if (state->apply_in_reverse &&
state->ws_error_action != nowarn_ws_error)
check_whitespace(state, line, len, 
patch->ws_rule);
@@ -2268,8 +2282,11 @@ static void show_stats(struct apply_state *state, struct 
patch *patch)
add, pluses, del, minuses);
 }
 
-static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
+static int read_old_data(struct stat *st, struct patch *patch,
+const char *path, struct strbuf *buf)
 {
+   enum safe_crlf safe_crlf = patch->has_crlf ?
+   SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_FALSE;
switch (st->st_mode & S_IFMT) {
case S_IFLNK:
if (strbuf_readlink(buf, path, st->st_size) < 0)
@@ -2278,7 +2295,7 @@ static int read_old_data(struct stat *st, const char 
*path, struct strbuf *buf)
case S_IFREG:
if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
return error(_("unable to open or read %s"), path);
-   convert_to_git(_index, path, buf->buf, buf->len, buf, 0);
+   convert_to_git(_index, path, buf->buf, buf->len, buf, 
safe_crlf);

99 matches

Mail list logo