date:20170808

Re: [PATCH] t4062: stop using repetition in regex

2017-08-08 Thread René Scharfe

Am 09.08.2017 um 00:26 schrieb Junio C Hamano:
> Junio C Hamano  writes:
> 
>> So I find Dscho's concern quite valid, even though I do believe you
>> when you say the code somehow segfaults.  I just can not tell
>> how/why it would segfault, though---it is possible that regexec()
>> implementation is stupid and does not realize that it can return early
>> reporting success without looking at the rest of the buffer, but
>> somehow I find it unlikely.
>>
>> Puzzled.
>>
>>> You get different results?  How is that possible?  The search string is
>>> NUL-terminated in each case, while the point of the test is that the
>>> file contents isn't, right?
> 
> Intellectual curiosity tells me we may want to find out why it
> fails, but in the meantime, I think replacing the test with "0$" to
> force the scanner to find either the end of line or the end of the
> buffer may be a good workaround.  We do not have to care how many of
> random bytes are in front of the last "0" in order to ensure that
> the regexec_buf() does not overstep to 4097th byte, while seeing
> that regexec() that does not know how long the haystack is has to do
> so, no?

Our regexec() calls strlen() (see my other reply).

Using "0$" looks like the best option to me.

René

Re: [PATCH] builtin/add: add a missing newline to an stderr message

2017-08-08 Thread Jonathan Nieder

Ramsay Jones wrote:

> Signed-off-by: Ramsay Jones 

Reviewed-by: Jonathan Nieder 

Thanks for catching it.

Re: [PATCH] builtin/add: add a missing newline to an stderr message

2017-08-08 Thread Jonathan Nieder

Junio C Hamano wrote:
> René Scharfe  writes:

>>> diff --git a/builtin/add.c b/builtin/add.c
>>> index e888fb8c5..385b53ae7 100644
>>> --- a/builtin/add.c
>>> +++ b/builtin/add.c
>>> @@ -43,7 +43,7 @@ static void chmod_pathspec(struct pathspec *pathspec, int 
>>> force_mode)
>>> continue;
>>>   
>>> if (chmod_cache_entry(ce, force_mode) < 0)
>>> -   fprintf(stderr, "cannot chmod '%s'", ce->name);
>>> +   fprintf(stderr, "cannot chmod '%s'\n", ce->name);
>>> }
>>>   }
>>
>> FYI: I brought this up yesterday in the original thread, along with a
>> few other observations:
>>
>>   https://public-inbox.org/git/3c61d9f6-e0fd-22a4-68e0-89fd9ce9b...@web.de/
>>
>> Not sure if the discussion can or should be revived after all this
>> time, though; just sending patches like yours might be the way to go.
>
> Thanks, so it should become
>
>   fprintf(stderr, "cannot chmod %c '%s'\n", force_mode, ce->name);
>
> in the final version to be queued?

I don't believe the force_mode without an 'x' provides a clear signal
to the end user.  Perhaps you meant %cx?

Thanks,
Jonathan

Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-08 Thread Jeff King

On Tue, Aug 08, 2017 at 03:43:13PM -0700, Junio C Hamano wrote:

> > @@ -2812,7 +2812,7 @@ off_t find_pack_entry_one(const unsigned char *sha1,
> > hi = mi;
> > else
> > lo = mi+1;
> > -   } while (lo < hi);
> > +   }
> > return 0;
> >  }
> 
> Interesting.  I see that we still have the conditional code to call
> out to sha1-lookup.c::sha1_entry_pos().  Do we need a similar change
> over there, I wonder?  Alternatively, as we have had the experimental
> sha1-lookup.c::sha1_entry_pos() long enough without anybody using it,
> perhaps we should write it off as a failed experiment and retire it?

There is also sha1_pos(), which seems to have the same problem (and is
used in several places).

-Peff

[PATCH] t1200: remove t1200-tutorial.sh

2017-08-08 Thread Stefan Beller

Nowadays there are better tutorials out there such as "Git from bottom up"
or others, easily found online. Additionally to that a tutorial in our
test suite is not as easy to discover as e.g. online tutorials.

This test/tutorial was discovered by the patch author in the effort to
migrate our tests in preparation to switch the hashing function.
Transforming this tutorial to be agnostic of the underlying hash function
would hurt its readability, hence being even less useful as a tutorial.

Instead delete this test as
(a) the functionality is tested elsewhere as well and
(b) reducing the test suite to its core improves performance, which
aids developers in keeping their development velocity.

Signed-off-by: Stefan Beller 
---
 t/t1200-tutorial.sh | 268 
 1 file changed, 268 deletions(-)
 delete mode 100755 t/t1200-tutorial.sh

diff --git a/t/t1200-tutorial.sh b/t/t1200-tutorial.sh
deleted file mode 100755
index 397ccb6909..00
--- a/t/t1200-tutorial.sh
+++ /dev/null
@@ -1,268 +0,0 @@
-#!/bin/sh
-#
-# Copyright (c) 2005 Johannes Schindelin
-#
-
-test_description='A simple turial in the form of a test case'
-
-. ./test-lib.sh
-
-test_expect_success 'blob'  '
-   echo "Hello World" > hello &&
-   echo "Silly example" > example &&
-
-   git update-index --add hello example &&
-
-   test blob = "$(git cat-file -t 557db03)"
-'
-
-test_expect_success 'blob 557db03' '
-   test "Hello World" = "$(git cat-file blob 557db03)"
-'
-
-echo "It's a new day for git" >>hello
-cat > diff.expect << EOF
-diff --git a/hello b/hello
-index 557db03..263414f 100644
 a/hello
-+++ b/hello
-@@ -1 +1,2 @@
- Hello World
-+It's a new day for git
-EOF
-
-test_expect_success 'git diff-files -p' '
-   git diff-files -p > diff.output &&
-   test_cmp diff.expect diff.output
-'
-
-test_expect_success 'git diff' '
-   git diff > diff.output &&
-   test_cmp diff.expect diff.output
-'
-
-test_expect_success 'tree' '
-   tree=$(git write-tree 2>/dev/null) &&
-   test 8988da15d077d4829fc51d8544c097def6644dbb = $tree
-'
-
-test_expect_success 'git diff-index -p HEAD' '
-   test_tick &&
-   tree=$(git write-tree) &&
-   commit=$(echo "Initial commit" | git commit-tree $tree) &&
-   git update-ref HEAD $commit &&
-   git diff-index -p HEAD > diff.output &&
-   test_cmp diff.expect diff.output
-'
-
-test_expect_success 'git diff HEAD' '
-   git diff HEAD > diff.output &&
-   test_cmp diff.expect diff.output
-'
-
-cat > whatchanged.expect << EOF
-commit VARIABLE
-Author: VARIABLE
-Date:   VARIABLE
-
-Initial commit
-
-diff --git a/example b/example
-new file mode 100644
-index 000..f24c74a
 /dev/null
-+++ b/example
-@@ -0,0 +1 @@
-+Silly example
-diff --git a/hello b/hello
-new file mode 100644
-index 000..557db03
 /dev/null
-+++ b/hello
-@@ -0,0 +1 @@
-+Hello World
-EOF
-
-test_expect_success 'git whatchanged -p --root' '
-   git whatchanged -p --root |
-   sed -e "1s/^\(.\{7\}\).\{40\}/\1VARIABLE/" \
-   -e "2,3s/^\(.\{8\}\).*$/\1VARIABLE/" \
-   > whatchanged.output &&
-   test_cmp whatchanged.expect whatchanged.output
-'
-
-test_expect_success 'git tag my-first-tag' '
-   git tag my-first-tag &&
-   test_cmp .git/refs/heads/master .git/refs/tags/my-first-tag
-'
-
-test_expect_success 'git checkout -b mybranch' '
-   git checkout -b mybranch &&
-   test_cmp .git/refs/heads/master .git/refs/heads/mybranch
-'
-
-cat > branch.expect < branch.output &&
-   test_cmp branch.expect branch.output
-'
-
-test_expect_success 'git resolve now fails' '
-   git checkout mybranch &&
-   echo "Work, work, work" >>hello &&
-   test_tick &&
-   git commit -m "Some work." -i hello &&
-
-   git checkout master &&
-
-   echo "Play, play, play" >>hello &&
-   echo "Lots of fun" >>example &&
-   test_tick &&
-   git commit -m "Some fun." -i hello example &&
-
-   test_must_fail git merge -m "Merge work in mybranch" mybranch
-'
-
-cat > hello << EOF
-Hello World
-It's a new day for git
-Play, play, play
-Work, work, work
-EOF
-
-cat > show-branch.expect << EOF
-* [master] Merge work in mybranch
- ! [mybranch] Some work.
---
--  [master] Merge work in mybranch
-*+ [mybranch] Some work.
-*  [master^] Some fun.
-EOF
-
-test_expect_success 'git show-branch' '
-   test_tick &&
-   git commit -m "Merge work in mybranch" -i hello &&
-   git show-branch --topo-order --more=1 master mybranch \
-   > show-branch.output &&
-   test_cmp show-branch.expect show-branch.output
-'
-
-cat > resolve.expect << EOF
-Updating VARIABLE..VARIABLE
-FASTFORWARD (no commit created; -m option ignored)
- example | 1 +
- hello   | 1 +
- 2 files changed, 2 insertions(+)
-EOF
-
-test_expect_success 'git resolve' '
-   git checkout mybranch &&
-   git merge -m "Merge upstream changes." master |

Re: [PATCH] t1200: remove t1200-tutorial.sh

2017-08-08 Thread Jonathan Nieder

Hi,

Stefan Beller wrote:

> Nowadays there are better tutorials out there such as "Git from bottom up"
> or others, easily found online. Additionally to that a tutorial in our
> test suite is not as easy to discover as e.g. online tutorials.
>
> This test/tutorial was discovered by the patch author in the effort to
> migrate our tests in preparation to switch the hashing function.
> Transforming this tutorial to be agnostic of the underlying hash function
> would hurt its readability, hence being even less useful as a tutorial.
>
> Instead delete this test as
> (a) the functionality is tested elsewhere as well and
> (b) reducing the test suite to its core improves performance, which
> aids developers in keeping their development velocity.
>
> Signed-off-by: Stefan Beller 
> ---
>  t/t1200-tutorial.sh | 268 
> 
>  1 file changed, 268 deletions(-)
>  delete mode 100755 t/t1200-tutorial.sh

Interesting.  When I first saw the diffstat I assumed you were talking
about a test that validates the examples in some manpage are correct.
But this is not that.

There indeed appear to be other good tests for these commands, even
"git whatchanged", so for what it's worth,

Reviewed-by: Jonathan Nieder 

Thanks.

[PATCH] builtin/add: add a missing newline to an stderr message

2017-08-08 Thread Ramsay Jones


Signed-off-by: Ramsay Jones 
---

Hi Junio,

I noticed this while looking into the t3700 failure on cygwin tonight.
Also, I couldn't decide whether or not to add the i18n '_()' brackets
around the message. In the end I didn't, but will happily add them
if you think I should.

Thanks!

ATB,
Ramsay Jones

 builtin/add.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/add.c b/builtin/add.c
index e888fb8c5..385b53ae7 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -43,7 +43,7 @@ static void chmod_pathspec(struct pathspec *pathspec, int 
force_mode)
continue;
 
if (chmod_cache_entry(ce, force_mode) < 0)
-   fprintf(stderr, "cannot chmod '%s'", ce->name);
+   fprintf(stderr, "cannot chmod '%s'\n", ce->name);
}
 }
 
-- 
2.14.0

Re: [PATCH] builtin/add: add a missing newline to an stderr message

2017-08-08 Thread René Scharfe

Am 08.08.2017 um 23:36 schrieb Ramsay Jones:
> 
> Signed-off-by: Ramsay Jones 
> ---
> 
> Hi Junio,
> 
> I noticed this while looking into the t3700 failure on cygwin tonight.
> Also, I couldn't decide whether or not to add the i18n '_()' brackets
> around the message. In the end I didn't, but will happily add them
> if you think I should.
> 
> Thanks!
> 
> ATB,
> Ramsay Jones
> 
>   builtin/add.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/builtin/add.c b/builtin/add.c
> index e888fb8c5..385b53ae7 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -43,7 +43,7 @@ static void chmod_pathspec(struct pathspec *pathspec, int 
> force_mode)
>   continue;
>   
>   if (chmod_cache_entry(ce, force_mode) < 0)
> - fprintf(stderr, "cannot chmod '%s'", ce->name);
> + fprintf(stderr, "cannot chmod '%s'\n", ce->name);
>   }
>   }
>   

FYI: I brought this up yesterday in the original thread, along with a
few other observations:

  https://public-inbox.org/git/3c61d9f6-e0fd-22a4-68e0-89fd9ce9b...@web.de/

Not sure if the discussion can or should be revived after all this
time, though; just sending patches like yours might be the way to go.

René

Re: [PATCH] builtin/add: add a missing newline to an stderr message

2017-08-08 Thread Junio C Hamano

René Scharfe  writes:

>> diff --git a/builtin/add.c b/builtin/add.c
>> index e888fb8c5..385b53ae7 100644
>> --- a/builtin/add.c
>> +++ b/builtin/add.c
>> @@ -43,7 +43,7 @@ static void chmod_pathspec(struct pathspec *pathspec, int 
>> force_mode)
>>  continue;
>>   
>>  if (chmod_cache_entry(ce, force_mode) < 0)
>> -fprintf(stderr, "cannot chmod '%s'", ce->name);
>> +fprintf(stderr, "cannot chmod '%s'\n", ce->name);
>>  }
>>   }
>
> FYI: I brought this up yesterday in the original thread, along with a
> few other observations:
>
>   https://public-inbox.org/git/3c61d9f6-e0fd-22a4-68e0-89fd9ce9b...@web.de/
>
> Not sure if the discussion can or should be revived after all this
> time, though; just sending patches like yours might be the way to go.

Thanks, so it should become

fprintf(stderr, "cannot chmod %c '%s'\n", force_mode, ce->name);

in the final version to be queued?

Re: reftable [v6]: new ref storage format

2017-08-08 Thread Shawn Pearce

On Tue, Aug 8, 2017 at 12:25 PM, Junio C Hamano  wrote:
> Shawn Pearce  writes:
>
>> For `log_type = 0x4..0x7` the `log_chained` section is used instead to
>> compress information that already appeared in a prior log record.  The
>> `log_chained` always includes `old_id` for this record, as `new_id` is
>> implied by the prior (by file order, more recent) record's `old_id`.
>>
>> The `not_same_committer` block appears if `log_type & 0x1` is true,
>> `not_same_message` block appears if `log_type & 0x2` is true.  When
>> one of these blocks is missing, its values are implied by the prior
>> (more recent) log record.
>
> Two comments.
>
>  * not-same-committer would be what I would use when I switch
>timezones, even if I stay to be me, right?

Correct. This is based on the theory that the timezone in a reflog is
actually the system timezone, not your timezone. If you push to a
remote system, that system's reflog will be using that system's
timezone, not your timezone. So you aren't really that different, and
we can compress the timezone part away. Also, if you do move
timezones, you are likely to remain in that timezone for some period
of time, and such we can compress many log records again with the same
timezone+name+email.

Its ancient history from my research with "pack v4", but people don't
really change timezones very often in the Git committer data. I
suspect its even more true with reflog data.

>  I am just wondering
>if it is clear to everybody that "committer" in that phrase is a
>short-hand for "committer information other than the timestamp".

Maybe not. I will try to come up with another shorthand name for this.

>  * Should the set of entries that are allowed to use of "chained"
>log be related to the set of entries that appear in the restart
>table in any way?  For a reader that scans starting at a restart
>point, it would be very cumbersome if the entry were chained from
>the previous entry, as it would force it to backtrack entries to
>find the first non-chained log entry.  A simple "log_chained must
>not be used for an entry that appear in the restart table" rule
>would solve that, but I didn't see it in the document.

Good catch!  This is implemented as you described in JGit (for the
reasons you described), but not documented. I'll fix it.

Re: [PATCH] Convert size datatype to size_t

2017-08-08 Thread Junio C Hamano

Martin Koegler  writes:

> From: Martin Koegler 
>
> It changes the signature of the core object access function
> including any other functions to assure a clean compile if
> sizeof(size_t) != sizeof(unsigned long).
>
> Signed-off-by: Martin Koegler 
> ---
> ...
>  66 files changed, 193 insertions(+), 191 deletions(-)

Wow, that's a lot of changes.  What version did you worked this
patch on?  The reason I ask is because...

> diff --git a/diff-delta.c b/diff-delta.c
> index cd238c8..3d5e1ef 100644

... I do not see any version of Git that had blob cd238c8 at that
path, so "git am -3" is having hard time applying this pach to allow
me reviewing it.

Re: [PATCH] t4062: stop using repetition in regex

2017-08-08 Thread Junio C Hamano

Junio C Hamano  writes:

> So I find Dscho's concern quite valid, even though I do believe you
> when you say the code somehow segfaults.  I just can not tell
> how/why it would segfault, though---it is possible that regexec()
> implementation is stupid and does not realize that it can return early
> reporting success without looking at the rest of the buffer, but
> somehow I find it unlikely.
>
> Puzzled.
>
>> You get different results?  How is that possible?  The search string is
>> NUL-terminated in each case, while the point of the test is that the
>> file contents isn't, right?

Intellectual curiosity tells me we may want to find out why it
fails, but in the meantime, I think replacing the test with "0$" to
force the scanner to find either the end of line or the end of the
buffer may be a good workaround.  We do not have to care how many of
random bytes are in front of the last "0" in order to ensure that
the regexec_buf() does not overstep to 4097th byte, while seeing
that regexec() that does not know how long the haystack is has to do
so, no?

Re: [PATCH] t4062: stop using repetition in regex

2017-08-08 Thread René Scharfe

Am 09.08.2017 um 00:09 schrieb Junio C Hamano:
> René Scharfe  writes:
> 
>> Am 08.08.2017 um 16:49 schrieb Johannes Schindelin:
>>> Hi René,
>>>
>>> On Tue, 8 Aug 2017, René Scharfe wrote:
>>>
 OpenBSD's regex library has a repetition limit (RE_DUP_MAX) of 255.
 That's the minimum acceptable value according to POSIX.  In t4062 we use
 4096 repetitions in the test "-G matches", though, causing it to fail.

 Do the same as the test "-S --pickaxe-regex" in the same file and search
 for a single zero instead.  That still suffices to trigger the buffer
 overrun in older versions (checked with b7d36ffca02^ and --valgrind on
 Linux), simplifies the test a bit, and avoids exceeding OpenBSD's limit.
>>>
>>> I am afraid not. The 4096 is precisely the page size required to trigger
>>> the bug on Windows against which this regression test tries to safeguard.
>>
>> Checked with b7d36ffca02^ on MinGW now as well and found that it
>> segfaults with the proposed change ten out of ten times.
> 
> That is a strange result but I can believe it.
> 
> The reason why I find it strange is that the test wants to run
> diff_grep() in diffcore-pickaxe.c with one == NULL (because we are
> looking at difference between an initial empty commit and the
> current commit that adds 4096-zeroes.txt file), which makes the
> current blob (i.e. a page of '0' that may be mmap(2)ed without
> trailing NUL to terminate it) scanned via regexec() to look for the
> search string.
> 
> I can understand why Dscho originally did "^0{4096}$"; it is to
> ensure that the whole page is scanned for 4096 zeroes and then the
> code would somehow make sure that there is no more byte until the
> end of line, which will force regexec (but not regexec_buf that knows
> where the buffer ends) to look at the 4097th byte that does not exist.
> 
> If you change the pattern to just "0" that is not anchored, I'd expect
> regexec() that does not know how big the haystack is to just find "0"
> at the first byte and happily return without segfaulting (because it
> does not even have to scan the remainder of the buffer).
> 
> So I find Dscho's concern quite valid, even though I do believe you
> when you say the code somehow segfaults.  I just can not tell
> how/why it would segfault, though---it is possible that regexec()
> implementation is stupid and does not realize that it can return early
> reporting success without looking at the rest of the buffer, but
> somehow I find it unlikely.
> 
> Puzzled.

Good point.  Valgrind reports:

==57466== Process terminating with default action of signal 11 (SIGSEGV): 
dumping core
==57466==  Access not within mapped region at address 0x4027000
==57466==at 0x4C2EDF4: strlen (vg_replace_strmem.c:458)
==57466==by 0x59D9F76: regexec@@GLIBC_2.3.4 (regexec.c:240)
==57466==by 0x54D96E: diff_grep (diffcore-pickaxe.c:0)
==57466==by 0x54DAC3: pickaxe_match (diffcore-pickaxe.c:149)

And you can see in our version in compat/regex/regexec.c:241 that the
first thing regexec() does is calling strlen().

So to avoid depending on that implementation detail we'd need to use
a search string that won't be found (e.g. "1") or with unlimited
repetition (e.g. "0*"), right?

René

Re: reftable [v6]: new ref storage format

2017-08-08 Thread Shawn Pearce

On Tue, Aug 8, 2017 at 12:01 PM, Junio C Hamano  wrote:
> I notice that you raised the location of restart table within a
> block in this iteration (or maybe it happened in v5).
>
> This forces you to hold all contents in core before the first byte
> is written out.  You start from the first entry (which will become
> the first restart entry), emit a handful as prefix compressed
> entries, emit a full entry (which will become the next restart
> entry), ... until you have enough to fill both the data and the
> restart table, then start writing from the header (which needs the
> length of the block), restart table and then data.
>
> I think it is OK to do so for the blocks whose size is limited to
> 16M, but I wonder if it is sensible to do the same for the index
> block whose limit is 2G.  If you keep the restart table after the
> data, then you could stream out the entries as you emit, write the
> restart table, and then seek back to fix the length in the header,
> without holding the 2G in core, no?

Yes. I'm torn on the ordering: restart table first or restart table last.

The advantage of it first is the reader can immediately work with it,
without necessarily touching the rest of the block. The disadvantage
is a writer can only stream at block sizes, as the writer is forced to
buffer the entire block. As it happens my implementation in JGit
buffers the entire block anyway, so this didn't really factor as an
issue for me.

Given that the index can now also be multi-level, I don't expect to
see a 2G index. A 2G index forces the reader to load the entire 2G to
take advantage of the restart table. It may be more efficient for such
a reader to have had the writer make a mutli-level index, instead of a
single monster index block. And so perhaps the writer shouldn't make a
2G index block that she is forced to buffer. :)

Perhaps I'll move it back to the tail of the block. I can see the
streaming writer code is maybe more straightforward that way.

Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-08 Thread Jonathan Nieder

René Scharfe wrote:

> find_pack_entry_one() uses the fan-out table of pack indexes to find out
> which entries match the first byte of the searched hash and does a
> binary search on this subset of the main index table.
>
> If there are no matching entries then lo and hi will have the same
> value.  The binary search still starts and compares the hash of the
> following entry (which has a non-matching first byte, so won't cause any
> trouble), or whatever comes after the sorted list of entries.
>
> The probability of that stray comparison matching by mistake is low, but
> let's not take any chances and check when entering the binary search
> loop if we're actually done already.
>
> Signed-off-by: Rene Scharfe 
> ---
>  sha1_file.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Thanks for a clear explanation.

Sanity checking: is this correct in the sha1[0] == 0 case?  In that
case, we have lo = 0, hi = the 0th offset from the fanout table.  The
offsets in the fanout table are defined as "the number of objects in
the corresponding pack, the first byte of whose object name is less
than or equal to N."  So hi == lo would mean there are no objects with
id starting with 0, as hoped.

Or in other words, the [lo, hi) interval we're trying to search is
indeed a half-open interval.

Reviewed-by: Jonathan Nieder

Re: [PATCH] t1200: remove t1200-tutorial.sh

2017-08-08 Thread Jonathan Nieder

Jonathan Nieder wrote:
> Stefan Beller wrote:
>>> Stefan Beller wrote:

 Nowadays there are better tutorials out there such as "Git from bottom up"
 or others, easily found online. Additionally to that a tutorial in our
 test suite is not as easy to discover as e.g. online tutorials.
[...]
>> 2ae6c70674 (Adapt tutorial to cygwin and add test case, 2005-10-13)
>> seemed to imply that it was testing some part for Documentation/tutorial.txt
>> though.
>
> Oh, good point.
>
> v1.2.0~121 (New tutorial, 2006-01-22) means that the test is no longer
> testing what is in the tutorial in any meaningful sense.  That's why
> my search for "git whatchanged -p --root" in manpages didn't find
> anything.

Correction: the tutorial is now called gitcore-tutorial and mostly
survives.  A search for -p --root failed because of v1.5.5.1~19^2
(core-tutorial.txt: Fix showing the current behaviour, 2008-04-10).

That said, the conclusion that this test has mostly bitrotted as far
as its original purpose goes still applies.

An alternative method of addressing the goal you described would be to
fuzz object-id shaped things out of the sample output.  I don't have a
strong preference, given how little this test contributes to coverage
(as you mentioned) and how difficult it is to make it continue to
match the documentation it is trying to test.

Thanks and sorry for the confusion,
Jonathan

[PATCH] builtin/add: add detail to a 'cannot chmod' error message

2017-08-08 Thread Ramsay Jones


In addition to adding the missing newline, add the x-ecutable bit
'mode change' character to the error message. This message now has
the same form as similar messages output by 'update-index'.

Signed-off-by: Ramsay Jones 
---

Hi Junio,

This is v2 of the earlier "add a newline" patch. Thanks!

ATB,
Ramsay Jones

 builtin/add.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index e888fb8c5..5d5773d5c 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -32,7 +32,7 @@ struct update_callback_data {
int add_errors;
 };
 
-static void chmod_pathspec(struct pathspec *pathspec, int force_mode)
+static void chmod_pathspec(struct pathspec *pathspec, char flip)
 {
int i;
 
@@ -42,8 +42,8 @@ static void chmod_pathspec(struct pathspec *pathspec, int 
force_mode)
if (pathspec && !ce_path_match(ce, pathspec, NULL))
continue;
 
-   if (chmod_cache_entry(ce, force_mode) < 0)
-   fprintf(stderr, "cannot chmod '%s'", ce->name);
+   if (chmod_cache_entry(ce, flip) < 0)
+   fprintf(stderr, "cannot chmod %cx '%s'\n", flip, 
ce->name);
}
 }
 
-- 
2.14.0

Re: reftable [v6]: new ref storage format

2017-08-08 Thread Stefan Beller

On Mon, Aug 7, 2017 at 11:30 AM, Shawn Pearce  wrote:
> On Mon, Aug 7, 2017 at 11:27 AM, Stefan Beller  wrote:
>> On Sun, Aug 6, 2017 at 6:47 PM, Shawn Pearce  wrote:
>>> 6th iteration of the reftable storage format.
>>>
>>> You can read a rendered version of this here:
>>> https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md
>>>
>>> The index may be organized into a multi-level index, where ...
>>> which may in turn point to either index blocks (3rd level) or ref blocks 
>>> (leaf level).
>>
>> So we allow 3 levels at most?
>
> No, its just an example. Large ref sets with small block size need 4
> levels. Or more.

A malicious (or buggy) writer can produce indexes pointing to
each other producing a circle. (Who would do that?)

A reader should  - instead of segfaulting due to unbounded
recursion or being stuck in an infinite loop - ignore the indexes
in this case and fallback to the slow non-indexed behavior,
i.e. while the file format allows for unbounded levels, a reader
should not.

Re: [PATCH] Convert size datatype to size_t

2017-08-08 Thread Junio C Hamano

Martin Koegler  writes:

> diff --git a/apply.c b/apply.c
> index f2d5991..ea97fd2 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -3085,7 +3085,7 @@ static int apply_binary_fragment(struct apply_state 
> *state,
>struct patch *patch)
>  {
>   struct fragment *fragment = patch->fragments;
> - unsigned long len;
> + size_t len;
>   void *dst;
>  
>   if (!fragment)

This variable is made size_t because it receives the result size of
patch_delta().  And it later is assigned to img->len field, which
already is size_t before this patch.

Curiously, the patch_delta() invocation with this patch reads like
this:

dst = patch_delta(img->buf, img->len, fragment->patch,
  fragment->size, );

where patch_delta() is updated (correctly) to:

void *patch_delta(const void *src_buf, size_t src_size,
  const void *delta_buf, size_t delta_size,
  size_t *dst_size)

with this patch.  But "size" field in "struct fragment" is still
"int".  So we'd need to update it as well, not necessarily in this
patch (which is already too big) but as part of a larger whole.

> @@ -3174,7 +3174,7 @@ static int apply_binary(struct apply_state *state,
>   if (has_sha1_file(oid.hash)) {
>   /* We already have the postimage */
>   enum object_type type;
> - unsigned long size;
> + size_t size;
>   char *result;
>  
>   result = read_sha1_file(oid.hash, , );

This is to receive the resulting size from read_sha1_file().  It is
assigned to img->len, which is already size_t, so all is good here.

> @@ -3236,7 +3236,7 @@ static int read_blob_object(struct strbuf *buf, const 
> struct object_id *oid, uns
>   strbuf_addf(buf, "Subproject commit %s\n", oid_to_hex(oid));
>   } else {
>   enum object_type type;
> - unsigned long sz;
> + size_t sz;
>   char *result;
>  
>   result = read_sha1_file(oid->hash, , );

By reading the remainder of this function, this conversion also is
good.  sz that is now size_t is used as the size attached to an
existing strbuf like so:

result = read_sha1_file(oid->hash, , );
if (!result)
return -1;
/* XXX read_sha1_file NUL-terminates */
strbuf_attach(buf, result, sz, sz + 1);

in the part beyond the post context of this hunk.  In the longer
term, sz+1 we see here may want to become the overflow-safe variant
st_add().

As you said in the comment after three-dashes in the patch, a lot
more work is needed and your patch is a good starting point.

I am not sure if we can split the patch somehow to make it easier to
review.  The deceptively small part of your patch, i.e.

 apply.c  |  6 +++---

needs the above analysis to see if they are correct and what more
work is necessary, and there are 65 more files with ~190 lines
changed.

[PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-08 Thread René Scharfe

find_pack_entry_one() uses the fan-out table of pack indexes to find out
which entries match the first byte of the searched hash and does a
binary search on this subset of the main index table.

If there are no matching entries then lo and hi will have the same
value.  The binary search still starts and compares the hash of the
following entry (which has a non-matching first byte, so won't cause any
trouble), or whatever comes after the sorted list of entries.

The probability of that stray comparison matching by mistake is low, but
let's not take any chances and check when entering the binary search
loop if we're actually done already.

Signed-off-by: Rene Scharfe 
---
 sha1_file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index b60ae15f70..11ee69a99d 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2799,7 +2799,7 @@ off_t find_pack_entry_one(const unsigned char *sha1,
return nth_packed_object_offset(p, pos);
}
 
-   do {
+   while (lo < hi) {
unsigned mi = (lo + hi) / 2;
int cmp = hashcmp(index + mi * stride, sha1);
 
@@ -2812,7 +2812,7 @@ off_t find_pack_entry_one(const unsigned char *sha1,
hi = mi;
else
lo = mi+1;
-   } while (lo < hi);
+   }
return 0;
 }
 
-- 
2.14.0

Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-08 Thread Jeff King

On Tue, Aug 08, 2017 at 06:52:31PM -0400, Jeff King wrote:

> > Interesting.  I see that we still have the conditional code to call
> > out to sha1-lookup.c::sha1_entry_pos().  Do we need a similar change
> > over there, I wonder?  Alternatively, as we have had the experimental
> > sha1-lookup.c::sha1_entry_pos() long enough without anybody using it,
> > perhaps we should write it off as a failed experiment and retire it?
> 
> There is also sha1_pos(), which seems to have the same problem (and is
> used in several places).

Actually, I take it back. The problem happens when we enter the loop
with no entries to look at. But both sha1_pos() and sha1_entry_pos()
return early before hitting their do-while loops in that case.

-Peff

Re: reftable [v6]: new ref storage format

2017-08-08 Thread Junio C Hamano

Shawn Pearce  writes:

> Given that the index can now also be multi-level, I don't expect to
> see a 2G index. A 2G index forces the reader to load the entire 2G to
> take advantage of the restart table. It may be more efficient for such
> a reader to have had the writer make a mutli-level index, instead of a
> single monster index block. And so perhaps the writer shouldn't make a
> 2G index block that she is forced to buffer. :)

Ah, OK, then it is sensible to have all table blocks to have the
same format, and restart at the beginning to help readers would be a
fine choice.  For the same "let's make them as consistent" sake, I
am tempted to suggest that we lift "the index block can be 2G" and
have it also be within uint_24(), perhaps?  Otherwise the readers
would have to read (or mmap) the whole 2G.

Re: [PATCH] builtin/add: add a missing newline to an stderr message

2017-08-08 Thread Ramsay Jones



On 08/08/17 22:45, René Scharfe wrote:
> Am 08.08.2017 um 23:36 schrieb Ramsay Jones:
>>
>> Signed-off-by: Ramsay Jones 
>> ---
>>
>> Hi Junio,
>>
>> I noticed this while looking into the t3700 failure on cygwin tonight.
>> Also, I couldn't decide whether or not to add the i18n '_()' brackets
>> around the message. In the end I didn't, but will happily add them
>> if you think I should.
>>
>> Thanks!
>>
>> ATB,
>> Ramsay Jones
>>
>>   builtin/add.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/builtin/add.c b/builtin/add.c
>> index e888fb8c5..385b53ae7 100644
>> --- a/builtin/add.c
>> +++ b/builtin/add.c
>> @@ -43,7 +43,7 @@ static void chmod_pathspec(struct pathspec *pathspec, int 
>> force_mode)
>>  continue;
>>   
>>  if (chmod_cache_entry(ce, force_mode) < 0)
>> -fprintf(stderr, "cannot chmod '%s'", ce->name);
>> +fprintf(stderr, "cannot chmod '%s'\n", ce->name);
>>  }
>>   }
>>   
> 
> FYI: I brought this up yesterday in the original thread, along with a
> few other observations:
> 
>   https://public-inbox.org/git/3c61d9f6-e0fd-22a4-68e0-89fd9ce9b...@web.de/

Ah, I missed that.

Hmm, I just looked at the code in builtin/update-index.c. Yes, it
would probably be a good idea to harmonize the messages - but just
where did 'flip' come from? ;-)

ATB,
Ramsay Jones

Re: reftable [v6]: new ref storage format

2017-08-08 Thread Shawn Pearce

On Tue, Aug 8, 2017 at 4:34 PM, Junio C Hamano  wrote:
> Shawn Pearce  writes:
>
>> Given that the index can now also be multi-level, I don't expect to
>> see a 2G index. A 2G index forces the reader to load the entire 2G to
>> take advantage of the restart table. It may be more efficient for such
>> a reader to have had the writer make a mutli-level index, instead of a
>> single monster index block. And so perhaps the writer shouldn't make a
>> 2G index block that she is forced to buffer. :)
>
> Ah, OK, then it is sensible to have all table blocks to have the
> same format, and restart at the beginning to help readers would be a
> fine choice.  For the same "let's make them as consistent" sake, I
> am tempted to suggest that we lift "the index block can be 2G" and
> have it also be within uint_24(), perhaps?  Otherwise the readers
> would have to read (or mmap) the whole 2G.

Gah. I just finished moving the restart table back to the end of the block. :)

However, I think I can agree with the index fitting into the uint24
size of 15M, and asking writers making an index that exceeds that to
use multi-level indexing.

Re: [PATCH] t1200: remove t1200-tutorial.sh

2017-08-08 Thread Jonathan Nieder

Stefan Beller wrote:
> On Tue, Aug 8, 2017 at 5:07 PM, Jonathan Nieder  wrote:
>> Stefan Beller wrote:

>>> Nowadays there are better tutorials out there such as "Git from bottom up"
>>> or others, easily found online. Additionally to that a tutorial in our
>>> test suite is not as easy to discover as e.g. online tutorials.
[...]
>>> Signed-off-by: Stefan Beller 
>>> ---
>>>  t/t1200-tutorial.sh | 268 
>>> 
>>>  1 file changed, 268 deletions(-)
>>>  delete mode 100755 t/t1200-tutorial.sh
>>
>> Interesting.  When I first saw the diffstat I assumed you were talking
>> about a test that validates the examples in some manpage are correct.
>> But this is not that.
[...]
> 2ae6c70674 (Adapt tutorial to cygwin and add test case, 2005-10-13)
> seemed to imply that it was testing some part for Documentation/tutorial.txt
> though.

Oh, good point.

v1.2.0~121 (New tutorial, 2006-01-22) means that the test is no longer
testing what is in the tutorial in any meaningful sense.  That's why
my search for "git whatchanged -p --root" in manpages didn't find
anything.

So what your patch does still seems reasonable (we have lived fine
without a test validating the examples in that tutorial, and if we
really want a test validating the examples then we should find a way
to automatically extract it), but the description is misleading.

With a corrected description, my Reviewed-by would apply.

Thanks for catching it.

Jonathan

Re: [PATCH] t4062: stop using repetition in regex

2017-08-08 Thread Junio C Hamano

René Scharfe  writes:

> Am 08.08.2017 um 16:49 schrieb Johannes Schindelin:
>> Hi René,
>> 
>> On Tue, 8 Aug 2017, René Scharfe wrote:
>> 
>>> OpenBSD's regex library has a repetition limit (RE_DUP_MAX) of 255.
>>> That's the minimum acceptable value according to POSIX.  In t4062 we use
>>> 4096 repetitions in the test "-G matches", though, causing it to fail.
>>>
>>> Do the same as the test "-S --pickaxe-regex" in the same file and search
>>> for a single zero instead.  That still suffices to trigger the buffer
>>> overrun in older versions (checked with b7d36ffca02^ and --valgrind on
>>> Linux), simplifies the test a bit, and avoids exceeding OpenBSD's limit.
>> 
>> I am afraid not. The 4096 is precisely the page size required to trigger
>> the bug on Windows against which this regression test tries to safeguard.
>
> Checked with b7d36ffca02^ on MinGW now as well and found that it
> segfaults with the proposed change ten out of ten times.

That is a strange result but I can believe it.

The reason why I find it strange is that the test wants to run
diff_grep() in diffcore-pickaxe.c with one == NULL (because we are
looking at difference between an initial empty commit and the
current commit that adds 4096-zeroes.txt file), which makes the
current blob (i.e. a page of '0' that may be mmap(2)ed without
trailing NUL to terminate it) scanned via regexec() to look for the
search string.

I can understand why Dscho originally did "^0{4096}$"; it is to
ensure that the whole page is scanned for 4096 zeroes and then the
code would somehow make sure that there is no more byte until the
end of line, which will force regexec (but not regexec_buf that knows
where the buffer ends) to look at the 4097th byte that does not exist.

If you change the pattern to just "0" that is not anchored, I'd expect
regexec() that does not know how big the haystack is to just find "0"
at the first byte and happily return without segfaulting (because it
does not even have to scan the remainder of the buffer).

So I find Dscho's concern quite valid, even though I do believe you
when you say the code somehow segfaults.  I just can not tell
how/why it would segfault, though---it is possible that regexec()
implementation is stupid and does not realize that it can return early
reporting success without looking at the rest of the buffer, but
somehow I find it unlikely.

Puzzled.

> You get different results?  How is that possible?  The search string is
> NUL-terminated in each case, while the point of the test is that the
> file contents isn't, right?

Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-08 Thread Junio C Hamano

René Scharfe  writes:

> find_pack_entry_one() uses the fan-out table of pack indexes to find out
> which entries match the first byte of the searched hash and does a
> binary search on this subset of the main index table.
>
> If there are no matching entries then lo and hi will have the same
> value.  The binary search still starts and compares the hash of the
> following entry (which has a non-matching first byte, so won't cause any
> trouble), or whatever comes after the sorted list of entries.
>
> The probability of that stray comparison matching by mistake is low, but
> let's not take any chances and check when entering the binary search
> loop if we're actually done already.
>
> Signed-off-by: Rene Scharfe 
> ---
>  sha1_file.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index b60ae15f70..11ee69a99d 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2799,7 +2799,7 @@ off_t find_pack_entry_one(const unsigned char *sha1,
>   return nth_packed_object_offset(p, pos);
>   }
>  
> - do {
> + while (lo < hi) {
>   unsigned mi = (lo + hi) / 2;
>   int cmp = hashcmp(index + mi * stride, sha1);
>  
> @@ -2812,7 +2812,7 @@ off_t find_pack_entry_one(const unsigned char *sha1,
>   hi = mi;
>   else
>   lo = mi+1;
> - } while (lo < hi);
> + }
>   return 0;
>  }

Interesting.  I see that we still have the conditional code to call
out to sha1-lookup.c::sha1_entry_pos().  Do we need a similar change
over there, I wonder?  Alternatively, as we have had the experimental
sha1-lookup.c::sha1_entry_pos() long enough without anybody using it,
perhaps we should write it off as a failed experiment and retire it?

Re: [PATCH] t1200: remove t1200-tutorial.sh

2017-08-08 Thread Stefan Beller

On Tue, Aug 8, 2017 at 5:07 PM, Jonathan Nieder  wrote:
> Hi,
>
> Stefan Beller wrote:
>
>> Nowadays there are better tutorials out there such as "Git from bottom up"
>> or others, easily found online. Additionally to that a tutorial in our
>> test suite is not as easy to discover as e.g. online tutorials.
>>
>> This test/tutorial was discovered by the patch author in the effort to
>> migrate our tests in preparation to switch the hashing function.
>> Transforming this tutorial to be agnostic of the underlying hash function
>> would hurt its readability, hence being even less useful as a tutorial.
>>
>> Instead delete this test as
>> (a) the functionality is tested elsewhere as well and
>> (b) reducing the test suite to its core improves performance, which
>> aids developers in keeping their development velocity.
>>
>> Signed-off-by: Stefan Beller 
>> ---
>>  t/t1200-tutorial.sh | 268 
>> 
>>  1 file changed, 268 deletions(-)
>>  delete mode 100755 t/t1200-tutorial.sh
>
> Interesting.  When I first saw the diffstat I assumed you were talking
> about a test that validates the examples in some manpage are correct.
> But this is not that.
>
> There indeed appear to be other good tests for these commands, even
> "git whatchanged", so for what it's worth,
>
> Reviewed-by: Jonathan Nieder 
>
> Thanks.

2ae6c70674 (Adapt tutorial to cygwin and add test case, 2005-10-13)
seemed to imply that it was testing some part for Documentation/tutorial.txt
though.

[PATCHv2 2/4] imap-send: add wrapper to get server credentials if needed

2017-08-08 Thread Nicolas Morey-Chaisemartin

Signed-off-by: Nicolas Morey-Chaisemartin 
---
 imap-send.c | 34 --
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/imap-send.c b/imap-send.c
index 09f29ea95..448a4a0b3 100644
--- a/imap-send.c
+++ b/imap-send.c
@@ -926,6 +926,25 @@ static int auth_cram_md5(struct imap_store *ctx, struct 
imap_cmd *cmd, const cha
return 0;
 }
 
+static void server_fill_credential(struct imap_server_conf *srvc, struct 
credential *cred)
+{
+   if (srvc->user && srvc->pass)
+   return;
+
+   cred->protocol = xstrdup(srvc->use_ssl ? "imaps" : "imap");
+   cred->host = xstrdup(srvc->host);
+
+   cred->username = xstrdup_or_null(srvc->user);
+   cred->password = xstrdup_or_null(srvc->pass);
+
+   credential_fill(cred);
+
+   if (!srvc->user)
+   srvc->user = xstrdup(cred->username);
+   if (!srvc->pass)
+   srvc->pass = xstrdup(cred->password);
+}
+
 static struct imap_store *imap_open_store(struct imap_server_conf *srvc, char 
*folder)
 {
struct credential cred = CREDENTIAL_INIT;
@@ -1078,20 +1097,7 @@ static struct imap_store *imap_open_store(struct 
imap_server_conf *srvc, char *f
}
 #endif
imap_info("Logging in...\n");
-   if (!srvc->user || !srvc->pass) {
-   cred.protocol = xstrdup(srvc->use_ssl ? "imaps" : 
"imap");
-   cred.host = xstrdup(srvc->host);
-
-   cred.username = xstrdup_or_null(srvc->user);
-   cred.password = xstrdup_or_null(srvc->pass);
-
-   credential_fill();
-
-   if (!srvc->user)
-   srvc->user = xstrdup(cred.username);
-   if (!srvc->pass)
-   srvc->pass = xstrdup(cred.password);
-   }
+   server_fill_credential(srvc, );
 
if (srvc->auth_method) {
struct imap_cmd_cb cb;
-- 
2.14.0.rc1.16.gcc208c97c

Re: reftable [v6]: new ref storage format

2017-08-08 Thread Jeff King

On Sun, Aug 06, 2017 at 06:47:06PM -0700, Shawn Pearce wrote:

> Changes from v5:
> - extensions.refStorage = reftable is used to select this format.

Thanks, I think this is a better scheme going forward. Just a few notes
on compatibility while I'm thinking about it:

  - existing versions will complain that they don't know what the
"refStorage" extension is

  - presumably we'd add new code that recognizes the extension, and then
makes sure the value is something we understand.

  - then we'd finally mark "reftable" as understood once we had an
implementation. We _could_ then also check other config (like
"reftable.*") and complain about unknown keys. But I think we could
declare any such keys as optional, and just rely on the version
number inside the reftable file for parsing it.

-Peff

Re: reftable [v5]: new ref storage format

2017-08-08 Thread Jeff King

On Mon, Aug 07, 2017 at 07:41:43AM -0700, Shawn Pearce wrote:

> > As such if JGit wanted to use a longer key size, it is possible to implement
> > similar automatic builds and packaging into JGit.
> 
> I don't know if we need a larger key size. $DAY_JOB limits ref names
> to ~200 bytes in a hook. I think GitHub does similar. But I'm worried
> about the general masses who might be using our software and expect
> ref names thus far to be as long as PATH_MAX on their system. Most
> systems run PATH_MAX around 1024.

GitHub limits to 255 (for the fully-qualified name, so including
"refs/heads/"). I don't recall ever seeing any complaints about that,
though I suppose it's not out of the realm of possibility for somebody
with a multi-byte encoding to hit with a real name (it's configurable,
so I'm not sure if Enterprise customers in Asia might ever bump it).  I
do think something like 1024 would be well into "you're insane if you
really want to name your branch this" territory.

-Peff

Re: reftable [v5]: new ref storage format

2017-08-08 Thread Jeff King

On Mon, Aug 07, 2017 at 03:40:48PM +, David Turner wrote:

> > -Original Message-
> > From: Shawn Pearce [mailto:spea...@spearce.org]
> > In git-core, I'm worried about the caveats related to locking. Git tries to 
> > work
> > nicely on NFS, and it seems LMDB wouldn't. Git also runs fine on a read-only
> > filesystem, and LMDB gets a little weird about that. Finally, Git doesn't 
> > have
> > nearly the risks LMDB has about a crashed reader or writer locking out 
> > future
> > operations until the locks have been resolved. This is especially true with 
> > shared
> > user repositories, where another user might setup and own the semaphore.
> 
> FWIW, git has problems with stale lock file in the event of a crash 
> (refs/foo.lock 
> might still exist, and git does nothing to clean it up).
> 
> In my testing (which involved a *lot* of crashing), I never once had to clean 
> up a
> stale LMDB lock.  That said, I didn't test on a RO filesystem.

Yeah, I'd expect LMDB to do much better than Git in a crash, because it
relies on flock. So when the kernel goes away, so too does your lock
(ditto if a git process dies without remembering to remove the lock,
though I don't think we've ever had such a bug).

But that's also why it may not work well over NFS (though my impression
is that flock _does_ work on modern NFS; I've been lucky enough not to
ever use it). Lack of NFS support wouldn't be a show-stopper for most
people, but it would be for totally replacing the existing code, I'd
think. I'm just not clear on what the state of lmdb-on-nfs is.

Assuming it could work, the interesting tradeoffs to me are:

  - something like reftable is hyper-optimized for high-latency
block-oriented access. It's not clear to me if lmdb would even be
usable for the distributed storage case Shawn has.

  - reftable is more code for us to implement, but we'd "own" the whole
stack down to the filesystem. That could be a big win for debugging
and optimizing for our use case.

  - reftable is re-inventing a lot of the database wheel. lmdb really is
a debugged, turn-key solution.

I'm not opposed to a world where lmdb becomes the standard solution and
Google does their own bespoke thing. But that's easy for me to say
because I'm not Google. I do care about keeping complexity and bugs to a
minimum for most users, and it's possible that lmdb could do that. But
if it can't become the baseline standard (due to NFS issues), then we'd
still want something to replace the current loose/packed storage. And if
reftable does that, then lmdb becomes a lot less interesting.

-Peff

[PATCH] t4062: stop using repetition in regex

2017-08-08 Thread René Scharfe

OpenBSD's regex library has a repetition limit (RE_DUP_MAX) of 255.
That's the minimum acceptable value according to POSIX.  In t4062 we use
4096 repetitions in the test "-G matches", though, causing it to fail.

Do the same as the test "-S --pickaxe-regex" in the same file and search
for a single zero instead.  That still suffices to trigger the buffer
overrun in older versions (checked with b7d36ffca02^ and --valgrind on
Linux), simplifies the test a bit, and avoids exceeding OpenBSD's limit.

Original-patch-by: David Coppa 
Signed-off-by: Rene Scharfe 
---
 t/t4062-diff-pickaxe.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t4062-diff-pickaxe.sh b/t/t4062-diff-pickaxe.sh
index 7c4903f497..c16e5af6fa 100755
--- a/t/t4062-diff-pickaxe.sh
+++ b/t/t4062-diff-pickaxe.sh
@@ -15,7 +15,7 @@ test_expect_success setup '
git commit -m "A 4k file"
 '
 test_expect_success '-G matches' '
-   git diff --name-only -G "^0{4096}$" HEAD^ >out &&
+   git diff --name-only -G0 HEAD^ >out &&
test 4096-zeroes.txt = "$(cat out)"
 '
 
-- 
2.14.0

[PATCHv2 4/4] imap-send: use curl by default

2017-08-08 Thread Nicolas Morey-Chaisemartin

Now that curl is enable by default, use the curl implementation
for imap too.
The goal is to validate feature parity between the legacy and
the curl implementation, deprecate thee legacy implementation
later on and in the long term, hopefully drop it altogether.

Signed-off-by: Nicolas Morey-Chaisemartin 
---
 imap-send.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/imap-send.c b/imap-send.c
index 5e80edaff..138d776f7 100644
--- a/imap-send.c
+++ b/imap-send.c
@@ -35,11 +35,11 @@ typedef void *SSL;
 #include "http.h"
 #endif
 
-#if defined(USE_CURL_FOR_IMAP_SEND) && defined(NO_OPENSSL)
-/* only available option */
+#if defined(USE_CURL_FOR_IMAP_SEND)
+/* Always default to curl if it's available. */
 #define USE_CURL_DEFAULT 1
 #else
-/* strictly opt in */
+/* We don't have curl, so continue to use the historical implementation */
 #define USE_CURL_DEFAULT 0
 #endif
 
-- 
2.14.0.rc1.16.gcc208c97c

[PATCH 1/2] filter-branch: Add --state-branch to hold pickled copy of ref map

2017-08-08 Thread Ian Campbell

Allowing for incremental updates of large trees.

I have been using this as part of the device tree extraction from the Linux
kernel source since 2013, about time I sent the patch upstream!

Signed-off-by: Ian Campbell 
---
 git-filter-branch.sh | 39 ++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 3a74602ef..d07db3fee 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -86,7 +86,7 @@ USAGE="[--setup ] [--env-filter ]
[--parent-filter ] [--msg-filter ]
[--commit-filter ] [--tag-name-filter ]
[--subdirectory-filter ] [--original ]
-   [-d ] [-f | --force]
+   [-d ] [-f | --force] [--state-branch ]
[--] [...]"
 
 OPTIONS_SPEC=
@@ -106,6 +106,7 @@ filter_msg=cat
 filter_commit=
 filter_tag_name=
 filter_subdir=
+state_branch=
 orig_namespace=refs/original/
 force=
 prune_empty=
@@ -181,6 +182,9 @@ do
--original)
orig_namespace=$(expr "$OPTARG/" : '\(.*[^/]\)/*$')/
;;
+   --state-branch)
+   state_branch="$OPTARG"
+   ;;
*)
usage
;;
@@ -252,6 +256,20 @@ export GIT_INDEX_FILE
 # map old->new commit ids for rewriting parents
 mkdir ../map || die "Could not create map/ directory"
 
+if [ -n "$state_branch" ] ; then
+   state_commit=`git show-ref -s "$state_branch"`
+   if [ -n "$state_commit" ] ; then
+   echo "Populating map from $state_branch ($state_commit)" 1>&2
+   git show "$state_commit":filter.map |
+   perl -n -e 'm/(.*):(.*)/ or die;
+   open F, ">../map/$1" or die;
+   print F "$2" or die;
+   close(F) or die'
+   else
+   echo "Branch $state_branch does not exist. Will create" 1>&2
+   fi
+fi
+
 # we need "--" only if there are no path arguments in $@
 nonrevs=$(git rev-parse --no-revs "$@") || exit
 if test -z "$nonrevs"
@@ -544,6 +562,25 @@ if [ "$filter_tag_name" ]; then
done
 fi
 
+if [ -n "$state_branch" ] ; then
+   echo "Saving rewrite state to $state_branch" 1>&2
+   STATE_BLOB=$(ls ../map |
+   perl -n -e 'chomp();
+   open F, "<../map/$_" or die;
+   chomp($f = ); print "$_:$f\n";' |
+   git hash-object -w --stdin )
+   STATE_TREE=$(/bin/echo -e "100644 blob $STATE_BLOB\tfilter.map" | git 
mktree)
+   STATE_PARENT=$(git show-ref -s "$state_branch")
+   unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE
+   unset GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL GIT_COMMITTER_DATE
+   if [ -n "$STATE_PARENT" ] ; then
+   STATE_COMMIT=$(/bin/echo "Sync" | git commit-tree "$STATE_TREE" -p 
"$STATE_PARENT")
+   else
+   STATE_COMMIT=$(/bin/echo "Sync" | git commit-tree "$STATE_TREE" )
+   fi
+   git update-ref "$state_branch" "$STATE_COMMIT"
+fi
+
 cd "$orig_dir"
 rm -rf "$tempdir"
 
-- 
2.11.0

[PATCH 0/2] filter-branch: support for incremental update + fix for ancient tag format

2017-08-08 Thread Ian Campbell

Hi,

I've long (since 2013, urk!) been carrying these two changes to git-
filter-branch in the split out devicetree source tree[0] which extracts
all the device tree sources from the Linux kernel source tree.

I think it's about time I sent them here, sorry for the rather extreme
delay! I've rebased to 2.14 and retested, I've also pushed a copy to
[1] where Travis seems happy.

Ian.

[0] 
https://git.kernel.org/pub/scm/linux/kernel/git/devicetree/devicetree-rebasing.git/
[1] https://github.com/ijc/git/tree/git-filter-branch

[PATCH 2/2] filter-branch: Handle rewritting (very) old style tags which lack tagger

2017-08-08 Thread Ian Campbell

Such as v2.6.12-rc2..v2.6.13-rc3 in the Linux kernel source tree.

Insert a fake tag header, since newer `git mktag` wont accept the input
otherwise:

$ git cat-file tag v2.6.12-rc2
object 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2
type commit
tag v2.6.12-rc2

Linux v2.6.12-rc2 release
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQBCbW8ZF3YsRnbiHLsRAgFRAKCq/TkuDaEombFABkPqYgGCgWN2lQCcC0qc
wznDbFU45A54dZC8RZ5JxyE=
=ESRP
-END PGP SIGNATURE-

$ git cat-file tag v2.6.12-rc2 | git mktag
error: char76: could not find "tagger "
fatal: invalid tag signature file
$ git cat-file tag v2.6.13-rc4 | git mktag
7eab951de91d95875ba34ec4c599f37e1208db93

Signed-off-by: Ian Campbell 
---
 git-filter-branch.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index d07db3fee..6927aa2da 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -540,6 +540,9 @@ if [ "$filter_tag_name" ]; then
new_sha1=$( ( printf 'object %s\ntype commit\ntag %s\n' 
\
"$new_sha1" "$new_ref"
git cat-file tag "$ref" |
+   awk '/^tagger/  { tagged=1 }
+/^$/   { if (!tagged && !done) { print 
"tagger Unknown  0 +" } ; done=1 }
+// { print }' |
sed -n \
-e '1,/^$/{
  /^object /d
-- 
2.11.0

Re: reftable [v5]: new ref storage format

2017-08-08 Thread Shawn Pearce

On Tue, Aug 8, 2017 at 12:52 AM, Jeff King  wrote:
> On Mon, Aug 07, 2017 at 03:40:48PM +, David Turner wrote:
>
>> > -Original Message-
>> > From: Shawn Pearce [mailto:spea...@spearce.org]
>> > In git-core, I'm worried about the caveats related to locking. Git tries 
>> > to work
>> > nicely on NFS, and it seems LMDB wouldn't. Git also runs fine on a 
>> > read-only
>> > filesystem, and LMDB gets a little weird about that. Finally, Git doesn't 
>> > have
>> > nearly the risks LMDB has about a crashed reader or writer locking out 
>> > future
>> > operations until the locks have been resolved. This is especially true 
>> > with shared
>> > user repositories, where another user might setup and own the semaphore.
>>
>> FWIW, git has problems with stale lock file in the event of a crash 
>> (refs/foo.lock
>> might still exist, and git does nothing to clean it up).
>>
>> In my testing (which involved a *lot* of crashing), I never once had to 
>> clean up a
>> stale LMDB lock.  That said, I didn't test on a RO filesystem.
>
> Yeah, I'd expect LMDB to do much better than Git in a crash, because it
> relies on flock. So when the kernel goes away, so too does your lock
> (ditto if a git process dies without remembering to remove the lock,
> though I don't think we've ever had such a bug).
>
> But that's also why it may not work well over NFS (though my impression
> is that flock _does_ work on modern NFS; I've been lucky enough not to
> ever use it). Lack of NFS support wouldn't be a show-stopper for most
> people, but it would be for totally replacing the existing code, I'd
> think. I'm just not clear on what the state of lmdb-on-nfs is.
>
> Assuming it could work, the interesting tradeoffs to me are:
>
>   - something like reftable is hyper-optimized for high-latency
> block-oriented access. It's not clear to me if lmdb would even be
> usable for the distributed storage case Shawn has.
>
>   - reftable is more code for us to implement, but we'd "own" the whole
> stack down to the filesystem. That could be a big win for debugging
> and optimizing for our use case.
>
>   - reftable is re-inventing a lot of the database wheel. lmdb really is
> a debugged, turn-key solution.
>
> I'm not opposed to a world where lmdb becomes the standard solution and
> Google does their own bespoke thing. But that's easy for me to say
> because I'm not Google. I do care about keeping complexity and bugs to a
> minimum for most users, and it's possible that lmdb could do that. But
> if it can't become the baseline standard (due to NFS issues), then we'd
> still want something to replace the current loose/packed storage. And if
> reftable does that, then lmdb becomes a lot less interesting.

Peff, thank you for this summary. It echos my opinions as well.

On the one hand, I love the idea of offloading the database stuff to
lmdb. But its got two technical blockers for me: behavior on NFS, and
virtualizing onto a different filesystem in userspace.

I really need a specialized reference store on a virtualized
distributed storage. The JGit reftable implementation fits that need
today. So we're probably going to go ahead and deploy that in our
environment.

I'd like to start writing a prototype reftable in C for git-core soon,
but I've been distracted by the JGit version first. It would be good
to have something to compare against the lmdb approach for git-core
before we make any decisions about what git-core wants to promote as
the new standard for ref storage.

Re: [RFC] imap-send: escape backslash in password

2017-08-08 Thread Jeff King

On Sun, Aug 06, 2017 at 06:34:22PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > That is fine by me. AFAIK, we already build the curl support by default
> > when a sufficiently-advanced version of curl is available. So if there
> > were feature-parity problems hopefully somebody would have reported it.
> >
> > I think the deprecation here can be relatively fast because we're not
> > actually dropping support for any feature. We're just requiring that
> > they install curl to get the same functionality (which might be
> > inconvenient, but it's a heck of a lot less inconvenient than "there's
> > no way to do what you want anymore").
> 
> Yeah, as long as imap-supporting libcurl is not too recent and are
> available everywhere, we should be OK.

I think we're not quite ready to switch to curl based on comments in the
nearby thread. But just for reference, since I started looking into
this...

The defines in the Makefile turn on USE_CURL_FOR_IMAP_SEND want curl
7.34.0. That's only from 2013, which is probably recent enough that it
may cause a problem (I had originally thought it was a few years older,
but I forgot the curl version hex encoding; 072200 is 7.34.0).

For comparison, nothing older than curl 7.19.4 will work for building
Git since v2.12.0, as we added some unconditional uses of CURLPROTO_*
there. Nobody seems to have noticed or complained. I pointed this out a
few months ago[1] and suggested we clean up some of the more antiquated
#if blocks in http.c that don't even build. There was some complaint
that we should keep even these ancient versions working, but the
compile error is still in "master".

So it's not clear to me that anybody cares about going that far back
(which is mid-2009), but I'd guess that 2013 might cause some problems.

[1] 
https://public-inbox.org/git/20170404025438.bgxz5sfmrawqs...@sigill.intra.peff.net/
if you're curious (you were offline for a while at that time, I
think).

[PATCHv2 0/4] imap-send: Fix and enable curl by default

2017-08-08 Thread Nicolas Morey-Chaisemartin

Changes since v1:

- Add patch fo fix return value of the curl_append_msgs_to_imap
- Patch #2: server_fill_credentials takes a credential struct as a parameter so 
they can be approved later
- Dropped the s/server/srvc/ cleanup (previous patch #3)
- Patch #4: Only use curl as the default if it's available

Nicolas Morey-Chaisemartin (4):
  imap-send: return with error if curl failed
  imap-send: add wrapper to get server credentials if needed
  imap_send: setup_curl: retreive credentials if not set in config file
  imap-send: use curl by default

 imap-send.c | 52 
 1 file changed, 32 insertions(+), 20 deletions(-)

[PATCHv2 1/4] imap-send: return with error if curl failed

2017-08-08 Thread Nicolas Morey-Chaisemartin

curl_append_msgs_to_imap always returned 0, whether curl failed or not.
Return a proper status so git imap-send will exit with an error code
if womething wrong happened.

Signed-off-by: Nicolas Morey-Chaisemartin 
---
 imap-send.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/imap-send.c b/imap-send.c
index b2d0b849b..09f29ea95 100644
--- a/imap-send.c
+++ b/imap-send.c
@@ -1490,7 +1490,7 @@ static int curl_append_msgs_to_imap(struct 
imap_server_conf *server,
curl_easy_cleanup(curl);
curl_global_cleanup();
 
-   return 0;
+   return res == CURLE_OK;
 }
 #endif
 
-- 
2.14.0.rc1.16.gcc208c97c

[PATCHv2 3/4] imap_send: setup_curl: retreive credentials if not set in config file

2017-08-08 Thread Nicolas Morey-Chaisemartin

Up to this point, the curl mode only supported getting the username
and password from the gitconfig file while the legacy mode could also
fetch them using the credential API.

Signed-off-by: Nicolas Morey-Chaisemartin 
---
 imap-send.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/imap-send.c b/imap-send.c
index 448a4a0b3..5e80edaff 100644
--- a/imap-send.c
+++ b/imap-send.c
@@ -1398,7 +1398,7 @@ static int append_msgs_to_imap(struct imap_server_conf 
*server,
 }
 
 #ifdef USE_CURL_FOR_IMAP_SEND
-static CURL *setup_curl(struct imap_server_conf *srvc)
+static CURL *setup_curl(struct imap_server_conf *srvc, struct credential *cred)
 {
CURL *curl;
struct strbuf path = STRBUF_INIT;
@@ -1411,6 +1411,7 @@ static CURL *setup_curl(struct imap_server_conf *srvc)
if (!curl)
die("curl_easy_init failed");
 
+   server_fill_credential(, cred);
curl_easy_setopt(curl, CURLOPT_USERNAME, server.user);
curl_easy_setopt(curl, CURLOPT_PASSWORD, server.pass);
 
@@ -1460,8 +1461,9 @@ static int curl_append_msgs_to_imap(struct 
imap_server_conf *server,
struct buffer msgbuf = { STRBUF_INIT, 0 };
CURL *curl;
CURLcode res = CURLE_OK;
+   struct credential cred = CREDENTIAL_INIT;
 
-   curl = setup_curl(server);
+   curl = setup_curl(server, );
curl_easy_setopt(curl, CURLOPT_READDATA, );
 
fprintf(stderr, "sending %d message%s\n", total, (total != 1) ? "s" : 
"");
@@ -1496,6 +1498,10 @@ static int curl_append_msgs_to_imap(struct 
imap_server_conf *server,
curl_easy_cleanup(curl);
curl_global_cleanup();
 
+   if (res == CURLE_OK && cred.username)
+   credential_approve();
+   credential_clear();
+
return res == CURLE_OK;
 }
 #endif
-- 
2.14.0.rc1.16.gcc208c97c

Re: [PATCH] Fix delta integer overflows

2017-08-08 Thread Martin Koegler

On Mon, Aug 07, 2017 at 06:44:16PM -0700, Junio C Hamano wrote:
> Having said that, I am a bit curious how you came to this patch.
> Was the issue found by code inspection, or did you actually have a
> real life use case to raise the core.bigFileThreshold configuration
> to a value above 4GB?

Real life use - tracking changes in larger files.

Raising the limit above 4GB suddenly resulted in a broken pack files in the 
repository and
aborts of various git commands.

Data is still recoverable with all size sanity checks disabled.

Regards,
Martin

Re: [PATCH] Fix delta integer overflows

2017-08-08 Thread Martin Koegler

On Mon, Aug 07, 2017 at 09:39:12PM +0200, Johannes Schindelin wrote:
> If you want to work on data in memory, then size_t is the appropriate data
> type. We already use it elsewhere. Let's use it here, too, without the
> intermediate bump from the incorrect `int` to the equally incorrect
> `long`.

I disagree with "We already use it elsewhere.". The whole delta code uses 
"unsigned long" -
look at delta.h. Look at unpack-objects.c. Or cache.h. Or pack-objects.c. Or 
index-pack.c.

Other possible cases:
git grep "unsigned long" |grep size

So the codebase still suggests, that "unsigned long" is the data type for 
storing object sizes.

I would be fine with resubmitting a patch using size_t/off_t for the touched 
parts - changing the whole
core code is a too invasive change for a bug fix.

Regards,
Martin
>From d97a7810ff679dd939972c151bcf23c122cdc570 Mon Sep 17 00:00:00 2001
From: Martin Koegler 
Date: Mon, 7 Aug 2017 20:00:30 +0200
Subject: [PATCH] Fix delta integer overflows

The current delta code produces incorrect pack objects for files > 4GB.

Signed-off-by: Martin Koegler 
---
 diff-delta.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/diff-delta.c b/diff-delta.c
index 3797ce6..cd238c8 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -319,7 +319,9 @@ create_delta(const struct delta_index *index,
 	 const void *trg_buf, unsigned long trg_size,
 	 unsigned long *delta_size, unsigned long max_size)
 {
-	unsigned int i, outpos, outsize, moff, msize, val;
+	unsigned int i, val;
+	off_t outpos, moff;
+	size_t l, outsize, msize;
 	int inscnt;
 	const unsigned char *ref_data, *ref_top, *data, *top;
 	unsigned char *out;
@@ -336,20 +338,20 @@ create_delta(const struct delta_index *index,
 		return NULL;
 
 	/* store reference buffer size */
-	i = index->src_size;
-	while (i >= 0x80) {
-		out[outpos++] = i | 0x80;
-		i >>= 7;
+	l = index->src_size;
+	while (l >= 0x80) {
+		out[outpos++] = l | 0x80;
+		l >>= 7;
 	}
-	out[outpos++] = i;
+	out[outpos++] = l;
 
 	/* store target buffer size */
-	i = trg_size;
-	while (i >= 0x80) {
-		out[outpos++] = i | 0x80;
-		i >>= 7;
+	l = trg_size;
+	while (l >= 0x80) {
+		out[outpos++] = l | 0x80;
+		l >>= 7;
 	}
-	out[outpos++] = i;
+	out[outpos++] = l;
 
 	ref_data = index->src_buf;
 	ref_top = ref_data + index->src_size;
-- 
2.1.4

Re: [PATCH] t1200: remove t1200-tutorial.sh

2017-08-08 Thread Junio C Hamano

Jonathan Nieder  writes:

> Correction: the tutorial is now called gitcore-tutorial and mostly
> survives.  A search for -p --root failed because of v1.5.5.1~19^2
> (core-tutorial.txt: Fix showing the current behaviour, 2008-04-10).

Yeah, I was wondering why neither of you find it while reading your
exchanges on a phone.

> That said, the conclusion that this test has mostly bitrotted as far
> as its original purpose goes still applies.
>
> An alternative method of addressing the goal you described would be to
> fuzz object-id shaped things out of the sample output.  I don't have a
> strong preference, given how little this test contributes to coverage
> (as you mentioned) and how difficult it is to make it continue to
> match the documentation it is trying to test.
>
> Thanks and sorry for the confusion,

OK, so can one of you give the final version of the patch with
updated description?

It _would_ be nice to have an executable tutorial/documentation or
docs with built-in tests like some other systems have, but I realize
that it would be a different matter.  We'd need some toolchain to
mark up tests in our tutorial, extract and run them, and compare the
result, which we currently do not have and it won't fit under our
test framework very well anyway.

Thanks.

Re: [PATCH] builtin/add: add detail to a 'cannot chmod' error message

2017-08-08 Thread Junio C Hamano

Ramsay Jones  writes:

> In addition to adding the missing newline, add the x-ecutable bit
> 'mode change' character to the error message. This message now has
> the same form as similar messages output by 'update-index'.
>
> Signed-off-by: Ramsay Jones 
> ---
>
> Hi Junio,
>
> This is v2 of the earlier "add a newline" patch. Thanks!

Thanks; here is me reminding myself to apply this with Jonathan's
reviewed-by in the morning.

>
> ATB,
> Ramsay Jones
>
>  builtin/add.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index e888fb8c5..5d5773d5c 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -32,7 +32,7 @@ struct update_callback_data {
>   int add_errors;
>  };
>  
> -static void chmod_pathspec(struct pathspec *pathspec, int force_mode)
> +static void chmod_pathspec(struct pathspec *pathspec, char flip)
>  {
>   int i;
>  
> @@ -42,8 +42,8 @@ static void chmod_pathspec(struct pathspec *pathspec, int 
> force_mode)
>   if (pathspec && !ce_path_match(ce, pathspec, NULL))
>   continue;
>  
> - if (chmod_cache_entry(ce, force_mode) < 0)
> - fprintf(stderr, "cannot chmod '%s'", ce->name);
> + if (chmod_cache_entry(ce, flip) < 0)
> + fprintf(stderr, "cannot chmod %cx '%s'\n", flip, 
> ce->name);
>   }
>  }

[PATCH v2 23/25] pack: move has_sha1_pack()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 builtin/prune-packed.c | 1 +
 cache.h| 2 --
 diff.c | 1 +
 pack.h | 2 ++
 packfile.c | 6 ++
 revision.c | 1 +
 sha1_file.c| 6 --
 7 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/builtin/prune-packed.c b/builtin/prune-packed.c
index ac978ad40..79130aa2e 100644
--- a/builtin/prune-packed.c
+++ b/builtin/prune-packed.c
@@ -2,6 +2,7 @@
 #include "cache.h"
 #include "progress.h"
 #include "parse-options.h"
+#include "pack.h"
 
 static const char * const prune_packed_usage[] = {
N_("git prune-packed [-n | --dry-run] [-q | --quiet]"),
diff --git a/cache.h b/cache.h
index 06a8caae6..d96d36d50 100644
--- a/cache.h
+++ b/cache.h
@@ -1190,8 +1190,6 @@ extern int check_sha1_signature(const unsigned char 
*sha1, void *buf, unsigned l
 
 extern int finalize_object_file(const char *tmpfile, const char *filename);
 
-extern int has_sha1_pack(const unsigned char *sha1);
-
 /*
  * Open the loose object at path, check its sha1, and return the contents,
  * type, and size. If the object is a blob, then "contents" may return NULL,
diff --git a/diff.c b/diff.c
index 85e714f6c..6bbc46326 100644
--- a/diff.c
+++ b/diff.c
@@ -20,6 +20,7 @@
 #include "string-list.h"
 #include "argv-array.h"
 #include "graph.h"
+#include "pack.h"
 
 #ifdef NO_FAST_WORKING_DIRECTORY
 #define FAST_WORKING_DIRECTORY 0
diff --git a/pack.h b/pack.h
index 1021a781c..ce0e15deb 100644
--- a/pack.h
+++ b/pack.h
@@ -223,4 +223,6 @@ extern struct packed_git *find_sha1_pack(const unsigned 
char *sha1,
 
 extern int find_pack_entry(const unsigned char *sha1, struct pack_entry *e);
 
+extern int has_sha1_pack(const unsigned char *sha1);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 0f1e3338b..507f65236 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1849,3 +1849,9 @@ int find_pack_entry(const unsigned char *sha1, struct 
pack_entry *e)
}
return 0;
 }
+
+int has_sha1_pack(const unsigned char *sha1)
+{
+   struct pack_entry e;
+   return find_pack_entry(sha1, );
+}
diff --git a/revision.c b/revision.c
index 6603af944..2868c4fc8 100644
--- a/revision.c
+++ b/revision.c
@@ -19,6 +19,7 @@
 #include "dir.h"
 #include "cache-tree.h"
 #include "bisect.h"
+#include "pack.h"
 
 volatile show_early_output_fn_t show_early_output;
 
diff --git a/sha1_file.c b/sha1_file.c
index 1a505eae5..2610ea057 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1629,12 +1629,6 @@ int has_pack_index(const unsigned char *sha1)
return 1;
 }
 
-int has_sha1_pack(const unsigned char *sha1)
-{
-   struct pack_entry e;
-   return find_pack_entry(sha1, );
-}
-
 int has_sha1_file_with_flags(const unsigned char *sha1, int flags)
 {
if (!startup_info->have_repository)
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 24/25] pack: move has_pack_index()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h | 2 --
 pack.h  | 2 ++
 packfile.c  | 8 
 sha1_file.c | 8 
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index d96d36d50..656b39d51 100644
--- a/cache.h
+++ b/cache.h
@@ -1225,8 +1225,6 @@ extern int has_object_file_with_flags(const struct 
object_id *oid, int flags);
  */
 extern int has_loose_object_nonlocal(const unsigned char *sha1);
 
-extern int has_pack_index(const unsigned char *sha1);
-
 extern void assert_sha1_type(const unsigned char *sha1, enum object_type 
expect);
 
 /* Helper to check and "touch" a file */
diff --git a/pack.h b/pack.h
index ce0e15deb..2c2a347ba 100644
--- a/pack.h
+++ b/pack.h
@@ -225,4 +225,6 @@ extern int find_pack_entry(const unsigned char *sha1, 
struct pack_entry *e);
 
 extern int has_sha1_pack(const unsigned char *sha1);
 
+extern int has_pack_index(const unsigned char *sha1);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 507f65236..28a16206c 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1855,3 +1855,11 @@ int has_sha1_pack(const unsigned char *sha1)
struct pack_entry e;
return find_pack_entry(sha1, );
 }
+
+int has_pack_index(const unsigned char *sha1)
+{
+   struct stat st;
+   if (stat(sha1_pack_index_name(sha1), ))
+   return 0;
+   return 1;
+}
diff --git a/sha1_file.c b/sha1_file.c
index 2610ea057..8584f6cf2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1621,14 +1621,6 @@ int force_object_loose(const unsigned char *sha1, time_t 
mtime)
return ret;
 }
 
-int has_pack_index(const unsigned char *sha1)
-{
-   struct stat st;
-   if (stat(sha1_pack_index_name(sha1), ))
-   return 0;
-   return 1;
-}
-
 int has_sha1_file_with_flags(const unsigned char *sha1, int flags)
 {
if (!startup_info->have_repository)
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 17/25] pack: move packed_object_info(), unpack_entry()

2017-08-08 Thread Jonathan Tan

Both sha1_file.c and packfile.c now need read_object(), so a copy of
read_object() was created in packfile.c.

This patch makes both mark_bad_packed_object() and has_packed_and_bad()
global. Unlike most of the other patches in this series, these 2
functions need to remain global.

Signed-off-by: Jonathan Tan 
---
 cache.h |   7 -
 pack.h  |  11 +
 packfile.c  | 660 ++
 sha1_file.c | 676 ++--
 4 files changed, 685 insertions(+), 669 deletions(-)

diff --git a/cache.h b/cache.h
index 1aea0..b14098bf1 100644
--- a/cache.h
+++ b/cache.h
@@ -1186,9 +1186,6 @@ extern void *map_sha1_file(const unsigned char *sha1, 
unsigned long *size);
 extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, 
unsigned long mapsize, void *buffer, unsigned long bufsiz);
 extern int parse_sha1_header(const char *hdr, unsigned long *sizep);
 
-/* global flag to enable extra checks when accessing packed objects */
-extern int do_check_packed_object_crc;
-
 extern int check_sha1_signature(const unsigned char *sha1, void *buf, unsigned 
long size, const char *type);
 
 extern int finalize_object_file(const char *tmpfile, const char *filename);
@@ -1621,8 +1618,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-extern void clear_delta_base_cache(void);
-
 /*
  * Make sure that a pointer access into an mmap'd index file is within bounds,
  * and can provide at least 8 bytes of data.
@@ -1660,7 +1655,6 @@ extern off_t nth_packed_object_offset(const struct 
packed_git *, uint32_t n);
 extern off_t find_pack_entry_one(const unsigned char *sha1, struct packed_git 
*);
 
 extern int is_pack_valid(struct packed_git *);
-extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
unsigned long *);
 
 /*
  * Iterate over the files in the loose-object parts of the object
@@ -1771,7 +1765,6 @@ struct object_info {
 /* Do not retry packed storage after checking packed and loose storage */
 #define OBJECT_INFO_QUICK 8
 extern int sha1_object_info_extended(const unsigned char *, struct object_info 
*, unsigned flags);
-extern int packed_object_info(struct packed_git *pack, off_t offset, struct 
object_info *);
 
 /* Dumb servers support */
 extern int update_server_info(int);
diff --git a/pack.h b/pack.h
index 5e3552392..2e6f357c3 100644
--- a/pack.h
+++ b/pack.h
@@ -171,4 +171,15 @@ extern unsigned long unpack_object_header_buffer(const 
unsigned char *buf, unsig
 extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
 extern int unpack_object_header(struct packed_git *, struct pack_window **, 
off_t *, unsigned long *);
 
+extern void clear_delta_base_cache(void);
+
+/* global flag to enable extra checks when accessing packed objects */
+extern int do_check_packed_object_crc;
+
+extern int packed_object_info(struct packed_git *pack, off_t offset, struct 
object_info *);
+extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
unsigned long *);
+
+extern void mark_bad_packed_object(struct packed_git *p, const unsigned char 
*sha1);
+extern const struct packed_git *has_packed_and_bad(const unsigned char *sha1);
+
 #endif
diff --git a/packfile.c b/packfile.c
index a4db78ea0..a3745f9df 100644
--- a/packfile.c
+++ b/packfile.c
@@ -4,6 +4,8 @@
 #include "dir.h"
 #include "mergesort.h"
 #include "delta.h"
+#include "list.h"
+#include "streaming.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -974,3 +976,661 @@ int unpack_object_header(struct packed_git *p,
 
return type;
 }
+
+void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1)
+{
+   unsigned i;
+   for (i = 0; i < p->num_bad_objects; i++)
+   if (!hashcmp(sha1, p->bad_object_sha1 + GIT_SHA1_RAWSZ * i))
+   return;
+   p->bad_object_sha1 = xrealloc(p->bad_object_sha1,
+ st_mult(GIT_MAX_RAWSZ,
+ st_add(p->num_bad_objects, 1)));
+   hashcpy(p->bad_object_sha1 + GIT_SHA1_RAWSZ * p->num_bad_objects, sha1);
+   p->num_bad_objects++;
+}
+
+const struct packed_git *has_packed_and_bad(const unsigned char *sha1)
+{
+   struct packed_git *p;
+   unsigned i;
+
+   for (p = packed_git; p; p = p->next)
+   for (i = 0; i < p->num_bad_objects; i++)
+   if (!hashcmp(sha1, p->bad_object_sha1 + 20 * i))
+   return p;
+   return NULL;
+}
+
+static off_t get_delta_base(struct packed_git *p,
+   struct pack_window **w_curs,
+   off_t *curpos,
+   enum object_type type,
+   off_t delta_obj_offset)
+{
+

[PATCH v2 16/25] sha1_file: remove read_packed_sha1()

2017-08-08 Thread Jonathan Tan

Use read_object() in its place instead. This avoids duplication of code.

This makes force_object_loose() slightly slower (because of a redundant
check of loose object storage), but only in the error case.

Signed-off-by: Jonathan Tan 
---
 sha1_file.c | 26 +-
 1 file changed, 1 insertion(+), 25 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 9eadda388..9e5444334 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2091,30 +2091,6 @@ int sha1_object_info(const unsigned char *sha1, unsigned 
long *sizep)
return type;
 }
 
-static void *read_packed_sha1(const unsigned char *sha1,
- enum object_type *type, unsigned long *size)
-{
-   struct pack_entry e;
-   void *data;
-
-   if (!find_pack_entry(sha1, ))
-   return NULL;
-   data = cache_or_unpack_entry(e.p, e.offset, size, type);
-   if (!data) {
-   /*
-* We're probably in deep shit, but let's try to fetch
-* the required object anyway from another pack or loose.
-* This should happen only in the presence of a corrupted
-* pack, and is better than failing outright.
-*/
-   error("failed to read object %s at offset %"PRIuMAX" from %s",
- sha1_to_hex(sha1), (uintmax_t)e.offset, e.p->pack_name);
-   mark_bad_packed_object(e.p, sha1);
-   data = read_object(sha1, type, size);
-   }
-   return data;
-}
-
 int pretend_sha1_file(void *buf, unsigned long len, enum object_type type,
  unsigned char *sha1)
 {
@@ -2497,7 +2473,7 @@ int force_object_loose(const unsigned char *sha1, time_t 
mtime)
 
if (has_loose_object(sha1))
return 0;
-   buf = read_packed_sha1(sha1, , );
+   buf = read_object(sha1, , );
if (!buf)
return error("cannot read sha1_file for %s", sha1_to_hex(sha1));
hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", typename(type), len) + 1;
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 18/25] pack: move nth_packed_object_{sha1,oid}

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h | 14 --
 pack.h  | 14 ++
 packfile.c  | 31 +++
 sha1_file.c | 31 ---
 4 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/cache.h b/cache.h
index b14098bf1..f083d532e 100644
--- a/cache.h
+++ b/cache.h
@@ -1628,20 +1628,6 @@ extern int odb_pack_keep(const char *name);
  */
 extern void check_pack_index_ptr(const struct packed_git *p, const void *ptr);
 
-/*
- * Return the SHA-1 of the nth object within the specified packfile.
- * Open the index if it is not already open.  The return value points
- * at the SHA-1 within the mmapped index.  Return NULL if there is an
- * error.
- */
-extern const unsigned char *nth_packed_object_sha1(struct packed_git *, 
uint32_t n);
-/*
- * Like nth_packed_object_sha1, but write the data into the object specified by
- * the the first argument.  Returns the first argument on success, and NULL on
- * error.
- */
-extern const struct object_id *nth_packed_object_oid(struct object_id *, 
struct packed_git *, uint32_t n);
-
 /*
  * Return the offset of the nth object within the specified packfile.
  * The index must already be opened.
diff --git a/pack.h b/pack.h
index 2e6f357c3..023c97b37 100644
--- a/pack.h
+++ b/pack.h
@@ -182,4 +182,18 @@ extern void *unpack_entry(struct packed_git *, off_t, enum 
object_type *, unsign
 extern void mark_bad_packed_object(struct packed_git *p, const unsigned char 
*sha1);
 extern const struct packed_git *has_packed_and_bad(const unsigned char *sha1);
 
+/*
+ * Return the SHA-1 of the nth object within the specified packfile.
+ * Open the index if it is not already open.  The return value points
+ * at the SHA-1 within the mmapped index.  Return NULL if there is an
+ * error.
+ */
+extern const unsigned char *nth_packed_object_sha1(struct packed_git *, 
uint32_t n);
+/*
+ * Like nth_packed_object_sha1, but write the data into the object specified by
+ * the the first argument.  Returns the first argument on success, and NULL on
+ * error.
+ */
+extern const struct object_id *nth_packed_object_oid(struct object_id *, 
struct packed_git *, uint32_t n);
+
 #endif
diff --git a/packfile.c b/packfile.c
index a3745f9df..b16cf648a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1634,3 +1634,34 @@ void *unpack_entry(struct packed_git *p, off_t 
obj_offset,
 
return data;
 }
+
+const unsigned char *nth_packed_object_sha1(struct packed_git *p,
+   uint32_t n)
+{
+   const unsigned char *index = p->index_data;
+   if (!index) {
+   if (open_pack_index(p))
+   return NULL;
+   index = p->index_data;
+   }
+   if (n >= p->num_objects)
+   return NULL;
+   index += 4 * 256;
+   if (p->index_version == 1) {
+   return index + 24 * n + 4;
+   } else {
+   index += 8;
+   return index + 20 * n;
+   }
+}
+
+const struct object_id *nth_packed_object_oid(struct object_id *oid,
+ struct packed_git *p,
+ uint32_t n)
+{
+   const unsigned char *hash = nth_packed_object_sha1(p, n);
+   if (!hash)
+   return NULL;
+   hashcpy(oid->hash, hash);
+   return oid;
+}
diff --git a/sha1_file.c b/sha1_file.c
index fe7e0db76..4cd2b1809 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1073,37 +1073,6 @@ int parse_sha1_header(const char *hdr, unsigned long 
*sizep)
return parse_sha1_header_extended(hdr, , 0);
 }
 
-const unsigned char *nth_packed_object_sha1(struct packed_git *p,
-   uint32_t n)
-{
-   const unsigned char *index = p->index_data;
-   if (!index) {
-   if (open_pack_index(p))
-   return NULL;
-   index = p->index_data;
-   }
-   if (n >= p->num_objects)
-   return NULL;
-   index += 4 * 256;
-   if (p->index_version == 1) {
-   return index + 24 * n + 4;
-   } else {
-   index += 8;
-   return index + 20 * n;
-   }
-}
-
-const struct object_id *nth_packed_object_oid(struct object_id *oid,
- struct packed_git *p,
- uint32_t n)
-{
-   const unsigned char *hash = nth_packed_object_sha1(p, n);
-   if (!hash)
-   return NULL;
-   hashcpy(oid->hash, hash);
-   return oid;
-}
-
 void check_pack_index_ptr(const struct packed_git *p, const void *vptr)
 {
const unsigned char *ptr = vptr;
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 25/25] pack: move for_each_packed_object()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 builtin/cat-file.c |  1 +
 cache.h|  7 +--
 pack.h | 11 +++
 packfile.c | 40 
 reachable.c|  1 +
 sha1_file.c| 40 
 6 files changed, 54 insertions(+), 46 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 96b786e48..316ef5c98 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -12,6 +12,7 @@
 #include "streaming.h"
 #include "tree-walk.h"
 #include "sha1-array.h"
+#include "pack.h"
 
 struct batch_options {
int enabled;
diff --git a/cache.h b/cache.h
index 656b39d51..6c3822783 100644
--- a/cache.h
+++ b/cache.h
@@ -1660,17 +1660,12 @@ int for_each_loose_file_in_objdir_buf(struct strbuf 
*path,
  void *data);
 
 /*
- * Iterate over loose and packed objects in both the local
+ * Iterate over loose objects in both the local
  * repository and any alternates repositories (unless the
  * LOCAL_ONLY flag is set).
  */
 #define FOR_EACH_OBJECT_LOCAL_ONLY 0x1
-typedef int each_packed_object_fn(const struct object_id *oid,
- struct packed_git *pack,
- uint32_t pos,
- void *data);
 extern int for_each_loose_object(each_loose_object_fn, void *, unsigned flags);
-extern int for_each_packed_object(each_packed_object_fn, void *, unsigned 
flags);
 
 struct object_info {
/* Request */
diff --git a/pack.h b/pack.h
index 2c2a347ba..905b05be5 100644
--- a/pack.h
+++ b/pack.h
@@ -227,4 +227,15 @@ extern int has_sha1_pack(const unsigned char *sha1);
 
 extern int has_pack_index(const unsigned char *sha1);
 
+/*
+ * Iterate over packed objects in both the local
+ * repository and any alternates repositories (unless the
+ * FOR_EACH_OBJECT_LOCAL_ONLY flag, defined in cache.h, is set).
+ */
+typedef int each_packed_object_fn(const struct object_id *oid,
+ struct packed_git *pack,
+ uint32_t pos,
+ void *data);
+extern int for_each_packed_object(each_packed_object_fn, void *, unsigned 
flags);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 28a16206c..031a40828 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1863,3 +1863,43 @@ int has_pack_index(const unsigned char *sha1)
return 0;
return 1;
 }
+
+static int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn 
cb, void *data)
+{
+   uint32_t i;
+   int r = 0;
+
+   for (i = 0; i < p->num_objects; i++) {
+   struct object_id oid;
+
+   if (!nth_packed_object_oid(, p, i))
+   return error("unable to get sha1 of object %u in %s",
+i, p->pack_name);
+
+   r = cb(, p, i, data);
+   if (r)
+   break;
+   }
+   return r;
+}
+
+int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned 
flags)
+{
+   struct packed_git *p;
+   int r = 0;
+   int pack_errors = 0;
+
+   prepare_packed_git();
+   for (p = packed_git; p; p = p->next) {
+   if ((flags & FOR_EACH_OBJECT_LOCAL_ONLY) && !p->pack_local)
+   continue;
+   if (open_pack_index(p)) {
+   pack_errors = 1;
+   continue;
+   }
+   r = for_each_object_in_pack(p, cb, data);
+   if (r)
+   break;
+   }
+   return r ? r : pack_errors;
+}
diff --git a/reachable.c b/reachable.c
index c62efbfd4..ef606ae17 100644
--- a/reachable.c
+++ b/reachable.c
@@ -9,6 +9,7 @@
 #include "cache-tree.h"
 #include "progress.h"
 #include "list-objects.h"
+#include "pack.h"
 
 struct connectivity_progress {
struct progress *progress;
diff --git a/sha1_file.c b/sha1_file.c
index 8584f6cf2..3f3f9174f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2013,46 +2013,6 @@ int for_each_loose_object(each_loose_object_fn cb, void 
*data, unsigned flags)
return foreach_alt_odb(loose_from_alt_odb, );
 }
 
-static int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn 
cb, void *data)
-{
-   uint32_t i;
-   int r = 0;
-
-   for (i = 0; i < p->num_objects; i++) {
-   struct object_id oid;
-
-   if (!nth_packed_object_oid(, p, i))
-   return error("unable to get sha1 of object %u in %s",
-i, p->pack_name);
-
-   r = cb(, p, i, data);
-   if (r)
-   break;
-   }
-   return r;
-}
-
-int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned 
flags)
-{
-   struct packed_git *p;
-   int r = 0;
-   int pack_errors = 0;
-
-   prepare_packed_git();
-

[PATCH v2 11/25] pack: move {,re}prepare_packed_git and approximate_object_count

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 builtin/gc.c   |   1 +
 cache.h|  15 
 http-backend.c |   1 +
 pack.h |  15 
 packfile.c | 216 +
 path.c |   1 +
 server-info.c  |   1 +
 sha1_file.c| 214 
 8 files changed, 235 insertions(+), 229 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index e6b84475a..f4fe023d3 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -19,6 +19,7 @@
 #include "sigchain.h"
 #include "argv-array.h"
 #include "commit.h"
+#include "pack.h"
 
 #define FAILED_RUN "failed to run %s"
 
diff --git a/cache.h b/cache.h
index 41562dc0b..f020dfade 100644
--- a/cache.h
+++ b/cache.h
@@ -1603,21 +1603,6 @@ struct pack_entry {
struct packed_git *p;
 };
 
-/* A hook to report invalid files in pack directory */
-#define PACKDIR_FILE_PACK 1
-#define PACKDIR_FILE_IDX 2
-#define PACKDIR_FILE_GARBAGE 4
-extern void (*report_garbage)(unsigned seen_bits, const char *path);
-
-extern void prepare_packed_git(void);
-extern void reprepare_packed_git(void);
-
-/*
- * Give a rough count of objects in the repository. This sacrifices accuracy
- * for speed.
- */
-unsigned long approximate_object_count(void);
-
 extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 struct packed_git *packs);
 
diff --git a/http-backend.c b/http-backend.c
index 519025d2c..12f7d421f 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -9,6 +9,7 @@
 #include "string-list.h"
 #include "url.h"
 #include "argv-array.h"
+#include "pack.h"
 
 static const char content_type[] = "Content-Type";
 static const char content_length[] = "Content-Length";
diff --git a/pack.h b/pack.h
index 576c4fc7c..cad5ed488 100644
--- a/pack.h
+++ b/pack.h
@@ -152,4 +152,19 @@ extern struct packed_git *add_packed_git(const char *path, 
size_t path_len, int
 
 extern void install_packed_git(struct packed_git *pack);
 
+/* A hook to report invalid files in pack directory */
+#define PACKDIR_FILE_PACK 1
+#define PACKDIR_FILE_IDX 2
+#define PACKDIR_FILE_GARBAGE 4
+extern void (*report_garbage)(unsigned seen_bits, const char *path);
+
+extern void prepare_packed_git(void);
+extern void reprepare_packed_git(void);
+
+/*
+ * Give a rough count of objects in the repository. This sacrifices accuracy
+ * for speed.
+ */
+unsigned long approximate_object_count(void);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 4eb65e460..a517172f7 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1,6 +1,8 @@
 #include "cache.h"
 #include "mru.h"
 #include "pack.h"
+#include "dir.h"
+#include "mergesort.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -667,3 +669,217 @@ void install_packed_git(struct packed_git *pack)
pack->next = packed_git;
packed_git = pack;
 }
+
+void (*report_garbage)(unsigned seen_bits, const char *path);
+
+static void report_helper(const struct string_list *list,
+ int seen_bits, int first, int last)
+{
+   if (seen_bits == (PACKDIR_FILE_PACK|PACKDIR_FILE_IDX))
+   return;
+
+   for (; first < last; first++)
+   report_garbage(seen_bits, list->items[first].string);
+}
+
+static void report_pack_garbage(struct string_list *list)
+{
+   int i, baselen = -1, first = 0, seen_bits = 0;
+
+   if (!report_garbage)
+   return;
+
+   string_list_sort(list);
+
+   for (i = 0; i < list->nr; i++) {
+   const char *path = list->items[i].string;
+   if (baselen != -1 &&
+   strncmp(path, list->items[first].string, baselen)) {
+   report_helper(list, seen_bits, first, i);
+   baselen = -1;
+   seen_bits = 0;
+   }
+   if (baselen == -1) {
+   const char *dot = strrchr(path, '.');
+   if (!dot) {
+   report_garbage(PACKDIR_FILE_GARBAGE, path);
+   continue;
+   }
+   baselen = dot - path + 1;
+   first = i;
+   }
+   if (!strcmp(path + baselen, "pack"))
+   seen_bits |= 1;
+   else if (!strcmp(path + baselen, "idx"))
+   seen_bits |= 2;
+   }
+   report_helper(list, seen_bits, first, list->nr);
+}
+
+static void prepare_packed_git_one(char *objdir, int local)
+{
+   struct strbuf path = STRBUF_INIT;
+   size_t dirnamelen;
+   DIR *dir;
+   struct dirent *de;
+   struct string_list garbage = STRING_LIST_INIT_DUP;
+
+   strbuf_addstr(, objdir);
+   strbuf_addstr(, "/pack");
+   dir = opendir(path.buf);
+   if (!dir) {
+   if (errno != ENOENT)
+   error_errno("unable to open

[PATCH v2 06/25] pack: move pack-closing functions

2017-08-08 Thread Jonathan Tan

The function close_pack_fd() needs to be temporarily made global. Its
scope will be restored to static in a subsequent commit.

Signed-off-by: Jonathan Tan 
---
 builtin/am.c|  1 +
 builtin/clone.c |  1 +
 builtin/fetch.c |  1 +
 builtin/merge.c |  1 +
 cache.h |  8 
 pack.h  |  9 +
 packfile.c  | 54 ++
 sha1_file.c | 55 ---
 8 files changed, 67 insertions(+), 63 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index c973bd96d..c38dd10a3 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -31,6 +31,7 @@
 #include "mailinfo.h"
 #include "apply.h"
 #include "string-list.h"
+#include "pack.h"
 
 /**
  * Returns 1 if the file is empty or does not exist, 0 otherwise.
diff --git a/builtin/clone.c b/builtin/clone.c
index 08b5cc433..53410a45d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -25,6 +25,7 @@
 #include "remote.h"
 #include "run-command.h"
 #include "connected.h"
+#include "pack.h"
 
 /*
  * Overall FIXMEs:
diff --git a/builtin/fetch.c b/builtin/fetch.c
index c87e59f3b..196a3bfc4 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -17,6 +17,7 @@
 #include "connected.h"
 #include "argv-array.h"
 #include "utf8.h"
+#include "pack.h"
 
 static const char * const builtin_fetch_usage[] = {
N_("git fetch [] [ [...]]"),
diff --git a/builtin/merge.c b/builtin/merge.c
index 900bafdb4..9cff4b276 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -32,6 +32,7 @@
 #include "gpg-interface.h"
 #include "sequencer.h"
 #include "string-list.h"
+#include "pack.h"
 
 #define DEFAULT_TWOHEAD (1<<0)
 #define DEFAULT_OCTOPUS (1<<1)
diff --git a/cache.h b/cache.h
index 5d6839525..25a21a61f 100644
--- a/cache.h
+++ b/cache.h
@@ -1637,15 +1637,7 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-/*
- * munmap the index file for the specified packfile (if it is
- * currently mmapped).
- */
-extern void close_pack_index(struct packed_git *);
-
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, unsigned long *);
-extern void close_pack_windows(struct packed_git *);
-extern void close_all_packs(void);
 extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
diff --git a/pack.h b/pack.h
index c16220586..fd4668528 100644
--- a/pack.h
+++ b/pack.h
@@ -147,4 +147,13 @@ extern int unuse_one_window(struct packed_git *current);
 
 extern void release_pack_memory(size_t);
 
+extern void close_pack_windows(struct packed_git *);
+extern int close_pack_fd(struct packed_git *);
+/*
+ * munmap the index file for the specified packfile (if it is
+ * currently mmapped).
+ */
+extern void close_pack_index(struct packed_git *);
+extern void close_all_packs(void);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 8daa74ad1..c8e2dbdee 100644
--- a/packfile.c
+++ b/packfile.c
@@ -257,3 +257,57 @@ void release_pack_memory(size_t need)
while (need >= (cur - pack_mapped) && unuse_one_window(NULL))
; /* nothing */
 }
+
+void close_pack_windows(struct packed_git *p)
+{
+   while (p->windows) {
+   struct pack_window *w = p->windows;
+
+   if (w->inuse_cnt)
+   die("pack '%s' still has open windows to it",
+   p->pack_name);
+   munmap(w->base, w->len);
+   pack_mapped -= w->len;
+   pack_open_windows--;
+   p->windows = w->next;
+   free(w);
+   }
+}
+
+int close_pack_fd(struct packed_git *p)
+{
+   if (p->pack_fd < 0)
+   return 0;
+
+   close(p->pack_fd);
+   pack_open_fds--;
+   p->pack_fd = -1;
+
+   return 1;
+}
+
+void close_pack_index(struct packed_git *p)
+{
+   if (p->index_data) {
+   munmap((void *)p->index_data, p->index_size);
+   p->index_data = NULL;
+   }
+}
+
+static void close_pack(struct packed_git *p)
+{
+   close_pack_windows(p);
+   close_pack_fd(p);
+   close_pack_index(p);
+}
+
+void close_all_packs(void)
+{
+   struct packed_git *p;
+
+   for (p = packed_git; p; p = p->next)
+   if (p->do_not_close)
+   die("BUG: want to close pack marked 'do-not-close'");
+   else
+   close_pack(p);
+}
diff --git a/sha1_file.c b/sha1_file.c
index 644876e4e..e2927244f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -717,53 +717,6 @@ void *xmmap(void *start, size_t length,
return ret;
 }
 
-void close_pack_windows(struct packed_git *p)
-{
-   while (p->windows) {
-   struct pack_window *w = p->windows;
-
-   if (w->inuse_cnt)
-   die("pack '%s' still has open windows to

[PATCH v2 10/25] pack: move install_packed_git()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 -
 pack.h  |  4 ++--
 packfile.c  | 11 ++-
 sha1_file.c |  9 -
 4 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/cache.h b/cache.h
index bf93477e8..41562dc0b 100644
--- a/cache.h
+++ b/cache.h
@@ -1611,7 +1611,6 @@ extern void (*report_garbage)(unsigned seen_bits, const 
char *path);
 
 extern void prepare_packed_git(void);
 extern void reprepare_packed_git(void);
-extern void install_packed_git(struct packed_git *pack);
 
 /*
  * Give a rough count of objects in the repository. This sacrifices accuracy
diff --git a/pack.h b/pack.h
index c1f3ff32d..576c4fc7c 100644
--- a/pack.h
+++ b/pack.h
@@ -124,8 +124,6 @@ extern char *sha1_pack_name(const unsigned char *sha1);
  */
 extern char *sha1_pack_index_name(const unsigned char *sha1);
 
-extern unsigned int pack_open_fds;
-
 extern void pack_report(void);
 
 /*
@@ -152,4 +150,6 @@ extern unsigned char *use_pack(struct packed_git *, struct 
pack_window **, off_t
 extern void unuse_pack(struct pack_window **);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
+extern void install_packed_git(struct packed_git *pack);
+
 #endif
diff --git a/packfile.c b/packfile.c
index efe0ed3e8..4eb65e460 100644
--- a/packfile.c
+++ b/packfile.c
@@ -28,7 +28,7 @@ static unsigned int pack_used_ctr;
 static unsigned int pack_mmap_calls;
 static unsigned int peak_pack_open_windows;
 static unsigned int pack_open_windows;
-unsigned int pack_open_fds;
+static unsigned int pack_open_fds;
 static unsigned int pack_max_fds;
 static size_t peak_pack_mapped;
 static size_t pack_mapped;
@@ -658,3 +658,12 @@ struct packed_git *add_packed_git(const char *path, size_t 
path_len, int local)
hashclr(p->sha1);
return p;
 }
+
+void install_packed_git(struct packed_git *pack)
+{
+   if (pack->pack_fd != -1)
+   pack_open_fds++;
+
+   pack->next = packed_git;
+   packed_git = pack;
+}
diff --git a/sha1_file.c b/sha1_file.c
index 7f12b1ee0..b956ca0c9 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -717,15 +717,6 @@ void *xmmap(void *start, size_t length,
return ret;
 }
 
-void install_packed_git(struct packed_git *pack)
-{
-   if (pack->pack_fd != -1)
-   pack_open_fds++;
-
-   pack->next = packed_git;
-   packed_git = pack;
-}
-
 void (*report_garbage)(unsigned seen_bits, const char *path);
 
 static void report_helper(const struct string_list *list,
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 01/25] pack: move pack name-related functions

2017-08-08 Thread Jonathan Tan

Currently, sha1_file.c and cache.h contain many functions, both related
to and unrelated to packfiles. This makes both files very large and
causes an unclear separation of concerns.

Create a new file, packfile.c, to hold all packfile-related functions
currently in sha1_file.c, and designate pack.h to hold these
packfile-related functions.

In this commit, the pack name-related functions are moved. Subsequent
commits will move the other functions.

Signed-off-by: Jonathan Tan 
---
 Makefile |  1 +
 builtin/pack-redundant.c |  1 +
 cache.h  | 23 ---
 pack.h   | 23 +++
 packfile.c   | 23 +++
 sha1_file.c  | 22 --
 6 files changed, 48 insertions(+), 45 deletions(-)
 create mode 100644 packfile.c

diff --git a/Makefile b/Makefile
index 461c845d3..5cdecaa17 100644
--- a/Makefile
+++ b/Makefile
@@ -816,6 +816,7 @@ LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
 LIB_OBJS += oidset.o
+LIB_OBJS += packfile.o
 LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-bitmap-write.o
 LIB_OBJS += pack-check.o
diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c
index cb1df1c76..df36d10e7 100644
--- a/builtin/pack-redundant.c
+++ b/builtin/pack-redundant.c
@@ -7,6 +7,7 @@
 */
 
 #include "builtin.h"
+#include "pack.h"
 
 #define BLKSIZE 512
 
diff --git a/cache.h b/cache.h
index 71fe09264..1f0f47819 100644
--- a/cache.h
+++ b/cache.h
@@ -902,20 +902,6 @@ extern void check_repository_format(void);
  */
 extern const char *sha1_file_name(const unsigned char *sha1);
 
-/*
- * Return the name of the (local) packfile with the specified sha1 in
- * its name.  The return value is a pointer to memory that is
- * overwritten each time this function is called.
- */
-extern char *sha1_pack_name(const unsigned char *sha1);
-
-/*
- * Return the name of the (local) pack index file with the specified
- * sha1 in its name.  The return value is a pointer to memory that is
- * overwritten each time this function is called.
- */
-extern char *sha1_pack_index_name(const unsigned char *sha1);
-
 /*
  * Return an abbreviated sha1 unique within this repository's object database.
  * The result will be at least `len` characters long, and will be NUL
@@ -1648,15 +1634,6 @@ extern void pack_report(void);
  */
 extern int odb_mkstemp(struct strbuf *template, const char *pattern);
 
-/*
- * Generate the filename to be used for a pack file with checksum "sha1" and
- * extension "ext". The result is written into the strbuf "buf", overwriting
- * any existing contents. A pointer to buf->buf is returned as a convenience.
- *
- * Example: odb_pack_name(out, sha1, "idx") => 
".git/objects/pack/pack-1234..idx"
- */
-extern char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, 
const char *ext);
-
 /*
  * Create a pack .keep file named "name" (which should generally be the output
  * of odb_pack_name). Returns a file descriptor opened for writing, or -1 on
diff --git a/pack.h b/pack.h
index 8294341af..63bfde00c 100644
--- a/pack.h
+++ b/pack.h
@@ -101,4 +101,27 @@ extern int read_pack_header(int fd, struct pack_header *);
 extern struct sha1file *create_tmp_packfile(char **pack_tmp_name);
 extern void finish_tmp_packfile(struct strbuf *name_buffer, const char 
*pack_tmp_name, struct pack_idx_entry **written_list, uint32_t nr_written, 
struct pack_idx_option *pack_idx_opts, unsigned char sha1[]);
 
+/*
+ * Generate the filename to be used for a pack file with checksum "sha1" and
+ * extension "ext". The result is written into the strbuf "buf", overwriting
+ * any existing contents. A pointer to buf->buf is returned as a convenience.
+ *
+ * Example: odb_pack_name(out, sha1, "idx") => 
".git/objects/pack/pack-1234..idx"
+ */
+extern char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, 
const char *ext);
+
+/*
+ * Return the name of the (local) packfile with the specified sha1 in
+ * its name.  The return value is a pointer to memory that is
+ * overwritten each time this function is called.
+ */
+extern char *sha1_pack_name(const unsigned char *sha1);
+
+/*
+ * Return the name of the (local) pack index file with the specified
+ * sha1 in its name.  The return value is a pointer to memory that is
+ * overwritten each time this function is called.
+ */
+extern char *sha1_pack_index_name(const unsigned char *sha1);
+
 #endif
diff --git a/packfile.c b/packfile.c
new file mode 100644
index 0..0d191dfd6
--- /dev/null
+++ b/packfile.c
@@ -0,0 +1,23 @@
+#include "cache.h"
+
+char *odb_pack_name(struct strbuf *buf,
+   const unsigned char *sha1,
+   const char *ext)
+{
+   strbuf_reset(buf);
+   strbuf_addf(buf, "%s/pack/pack-%s.%s", get_object_directory(),
+   sha1_to_hex(sha1), ext);
+   return buf->buf;
+}
+
+char *sha1_pack_name(const unsigned char *sha1)
+{
+

[PATCH v2 02/25] pack: move static state variables

2017-08-08 Thread Jonathan Tan

sha1_file.c declares some static variables that store packfile-related
state. Move them to packfile.c.

They are temporarily made global, but subsequent commits will restore
their scope back to static.

Signed-off-by: Jonathan Tan 
---
 pack.h  |  9 +
 packfile.c  | 14 ++
 sha1_file.c | 13 -
 3 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/pack.h b/pack.h
index 63bfde00c..7fcd45f7b 100644
--- a/pack.h
+++ b/pack.h
@@ -124,4 +124,13 @@ extern char *sha1_pack_name(const unsigned char *sha1);
  */
 extern char *sha1_pack_index_name(const unsigned char *sha1);
 
+extern unsigned int pack_used_ctr;
+extern unsigned int pack_mmap_calls;
+extern unsigned int peak_pack_open_windows;
+extern unsigned int pack_open_windows;
+extern unsigned int pack_open_fds;
+extern unsigned int pack_max_fds;
+extern size_t peak_pack_mapped;
+extern size_t pack_mapped;
+
 #endif
diff --git a/packfile.c b/packfile.c
index 0d191dfd6..0f46e0617 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "mru.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -21,3 +22,16 @@ char *sha1_pack_index_name(const unsigned char *sha1)
static struct strbuf buf = STRBUF_INIT;
return odb_pack_name(, sha1, "idx");
 }
+
+unsigned int pack_used_ctr;
+unsigned int pack_mmap_calls;
+unsigned int peak_pack_open_windows;
+unsigned int pack_open_windows;
+unsigned int pack_open_fds;
+unsigned int pack_max_fds;
+size_t peak_pack_mapped;
+size_t pack_mapped;
+struct packed_git *packed_git;
+
+static struct mru packed_git_mru_storage;
+struct mru *packed_git_mru = _git_mru_storage;
diff --git a/sha1_file.c b/sha1_file.c
index 7e511ce9e..4d95e21eb 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -682,19 +682,6 @@ static int has_loose_object(const unsigned char *sha1)
return check_and_freshen(sha1, 0);
 }
 
-static unsigned int pack_used_ctr;
-static unsigned int pack_mmap_calls;
-static unsigned int peak_pack_open_windows;
-static unsigned int pack_open_windows;
-static unsigned int pack_open_fds;
-static unsigned int pack_max_fds;
-static size_t peak_pack_mapped;
-static size_t pack_mapped;
-struct packed_git *packed_git;
-
-static struct mru packed_git_mru_storage;
-struct mru *packed_git_mru = _git_mru_storage;
-
 void pack_report(void)
 {
fprintf(stderr,
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 00/25] Move exported packfile funcs to its own file

2017-08-08 Thread Jonathan Tan

Here is the complete patch set. I have only moved the exported functions
that operate with packfiles and their static helpers - for example,
static functions like freshen_packed_object() that are used only by
non-pack-specific functions are not moved.

In the end, 3 functions needed to be made global. They are
find_pack_entry(), mark_bad_packed_object(), and has_packed_and_bad().

Of the 3, find_pack_entry() is probably legitimately promoted. But I
think that the latter two functions needing to be accessed from
sha1_file.c points to a design that could be improved - they are only
used when packed_object_info() detects corruption, and used for marking
as bad and printing messages to the user respectively, which
packed_object_info() should probably do itself. But I have not made this
change in this patch set.

(Other than the 3 functions above, there are some variables and
functions that are temporarily made global, but reduced back to static
when the wide scope is no longer needed.)

Jonathan Tan (25):
  pack: move pack name-related functions
  pack: move static state variables
  pack: move pack_report()
  pack: move open_pack_index(), parse_pack_index()
  pack: move release_pack_memory()
  pack: move pack-closing functions
  pack: move use_pack()
  pack: move unuse_pack()
  pack: move add_packed_git()
  pack: move install_packed_git()
  pack: move {,re}prepare_packed_git and approximate_object_count
  pack: move unpack_object_header()
  pack: move get_size_from_delta()
  pack: move unpack_object_header()
  sha1_file: set whence in storage-specific info fn
  sha1_file: remove read_packed_sha1()
  pack: move packed_object_info(), unpack_entry()
  pack: move nth_packed_object_{sha1,oid}
  pack: move check_pack_index_ptr(), nth_packed_object_offset()
  pack: move find_pack_entry_one(), is_pack_valid()
  pack: move find_sha1_pack()
  pack: move find_pack_entry() and make it global
  pack: move has_sha1_pack()
  pack: move has_pack_index()
  pack: move for_each_packed_object()

 Makefile |1 +
 builtin/am.c |1 +
 builtin/cat-file.c   |1 +
 builtin/clone.c  |1 +
 builtin/count-objects.c  |1 +
 builtin/fetch.c  |1 +
 builtin/gc.c |1 +
 builtin/merge.c  |1 +
 builtin/pack-redundant.c |1 +
 builtin/prune-packed.c   |1 +
 cache.h  |  122 +--
 connected.c  |1 +
 diff.c   |1 +
 git-compat-util.h|2 -
 http-backend.c   |1 +
 http-push.c  |1 +
 http-walker.c|1 +
 pack.h   |  137 +++
 packfile.c   | 1905 +++
 path.c   |1 +
 reachable.c  |1 +
 revision.c   |1 +
 server-info.c|1 +
 sha1_file.c  | 2484 ++
 sha1_name.c  |1 +
 streaming.c  |1 +
 26 files changed, 2350 insertions(+), 2321 deletions(-)
 create mode 100644 packfile.c

-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 08/25] pack: move unuse_pack()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h | 1 -
 pack.h  | 1 +
 packfile.c  | 9 +
 sha1_file.c | 9 -
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index dd9f9a9ae..4812f3a63 100644
--- a/cache.h
+++ b/cache.h
@@ -1637,7 +1637,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
diff --git a/pack.h b/pack.h
index bf2b99bf9..3876e9ae6 100644
--- a/pack.h
+++ b/pack.h
@@ -149,5 +149,6 @@ extern void close_all_packs(void);
 extern int open_packed_git(struct packed_git *p);
 
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, unsigned long *);
+extern void unuse_pack(struct pack_window **);
 
 #endif
diff --git a/packfile.c b/packfile.c
index 85cb65558..93526ea7b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -596,3 +596,12 @@ unsigned char *use_pack(struct packed_git *p,
*left = win->len - xsize_t(offset);
return win->base + offset;
 }
+
+void unuse_pack(struct pack_window **w_cursor)
+{
+   struct pack_window *w = *w_cursor;
+   if (w) {
+   w->inuse_cnt--;
+   *w_cursor = NULL;
+   }
+}
diff --git a/sha1_file.c b/sha1_file.c
index 8f17a07e9..12501ef06 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -717,15 +717,6 @@ void *xmmap(void *start, size_t length,
return ret;
 }
 
-void unuse_pack(struct pack_window **w_cursor)
-{
-   struct pack_window *w = *w_cursor;
-   if (w) {
-   w->inuse_cnt--;
-   *w_cursor = NULL;
-   }
-}
-
 static struct packed_git *alloc_packed_git(int extra)
 {
struct packed_git *p = xmalloc(st_add(sizeof(*p), extra));
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 14/25] pack: move unpack_object_header()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 -
 pack.h  |  1 +
 packfile.c  | 26 ++
 sha1_file.c | 26 --
 4 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/cache.h b/cache.h
index e29918c75..1aea0 100644
--- a/cache.h
+++ b/cache.h
@@ -1661,7 +1661,6 @@ extern off_t find_pack_entry_one(const unsigned char 
*sha1, struct packed_git *)
 
 extern int is_pack_valid(struct packed_git *);
 extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
unsigned long *);
-extern int unpack_object_header(struct packed_git *, struct pack_window **, 
off_t *, unsigned long *);
 
 /*
  * Iterate over the files in the loose-object parts of the object
diff --git a/pack.h b/pack.h
index 69c92d8d2..5e3552392 100644
--- a/pack.h
+++ b/pack.h
@@ -169,5 +169,6 @@ unsigned long approximate_object_count(void);
 
 extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
unsigned long len, enum object_type *type, unsigned long *sizep);
 extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
+extern int unpack_object_header(struct packed_git *, struct pack_window **, 
off_t *, unsigned long *);
 
 #endif
diff --git a/packfile.c b/packfile.c
index 511afad85..a4db78ea0 100644
--- a/packfile.c
+++ b/packfile.c
@@ -948,3 +948,29 @@ unsigned long get_size_from_delta(struct packed_git *p,
/* Read the result size */
return get_delta_hdr_size(, delta_head+sizeof(delta_head));
 }
+
+int unpack_object_header(struct packed_git *p,
+struct pack_window **w_curs,
+off_t *curpos,
+unsigned long *sizep)
+{
+   unsigned char *base;
+   unsigned long left;
+   unsigned long used;
+   enum object_type type;
+
+   /* use_pack() assures us we have [base, base + 20) available
+* as a range that we can look at.  (Its actually the hash
+* size that is assured.)  With our object header encoding
+* the maximum deflated object size is 2^137, which is just
+* insane, so we know won't exceed what we have been given.
+*/
+   base = use_pack(p, w_curs, *curpos, );
+   used = unpack_object_header_buffer(base, left, , sizep);
+   if (!used) {
+   type = OBJ_BAD;
+   } else
+   *curpos += used;
+
+   return type;
+}
diff --git a/sha1_file.c b/sha1_file.c
index 7d354d9b6..f3bcdae17 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1170,32 +1170,6 @@ static const unsigned char *get_delta_base_sha1(struct 
packed_git *p,
return NULL;
 }
 
-int unpack_object_header(struct packed_git *p,
-struct pack_window **w_curs,
-off_t *curpos,
-unsigned long *sizep)
-{
-   unsigned char *base;
-   unsigned long left;
-   unsigned long used;
-   enum object_type type;
-
-   /* use_pack() assures us we have [base, base + 20) available
-* as a range that we can look at.  (Its actually the hash
-* size that is assured.)  With our object header encoding
-* the maximum deflated object size is 2^137, which is just
-* insane, so we know won't exceed what we have been given.
-*/
-   base = use_pack(p, w_curs, *curpos, );
-   used = unpack_object_header_buffer(base, left, , sizep);
-   if (!used) {
-   type = OBJ_BAD;
-   } else
-   *curpos += used;
-
-   return type;
-}
-
 static int retry_bad_packed_offset(struct packed_git *p, off_t obj_offset)
 {
int type;
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 03/25] pack: move pack_report()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  2 --
 pack.h  |  2 ++
 packfile.c  | 24 
 sha1_file.c | 24 
 4 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/cache.h b/cache.h
index 1f0f47819..c7f802e4a 100644
--- a/cache.h
+++ b/cache.h
@@ -1624,8 +1624,6 @@ unsigned long approximate_object_count(void);
 extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 struct packed_git *packs);
 
-extern void pack_report(void);
-
 /*
  * Create a temporary file rooted in the object database directory, or
  * die on failure. The filename is taken from "pattern", which should have the
diff --git a/pack.h b/pack.h
index 7fcd45f7b..6098bfe40 100644
--- a/pack.h
+++ b/pack.h
@@ -133,4 +133,6 @@ extern unsigned int pack_max_fds;
 extern size_t peak_pack_mapped;
 extern size_t pack_mapped;
 
+extern void pack_report(void);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 0f46e0617..60d9fc3b0 100644
--- a/packfile.c
+++ b/packfile.c
@@ -35,3 +35,27 @@ struct packed_git *packed_git;
 
 static struct mru packed_git_mru_storage;
 struct mru *packed_git_mru = _git_mru_storage;
+
+#define SZ_FMT PRIuMAX
+static inline uintmax_t sz_fmt(size_t s) { return s; }
+
+void pack_report(void)
+{
+   fprintf(stderr,
+   "pack_report: getpagesize()= %10" SZ_FMT "\n"
+   "pack_report: core.packedGitWindowSize = %10" SZ_FMT "\n"
+   "pack_report: core.packedGitLimit  = %10" SZ_FMT "\n",
+   sz_fmt(getpagesize()),
+   sz_fmt(packed_git_window_size),
+   sz_fmt(packed_git_limit));
+   fprintf(stderr,
+   "pack_report: pack_used_ctr= %10u\n"
+   "pack_report: pack_mmap_calls  = %10u\n"
+   "pack_report: pack_open_windows= %10u / %10u\n"
+   "pack_report: pack_mapped  = "
+   "%10" SZ_FMT " / %10" SZ_FMT "\n",
+   pack_used_ctr,
+   pack_mmap_calls,
+   pack_open_windows, peak_pack_open_windows,
+   sz_fmt(pack_mapped), sz_fmt(peak_pack_mapped));
+}
diff --git a/sha1_file.c b/sha1_file.c
index 4d95e21eb..0de39f480 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -29,9 +29,6 @@
 #include "mergesort.h"
 #include "quote.h"
 
-#define SZ_FMT PRIuMAX
-static inline uintmax_t sz_fmt(size_t s) { return s; }
-
 const unsigned char null_sha1[20];
 const struct object_id null_oid;
 const struct object_id empty_tree_oid = {
@@ -682,27 +679,6 @@ static int has_loose_object(const unsigned char *sha1)
return check_and_freshen(sha1, 0);
 }
 
-void pack_report(void)
-{
-   fprintf(stderr,
-   "pack_report: getpagesize()= %10" SZ_FMT "\n"
-   "pack_report: core.packedGitWindowSize = %10" SZ_FMT "\n"
-   "pack_report: core.packedGitLimit  = %10" SZ_FMT "\n",
-   sz_fmt(getpagesize()),
-   sz_fmt(packed_git_window_size),
-   sz_fmt(packed_git_limit));
-   fprintf(stderr,
-   "pack_report: pack_used_ctr= %10u\n"
-   "pack_report: pack_mmap_calls  = %10u\n"
-   "pack_report: pack_open_windows= %10u / %10u\n"
-   "pack_report: pack_mapped  = "
-   "%10" SZ_FMT " / %10" SZ_FMT "\n",
-   pack_used_ctr,
-   pack_mmap_calls,
-   pack_open_windows, peak_pack_open_windows,
-   sz_fmt(pack_mapped), sz_fmt(peak_pack_mapped));
-}
-
 /*
  * Open and mmap the index file at path, perform a couple of
  * consistency checks, then record its information to p.  Return 0 on
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 05/25] pack: move release_pack_memory()

2017-08-08 Thread Jonathan Tan

The function unuse_one_window() needs to be temporarily made global. Its
scope will be restored to static in a subsequent commit.

Signed-off-by: Jonathan Tan 
---
 git-compat-util.h |  2 --
 pack.h|  4 
 packfile.c| 49 +
 sha1_file.c   | 49 -
 4 files changed, 53 insertions(+), 51 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index db9c22de7..201056e2d 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -749,8 +749,6 @@ const char *inet_ntop(int af, const void *src, char *dst, 
size_t size);
 extern int git_atexit(void (*handler)(void));
 #endif
 
-extern void release_pack_memory(size_t);
-
 typedef void (*try_to_free_t)(size_t);
 extern try_to_free_t set_try_to_free_routine(try_to_free_t);
 
diff --git a/pack.h b/pack.h
index 5be0ed42a..c16220586 100644
--- a/pack.h
+++ b/pack.h
@@ -143,4 +143,8 @@ extern int open_pack_index(struct packed_git *);
 
 extern struct packed_git *parse_pack_index(unsigned char *sha1, const char 
*idx_path);
 
+extern int unuse_one_window(struct packed_git *current);
+
+extern void release_pack_memory(size_t);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 6edc43228..8daa74ad1 100644
--- a/packfile.c
+++ b/packfile.c
@@ -208,3 +208,52 @@ struct packed_git *parse_pack_index(unsigned char *sha1, 
const char *idx_path)
 
return p;
 }
+
+static void scan_windows(struct packed_git *p,
+   struct packed_git **lru_p,
+   struct pack_window **lru_w,
+   struct pack_window **lru_l)
+{
+   struct pack_window *w, *w_l;
+
+   for (w_l = NULL, w = p->windows; w; w = w->next) {
+   if (!w->inuse_cnt) {
+   if (!*lru_w || w->last_used < (*lru_w)->last_used) {
+   *lru_p = p;
+   *lru_w = w;
+   *lru_l = w_l;
+   }
+   }
+   w_l = w;
+   }
+}
+
+int unuse_one_window(struct packed_git *current)
+{
+   struct packed_git *p, *lru_p = NULL;
+   struct pack_window *lru_w = NULL, *lru_l = NULL;
+
+   if (current)
+   scan_windows(current, _p, _w, _l);
+   for (p = packed_git; p; p = p->next)
+   scan_windows(p, _p, _w, _l);
+   if (lru_p) {
+   munmap(lru_w->base, lru_w->len);
+   pack_mapped -= lru_w->len;
+   if (lru_l)
+   lru_l->next = lru_w->next;
+   else
+   lru_p->windows = lru_w->next;
+   free(lru_w);
+   pack_open_windows--;
+   return 1;
+   }
+   return 0;
+}
+
+void release_pack_memory(size_t need)
+{
+   size_t cur = pack_mapped;
+   while (need >= (cur - pack_mapped) && unuse_one_window(NULL))
+   ; /* nothing */
+}
diff --git a/sha1_file.c b/sha1_file.c
index 2e414f5f5..644876e4e 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -679,55 +679,6 @@ static int has_loose_object(const unsigned char *sha1)
return check_and_freshen(sha1, 0);
 }
 
-static void scan_windows(struct packed_git *p,
-   struct packed_git **lru_p,
-   struct pack_window **lru_w,
-   struct pack_window **lru_l)
-{
-   struct pack_window *w, *w_l;
-
-   for (w_l = NULL, w = p->windows; w; w = w->next) {
-   if (!w->inuse_cnt) {
-   if (!*lru_w || w->last_used < (*lru_w)->last_used) {
-   *lru_p = p;
-   *lru_w = w;
-   *lru_l = w_l;
-   }
-   }
-   w_l = w;
-   }
-}
-
-static int unuse_one_window(struct packed_git *current)
-{
-   struct packed_git *p, *lru_p = NULL;
-   struct pack_window *lru_w = NULL, *lru_l = NULL;
-
-   if (current)
-   scan_windows(current, _p, _w, _l);
-   for (p = packed_git; p; p = p->next)
-   scan_windows(p, _p, _w, _l);
-   if (lru_p) {
-   munmap(lru_w->base, lru_w->len);
-   pack_mapped -= lru_w->len;
-   if (lru_l)
-   lru_l->next = lru_w->next;
-   else
-   lru_p->windows = lru_w->next;
-   free(lru_w);
-   pack_open_windows--;
-   return 1;
-   }
-   return 0;
-}
-
-void release_pack_memory(size_t need)
-{
-   size_t cur = pack_mapped;
-   while (need >= (cur - pack_mapped) && unuse_one_window(NULL))
-   ; /* nothing */
-}
-
 static void mmap_limit_check(size_t length)
 {
static size_t limit = 0;
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 15/25] sha1_file: set whence in storage-specific info fn

2017-08-08 Thread Jonathan Tan

Move the setting of oi->whence to sha1_loose_object_info() and
packed_object_info(). This allows sha1_object_info_extended() to not
need to know about the delta base cache.

Signed-off-by: Jonathan Tan 
---
 sha1_file.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index f3bcdae17..9eadda388 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1473,6 +1473,9 @@ int packed_object_info(struct packed_git *p, off_t 
obj_offset,
hashclr(oi->delta_base_sha1);
}
 
+   oi->whence = in_delta_base_cache(p, obj_offset) ? OI_DBCACHED :
+ OI_PACKED;
+
 out:
unuse_pack(_curs);
return type;
@@ -2002,6 +2005,7 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
if (oi->sizep == _scratch)
oi->sizep = NULL;
strbuf_release();
+   oi->whence = OI_LOOSE;
return (status < 0) ? status : 0;
 }
 
@@ -2039,10 +2043,8 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi,
 
if (!find_pack_entry(real, )) {
/* Most likely it's a loose object. */
-   if (!sha1_loose_object_info(real, oi, flags)) {
-   oi->whence = OI_LOOSE;
+   if (!sha1_loose_object_info(real, oi, flags))
return 0;
-   }
 
/* Not a loose object; someone else may have just packed it. */
if (flags & OBJECT_INFO_QUICK) {
@@ -2065,10 +2067,7 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi,
if (rtype < 0) {
mark_bad_packed_object(e.p, real);
return sha1_object_info_extended(real, oi, 0);
-   } else if (in_delta_base_cache(e.p, e.offset)) {
-   oi->whence = OI_DBCACHED;
-   } else {
-   oi->whence = OI_PACKED;
+   } else if (oi->whence == OI_PACKED) {
oi->u.packed.offset = e.offset;
oi->u.packed.pack = e.p;
oi->u.packed.is_delta = (rtype == OBJ_REF_DELTA ||
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 20/25] pack: move find_pack_entry_one(), is_pack_valid()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  8 --
 pack.h  | 10 ++--
 packfile.c  | 85 +
 sha1_file.c | 84 
 4 files changed, 93 insertions(+), 94 deletions(-)

diff --git a/cache.h b/cache.h
index 7686ccb30..b944aca69 100644
--- a/cache.h
+++ b/cache.h
@@ -1618,14 +1618,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-/*
- * If the object named sha1 is present in the specified packfile,
- * return its offset within the packfile; otherwise, return 0.
- */
-extern off_t find_pack_entry_one(const unsigned char *sha1, struct packed_git 
*);
-
-extern int is_pack_valid(struct packed_git *);
-
 /*
  * Iterate over the files in the loose-object parts of the object
  * directory "path", triggering the following callbacks:
diff --git a/pack.h b/pack.h
index e0e206e3c..f5bd94813 100644
--- a/pack.h
+++ b/pack.h
@@ -144,8 +144,6 @@ extern void close_pack_windows(struct packed_git *);
 extern void close_pack_index(struct packed_git *);
 extern void close_all_packs(void);
 
-extern int open_packed_git(struct packed_git *p);
-
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, unsigned long *);
 extern void unuse_pack(struct pack_window **);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
@@ -212,4 +210,12 @@ extern void check_pack_index_ptr(const struct packed_git 
*p, const void *ptr);
  */
 extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t n);
 
+/*
+ * If the object named sha1 is present in the specified packfile,
+ * return its offset within the packfile; otherwise, return 0.
+ */
+extern off_t find_pack_entry_one(const unsigned char *sha1, struct packed_git 
*);
+
+extern int is_pack_valid(struct packed_git *);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 94c8af991..71017d2ec 100644
--- a/packfile.c
+++ b/packfile.c
@@ -6,6 +6,7 @@
 #include "delta.h"
 #include "list.h"
 #include "streaming.h"
+#include "sha1-lookup.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -1698,3 +1699,87 @@ off_t nth_packed_object_offset(const struct packed_git 
*p, uint32_t n)
   ntohl(*((uint32_t *)(index + 4)));
}
 }
+
+off_t find_pack_entry_one(const unsigned char *sha1,
+ struct packed_git *p)
+{
+   const uint32_t *level1_ofs = p->index_data;
+   const unsigned char *index = p->index_data;
+   unsigned hi, lo, stride;
+   static int use_lookup = -1;
+   static int debug_lookup = -1;
+
+   if (debug_lookup < 0)
+   debug_lookup = !!getenv("GIT_DEBUG_LOOKUP");
+
+   if (!index) {
+   if (open_pack_index(p))
+   return 0;
+   level1_ofs = p->index_data;
+   index = p->index_data;
+   }
+   if (p->index_version > 1) {
+   level1_ofs += 2;
+   index += 8;
+   }
+   index += 4 * 256;
+   hi = ntohl(level1_ofs[*sha1]);
+   lo = ((*sha1 == 0x0) ? 0 : ntohl(level1_ofs[*sha1 - 1]));
+   if (p->index_version > 1) {
+   stride = 20;
+   } else {
+   stride = 24;
+   index += 4;
+   }
+
+   if (debug_lookup)
+   printf("%02x%02x%02x... lo %u hi %u nr %"PRIu32"\n",
+  sha1[0], sha1[1], sha1[2], lo, hi, p->num_objects);
+
+   if (use_lookup < 0)
+   use_lookup = !!getenv("GIT_USE_LOOKUP");
+   if (use_lookup) {
+   int pos = sha1_entry_pos(index, stride, 0,
+lo, hi, p->num_objects, sha1);
+   if (pos < 0)
+   return 0;
+   return nth_packed_object_offset(p, pos);
+   }
+
+   do {
+   unsigned mi = (lo + hi) / 2;
+   int cmp = hashcmp(index + mi * stride, sha1);
+
+   if (debug_lookup)
+   printf("lo %u hi %u rg %u mi %u\n",
+  lo, hi, hi - lo, mi);
+   if (!cmp)
+   return nth_packed_object_offset(p, mi);
+   if (cmp > 0)
+   hi = mi;
+   else
+   lo = mi+1;
+   } while (lo < hi);
+   return 0;
+}
+
+int is_pack_valid(struct packed_git *p)
+{
+   /* An already open pack is known to be valid. */
+   if (p->pack_fd != -1)
+   return 1;
+
+   /* If the pack has one window completely covering the
+* file size, the pack is known to be valid even if
+* the descriptor is not currently open.
+*/
+   if (p->windows) {
+   struct pack_window *w = p->windows;
+
+   if (!w->offset && w->len ==

[PATCH v2 21/25] pack: move find_sha1_pack()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h   |  3 ---
 http-push.c   |  1 +
 http-walker.c |  1 +
 pack.h|  3 +++
 packfile.c| 13 +
 sha1_file.c   | 13 -
 6 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/cache.h b/cache.h
index b944aca69..06a8caae6 100644
--- a/cache.h
+++ b/cache.h
@@ -1600,9 +1600,6 @@ struct pack_entry {
struct packed_git *p;
 };
 
-extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
-struct packed_git *packs);
-
 /*
  * Create a temporary file rooted in the object database directory, or
  * die on failure. The filename is taken from "pattern", which should have the
diff --git a/http-push.c b/http-push.c
index c91f40a61..4e8a227d1 100644
--- a/http-push.c
+++ b/http-push.c
@@ -11,6 +11,7 @@
 #include "list-objects.h"
 #include "sigchain.h"
 #include "argv-array.h"
+#include "pack.h"
 
 #ifdef EXPAT_NEEDS_XMLPARSE_H
 #include 
diff --git a/http-walker.c b/http-walker.c
index ee049cb13..d6f0af944 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -4,6 +4,7 @@
 #include "http.h"
 #include "list.h"
 #include "transport.h"
+#include "pack.h"
 
 struct alt_base {
char *base;
diff --git a/pack.h b/pack.h
index f5bd94813..0517d6542 100644
--- a/pack.h
+++ b/pack.h
@@ -218,4 +218,7 @@ extern off_t find_pack_entry_one(const unsigned char *sha1, 
struct packed_git *)
 
 extern int is_pack_valid(struct packed_git *);
 
+extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
+struct packed_git *packs);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 71017d2ec..f16b56262 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1783,3 +1783,16 @@ int is_pack_valid(struct packed_git *p)
/* Force the pack to open to prove its valid. */
return !open_packed_git(p);
 }
+
+struct packed_git *find_sha1_pack(const unsigned char *sha1,
+ struct packed_git *packs)
+{
+   struct packed_git *p;
+
+   for (p = packs; p; p = p->next) {
+   if (find_pack_entry_one(sha1, p))
+   return p;
+   }
+   return NULL;
+
+}
diff --git a/sha1_file.c b/sha1_file.c
index 75b9ceb39..229358663 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1126,19 +1126,6 @@ static int find_pack_entry(const unsigned char *sha1, 
struct pack_entry *e)
return 0;
 }
 
-struct packed_git *find_sha1_pack(const unsigned char *sha1,
- struct packed_git *packs)
-{
-   struct packed_git *p;
-
-   for (p = packs; p; p = p->next) {
-   if (find_pack_entry_one(sha1, p))
-   return p;
-   }
-   return NULL;
-
-}
-
 static int sha1_loose_object_info(const unsigned char *sha1,
  struct object_info *oi,
  int flags)
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 19/25] pack: move check_pack_index_ptr(), nth_packed_object_offset()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h | 16 
 pack.h  | 16 
 packfile.c  | 33 +
 sha1_file.c | 33 -
 4 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/cache.h b/cache.h
index f083d532e..7686ccb30 100644
--- a/cache.h
+++ b/cache.h
@@ -1618,22 +1618,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-/*
- * Make sure that a pointer access into an mmap'd index file is within bounds,
- * and can provide at least 8 bytes of data.
- *
- * Note that this is only necessary for variable-length segments of the file
- * (like the 64-bit extended offset table), as we compare the size to the
- * fixed-length parts when we open the file.
- */
-extern void check_pack_index_ptr(const struct packed_git *p, const void *ptr);
-
-/*
- * Return the offset of the nth object within the specified packfile.
- * The index must already be opened.
- */
-extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t n);
-
 /*
  * If the object named sha1 is present in the specified packfile,
  * return its offset within the packfile; otherwise, return 0.
diff --git a/pack.h b/pack.h
index 023c97b37..e0e206e3c 100644
--- a/pack.h
+++ b/pack.h
@@ -196,4 +196,20 @@ extern const unsigned char *nth_packed_object_sha1(struct 
packed_git *, uint32_t
  */
 extern const struct object_id *nth_packed_object_oid(struct object_id *, 
struct packed_git *, uint32_t n);
 
+/*
+ * Make sure that a pointer access into an mmap'd index file is within bounds,
+ * and can provide at least 8 bytes of data.
+ *
+ * Note that this is only necessary for variable-length segments of the file
+ * (like the 64-bit extended offset table), as we compare the size to the
+ * fixed-length parts when we open the file.
+ */
+extern void check_pack_index_ptr(const struct packed_git *p, const void *ptr);
+
+/*
+ * Return the offset of the nth object within the specified packfile.
+ * The index must already be opened.
+ */
+extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t n);
+
 #endif
diff --git a/packfile.c b/packfile.c
index b16cf648a..94c8af991 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1665,3 +1665,36 @@ const struct object_id *nth_packed_object_oid(struct 
object_id *oid,
hashcpy(oid->hash, hash);
return oid;
 }
+
+void check_pack_index_ptr(const struct packed_git *p, const void *vptr)
+{
+   const unsigned char *ptr = vptr;
+   const unsigned char *start = p->index_data;
+   const unsigned char *end = start + p->index_size;
+   if (ptr < start)
+   die(_("offset before start of pack index for %s (corrupt 
index?)"),
+   p->pack_name);
+   /* No need to check for underflow; .idx files must be at least 8 bytes 
*/
+   if (ptr >= end - 8)
+   die(_("offset beyond end of pack index for %s (truncated 
index?)"),
+   p->pack_name);
+}
+
+off_t nth_packed_object_offset(const struct packed_git *p, uint32_t n)
+{
+   const unsigned char *index = p->index_data;
+   index += 4 * 256;
+   if (p->index_version == 1) {
+   return ntohl(*((uint32_t *)(index + 24 * n)));
+   } else {
+   uint32_t off;
+   index += 8 + p->num_objects * (20 + 4);
+   off = ntohl(*((uint32_t *)(index + 4 * n)));
+   if (!(off & 0x8000))
+   return off;
+   index += p->num_objects * 4 + (off & 0x7fff) * 8;
+   check_pack_index_ptr(p, index);
+   return (((uint64_t)ntohl(*((uint32_t *)(index + 0 << 32) |
+  ntohl(*((uint32_t *)(index + 4)));
+   }
+}
diff --git a/sha1_file.c b/sha1_file.c
index 4cd2b1809..0f4d68c5a 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1073,39 +1073,6 @@ int parse_sha1_header(const char *hdr, unsigned long 
*sizep)
return parse_sha1_header_extended(hdr, , 0);
 }
 
-void check_pack_index_ptr(const struct packed_git *p, const void *vptr)
-{
-   const unsigned char *ptr = vptr;
-   const unsigned char *start = p->index_data;
-   const unsigned char *end = start + p->index_size;
-   if (ptr < start)
-   die(_("offset before start of pack index for %s (corrupt 
index?)"),
-   p->pack_name);
-   /* No need to check for underflow; .idx files must be at least 8 bytes 
*/
-   if (ptr >= end - 8)
-   die(_("offset beyond end of pack index for %s (truncated 
index?)"),
-   p->pack_name);
-}
-
-off_t nth_packed_object_offset(const struct packed_git *p, uint32_t n)
-{
-   const unsigned char *index = p->index_data;
-   index += 4 * 256;
-   if (p->index_version == 1) {
-   return ntohl(*((uint32_t *)(index + 24 * n)));
-   } else {
-

[PATCH v2 22/25] pack: move find_pack_entry() and make it global

2017-08-08 Thread Jonathan Tan

This function needs to be global as it is used by sha1_file.c and will
be used by packfile.c.

Signed-off-by: Jonathan Tan 
---
 pack.h  |  2 ++
 packfile.c  | 53 +
 sha1_file.c | 53 -
 3 files changed, 55 insertions(+), 53 deletions(-)

diff --git a/pack.h b/pack.h
index 0517d6542..1021a781c 100644
--- a/pack.h
+++ b/pack.h
@@ -221,4 +221,6 @@ extern int is_pack_valid(struct packed_git *);
 extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 struct packed_git *packs);
 
+extern int find_pack_entry(const unsigned char *sha1, struct pack_entry *e);
+
 #endif
diff --git a/packfile.c b/packfile.c
index f16b56262..0f1e3338b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1796,3 +1796,56 @@ struct packed_git *find_sha1_pack(const unsigned char 
*sha1,
return NULL;
 
 }
+
+static int fill_pack_entry(const unsigned char *sha1,
+  struct pack_entry *e,
+  struct packed_git *p)
+{
+   off_t offset;
+
+   if (p->num_bad_objects) {
+   unsigned i;
+   for (i = 0; i < p->num_bad_objects; i++)
+   if (!hashcmp(sha1, p->bad_object_sha1 + 20 * i))
+   return 0;
+   }
+
+   offset = find_pack_entry_one(sha1, p);
+   if (!offset)
+   return 0;
+
+   /*
+* We are about to tell the caller where they can locate the
+* requested object.  We better make sure the packfile is
+* still here and can be accessed before supplying that
+* answer, as it may have been deleted since the index was
+* loaded!
+*/
+   if (!is_pack_valid(p))
+   return 0;
+   e->offset = offset;
+   e->p = p;
+   hashcpy(e->sha1, sha1);
+   return 1;
+}
+
+/*
+ * Iff a pack file contains the object named by sha1, return true and
+ * store its location to e.
+ */
+int find_pack_entry(const unsigned char *sha1, struct pack_entry *e)
+{
+   struct mru_entry *p;
+
+   prepare_packed_git();
+   if (!packed_git)
+   return 0;
+
+   for (p = packed_git_mru->head; p; p = p->next) {
+   if (fill_pack_entry(sha1, e, p->item)) {
+   mru_mark(packed_git_mru, p);
+   return 1;
+   }
+   }
+   return 0;
+}
diff --git a/sha1_file.c b/sha1_file.c
index 229358663..1a505eae5 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1073,59 +1073,6 @@ int parse_sha1_header(const char *hdr, unsigned long 
*sizep)
return parse_sha1_header_extended(hdr, , 0);
 }
 
-static int fill_pack_entry(const unsigned char *sha1,
-  struct pack_entry *e,
-  struct packed_git *p)
-{
-   off_t offset;
-
-   if (p->num_bad_objects) {
-   unsigned i;
-   for (i = 0; i < p->num_bad_objects; i++)
-   if (!hashcmp(sha1, p->bad_object_sha1 + 20 * i))
-   return 0;
-   }
-
-   offset = find_pack_entry_one(sha1, p);
-   if (!offset)
-   return 0;
-
-   /*
-* We are about to tell the caller where they can locate the
-* requested object.  We better make sure the packfile is
-* still here and can be accessed before supplying that
-* answer, as it may have been deleted since the index was
-* loaded!
-*/
-   if (!is_pack_valid(p))
-   return 0;
-   e->offset = offset;
-   e->p = p;
-   hashcpy(e->sha1, sha1);
-   return 1;
-}
-
-/*
- * Iff a pack file contains the object named by sha1, return true and
- * store its location to e.
- */
-static int find_pack_entry(const unsigned char *sha1, struct pack_entry *e)
-{
-   struct mru_entry *p;
-
-   prepare_packed_git();
-   if (!packed_git)
-   return 0;
-
-   for (p = packed_git_mru->head; p; p = p->next) {
-   if (fill_pack_entry(sha1, e, p->item)) {
-   mru_mark(packed_git_mru, p);
-   return 1;
-   }
-   }
-   return 0;
-}
-
 static int sha1_loose_object_info(const unsigned char *sha1,
  struct object_info *oi,
  int flags)
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 12/25] pack: move unpack_object_header()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 -
 pack.h  |  2 ++
 packfile.c  | 25 +
 sha1_file.c | 25 -
 4 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/cache.h b/cache.h
index f020dfade..9c70759a6 100644
--- a/cache.h
+++ b/cache.h
@@ -1661,7 +1661,6 @@ extern off_t find_pack_entry_one(const unsigned char 
*sha1, struct packed_git *)
 
 extern int is_pack_valid(struct packed_git *);
 extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
unsigned long *);
-extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
unsigned long len, enum object_type *type, unsigned long *sizep);
 extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
 extern int unpack_object_header(struct packed_git *, struct pack_window **, 
off_t *, unsigned long *);
 
diff --git a/pack.h b/pack.h
index cad5ed488..4a7f88a38 100644
--- a/pack.h
+++ b/pack.h
@@ -167,4 +167,6 @@ extern void reprepare_packed_git(void);
  */
 unsigned long approximate_object_count(void);
 
+extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
unsigned long len, enum object_type *type, unsigned long *sizep);
+
 #endif
diff --git a/packfile.c b/packfile.c
index a517172f7..6e4f1c6e3 100644
--- a/packfile.c
+++ b/packfile.c
@@ -883,3 +883,28 @@ void reprepare_packed_git(void)
prepare_packed_git_run_once = 0;
prepare_packed_git();
 }
+
+unsigned long unpack_object_header_buffer(const unsigned char *buf,
+   unsigned long len, enum object_type *type, unsigned long *sizep)
+{
+   unsigned shift;
+   unsigned long size, c;
+   unsigned long used = 0;
+
+   c = buf[used++];
+   *type = (c >> 4) & 7;
+   size = c & 15;
+   shift = 4;
+   while (c & 0x80) {
+   if (len <= used || bitsizeof(long) <= shift) {
+   error("bad object header");
+   size = used = 0;
+   break;
+   }
+   c = buf[used++];
+   size += (c & 0x7f) << shift;
+   shift += 7;
+   }
+   *sizep = size;
+   return used;
+}
diff --git a/sha1_file.c b/sha1_file.c
index bbce60f1c..1f4b4ba2c 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -913,31 +913,6 @@ void *map_sha1_file(const unsigned char *sha1, unsigned 
long *size)
return map_sha1_file_1(NULL, sha1, size);
 }
 
-unsigned long unpack_object_header_buffer(const unsigned char *buf,
-   unsigned long len, enum object_type *type, unsigned long *sizep)
-{
-   unsigned shift;
-   unsigned long size, c;
-   unsigned long used = 0;
-
-   c = buf[used++];
-   *type = (c >> 4) & 7;
-   size = c & 15;
-   shift = 4;
-   while (c & 0x80) {
-   if (len <= used || bitsizeof(long) <= shift) {
-   error("bad object header");
-   size = used = 0;
-   break;
-   }
-   c = buf[used++];
-   size += (c & 0x7f) << shift;
-   shift += 7;
-   }
-   *sizep = size;
-   return used;
-}
-
 static int unpack_sha1_short_header(git_zstream *stream,
unsigned char *map, unsigned long mapsize,
void *buffer, unsigned long bufsiz)
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 09/25] pack: move add_packed_git()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 -
 connected.c |  1 +
 pack.h  |  1 +
 packfile.c  | 53 +
 sha1_file.c | 61 -
 5 files changed, 55 insertions(+), 62 deletions(-)

diff --git a/cache.h b/cache.h
index 4812f3a63..bf93477e8 100644
--- a/cache.h
+++ b/cache.h
@@ -1638,7 +1638,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
 extern int odb_pack_keep(const char *name);
 
 extern void clear_delta_base_cache(void);
-extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
 /*
  * Make sure that a pointer access into an mmap'd index file is within bounds,
diff --git a/connected.c b/connected.c
index 136c2ac16..3e3f0148c 100644
--- a/connected.c
+++ b/connected.c
@@ -3,6 +3,7 @@
 #include "sigchain.h"
 #include "connected.h"
 #include "transport.h"
+#include "pack.h"
 
 /*
  * If we feed all the commits we want to verify to this command
diff --git a/pack.h b/pack.h
index 3876e9ae6..c1f3ff32d 100644
--- a/pack.h
+++ b/pack.h
@@ -150,5 +150,6 @@ extern int open_packed_git(struct packed_git *p);
 
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, unsigned long *);
 extern void unuse_pack(struct pack_window **);
+extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
 #endif
diff --git a/packfile.c b/packfile.c
index 93526ea7b..efe0ed3e8 100644
--- a/packfile.c
+++ b/packfile.c
@@ -605,3 +605,56 @@ void unuse_pack(struct pack_window **w_cursor)
*w_cursor = NULL;
}
 }
+
+static void try_to_free_pack_memory(size_t size)
+{
+   release_pack_memory(size);
+}
+
+struct packed_git *add_packed_git(const char *path, size_t path_len, int local)
+{
+   static int have_set_try_to_free_routine;
+   struct stat st;
+   size_t alloc;
+   struct packed_git *p;
+
+   if (!have_set_try_to_free_routine) {
+   have_set_try_to_free_routine = 1;
+   set_try_to_free_routine(try_to_free_pack_memory);
+   }
+
+   /*
+* Make sure a corresponding .pack file exists and that
+* the index looks sane.
+*/
+   if (!strip_suffix_mem(path, _len, ".idx"))
+   return NULL;
+
+   /*
+* ".pack" is long enough to hold any suffix we're adding (and
+* the use xsnprintf double-checks that)
+*/
+   alloc = st_add3(path_len, strlen(".pack"), 1);
+   p = alloc_packed_git(alloc);
+   memcpy(p->pack_name, path, path_len);
+
+   xsnprintf(p->pack_name + path_len, alloc - path_len, ".keep");
+   if (!access(p->pack_name, F_OK))
+   p->pack_keep = 1;
+
+   xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
+   if (stat(p->pack_name, ) || !S_ISREG(st.st_mode)) {
+   free(p);
+   return NULL;
+   }
+
+   /* ok, it looks sane as far as we can check without
+* actually mapping the pack file.
+*/
+   p->pack_size = st.st_size;
+   p->pack_local = local;
+   p->mtime = st.st_mtime;
+   if (path_len < 40 || get_sha1_hex(path + path_len - 40, p->sha1))
+   hashclr(p->sha1);
+   return p;
+}
diff --git a/sha1_file.c b/sha1_file.c
index 12501ef06..7f12b1ee0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -717,67 +717,6 @@ void *xmmap(void *start, size_t length,
return ret;
 }
 
-static struct packed_git *alloc_packed_git(int extra)
-{
-   struct packed_git *p = xmalloc(st_add(sizeof(*p), extra));
-   memset(p, 0, sizeof(*p));
-   p->pack_fd = -1;
-   return p;
-}
-
-static void try_to_free_pack_memory(size_t size)
-{
-   release_pack_memory(size);
-}
-
-struct packed_git *add_packed_git(const char *path, size_t path_len, int local)
-{
-   static int have_set_try_to_free_routine;
-   struct stat st;
-   size_t alloc;
-   struct packed_git *p;
-
-   if (!have_set_try_to_free_routine) {
-   have_set_try_to_free_routine = 1;
-   set_try_to_free_routine(try_to_free_pack_memory);
-   }
-
-   /*
-* Make sure a corresponding .pack file exists and that
-* the index looks sane.
-*/
-   if (!strip_suffix_mem(path, _len, ".idx"))
-   return NULL;
-
-   /*
-* ".pack" is long enough to hold any suffix we're adding (and
-* the use xsnprintf double-checks that)
-*/
-   alloc = st_add3(path_len, strlen(".pack"), 1);
-   p = alloc_packed_git(alloc);
-   memcpy(p->pack_name, path, path_len);
-
-   xsnprintf(p->pack_name + path_len, alloc - path_len, ".keep");
-   if (!access(p->pack_name, F_OK))
-   p->pack_keep = 1;
-
-   xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
-   if (stat(p->pack_name, ) || !S_ISREG(st.st_mode)) {
-

Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-08 Thread Junio C Hamano

Jeff King  writes:

> On Tue, Aug 08, 2017 at 06:52:31PM -0400, Jeff King wrote:
>
>> > Interesting.  I see that we still have the conditional code to call
>> > out to sha1-lookup.c::sha1_entry_pos().  Do we need a similar change
>> > over there, I wonder?  Alternatively, as we have had the experimental
>> > sha1-lookup.c::sha1_entry_pos() long enough without anybody using it,
>> > perhaps we should write it off as a failed experiment and retire it?
>> 
>> There is also sha1_pos(), which seems to have the same problem (and is
>> used in several places).
>
> Actually, I take it back. The problem happens when we enter the loop
> with no entries to look at. But both sha1_pos() and sha1_entry_pos()
> return early before hitting their do-while loops in that case.

Ah, I was not looking at that part of the code.  Thanks.

I still wonder if we want to retire that conditional invocation of
sha1_entry_pos(), though.

[PATCH v2 13/25] pack: move get_size_from_delta()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 -
 pack.h  |  1 +
 packfile.c  | 40 
 sha1_file.c | 39 ---
 4 files changed, 41 insertions(+), 40 deletions(-)

diff --git a/cache.h b/cache.h
index 9c70759a6..e29918c75 100644
--- a/cache.h
+++ b/cache.h
@@ -1661,7 +1661,6 @@ extern off_t find_pack_entry_one(const unsigned char 
*sha1, struct packed_git *)
 
 extern int is_pack_valid(struct packed_git *);
 extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
unsigned long *);
-extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
 extern int unpack_object_header(struct packed_git *, struct pack_window **, 
off_t *, unsigned long *);
 
 /*
diff --git a/pack.h b/pack.h
index 4a7f88a38..69c92d8d2 100644
--- a/pack.h
+++ b/pack.h
@@ -168,5 +168,6 @@ extern void reprepare_packed_git(void);
 unsigned long approximate_object_count(void);
 
 extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
unsigned long len, enum object_type *type, unsigned long *sizep);
+extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
 
 #endif
diff --git a/packfile.c b/packfile.c
index 6e4f1c6e3..511afad85 100644
--- a/packfile.c
+++ b/packfile.c
@@ -3,6 +3,7 @@
 #include "pack.h"
 #include "dir.h"
 #include "mergesort.h"
+#include "delta.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -908,3 +909,42 @@ unsigned long unpack_object_header_buffer(const unsigned 
char *buf,
*sizep = size;
return used;
 }
+
+unsigned long get_size_from_delta(struct packed_git *p,
+ struct pack_window **w_curs,
+ off_t curpos)
+{
+   const unsigned char *data;
+   unsigned char delta_head[20], *in;
+   git_zstream stream;
+   int st;
+
+   memset(, 0, sizeof(stream));
+   stream.next_out = delta_head;
+   stream.avail_out = sizeof(delta_head);
+
+   git_inflate_init();
+   do {
+   in = use_pack(p, w_curs, curpos, _in);
+   stream.next_in = in;
+   st = git_inflate(, Z_FINISH);
+   curpos += stream.next_in - in;
+   } while ((st == Z_OK || st == Z_BUF_ERROR) &&
+stream.total_out < sizeof(delta_head));
+   git_inflate_end();
+   if ((st != Z_STREAM_END) && stream.total_out != sizeof(delta_head)) {
+   error("delta data unpack-initial failed");
+   return 0;
+   }
+
+   /* Examine the initial part of the delta to figure out
+* the result size.
+*/
+   data = delta_head;
+
+   /* ignore base size */
+   get_delta_hdr_size(, delta_head+sizeof(delta_head));
+
+   /* Read the result size */
+   return get_delta_hdr_size(, delta_head+sizeof(delta_head));
+}
diff --git a/sha1_file.c b/sha1_file.c
index 1f4b4ba2c..7d354d9b6 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1099,45 +1099,6 @@ int parse_sha1_header(const char *hdr, unsigned long 
*sizep)
return parse_sha1_header_extended(hdr, , 0);
 }
 
-unsigned long get_size_from_delta(struct packed_git *p,
- struct pack_window **w_curs,
- off_t curpos)
-{
-   const unsigned char *data;
-   unsigned char delta_head[20], *in;
-   git_zstream stream;
-   int st;
-
-   memset(, 0, sizeof(stream));
-   stream.next_out = delta_head;
-   stream.avail_out = sizeof(delta_head);
-
-   git_inflate_init();
-   do {
-   in = use_pack(p, w_curs, curpos, _in);
-   stream.next_in = in;
-   st = git_inflate(, Z_FINISH);
-   curpos += stream.next_in - in;
-   } while ((st == Z_OK || st == Z_BUF_ERROR) &&
-stream.total_out < sizeof(delta_head));
-   git_inflate_end();
-   if ((st != Z_STREAM_END) && stream.total_out != sizeof(delta_head)) {
-   error("delta data unpack-initial failed");
-   return 0;
-   }
-
-   /* Examine the initial part of the delta to figure out
-* the result size.
-*/
-   data = delta_head;
-
-   /* ignore base size */
-   get_delta_hdr_size(, delta_head+sizeof(delta_head));
-
-   /* Read the result size */
-   return get_delta_hdr_size(, delta_head+sizeof(delta_head));
-}
-
 static off_t get_delta_base(struct packed_git *p,
struct pack_window **w_curs,
off_t *curpos,
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 04/25] pack: move open_pack_index(), parse_pack_index()

2017-08-08 Thread Jonathan Tan

alloc_packed_git() in packfile.c is duplicated from sha1_file.c. In a
subsequent commit, alloc_packed_git() will be removed from sha1_file.c.

Signed-off-by: Jonathan Tan 
---
 builtin/count-objects.c |   1 +
 cache.h |   8 ---
 pack.h  |   8 +++
 packfile.c  | 149 
 sha1_file.c | 140 -
 sha1_name.c |   1 +
 6 files changed, 159 insertions(+), 148 deletions(-)

diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index 1d82e61f2..185d3190a 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -10,6 +10,7 @@
 #include "builtin.h"
 #include "parse-options.h"
 #include "quote.h"
+#include "pack.h"
 
 static unsigned long garbage;
 static off_t size_garbage;
diff --git a/cache.h b/cache.h
index c7f802e4a..5d6839525 100644
--- a/cache.h
+++ b/cache.h
@@ -1603,8 +1603,6 @@ struct pack_entry {
struct packed_git *p;
 };
 
-extern struct packed_git *parse_pack_index(unsigned char *sha1, const char 
*idx_path);
-
 /* A hook to report invalid files in pack directory */
 #define PACKDIR_FILE_PACK 1
 #define PACKDIR_FILE_IDX 2
@@ -1639,12 +1637,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-/*
- * mmap the index file for the specified packfile (if it is not
- * already mmapped).  Return 0 on success.
- */
-extern int open_pack_index(struct packed_git *);
-
 /*
  * munmap the index file for the specified packfile (if it is
  * currently mmapped).
diff --git a/pack.h b/pack.h
index 6098bfe40..5be0ed42a 100644
--- a/pack.h
+++ b/pack.h
@@ -135,4 +135,12 @@ extern size_t pack_mapped;
 
 extern void pack_report(void);
 
+/*
+ * mmap the index file for the specified packfile (if it is not
+ * already mmapped).  Return 0 on success.
+ */
+extern int open_pack_index(struct packed_git *);
+
+extern struct packed_git *parse_pack_index(unsigned char *sha1, const char 
*idx_path);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 60d9fc3b0..6edc43228 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "mru.h"
+#include "pack.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -59,3 +60,151 @@ void pack_report(void)
pack_open_windows, peak_pack_open_windows,
sz_fmt(pack_mapped), sz_fmt(peak_pack_mapped));
 }
+
+/*
+ * Open and mmap the index file at path, perform a couple of
+ * consistency checks, then record its information to p.  Return 0 on
+ * success.
+ */
+static int check_packed_git_idx(const char *path, struct packed_git *p)
+{
+   void *idx_map;
+   struct pack_idx_header *hdr;
+   size_t idx_size;
+   uint32_t version, nr, i, *index;
+   int fd = git_open(path);
+   struct stat st;
+
+   if (fd < 0)
+   return -1;
+   if (fstat(fd, )) {
+   close(fd);
+   return -1;
+   }
+   idx_size = xsize_t(st.st_size);
+   if (idx_size < 4 * 256 + 20 + 20) {
+   close(fd);
+   return error("index file %s is too small", path);
+   }
+   idx_map = xmmap(NULL, idx_size, PROT_READ, MAP_PRIVATE, fd, 0);
+   close(fd);
+
+   hdr = idx_map;
+   if (hdr->idx_signature == htonl(PACK_IDX_SIGNATURE)) {
+   version = ntohl(hdr->idx_version);
+   if (version < 2 || version > 2) {
+   munmap(idx_map, idx_size);
+   return error("index file %s is version %"PRIu32
+" and is not supported by this binary"
+" (try upgrading GIT to a newer version)",
+path, version);
+   }
+   } else
+   version = 1;
+
+   nr = 0;
+   index = idx_map;
+   if (version > 1)
+   index += 2;  /* skip index header */
+   for (i = 0; i < 256; i++) {
+   uint32_t n = ntohl(index[i]);
+   if (n < nr) {
+   munmap(idx_map, idx_size);
+   return error("non-monotonic index %s", path);
+   }
+   nr = n;
+   }
+
+   if (version == 1) {
+   /*
+* Total size:
+*  - 256 index entries 4 bytes each
+*  - 24-byte entries * nr (20-byte sha1 + 4-byte offset)
+*  - 20-byte SHA1 of the packfile
+*  - 20-byte SHA1 file checksum
+*/
+   if (idx_size != 4*256 + nr * 24 + 20 + 20) {
+   munmap(idx_map, idx_size);
+   return error("wrong index v1 file size in %s", path);
+   }
+   } else if (version == 2) {
+   /*
+* Minimum size:
+

[PATCH v2 07/25] pack: move use_pack()

2017-08-08 Thread Jonathan Tan

The function open_packed_git() needs to be temporarily made global. Its
scope will be restored to static in a subsequent commit.

Signed-off-by: Jonathan Tan 
---
 cache.h |   1 -
 pack.h  |  14 +--
 packfile.c  | 303 ++--
 sha1_file.c | 285 
 streaming.c |   1 +
 5 files changed, 299 insertions(+), 305 deletions(-)

diff --git a/cache.h b/cache.h
index 25a21a61f..dd9f9a9ae 100644
--- a/cache.h
+++ b/cache.h
@@ -1637,7 +1637,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, unsigned long *);
 extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
diff --git a/pack.h b/pack.h
index fd4668528..bf2b99bf9 100644
--- a/pack.h
+++ b/pack.h
@@ -124,14 +124,7 @@ extern char *sha1_pack_name(const unsigned char *sha1);
  */
 extern char *sha1_pack_index_name(const unsigned char *sha1);
 
-extern unsigned int pack_used_ctr;
-extern unsigned int pack_mmap_calls;
-extern unsigned int peak_pack_open_windows;
-extern unsigned int pack_open_windows;
 extern unsigned int pack_open_fds;
-extern unsigned int pack_max_fds;
-extern size_t peak_pack_mapped;
-extern size_t pack_mapped;
 
 extern void pack_report(void);
 
@@ -143,12 +136,9 @@ extern int open_pack_index(struct packed_git *);
 
 extern struct packed_git *parse_pack_index(unsigned char *sha1, const char 
*idx_path);
 
-extern int unuse_one_window(struct packed_git *current);
-
 extern void release_pack_memory(size_t);
 
 extern void close_pack_windows(struct packed_git *);
-extern int close_pack_fd(struct packed_git *);
 /*
  * munmap the index file for the specified packfile (if it is
  * currently mmapped).
@@ -156,4 +146,8 @@ extern int close_pack_fd(struct packed_git *);
 extern void close_pack_index(struct packed_git *);
 extern void close_all_packs(void);
 
+extern int open_packed_git(struct packed_git *p);
+
+extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, unsigned long *);
+
 #endif
diff --git a/packfile.c b/packfile.c
index c8e2dbdee..85cb65558 100644
--- a/packfile.c
+++ b/packfile.c
@@ -24,14 +24,14 @@ char *sha1_pack_index_name(const unsigned char *sha1)
return odb_pack_name(, sha1, "idx");
 }
 
-unsigned int pack_used_ctr;
-unsigned int pack_mmap_calls;
-unsigned int peak_pack_open_windows;
-unsigned int pack_open_windows;
+static unsigned int pack_used_ctr;
+static unsigned int pack_mmap_calls;
+static unsigned int peak_pack_open_windows;
+static unsigned int pack_open_windows;
 unsigned int pack_open_fds;
-unsigned int pack_max_fds;
-size_t peak_pack_mapped;
-size_t pack_mapped;
+static unsigned int pack_max_fds;
+static size_t peak_pack_mapped;
+static size_t pack_mapped;
 struct packed_git *packed_git;
 
 static struct mru packed_git_mru_storage;
@@ -228,7 +228,7 @@ static void scan_windows(struct packed_git *p,
}
 }
 
-int unuse_one_window(struct packed_git *current)
+static int unuse_one_window(struct packed_git *current)
 {
struct packed_git *p, *lru_p = NULL;
struct pack_window *lru_w = NULL, *lru_l = NULL;
@@ -274,7 +274,7 @@ void close_pack_windows(struct packed_git *p)
}
 }
 
-int close_pack_fd(struct packed_git *p)
+static int close_pack_fd(struct packed_git *p)
 {
if (p->pack_fd < 0)
return 0;
@@ -311,3 +311,288 @@ void close_all_packs(void)
else
close_pack(p);
 }
+
+/*
+ * The LRU pack is the one with the oldest MRU window, preferring packs
+ * with no used windows, or the oldest mtime if it has no windows allocated.
+ */
+static void find_lru_pack(struct packed_git *p, struct packed_git **lru_p, 
struct pack_window **mru_w, int *accept_windows_inuse)
+{
+   struct pack_window *w, *this_mru_w;
+   int has_windows_inuse = 0;
+
+   /*
+* Reject this pack if it has windows and the previously selected
+* one does not.  If this pack does not have windows, reject
+* it if the pack file is newer than the previously selected one.
+*/
+   if (*lru_p && !*mru_w && (p->windows || p->mtime > (*lru_p)->mtime))
+   return;
+
+   for (w = this_mru_w = p->windows; w; w = w->next) {
+   /*
+* Reject this pack if any of its windows are in use,
+* but the previously selected pack did not have any
+* inuse windows.  Otherwise, record that this pack
+* has windows in use.
+*/
+   if (w->inuse_cnt) {
+   if (*accept_windows_inuse)
+   has_windows_inuse = 1;
+   else
+

Re: [PATCH] t4062: stop using repetition in regex

2017-08-08 Thread Junio C Hamano

René Scharfe  writes:

> Am 09.08.2017 um 00:26 schrieb Junio C Hamano:
>> ... but in the meantime, I think replacing the test with "0$" to
>> force the scanner to find either the end of line or the end of the
>> buffer may be a good workaround.  We do not have to care how many of
>> random bytes are in front of the last "0" in order to ensure that
>> the regexec_buf() does not overstep to 4097th byte, while seeing
>> that regexec() that does not know how long the haystack is has to do
>> so, no?
>
> Our regexec() calls strlen() (see my other reply).
>
> Using "0$" looks like the best option to me.

Yeah, it seems that way.  If we want to be close/faithful to the
original, we could do "^0*$", but the part that is essential to
trigger the old bug is not the "we have many zeroes" (or "we have
4096 zeroes") part, but "zero is at the end of the string" part, so
"0$" would be the minimal pattern that also would work for OBSD.

Dscho?

Re: [PATCH] builtin/add: add a missing newline to an stderr message

2017-08-08 Thread Junio C Hamano

Jonathan Nieder  writes:

> I don't believe the force_mode without an 'x' provides a clear signal
> to the end user.  Perhaps you meant %cx?

Indeed you are right.  I think I saw Ramsay's v2 that has the 'x',
so let's use that version.

Thanks.

Re: [RFC] clang-format: outline the git project's coding style

2017-08-08 Thread Johannes Schindelin

Hi Brandon,

On Mon, 7 Aug 2017, Brandon Williams wrote:

> Add a '.clang-format' file which outlines the git project's coding
> style.  This can be used with clang-format to auto-format .c and .h
> files to conform with git's style.
> 
> Signed-off-by: Brandon Williams 
> ---
> 
> I'm sure this sort of thing comes up every so often on the list but back at
> git-merge I mentioned how it would be nice to not have to worry about style
> when reviewing patches as that is something mechanical and best left to a
> machine (for the most part).

Amen.

If I never have to see a review mentioning an unwrapped line, I am quite
certain I will be quite content.

Ciao,
Dscho

Re: Can the '--set-upstream' option of branch be removed ?

2017-08-08 Thread Kaartic Sivaraam

On Mon, 2017-08-07 at 13:59 -0700, Junio C Hamano wrote:
> Kaartic Sivaraam  writes:
> 
> > I refactored builtin/branch.c to remove the '--set-upstream'
> > option,successfully. The corresponding patch follows. 
> > 
> > There's just one issue with the version of git that doesn't
> > have the '--set-upstream' option. It's described in the commit
> > log message of the following patch.
> 
> which is...
> 
> > Note that, 'git branch' still *accepts* '--set-upstream' as a consequence
> > of "unique prefix can be abbrievated in option names". '--set-upstream'
> > is a unique prefix of '--set-upstream-to' after '--set-upstream' has
> > been removed.
> 
> ... this.
> 
> Thanks for spotting the issue.  
> 
Oh, I would have to thank you for enlightening me about,

"unique prefix can be abbrievated in option names"

If I didn't know that, it would taken me some time (or an email) to
find why 'git' accepted  '--set-upstream' even after it's removal!

> I think in the longer term we still want to remove --set-upstream as
> many people seem to say that its behaviour has been uttering
> confusing to them and that is why we keep giving the warning any
> time it is used.
> 
I do accept that. The behaviour of '--set-upstream' is awkward.

> > I guess it would be difficult to detect the removal of the option in
> > case it's used in scripts and might cause confusion to users?
> 
> If we want to follow through the transition, because of the issue
> you spotted, we'd need one extra step to make sure users won't be
> hurt before removal: we would need to still recognize --set-upstream
> as an option distinct from --set-upstream-to, and actively fail thes
> request, telling them that the former option no longer is supported.
> 
There's no issue in doing that if people don't shout at us for the
behaviour :)

Just to be sure, you mean "die() with a good message" when you say
"fail these requests, telling them that the former option no longer is
supported."

> Then after waiting for a few years, we may be able to re-introduce
> the "--set-upstream" option that takes the arguments in the same
> order as "--set-upstream-to", which would be the ideal endgame
> (assuming that the reason why we started deprecating "--set-upstream"
> and encouraged users to use "--set-upstream-to" still holds).
> 
It's pretty surprising it takes almost a decade to *stop accepting* a
bad option though many users are confused by it. 

"It's easier to do things than to undo them!"

-- 
Kaartic

Re: [RFC] clang-format: outline the git project's coding style

2017-08-08 Thread Brandon Williams

On 08/08, Stefan Beller wrote:
> On Tue, Aug 8, 2017 at 5:05 AM, Johannes Schindelin
>  wrote:
> > Hi Brandon,
> >
> > On Mon, 7 Aug 2017, Brandon Williams wrote:
> >
> >> Add a '.clang-format' file which outlines the git project's coding
> >> style.  This can be used with clang-format to auto-format .c and .h
> >> files to conform with git's style.
> >>
> >> Signed-off-by: Brandon Williams 
> >> ---
> >>
> >> I'm sure this sort of thing comes up every so often on the list but back at
> >> git-merge I mentioned how it would be nice to not have to worry about style
> >> when reviewing patches as that is something mechanical and best left to a
> >> machine (for the most part).
> >
> > Amen.
> >
> > If I never have to see a review mentioning an unwrapped line, I am quite
> > certain I will be quite content.
> >
> > Ciao,
> > Dscho
> 
> As a thought experiment I'd like to propose to take it one step further:
> 
>   If the code was formatted perfectly in one style such that a formatter for
>   this style would not produce changes when rerun again on the code, then
>   each individual could have a clean/smudge filter to work in their preferred
>   style, and only the exchange and storage of code is in a mutual agreed
>   style. If the mutually agreed style is close to what I prefer, I don't have 
> to
>   use clean/smudge filters.
> 
> Additionally to this patch, we'd want to either put a note into
> SubmittingPatches or Documentation/gitworkflows.txt to hint at
> how to use this formatting to just affect the patch that is currently
> worked on or rather a pre-commit hook?

I'm sure both of these things will probably happen if we decide to make
use of a code-formatter.  This RFC is really just trying to ask the
question: "Is this something we want to entertain doing?"
My response would be 'Yes' but there's many other opinions to consider
first :)

-- 
Brandon Williams

Re: What's cooking in git.git (Jul 2017, #09; Mon, 31)

2017-08-08 Thread Igor Djordjevic

On 08/08/2017 15:14, Ævar Arnfjörð Bjarmason wrote:
> On Mon, Aug 07 2017, Igor Djordjevic jotted:
>> On 07/08/2017 23:25, Igor Djordjevic wrote:
>>> On 06/08/2017 22:26, Ævar Arnfjörð Bjarmason wrote:
 On Sat, Aug 05 2017, Junio C. Hamano jotted:
> I actually consider "branch" to *never* invoking a checkout.  Even
> when "git branch -m A B" happens to be done when your checked out
> branch is A and you end up being on B.  That is not a "checkout".

 I think we just have a different mental model of what "checkout"
 means. In my mind any operation that updates the HEAD to point to a new
 branch is a checkout of that branch.
>>>
>>> If I may, from a side-viewer`s point of view, it seems you`re
>>> thinking in low-level implementation details, where what Junio
>>> describes seems more as a high-level, conceptual/end-user`s point of
>>> view.
> 
> Yeah, I think that's a fair summary. Also I didn't mean to de-rail this
> whole thread on what "checkout" really means, just explain what I meant
> with previous comments, since there seemed to be confusion about that.
> 
>>> Needing to update HEAD reference once we "rename" a branch, too, what
>>> you consider a "checkout", seems to be required only because branch
>>> name _is_ the branch reference in Git, so we need to update HEAD to
>>> point to a new/renamed branch reference -- but it`s still the same
>>> branch, conceptually.
> 
> It's not *required* we could do one of three things:
> 
>  1) Do what we do now, i.e. rename the branch/reflog & check out the new
> name.
> 
>  2) Rename the branch/reflog and checkout HEAD^0, i.e. say "the branch
> is now elsewhere, but we haven't moved your commit".
> 
>  3) Just not run replace_each_worktree_head_symref() which would end up
>  on a branch with no commits, i.e. an orphan branch.
> 
> Now, I think 2 & 3 are pretty nonsensical and wouldn't ever propose we
> should do that, but it's illustrative that #1 is not some required
> inevitability in terms of explaining what's happening with the new name
> being checked out (i.e. HEAD being updated).

I think we agree, but we`re talking from different standpoints again.

You say "it *can* be done *but* it doesn`t make sense", where I say 
"it *can`t* be done *because* it doesn`t make sense".

Implementation wise, of course it can be done differently, but 
conceptually it would be wrong (or confusing, at least), thus 
different implementation, while theoretically possible, may not be a 
sane option, thus it`s impractical (not to say "impossible").

By "required", I really meant "required in order to be conceptually 
sane".

>>> Documentation for "git-checkout" states that it is used to "*Switch
>>> branches*...[snip]", and that is not what happens here.
> 
> That's just the summary at the top but not the full story of what
> git-checkout does. E.g. you can checkout a bare SHA1 which is not
> switching branches, or a tag or whatever.

I disagree, by checking out a bare SHA1 (or tag or whatever) I`d say 
you *are* still switching branches - conceptually, at least.

When you move from a named branch to (yet) unnamed one, you are 
switching branches. Same goes for when you switch from one unnamed 
branch to another (named or unnamed).

Might be "detached HEAD" is not something we call an "unnamed 
branch", but that`s what it practically is.

>>> Implementation-wise it does because we can`t do it differently at the
>>> moment, but in user`s eyes it`s still the same branch, so no switch
>>> is made as far as the user is concerned.
> 
> Kind of, it's also worthwhile to think about that in some sense no
> switch would be performed as far as the user is concerned by taking
> option #2, i.e. we'd be in the same working tree / you could still make
> commits.
> 
> You just couldn't make new commits on your "master" which is now called
> "topic" and get new commits on "topic". I think it makes sense to do
> that, but again, it's illustrative that it's not inevitable for
> discussing the implementation.

But your second paragraph exactly explains why it`s not the same with 
option #2 (I guess you mean "no HEAD update to point to renamed 
branch" here) - if user renames branch "master" to "topic", it`s 
natural to expect that next commit is made on "topic" (which is now a 
new name for "master", conceptually still being the same, just 
renamed branch).

Ending up in limbo, in a detached HEAD state, or even worse, still 
being on a branch with an old name "master" (which conceptually 
shouldn`t exist anymore, being known as "topic" now), would be 
confusing, to say the least, and conceptually wrong, I would add.

>>> In a different implementation, where branches would have permanent
>>> references other than their names, no HEAD update would be needed as
>>> the reference would still be the same, no matter the name change,
>>> making the `git branch -m` situation clear even from your standpoint,
>>> I`d say.
>>>
> Really from the end-user's point of

Re: [PATCH v2 2/2 / RFC] branch: quote branch/ref names to improve readability

2017-08-08 Thread Stefan Beller

On Tue, Aug 8, 2017 at 10:11 AM, Kaartic Sivaraam
 wrote:
> Signed-off-by: Kaartic Sivaraam 
> ---
>  branch.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)

I like this patch.

In submodule.c we quote a lot of things (branches, submodules, paths), so
this is another step to make the output as a whole more consistent.
(Though wondering for non-submodule users, if they perceive it as
inconsistency as other parts of the code may not follow the rigorous quoting)

>
> diff --git a/branch.c b/branch.c
> index ad5a2299b..a40721f3c 100644
> --- a/branch.c
> +++ b/branch.c
> @@ -90,24 +90,24 @@ int install_branch_config(int flag, const char *local, 
> const char *origin, const
> if (shortname) {
> if (origin)
> printf_ln(rebasing ?
> - _("Branch %s set up to track remote 
> branch %s from %s by rebasing.") :
> - _("Branch %s set up to track remote 
> branch %s from %s."),
> + _("Branch '%s' set up to track 
> remote branch '%s' from '%s' by rebasing.") :
> + _("Branch '%s' set up to track 
> remote branch '%s' from '%s'."),
>   local, shortname, origin);
> else
> printf_ln(rebasing ?
> - _("Branch %s set up to track local 
> branch %s by rebasing.") :
> - _("Branch %s set up to track local 
> branch %s."),
> + _("Branch '%s' set up to track 
> local branch '%s' by rebasing.") :
> + _("Branch '%s' set up to track 
> local branch '%s'."),
>   local, shortname);
> } else {
> if (origin)
> printf_ln(rebasing ?
> - _("Branch %s set up to track remote 
> ref %s by rebasing.") :
> - _("Branch %s set up to track remote 
> ref %s."),
> + _("Branch '%s' set up to track 
> remote ref '%s' by rebasing.") :
> + _("Branch '%s' set up to track 
> remote ref '%s'."),
>   local, remote);
> else
> printf_ln(rebasing ?
> - _("Branch %s set up to track local 
> ref %s by rebasing.") :
> - _("Branch %s set up to track local 
> ref %s."),
> + _("Branch '%s' set up to track 
> local ref '%s' by rebasing.") :
> + _("Branch '%s' set up to track 
> local ref '%s'."),
>   local, remote);
> }
> }
> --
> 2.14.0.rc1.434.g6eded367a
>

Re: [PATCH] git svn fetch: Create correct commit timestamp when using --localtime

2017-08-08 Thread Urs Thuermann

Junio C Hamano  writes:

> > Yep, seems alright.  Can you apply directly?
> > Been a bit preoccupied as of late.  Thanks.
> 
> Surely, I'll just add your Reviewed-by: myself ;-)

OK, thanks.  This will fix the bug I've reported here a week or so ago
(see the References header).

urs

Re: reftable [v6]: new ref storage format

2017-08-08 Thread Junio C Hamano

I notice that you raised the location of restart table within a
block in this iteration (or maybe it happened in v5).  

This forces you to hold all contents in core before the first byte
is written out.  You start from the first entry (which will become
the first restart entry), emit a handful as prefix compressed
entries, emit a full entry (which will become the next restart
entry), ... until you have enough to fill both the data and the
restart table, then start writing from the header (which needs the
length of the block), restart table and then data.

I think it is OK to do so for the blocks whose size is limited to
16M, but I wonder if it is sensible to do the same for the index
block whose limit is 2G.  If you keep the restart table after the
data, then you could stream out the entries as you emit, write the
restart table, and then seek back to fix the length in the header,
without holding the 2G in core, no?

Re: [RFC] clang-format: outline the git project's coding style

2017-08-08 Thread Junio C Hamano

Brandon Williams  writes:

>> > +# Add a line break after the return type of top-level functions
>> > +# int
>> > +# foo();
>> > +AlwaysBreakAfterReturnType: TopLevel
>> 
>> We do that?
>
> Haha So generally no we don't do this.  Though there are definitely many
> places in our code base where we do.  Personally this makes it a bit
> easier to read when you end up having long function names.  I also
> worked on a code base which did this and it made it incredible easy to
> grep for the definition of a function.  If you grep for 'foo()' then
> you'd get all the uses of the function including the definition but if
> you grep for '^foo()' you'd get only the definition.
>
> But that's my preference, if we end up using this tool it would probably
> make sense to change this.

Yeah, I even know people who did

int
foo(void)

for greppability of "^foo".  It took some effort to get used to that
style.

>> > +# Insert a space after a cast
>> > +# x = (int32) y;notx = (int32)y;
>> > +SpaceAfterCStyleCast: true
>> 
>> Hmph, I thought we did the latter, i.e. cast sticks to the casted
>> expression without SP.
>
> I've seen both and I wasn't sure which was the correct form to use.

We do the latter because checkpatch.pl from the kernel project tells
us to, I think.

Re: [PATCH v2 1/2 / RFC] builtin/branch: stop supporting the use of --set-upstream option

2017-08-08 Thread Martin Ågren

On 8 August 2017 at 19:11, Kaartic Sivaraam
 wrote:
> The '--set-upstream' option of branch was deprecated in,
>
> b347d06bf branch: deprecate --set-upstream and show help if we
> detect possible mistaken use (Thu, 30 Aug 2012 19:23:13 +0200)
>
> It was deprecated for the reasons specified in the commit message of
> the referenced commit.
>
> Refactor 'branch' so that it dies with an appropraite error message
> when the '--set-upstream' is used.

appropriate. (Also, is this really a refactoring?)

>
> Note that there's a reason behind "dying with an error message" instead of
> "not accepting the '--set-upstream'". ;git branch' would still *accept*
> '--set-upstream' even after it's removal as a consequence of "unique
> prefix can be abbrievated in option names" AND '--set-upstream' is a unique
> prefix of '--set-upstream-to' when '--set-upstream' has been removed. In
> order to smooth the transition for users due to the "prefix issue" it was
> decided to make branch die when seeing the '--set-upstream' flag for a few
> years and let the users know that it would be removed some time in the future.
>
> The before/after behaviour for a simple case follows,
>
> $ git remote
> origin
>
> Before,
>
> $ git branch
> * master
>
> $ git branch --set-upstream origin/master
> The --set-upstream flag is deprecated and will be removed. Consider using 
> --track or --set-upstream-to
> Branch origin/master set up to track local branch master.
>
> $ echo $?
> 0
>
> $ git branch
> * master
>   origin/master
>
> After,
>
> $ git branch
> * master
>
> $ git branch --set-upstream origin/master
> fatal: the '--set-upstream' flag is no longer supported and will be 
> removed. Consider using '--track' or '--set-upstream-to'
>
> $ echo $?
> 128
>
> $ git branch
> * master
>
> Signed-off-by: Kaartic Sivaraam 
> ---
>  Changes in v2:
>
>  The previous patch removed the concerned option while the current patch
>  makes 'git branch' die on seeing the option.
>
>  The possibility of '--set-upstream' becoming an alias of  '--set-upstream-to'
>  was documented.
>
>  Documentation/git-branch.txt |  8 +++
>  builtin/branch.c | 21 +--
>  t/t3200-branch.sh| 50 
> 
>  t/t6040-tracking-info.sh | 16 +++---
>  4 files changed, 17 insertions(+), 78 deletions(-)
>
> diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
> index 81bd0a7b7..372107e0c 100644
> --- a/Documentation/git-branch.txt
> +++ b/Documentation/git-branch.txt
> @@ -195,10 +195,10 @@ start-point is either a local or remote-tracking branch.
> branch.autoSetupMerge configuration variable is true.
>
>  --set-upstream::
> -   If specified branch does not exist yet or if `--force` has been
> -   given, acts exactly like `--track`. Otherwise sets up configuration
> -   like `--track` would when creating the branch, except that where
> -   branch points to is not changed.
> +   This option is no longer supported and will be removed in the future.
> +   Consider using --track or --set-upstream-to instead.
> ++
> +Note: This could possibly become an alias of --set-upstream-to in the future.

Maybe the final note could be removed? Someone who is looking up
--set-upstream because Git just "crashed" on them will only want to know
what they should do instead. Our thoughts about the future are perhaps
not that interesting. (I sort of wonder if this option needs to be
documented at all, especially if this doesn't say anything more than
the die() just did.)

Also, I'm wondering if it should be "has been removed" instead of "will
be removed"? /Implementation-wise/, it has not been removed yet, but to
the user, it has. So maybe just "This option has been removed. Consider
using --track or --set-upstream-to instead." The same below.

I don't know if it's worth trying to use PARSE_OPT_HIDDEN in the
options-struct?

Martin

>  -u ::
>  --set-upstream-to=::
> diff --git a/builtin/branch.c b/builtin/branch.c
> index a3bd2262b..2fcb6f7e5 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -755,8 +755,6 @@ int cmd_branch(int argc, const char **argv, const char 
> *prefix)
> strbuf_release();
> } else if (argc > 0 && argc <= 2) {
> struct branch *branch = branch_get(argv[0]);
> -   int branch_existed = 0, remote_tracking = 0;
> -   struct strbuf buf = STRBUF_INIT;
>
> if (!strcmp(argv[0], "HEAD"))
> die(_("it does not make sense to create 'HEAD' 
> manually"));
> @@ -768,28 +766,11 @@ int cmd_branch(int argc, const char **argv, const char 
> *prefix)
> die(_("-a and -r options to 'git branch' do not make 
> sense with a branch name"));
>
> if (track ==

Re: [PATCH 0/6] clean up parsing of maybe_bool

2017-08-08 Thread Martin Ågren

On 8 August 2017 at 19:04, Junio C Hamano  wrote:
> Martin Ågren  writes:
>
>> Thanks, both of you. I could wait a couple of days to see if there are
>> other things to address, then send a v2 with a more aggressive patch 5?
>
> Sounds like a plan.  If there aren't anything else, I personally do
> not mind using what is already on the list without v2, though.  That
> would be less work/churn for me ;-)

Sure. I'll try to remember to send a patch to kill git_config_maybe_bool
some time from now. (At which point you'll get more work.. ;) )

Re: [RFC] clang-format: outline the git project's coding style

2017-08-08 Thread Stefan Beller

On Tue, Aug 8, 2017 at 5:05 AM, Johannes Schindelin
 wrote:
> Hi Brandon,
>
> On Mon, 7 Aug 2017, Brandon Williams wrote:
>
>> Add a '.clang-format' file which outlines the git project's coding
>> style.  This can be used with clang-format to auto-format .c and .h
>> files to conform with git's style.
>>
>> Signed-off-by: Brandon Williams 
>> ---
>>
>> I'm sure this sort of thing comes up every so often on the list but back at
>> git-merge I mentioned how it would be nice to not have to worry about style
>> when reviewing patches as that is something mechanical and best left to a
>> machine (for the most part).
>
> Amen.
>
> If I never have to see a review mentioning an unwrapped line, I am quite
> certain I will be quite content.
>
> Ciao,
> Dscho

As a thought experiment I'd like to propose to take it one step further:

  If the code was formatted perfectly in one style such that a formatter for
  this style would not produce changes when rerun again on the code, then
  each individual could have a clean/smudge filter to work in their preferred
  style, and only the exchange and storage of code is in a mutual agreed
  style. If the mutually agreed style is close to what I prefer, I don't have to
  use clean/smudge filters.

Additionally to this patch, we'd want to either put a note into
SubmittingPatches or Documentation/gitworkflows.txt to hint at
how to use this formatting to just affect the patch that is currently
worked on or rather a pre-commit hook?

Re: [PATCHv2 3/4] imap_send: setup_curl: retreive credentials if not set in config file

2017-08-08 Thread Martin Ågren

On 8 August 2017 at 09:48, Nicolas Morey-Chaisemartin
 wrote:
> Up to this point, the curl mode only supported getting the username
> and password from the gitconfig file while the legacy mode could also
> fetch them using the credential API.
>
> Signed-off-by: Nicolas Morey-Chaisemartin 
> ---
>  imap-send.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/imap-send.c b/imap-send.c
> index 448a4a0b3..5e80edaff 100644
> --- a/imap-send.c
> +++ b/imap-send.c
> @@ -1398,7 +1398,7 @@ static int append_msgs_to_imap(struct imap_server_conf 
> *server,
>  }
>
>  #ifdef USE_CURL_FOR_IMAP_SEND
> -static CURL *setup_curl(struct imap_server_conf *srvc)
> +static CURL *setup_curl(struct imap_server_conf *srvc, struct credential 
> *cred)

Hmm, yeah, that really did pollute the interface. :)

>  {
> CURL *curl;
> struct strbuf path = STRBUF_INIT;
> @@ -1411,6 +1411,7 @@ static CURL *setup_curl(struct imap_server_conf *srvc)
> if (!curl)
> die("curl_easy_init failed");
>
> +   server_fill_credential(, cred);
> curl_easy_setopt(curl, CURLOPT_USERNAME, server.user);
> curl_easy_setopt(curl, CURLOPT_PASSWORD, server.pass);
>
> @@ -1460,8 +1461,9 @@ static int curl_append_msgs_to_imap(struct 
> imap_server_conf *server,
> struct buffer msgbuf = { STRBUF_INIT, 0 };
> CURL *curl;
> CURLcode res = CURLE_OK;
> +   struct credential cred = CREDENTIAL_INIT;
>
> -   curl = setup_curl(server);
> +   curl = setup_curl(server, );
> curl_easy_setopt(curl, CURLOPT_READDATA, );
>
> fprintf(stderr, "sending %d message%s\n", total, (total != 1) ? "s" : 
> "");
> @@ -1496,6 +1498,10 @@ static int curl_append_msgs_to_imap(struct 
> imap_server_conf *server,
> curl_easy_cleanup(curl);
> curl_global_cleanup();
>
> +   if (res == CURLE_OK && cred.username)
> +   credential_approve();

Maybe a similar call to credential_reject() here? I guess all we'll
know is that some sort of error happened, possibly credentials-related,
possibly not. Just a thought.

> +   credential_clear();
> +
> return res == CURLE_OK;
>  }
>  #endif
> --
> 2.14.0.rc1.16.gcc208c97c
>
>

[PATCH v2] am: fix signoff when other trailers are present

2017-08-08 Thread Phillip Wood

From: Phillip Wood 

If there was no 'Signed-off-by:' trailer but another trailer such as
'Reported-by:' then 'git am --signoff' would add a blank line between
the existing trailers and the added 'Signed-off-by:' line. e.g.

Rebase accepts '--rerere-autoupdate' as an option but only honors
it if '-m' is also given. Fix it for a non-interactive rebase by
passing on the option to 'git am' and 'git cherry-pick'.

Reported-by: Junio C Hamano 

Signed-off-by: Phillip Wood 

Fix by using the code provided for this purpose in sequencer.c.
Change the tests so that they check the formatting of the
'Signed-off-by:' lines rather than just grepping for them.

Signed-off-by: Phillip Wood 
---
I've removed the call to ignore_non_trailer(), otherwise this is
unchanged from v1

 builtin/am.c  | 26 +--
 t/t4150-am.sh | 83 +--
 2 files changed, 64 insertions(+), 45 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 
c973bd96dcb5d630d56e935733bfa4530ccd2872..3aaef59676452fd2e7c6d4a375dc7c95558744c6
 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1181,34 +1181,10 @@ static void NORETURN die_user_resolve(const struct 
am_state *state)
  */
 static void am_append_signoff(struct am_state *state)
 {
-   char *cp;
-   struct strbuf mine = STRBUF_INIT;
struct strbuf sb = STRBUF_INIT;
 
strbuf_attach(, state->msg, state->msg_len, state->msg_len);
-
-   /* our sign-off */
-   strbuf_addf(, "\n%s%s\n",
-   sign_off_header,
-   fmt_name(getenv("GIT_COMMITTER_NAME"),
-getenv("GIT_COMMITTER_EMAIL")));
-
-   /* Does sb end with it already? */
-   if (mine.len < sb.len &&
-   !strcmp(mine.buf, sb.buf + sb.len - mine.len))
-   goto exit; /* no need to duplicate */
-
-   /* Does it have any Signed-off-by: in the text */
-   for (cp = sb.buf;
-cp && *cp && (cp = strstr(cp, sign_off_header)) != NULL;
-cp = strchr(cp, '\n')) {
-   if (sb.buf == cp || cp[-1] == '\n')
-   break;
-   }
-
-   strbuf_addstr(, mine.buf + !!cp);
-exit:
-   strbuf_release();
+   append_signoff(, 0, 0);
state->msg = strbuf_detach(, >msg_len);
 }
 
diff --git a/t/t4150-am.sh b/t/t4150-am.sh
index 
44807e218d7016f58bd41b89af71104a37f31a8b..73b67b4280b99e0328e201e6b69c3d88b766ea84
 100755
--- a/t/t4150-am.sh
+++ b/t/t4150-am.sh
@@ -40,6 +40,8 @@ test_expect_success 'setup: messages' '
dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio
dignissim qui blandit praesent luptatum zzril delenit augue duis dolore 
te
feugait nulla facilisi.
+
+   Reported-by: A N Other 
EOF
 
cat >failmail <<-\EOF &&
@@ -93,7 +95,7 @@ test_expect_success setup '
echo world >>file &&
git add file &&
test_tick &&
-   git commit -s -F msg &&
+   git commit -F msg &&
git tag second &&
 
git format-patch --stdout first >patch1 &&
@@ -124,8 +126,6 @@ test_expect_success setup '
echo "Date: $GIT_AUTHOR_DATE" &&
echo &&
sed -e "1,2d" msg &&
-   echo &&
-   echo "Signed-off-by: $GIT_COMMITTER_NAME 
<$GIT_COMMITTER_EMAIL>" &&
echo "---" &&
git diff-tree --no-commit-id --stat -p second
} >patch1-stgit.eml &&
@@ -144,8 +144,6 @@ test_expect_success setup '
echo "# Parent  $_z40" &&
cat msg &&
echo &&
-   echo "Signed-off-by: $GIT_COMMITTER_NAME 
<$GIT_COMMITTER_EMAIL>" &&
-   echo &&
git diff-tree --no-commit-id -p second
} >patch1-hg.eml &&
 
@@ -470,13 +468,15 @@ test_expect_success 'am --signoff adds Signed-off-by: 
line' '
git reset --hard &&
git checkout -b master2 first &&
git am --signoff expected &&
-   echo "Signed-off-by: $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>" 
>>expected &&
-   git cat-file commit HEAD^ | grep "Signed-off-by:" >actual &&
-   test_cmp expected actual &&
-   echo "Signed-off-by: $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>" 
>expected &&
-   git cat-file commit HEAD | grep "Signed-off-by:" >actual &&
-   test_cmp expected actual
+   {
+   printf "third\n\nSigned-off-by: %s <%s>\n\n" \
+   "$GIT_COMMITTER_NAME" "$GIT_COMMITTER_EMAIL" &&
+   cat msg &&
+   printf "Signed-off-by: %s <%s>\n\n" \
+   "$GIT_COMMITTER_NAME" "$GIT_COMMITTER_EMAIL"
+   } >expected-log &&
+   git log --pretty=%B -2 HEAD >actual &&
+   test_cmp expected-log actual
 '
 
 test_expect_success 'am stays in branch' '
@@ -486,17

Re: [PATCH] Fix delta integer overflows

2017-08-08 Thread Johannes Schindelin

Hi Martin,

On Tue, 8 Aug 2017, Martin Koegler wrote:

> On Mon, Aug 07, 2017 at 09:39:12PM +0200, Johannes Schindelin wrote:
> > If you want to work on data in memory, then size_t is the appropriate data
> > type. We already use it elsewhere. Let's use it here, too, without the
> > intermediate bump from the incorrect `int` to the equally incorrect
> > `long`.
> 
> I disagree with "We already use it elsewhere.".

By "it" I meant "size_t".

> The whole delta code uses "unsigned long" - look at delta.h. Look at
> unpack-objects.c. Or cache.h. Or pack-objects.c. Or index-pack.c.

I know that. It is a major bug in the source code.

> Other possible cases:
> git grep "unsigned long" |grep size

Yes, even more bugs.

> So the codebase still suggests, that "unsigned long" is the data type
> for storing object sizes.

And it would be wrong, and we know it already. Most importantly, Junio
knows it already.

> I would be fine with resubmitting a patch using size_t/off_t for the
> touched parts - changing the whole core code is a too invasive change
> for a bug fix.

Sorry, my mistake: I never meant to burden you with the invasive change.
I only wanted you to change the `int` to `size_t` right away.

Thanks,
Johannes

Re: [PATCHv2 3/4] imap_send: setup_curl: retreive credentials if not set in config file

2017-08-08 Thread Nicolas Morey-Chaisemartin



Le 08/08/2017 à 12:09, Martin Ågren a écrit :
> On 8 August 2017 at 09:48, Nicolas Morey-Chaisemartin
>  wrote:
>> Up to this point, the curl mode only supported getting the username
>> and password from the gitconfig file while the legacy mode could also
>> fetch them using the credential API.
>>
>> Signed-off-by: Nicolas Morey-Chaisemartin 
>> ---
>>  imap-send.c | 10 --
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/imap-send.c b/imap-send.c
>> index 448a4a0b3..5e80edaff 100644
>> --- a/imap-send.c
>> +++ b/imap-send.c
>> @@ -1398,7 +1398,7 @@ static int append_msgs_to_imap(struct imap_server_conf 
>> *server,
>>  }
>>
>>  #ifdef USE_CURL_FOR_IMAP_SEND
>> -static CURL *setup_curl(struct imap_server_conf *srvc)
>> +static CURL *setup_curl(struct imap_server_conf *srvc, struct credential 
>> *cred)
> Hmm, yeah, that really did pollute the interface. :)
>
>>  {
>> CURL *curl;
>> struct strbuf path = STRBUF_INIT;
>> @@ -1411,6 +1411,7 @@ static CURL *setup_curl(struct imap_server_conf *srvc)
>> if (!curl)
>> die("curl_easy_init failed");
>>
>> +   server_fill_credential(, cred);
>> curl_easy_setopt(curl, CURLOPT_USERNAME, server.user);
>> curl_easy_setopt(curl, CURLOPT_PASSWORD, server.pass);
>>
>> @@ -1460,8 +1461,9 @@ static int curl_append_msgs_to_imap(struct 
>> imap_server_conf *server,
>> struct buffer msgbuf = { STRBUF_INIT, 0 };
>> CURL *curl;
>> CURLcode res = CURLE_OK;
>> +   struct credential cred = CREDENTIAL_INIT;
>>
>> -   curl = setup_curl(server);
>> +   curl = setup_curl(server, );
>> curl_easy_setopt(curl, CURLOPT_READDATA, );
>>
>> fprintf(stderr, "sending %d message%s\n", total, (total != 1) ? "s" 
>> : "");
>> @@ -1496,6 +1498,10 @@ static int curl_append_msgs_to_imap(struct 
>> imap_server_conf *server,
>> curl_easy_cleanup(curl);
>> curl_global_cleanup();
>>
>> +   if (res == CURLE_OK && cred.username)
>> +   credential_approve();
> Maybe a similar call to credential_reject() here? I guess all we'll
> know is that some sort of error happened, possibly credentials-related,
> possibly not. Just a thought.

Checking the doc, there is actually a CURLE_LOGIN_DENIED return code which 
means authentication failed.
I'll fix this in v3

Nicolas

Re: [PATCH v1] am: fix signoff when other trailers are present

2017-08-08 Thread Phillip Wood

On 07/08/17 18:49, Junio C Hamano wrote:
> Phillip Wood  writes:
> 
>> From: Phillip Wood 
>>
>> If there was no 'Signed-off-by:' trailer but another trailer such as
>> 'Reported-by:' then 'git am --signoff' would add a blank line between
>> the existing trailers and the added 'Signed-off-by:' line. e.g.
>>
>> Rebase accepts '--rerere-autoupdate' as an option but only honors
>> it if '-m' is also given. Fix it for a non-interactive rebase by
>> passing on the option to 'git am' and 'git cherry-pick'.
>>
>> Reported-by: Junio C Hamano 
>>
>> Signed-off-by: Phillip Wood 
>>
>> Fix by using the code provided for this purpose in sequencer.c.
>> Change the tests so that they check the formatting of the
>> 'Signed-off-by:' lines rather than just grepping for them.
>>
>> Signed-off-by: Phillip Wood 
>> ---
>> I'm not sure if this should be calling ignore_non_trailer() or not -
>> git commit does but git cherry-pick does not. This follows commit and
>> cherry-pick in ignoring the value of trailer.ifExists for the signoff.
>> I'm a bit surprised they do that - is it correct?
> 
> These built-in "sign-off" machinery long predates the "trailer"
> thing, so I am not surprised if they do not behave the same.  I
> vaguely recall having discussions on this earlier this year, but
> details escape me.  

Ah, that explains it. I did a quick search on public-inbox but didn't
turn up any obvious discussion about --signoff and trailer config. If
that's the behavior people are used to then lets leave it as it is
unless there is a call for it to be changed.

> Asking Jonathan, who did a series that ends at 44dc738a ("sequencer:
> add newline before adding footers", 2017-04-26), and Christian, who
> is the original contirbutor to the "trailer" machinery, for input.
>

Re: [PATCH v1] am: fix signoff when other trailers are present

2017-08-08 Thread Phillip Wood

On 07/08/17 19:08, Jonathan Tan wrote:
> On Mon, 07 Aug 2017 10:49:28 -0700
> Junio C Hamano  wrote:
> 
>> Phillip Wood  writes:
>>
>>> From: Phillip Wood 
>>>
>>> If there was no 'Signed-off-by:' trailer but another trailer such as
>>> 'Reported-by:' then 'git am --signoff' would add a blank line between
>>> the existing trailers and the added 'Signed-off-by:' line. e.g.
>>>
>>> Rebase accepts '--rerere-autoupdate' as an option but only honors
>>> it if '-m' is also given. Fix it for a non-interactive rebase by
>>> passing on the option to 'git am' and 'git cherry-pick'.
>>>
>>> Reported-by: Junio C Hamano 
>>>
>>> Signed-off-by: Phillip Wood 
>>>
>>> Fix by using the code provided for this purpose in sequencer.c.
>>> Change the tests so that they check the formatting of the
>>> 'Signed-off-by:' lines rather than just grepping for them.
>>>
>>> Signed-off-by: Phillip Wood 
>>> ---
>>> I'm not sure if this should be calling ignore_non_trailer() or not -
>>> git commit does but git cherry-pick does not. This follows commit and
>>> cherry-pick in ignoring the value of trailer.ifExists for the signoff.
>>> I'm a bit surprised they do that - is it correct?
>>
>> These built-in "sign-off" machinery long predates the "trailer"
>> thing, so I am not surprised if they do not behave the same.  I
>> vaguely recall having discussions on this earlier this year, but
>> details escape me.  
>>
>> Asking Jonathan, who did a series that ends at 44dc738a ("sequencer:
>> add newline before adding footers", 2017-04-26), and Christian, who
>> is the original contirbutor to the "trailer" machinery, for input.
> 
> Regarding ignore_non_trailer(), I believe that's because "git commit"
> wants to tolerate blank lines and comments after the "real" commit
> message, whereas "git cherry-pick" doesn't need to. As far as I can
> tell, this "git am" case is similar to "git cherry-pick".
> 
> Regarding trailer.ifExists, the then existing behavior was to refrain
> from writing a new sign-off line only if it would be a duplicate of the
> last one, regardless of trailer.ifExists (as Junio says, back then, the
> sign-off mechanism and the trailer mechanism were independent). I
> preserved that behavior.
> 
Hi Jonathan

Thanks for the background. I'll remove the call to ignore_non_trailer()

Re: Partial clone design (with connectivity check for locally-created objects)

2017-08-08 Thread Ben Peart




On 8/7/2017 3:21 PM, Jonathan Nieder wrote:

Hi,

Ben Peart wrote:

On Fri, 04 Aug 2017 15:51:08 -0700
Junio C Hamano  wrote:

Jonathan Tan  writes:



"Imported" objects must be in a packfile that has a ".remote"
file with arbitrary text (similar to the ".keep" file). They come from
clones, fetches, and the object loader (see below).
...

A "homegrown" object is valid if each object it references:
  1. is a "homegrown" object,
  2. is an "imported" object, or
  3. is referenced by an "imported" object.


Overall it captures what was discussed, and I think it is a good
start.


I missed the offline discussion and so am trying to piece together
what this latest design is trying to do.  Please let me know if I'm
not understanding something correctly.


I believe
https://public-inbox.org/git/cover.1501532294.git.jonathanta...@google.com/
and the surrounding thread (especially
https://public-inbox.org/git/xmqqefsudjqk@gitster.mtv.corp.google.com/)
is the discussion Junio is referring to.

[...]

This segmentation is what is driving the need for the object loader
to build a new local pack file for every command that has to fetch a
missing object.  For example, we can't just write a tree object from
a "partial" clone into the loose object store as we have no way for
fsck to treat them differently and ignore any missing objects
referenced by that tree object.


That's related and how it got lumped into this proposal, but it's not
the only motivation.

Other aspects:

  1. using pack files instead of loose objects means we can use deltas.
 This is the primary motivation.

  2. pack files can use reachability bitmaps (I realize there are
 obstacles to getting benefit out of this because git's bitmap
 format currently requires a pack to be self-contained, but I
 thought it was worth mentioning for completeness).

  3. existing git servers are oriented around pack files; they can
 more cheaply serve objects from pack files in pack format,
 including reusing deltas from them.

  4. file systems cope better with a few large files than many small
 files

[...]

We all know that git doesn't scale well with a lot of pack files as
it has to do a linear search through all the pack files when
attempting to find an object.  I can see that very quickly, there
would be a lot of pack files generated and with gc ignoring
"partial" pack files, this would never get corrected.


Yes, that's an important point.  Regardless of this proposal, we need
to get more aggressive about concatenating pack files (e.g. by
implementing exponential rollup in "git gc --auto").


In our usage scenarios, _all_ of the objects come from "partial"
clones so all of our objects would end up in a series of "partial"
pack files and would have pretty poor performance as a result.


Can you say more about this?  Why would the pack files (or loose
objects, for that matter) never end up being consolidated into few
pack files?



Our initial clone is very sparse - we only pull down the commit we are 
about to checkout and none of the blobs. All missing objects are then 
downloaded on demand (and in this proposal, would end up in a "partial" 
pack file).  For performance reasons, we also (by default) download a 
server computed pack file of commits and trees to pre-populate the local 
cache.


Without modification, fsck, repack, prune, gc will trigger every object 
in the repo to be downloaded.  We punted for now and just block those 
commands but eventually they need to be aware of missing objects so that 
they do not cause them to be downloaded.  Jonathan is already working on 
this for fsck in another patch series.



[...]

That thinking did lead me back to wondering again if we could live
with a repo specific flag.  If any clone/fetch was "partial" the
flag is set and fsck ignore missing objects whether they came from a
"partial" remote or not.

I'll admit it isn't as robust if someone is mixing and matching
remotes from different servers some of which are partial and some of
which are not.  I'm not sure how often that would actually happen
but I _am_ certain a single repo specific flag is a _much_ simpler
model than anything else we've come up with so far.


The primary motivation in this thread is locally-created objects, not
objects obtained from other remotes.  Objects obtained from other
remotes are more of an edge case.



Thank you - that helps me to better understand the requirements of the 
problem we're trying to solve.  In short, that means what we really need 
is a way to identify locally created objects so that fsck can do a 
complete connectivity check on them.  I'll have to think about a good 
way to do that - we've talked about a few but each has a different set 
of trade-offs and none of them are great (yet :)).



Thanks for your thoughtful comments.

Jonathan

Re: What's cooking in git.git (Jul 2017, #09; Mon, 31)

2017-08-08 Thread Ævar Arnfjörð Bjarmason

On Mon, Aug 07 2017, Igor Djordjevic jotted:

> On 07/08/2017 23:25, Igor Djordjevic wrote:
>> On 06/08/2017 22:26, Ævar Arnfjörð Bjarmason wrote:
>>> On Sat, Aug 05 2017, Junio C. Hamano jotted:
 I actually consider "branch" to *never* invoking a checkout.  Even
 when "git branch -m A B" happens to be done when your checked out
 branch is A and you end up being on B.  That is not a "checkout".
>>>
>>> I think we just have a different mental model of what "checkout"
>>> means. In my mind any operation that updates the HEAD to point to a new
>>> branch is a checkout of that branch.
>>
>> If I may, from a side-viewer`s point of view, it seems you`re
>> thinking in low-level implementation details, where what Junio
>> describes seems more as a high-level, conceptual/end-user`s point of
>> view.

Yeah, I think that's a fair summary. Also I didn't mean to de-rail this
whole thread on what "checkout" really means, just explain what I meant
with previous comments, since there seemed to be confusion about that.

>> Needing to update HEAD reference once we "rename" a branch, too, what
>> you consider a "checkout", seems to be required only because branch
>> name _is_ the branch reference in Git, so we need to update HEAD to
>> point to a new/renamed branch reference -- but it`s still the same
>> branch, conceptually.

It's not *required* we could do one of three things:

 1) Do what we do now, i.e. rename the branch/reflog & check out the new
name.

 2) Rename the branch/reflog and checkout HEAD^0, i.e. say "the branch
is now elsewhere, but we haven't moved your commit".

 3) Just not run replace_each_worktree_head_symref() which would end up
 on a branch with no commits, i.e. an orphan branch.

Now, I think 2 & 3 are pretty nonsensical and wouldn't ever propose we
should do that, but it's illustrative that #1 is not some required
inevitability in terms of explaining what's happening with the new name
being checked out (i.e. HEAD being updated).

>> Documentation for "git-checkout" states that it is used to "*Switch
>> branches*...[snip]", and that is not what happens here.

That's just the summary at the top but not the full story of what
git-checkout does. E.g. you can checkout a bare SHA1 which is not
switching branches, or a tag or whatever.

>> Implementation-wise it does because we can`t do it differently at the
>> moment, but in user`s eyes it`s still the same branch, so no switch
>> is made as far as the user is concerned.

Kind of, it's also worthwhile to think about that in some sense no
switch would be performed as far as the user is concerned by taking
option #2, i.e. we'd be in the same working tree / you could still make
commits.

You just couldn't make new commits on your "master" which is now called
"topic" and get new commits on "topic". I think it makes sense to do
that, but again, it's illustrative that it's not inevitable for
discussing the implementation.

>> In a different implementation, where branches would have permanent
>> references other than their names, no HEAD update would be needed as
>> the reference would still be the same, no matter the name change,
>> making the `git branch -m` situation clear even from your standpoint,
>> I`d say.
>>
 Really from the end-user's point of view that is not a checkout.
 The user renamed the branch A and the same conceptual entity, which
 is a branch, is now called B.  If that branch was what was checked
 out (IOW, if that branch was what would be grown by one commit if
 the user did "git commit"), then now that branch's name is B.  It is
 natural if you ask "symbolic-ref HEAD" what branch is checked out
 after renaming A to B (and A happened to be what was checked out),
 the answer chould be B.

 It's like the city you live in changed the name of the street your
 house is on.  You do not call movers, you do not do anything, but
 your address changes.
>>>
>>> Yeah I see what you mean, although this analogy rapidly breaks down when
>>> you poke at it as shown above. My house (a sha1) can be on any number of
>>> streets and new ones can be added/removed all the time without changing
>>> where my house is at.
>>
>> I may be missing something, but I find the house/address analogy a
>> good one, actually, as I understood that "house" resembles a branch
>> reference HEAD is pointing to, not a sha1.
>>
>> Even further, and that might be the point of confusion, "house" seems
>> to be more like a "permanent branch reference" I mentioned above,
>> where your address can change (branch being renamed), but you would
>> still be in the same house (HEAD would still point to the same
>> permanent branch reference).
>>
>> If you move to another house, only then would HEAD change to point to
>> another (permanent) branch reference (a different house), and that
>> would be a checkout.

I've yet to see a good real-world analogy of "DAG with labels", which is
all git really is, that doesn't break down

Re: Partial clone design (with connectivity check for locally-created objects)

2017-08-08 Thread Ben Peart




On 8/7/2017 3:41 PM, Junio C Hamano wrote:

Ben Peart  writes:


My concern with this proposal is the combination of 1) writing a new
pack file for every git command that ends up bringing down a missing
object and 2) gc not compressing those pack files into a single pack
file.


Your noticing these is a sign that you read the outline of the
design correctly, I think.

The basic idea is that the local fsck should tolerate missing
objects when they are known to be obtainable from that external
service, but should still be able to diagnose missing objects that
we do not know if the external service has, especially the ones that
have been newly created locally and not yet made available to them
by pushing them back.



This helps me a lot as now I think I understand the primary requirement 
we're trying to solve for.  Let me rephrase it and see if this makes sense:


We need to be able to identify whether an object was created locally 
(and should pass more strict fsck/connectivity tests) or whether it came 
from a remote (and so any missing objects could presumably be fetched 
from the server).


I agree it would be nice to solve this (and not just punt fsck - even if 
it is an opt-in behavior).


We've discussed a couple of different possible solutions, each of which 
have different tradeoffs.  Let me try to summarize here and perhaps 
suggest some other possibilities:



Promised list
-
This provides an external data structure that allowed us to flag objects 
that came from a remote server (vs created locally).


The biggest drawback is that this data structure can get very large and 
become difficult/expensive to generate/transfer/maintain.


It also (at least in one proposal) required protocol and server side 
changes to support it.



Annotated via filename
--
This idea is to annotate the file names of objects that came from a 
remote server (pack files and loose objects) with a unique file 
extension (.remote) that indicates whether they are locally created or not.


To make this work, git must understand about both types of loose objects 
and pack files and search in both locations when looking for objects.


Another drawback of this is that commands (repack, gc) that optimize 
loose objects and pack files must now be aware of the different 
extensions and handle both while not merging remote and non-remote objects.


In short, we're creating separate object stores - one for locally 
created objects and one for everything else.



Now a couple of different ideas:

Annotated via flags
===
The fundamental idea here is that we add the ability to flag locally 
created objects on the object itself.


Given that at the core, "Git is a simple key-value data store" can we 
take advantage of that fact and include a "locally created" bit as a 
property on every object?


I could not think of a good way to accomplish this as it is ultimately 
changing the object format which creates rapidly expanding ripples of 
change.


For example, The object header currently includes the type a space, the 
length and a null. Even if we could add a "local" property (either by 
adding a 5th item, taking over the space, creating new object types, 
etc), the fact that the header is included in the sha1 means that push 
would become problematic as flipping the bit would change the sha and 
the trees and commits that reference it.



Local list
--
Given the number of locally created objects is usually very small in 
comparison to the total number of objects (even just due to history), it 
makes more sense to track locally created objects instead of 
promised/remote objects.


The biggest advantage of this over the "promised list" is that the 
"local list" being maintained is _significantly_ smaller (often orders 
of magnitude smaller).


Another advantage over the "promised list" solution is that it doesn't 
require any server side or protocol changes.


On the client when objects are created (write_loose_object?) the new 
objects are added to the "local list" and in the connectivity check 
(fsck) if the object is not in the "local list," the connectivity check 
can be skipped as any missing object can presumably be retrieved from 
the server.


A simple file format could be used (header + list of SHA1 values) and 
write_loose_object could do a trivial append. In fsck, the file could be 
loaded into a hashmap to make for fast existence checks.


Entries could be removed from the "local list" for objects later fetched 
from a server (though I had a hard time contriving a scenario where this 
would happen so I consider this optional).


On the surface, this seems like the simplest solution that meets the 
stated requirements.



Object DB
-
This is a different way of providing separate object stores than the 
"Annotated via filename" proposal. It should be a cleaner/more elegant 
solution that enables several other capabilities but it is also more

Re: [RFC] imap-send: escape backslash in password

2017-08-08 Thread Junio C Hamano

Jeff King  writes:

> I think we're not quite ready to switch to curl based on comments in the
> nearby thread. But just for reference, since I started looking into
> this...
>
> The defines in the Makefile turn on USE_CURL_FOR_IMAP_SEND want curl
> 7.34.0. That's only from 2013, which is probably recent enough that it
> may cause a problem (I had originally thought it was a few years older,
> but I forgot the curl version hex encoding; 072200 is 7.34.0).
>
> For comparison, nothing older than curl 7.19.4 will work for building
> Git since v2.12.0, as we added some unconditional uses of CURLPROTO_*
> there. Nobody seems to have noticed or complained. I pointed this out a
> few months ago[1] and suggested we clean up some of the more antiquated
> #if blocks in http.c that don't even build. There was some complaint
> that we should keep even these ancient versions working, but the
> compile error is still in "master".
>
> So it's not clear to me that anybody cares about going that far back
> (which is mid-2009), but I'd guess that 2013 might cause some problems.
>
> [1] 
> https://public-inbox.org/git/20170404025438.bgxz5sfmrawqs...@sigill.intra.peff.net/
> if you're curious (you were offline for a while at that time, I
> think).

Thanks for digging.  It would not help the issue on this thread at
all.  While I agree with your conclusion in the quoted thread:

I think it might be nice to declare a "too old" version, though,
just so we can stop adding _new_ ifdefs. Maybe 7.11.1 is that
version now, and in another few years we can bump to 7.16.0. :)

it appears that we silently declared it to 7.19.4 and found out that
nobody complained, without us having to wait for a few years?

Re: [PATCH v2 0/5] Fsck for lazy objects, and (now) actual invocation of loader

2017-08-08 Thread Ben Peart




On 7/31/2017 5:02 PM, Jonathan Tan wrote:

Besides review changes, this patch set now includes my rewritten
lazy-loading sha1_file patch, so you can now do this (excerpted from one
of the tests):

 test_create_repo server
 test_commit -C server 1 1.t abcdefgh
 HASH=$(git hash-object server/1.t)
 
 test_create_repo client

 test_must_fail git -C client cat-file -p "$HASH"
 git -C client config core.repositoryformatversion 1
 git -C client config extensions.lazyobject \
 "\"$TEST_DIRECTORY/t0410/lazy-object\" \"$(pwd)/server/.git\""
 git -C client cat-file -p "$HASH"

with fsck still working. Also, there is no need for a list of promised
blobs, and the long-running process protocol is being used.

Changes from v1:
  - added last patch that supports lazy loading
  - clarified documentation in "introduce lazyobject extension" patch
(following Junio's comments [1])

As listed in the changes above, I have rewritten my lazy-loading
sha1_file patch to no longer use the list of promises. Also, I have
added documentation about the protocol used to (hopefully) the
appropriate places.


Glad to see the removal of the promises.  Given the ongoing 
conversation, I'm interested to see how you are detecting locally create 
objects vs those downloaded from a server.




This is a minimal implementation, hopefully enough of a foundation to be
built upon. In particular, I haven't added the environment variable to
suppress lazy loading, and the lazy loading protocol only supports one
object at a time.


We can add multiple object support to the protocol when we get to the 
point that we have code that will utilize it.




Other work
--

This differs slightly from Ben Peart's patch [2] in that the
lazy-loading functionality is provided through a configured shell
command instead of a hook shell script. I envision commands like "git
clone", in the future, needing to pre-configure lazy loading, and I
think that it will be less surprising to the user if "git clone" wrote a
default configuration instead of a default hook.


This was on my "todo" list to investigate as I've been told it can 
enable people to use taskset to set CPU affinity and get some 
significant performance wins. I'd be interested to see if it actually 
helps here at all.




This also differs from Christian Couder's patch set [3] that implement a
larger-scale object database, in that (i) my patch set does not support
putting objects into external databases, and (ii) my patch set requires
the lazy loader to make the objects available in the local repo, instead
of allowing the objects to only be stored in the external database.


This is the model we're using today so I'm confident it will meet our 
requirements.




[1] https://public-inbox.org/git/xmqqzibpn1zh@gitster.mtv.corp.google.com/
[2] https://public-inbox.org/git/20170714132651.170708-2-benpe...@microsoft.com/
[3] https://public-inbox.org/git/20170620075523.26961-1-chrisc...@tuxfamily.org/

Jonathan Tan (5):
   environment, fsck: introduce lazyobject extension
   fsck: support refs pointing to lazy objects
   fsck: support referenced lazy objects
   fsck: support lazy objects as CLI argument
   sha1_file: support loading lazy objects

  Documentation/Makefile |   1 +
  Documentation/gitattributes.txt|  54 ++
  Documentation/gitrepository-layout.txt |   3 +
  .../technical/long-running-process-protocol.txt|  50 +
  Documentation/technical/repository-version.txt |  23 +
  Makefile   |   1 +
  builtin/cat-file.c |   2 +
  builtin/fsck.c |  25 -
  cache.h|   4 +
  environment.c  |   1 +
  lazy-object.c  |  80 +++
  lazy-object.h  |  12 +++
  object.c   |   7 ++
  object.h   |  13 +++
  setup.c|   7 +-
  sha1_file.c|  44 +---
  t/t0410-lazy-object.sh | 113 +
  t/t0410/lazy-object| 102 +++
  18 files changed, 478 insertions(+), 64 deletions(-)
  create mode 100644 Documentation/technical/long-running-process-protocol.txt
  create mode 100644 lazy-object.c
  create mode 100644 lazy-object.h
  create mode 100755 t/t0410-lazy-object.sh
  create mode 100755 t/t0410/lazy-object

Re: [PATCH] git svn fetch: Create correct commit timestamp when using --localtime

2017-08-08 Thread Junio C Hamano

Eric Wong  writes:

> Junio C Hamano  wrote:
>> Urs Thuermann  writes:
>> 
>> > In parse_svn_date() prepend the correct UTC offset to the timestamp
>> > returned.  This is the offset in effect at the commit time instead of
>> > the offset in effect at calling time.
>> >
>> > Signed-off-by: Urs Thuermann 
>> > ---
>> >  perl/Git/SVN.pm | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> Thanks; sounds sensible.  
>> 
>> Eric?
>
> Yep, seems alright.  Can you apply directly?
> Been a bit preoccupied as of late.  Thanks.

Surely, I'll just add your Reviewed-by: myself ;-)

Thanks.

Dear Beloved Friend

2017-08-08 Thread Mrs Marois

Dear Beloved Friend,

I am Mrs Nicole Benoite Marois and I have been suffering from ovarian cancer 
disease and the doctor says that i have just few weeks to leave.  I am from 
(Paris) France but based in Benin republic since eleven years ago as a business 
woman dealing with gold exportation before the death of my husband many years 
ago.

I have $4.5 Million US Dollars at Eco-Bank here in Benin republic and I 
instructed the bank to transfer the fund to you as foreigner that will apply to 
the bank after I have gone, that they should release the fund to him/her, but 
you will assure me that you will take 50 % of the fund and give 50% to the 
orphanages home in your country for my heart to rest.

Yours fairly friend,
Mrs. Nicole Benoite Marois

Re: Can the '--set-upstream' option of branch be removed ?

2017-08-08 Thread Junio C Hamano

Kaartic Sivaraam  writes:

> Just to be sure, you mean "die() with a good message" when you say
> "fail these requests, telling them that the former option no longer is
> supported."

Yes.

> It's pretty surprising it takes almost a decade to *stop accepting* a
> bad option though many users are confused by it.
>
> "It's easier to do things than to undo them!"

Yes, that is why we try to be extra cautious when adding new things.

1 2 >

1 - 100 of 140 matches

Mail list logo