date:20181113

Re: [RFC PATCH 0/2] Fix scissors bug during merge conflict

2018-11-13 Thread Junio C Hamano

Denton Liu  writes:

> With this fix, the message becomes the following:
>
>   Merge branch 'master' into new
>
>   #  >8 
>   # Do not modify or remove the line above.
>   # Everything below it will be ignored.
>   #
>   # Conflicts:
>   #   a

I have a mixed feeling about this change and I certainly would not
call it a "fix".

The reason why we give the list of conflicted paths that is
commented out is to remind the user of them in order to help them
describe what area of the codebase had overlapping changes, why, and
how the overlap was resolved, which is relevant information when
somebody else later needs to dig into the history to understand how
the code came into today's shape and why.  For that reason, if a
careless user left conflicts list behind without describing these
details about the merge, it might be better to have the unexplained
list in the merge than nothing.

In theory, the above argument applies the same way for the paths to
be committed, but the list is fairly trivial to recreate with "git
diff $it^!", unlike "which paths had conflict", which can only be
found out by recreating the auto-merge.  So in practice, the paths
that had conflicts is more worth showing than the paths that were
modified.

So, I dunno.  If we value the "more expensive list to reproduce",
the fix might be not to place it (and possibly the comments and
everything under the scissors line) behind a "# " comment char on
the line, without moving its position.

.

Re: [PATCH] git-compat-util: prefer poll.h to sys/poll.h

2018-11-13 Thread Junio C Hamano

Đoàn Trần Công Danh   writes:

> -#ifndef NO_SYS_POLL_H
> +#if !defined(NO_POLL_H)
> +#include 
> +#elif !defined(NO_SYS_POLL_H)
>  #include 
>  #else
> +/* Pull the compat stuff */
>  #include 
>  #endif

The last comment would help readers who got "Huh?  When NO_POLL_H
and NO_SYS_POLL_H is given, we still include " without it.

Looks good.  Thanks.

Re: [PATCH 0/1] Some left-over add-on for bw/config-h

2018-11-13 Thread Jeff King

On Wed, Nov 14, 2018 at 02:52:24PM +0900, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" 
> writes:
> 
> > Back when bw/config-h was developed (and backported to Git for Windows), I
> > came up with this patch. It seems to not be strictly necessary, but I like
> > the safety of falling back to the Git directory when no common directory is
> > configured (for whatever reason).
> 
> Shouldn't that be diagnosed as an error with BUG()?  That is, if we
> know there is the current repository, and if commondir is not set,
> isn't it a bug that we want to look into in the start-up sequence?
> 
> The phrase "for whatever reason" makes me wonder if this is truly
> "defensive programming", rather than just sweeping the bug under the
> rug.
> 
> FWIW, with this toy patch applied on top of this patch, all tests
> that I usually run seem to pass.

Heh, I independently made the same change after reading the patch and
came here to report similar results.

I think the key thing here is that git_dir is never meant to be used as
a source for config. It is there to let the "includeIf gitdir" directive
work. So it would indeed be surprising and a bug if we had one but not
the other.

-Peff

Re: [PATCH 0/1] rebase: understand -C again, refactor

2018-11-13 Thread Jeff King

On Tue, Nov 13, 2018 at 04:38:24AM -0800, Johannes Schindelin via GitGitGadget 
wrote:

> Phillip Wood reported a problem where the built-in rebase did not understand
> options like -C1, i.e. it did not expect the option argument.
> 
> While investigating how to address this best, I stumbled upon 
> OPT_PASSTHRU_ARGV (which I was so far happily unaware of).

I was unaware of it, too. Looking at the OPT_PASSTHRU and its ARGV
counterpart, I think the original intent was that you'd pass through
normal last-one-wins individual options with OPT_PASSTHRU, and then
list-like options with OPT_PASSTHRU_ARGV. But here you've used the
latter to pass sets of individual last-one-wins options.

That said, I think what you've done here is way simpler and more
readable than using a bunch of OPT_PASSTHRUs would have been. And even
if it was not the original intent of the ARGV variant, I can't see any
reason to avoid doing it this way.

-Peff

Re: [PATCH] diff: align move detection error handling with other options

2018-11-13 Thread Junio C Hamano

Stefan Beller  writes:

>Subject: Re: [PATCH] diff: align move detection error handling with other 
>options

When sending an updated version of existing topic, please make sure
you indicate as such with v$n etc.  I will assume that this is to
replace the patch queued on sb/diff-color-moved-config-option-fixup
topic.  Please do not assume that all messages on the References:
header are visible in the readers' MUA to show which thread it is a
response to.  At least a hint like v$n (or mentioning the name of
the topic branch if the previous round is already in 'pu') would
make the reader realize that the References: header can be used if
the reader wants to find in what context the patch is relevant,
especially when redoing a change whose previous round is more than a
week old, as we see too many changes in a day already.

> This changes the error handling for the options --color-moved-ws
> and --color-moved-ws to be like the rest of the options.
>
> Move the die() call out of parse_color_moved_ws into the parsing
> of command line options. As the function returns a bit field, change
> its signature to return an unsigned instead of an int; add a new bit
> to signal errors. Once the error is signaled, we discard the other
> bits, such that it doesn't matter if the error bit overlaps with any
> other bit.

OK.  That sound better than the original.

>
> Signed-off-by: Stefan Beller 
> ---
>
> c.f.
> ./git -c diff.colormovedws=bogus diff HEAD
> error: unknown color-moved-ws mode 'bogus'
> fatal: unable to parse 'diff.colormovedws' from command-line config

These double messages may be something we want to fix eventually,
but I think that the issue is not specific to the config callback of
diff API, but something the config API needs to support to allow its
users produce a better single message (the first part is merely
giving a more detailed explanation why Git was unable to parse, and
should ideally be folded into the second part).

More generally, even if git_diff_ui_config() was called, as long as
we do not do the "--color-moved" and "--color-moved-ws" operations,
the user shouldn't even get an "unknown mode" message, let alone
"fatal".  The above (and other existing uses of "return -1"s in the
same config callback) should actually become an example of what not
to do, as "diff HEAD" does not *care* what value that variable has.

But again, fixing that anti-pattern is a much larger change.  Once
we move diff.c to the more modern git_config_get*() based interface,
instead of the old style git_config() callback based interface, it
would become easier to update it so that the complaint will be given
(and kill the command) only when a variable whose value actually
_matters_ is unparsable.

Thanks.

Re: [PATCH v2 00/11] fast export and import fixes and features

2018-11-13 Thread Jeff King

On Tue, Nov 13, 2018 at 04:25:49PM -0800, Elijah Newren wrote:

> This is a series of small fixes and features for fast-export and
> fast-import, mostly on the fast-export side.

I looked over this, and I think you've addressed all of my questions.

A few quick comments:

> Changes since v1 (full range-diff below):
>   - used {tilde} in asciidoc documentation to avoid subscripting and
> escaping problems

I think just using backticks would make the source more readable, as
well as make the output prettier. But that's pretty minor.

>   - renamed ABORT/ERROR enum values to help avoid further misusage

This is an improvement, I think. It's a little funny that we still have
bare names for the non-ABORT bits, though (there's less semantic
overlap, but if it's a good practice to use qualified enum names, we
should probably just do so consistently).

>   - multiple small testcase cleanups (use $ZERO_OID, remove grep -A, etc.)

Looks good.

>   - add FIXME comment to code about string_list usage

Makes sense.

>   - record Peff's idea for a future optimization in patch 8 commit message
> (is there a better place to put that??)

Seems like a reasonable place (though you are welcome to restate it if
you like).

>   - New patch (9/11): remove the unmaintained copy of fast-import stream
> format documentation at the beginning of fast-import.c

Looks good. I wondered if there might be bits that need migrated, but
given the length of time that comment has been there, it's unlikely. And
in the worst case, if somebody finds some information missing from
git-fast-import.txt, they can still consult the history.

>   - Rewrite commit message for 10/11 to match the wording Peff liked
> better, s/originally/original-oid/, and add documentation to
> git-fast-import.txt

Looks good.

>   - Rewrite commit message for 11/11; the last one didn't make sense to
> Peff.  I hope this one does.

Thanks for your patience in getting me to understand what you're trying
to do. At this point I still think that using rev-list and diff-tree is
probably the right solution for your use case.

-Peff

Re: [PATCH 10/10] fast-export: add --always-show-modify-after-rename

2018-11-13 Thread Jeff King

On Tue, Nov 13, 2018 at 09:10:36AM -0800, Elijah Newren wrote:

> > I am looking at this problem as "how do you answer question X in a
> > repository". And I think you are looking at as "I am receiving a
> > fast-export stream, and I need to answer question X on the fly".
> >
> > And that would explain why you want to get extra annotations into the
> > fast-export stream. Is that right?
> 
> I'm not trying to get information on the fly during a rewrite or
> anything like that.  This is an optional pre-rewrite step (from a
> separate invocation of the tool) where I have multiple questions I
> want to answer.  I'd like to answer them all relatively quickly, if
> possible, and I think all of them should be answerable with a single
> history traversal (plus a cat-file --batch-all-objects call to get
> object sizes, since I don't know of another way to get those).  I'd be
> fine with switching from fast-export to log or something else if it
> met the needs better.

Ah, OK. Yes, if we're just trying to query, then I think you should be
able to do what you want with the existing traversal and diff tools. And
if not, we should think about a new feature there, and not try to
shoe-horn it into fast-export.

> As far as I can tell, you're trying to split each question apart and
> do a history traversal for each, and I don't see why that's better.
> Simpler, perhaps, but it seems worse for performance.  Am I missing
> something?

I was only trying to address each possible query individually. I agree
that if you are querying both things, you should be able to do it in a
single traversal (and that is strictly better). It may require a little
more parsing of the output (e.g., `--find-object` is easy to implement
yourself looking at --raw output).

> Ah, I didn't know renames were on by default; I somehow missed that.
> Also, the rev-list to diff-tree pipe is nice, but I also need parent
> and commit timestamp information.

diff-tree will format the commit info as well (before git-log was a C
builtin, it was just a rev-list/diff-tree pipeline in a shell script).
So you can do:

  git rev-list ... |
  git diff-tree --stdin --format='%h %ct %p' --raw -r -M

and get dump very similar to what fast-export would give you.

> > >   git log -M --diff-filter=RAMD --no-abbrev --raw
> >
> > What is there besides RAMD? :)
> 
> Well, as you pointed out above, log detects renames by default,
> whereas it didn't used to.
> So, if someone had written some similar-ish history walking/parsing
> tool years ago that didn't depend need renames and was based on log
> output, there's a good chance their tool might start failing when
> rename detection was turned on by default, because instead of getting
> both a 'D' and an 'M' change, they'd get an unexpected 'R'.

Mostly I just meant: your diff-filter includes basically everything, so
why bother filtering? You're going to have to parse the result anyway,
and you can throw away uninteresting bits there.

> For my case, do I have to worry about similar future changes?  Will
> copy detection ('C') or break detection ('B') become the default in
> the future?  Do I have to worry about typechanges ('T")?  Will new
> change types be added?  I mean, the fast-export output could maybe
> change too, but it seems much less likely than with log.

If you use diff-tree, then it won't ever enable copy or break detection
without you explicitly asking for it.

> Let me try to put it as briefly as I can.  With as few traversals as
> possible, I want to:
>   * Get all blob sizes
>   * Map blob shas to filename(s) they appeared under in the history
>   * Find when files and directories were deleted (and whether they
> were later reinstated, since that means they aren't actually gone)
>   * Find sets of filenames referring to the same logical 'file'. (e.g.
> foo->bar in commit A and bar->baz in commit B mean that {foo,bar,baz}
> refer to the same 'file' so that a user has an easy report to look at
> to find out that if they just want to "keep baz and its history" then
> they need foo & bar & baz.  I need to know about things like another
> foo or bar being introduced after the rename though, since that breaks
> the connection between filenames)
>   * Do a few aggregations on the above data as well (e.g. all copies
> of postgres.exe add up to 20M -- why were those checked in anyway?,
> *.webm files in aggregate are .5G, your long-deleted src/video-server/
> directory from that aborted experimental project years ago takes up 2G
> of your history, etc.)
> 
> Right now, my best solution for this combination of questions is
> 'cat-file --batch-all-objects' plus fast-export, if I get patch 10/10
> in place.  I'm totally open to better solutions, including ones that
> don't use fast-export.

OK, I think I understand your problem better now. I don't think there's
anything fast-export can show that log/diff-tree could not, aside from
actual blob contents. But I don't think you want them (and if you did,
you can use

Re: [PATCH v3] index-pack: add ability to disable SHA-1 collision check

2018-11-13 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason   writes:

> Add a new core.checkCollisions setting. On by default, it can be set
> to 'false' to disable the check for existing objects in sha1_object().
> ...
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 2004e25da2..4a3508aa9f 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -791,23 +791,24 @@ static void sha1_object(const void *data, struct 
> object_entry *obj_entry,
>  {
>   void *new_data = NULL;
>   int collision_test_needed = 0;
> + int do_coll_check = git_config_get_collision_check();
>  
>   assert(data || obj_entry);
>  
> - if (startup_info->have_repository) {
> + if (do_coll_check && startup_info->have_repository) {
>   read_lock();
>   collision_test_needed =
>   has_sha1_file_with_flags(oid->hash, OBJECT_INFO_QUICK);
>   read_unlock();
>   }
>  
> - if (collision_test_needed && !data) {
> + if (do_coll_check && collision_test_needed && !data) {
>   read_lock();
>   if (!check_collison(obj_entry))
>   collision_test_needed = 0;
>   read_unlock();
>   }
> - if (collision_test_needed) {
> + if (do_coll_check && collision_test_needed) {

If I am reading the patch correctly, The latter two changes are
totally unnecessary.  c-t-needed is true only when dO-coll_check
allowed the initial "do we even have that object?" check to kick in
and never set otherwise.

I am not all that enthused to the idea of sending a wrong message to
our users, i.e. it is sometimes OK to sacrifice the security of
collision detection.

A small change like this is easy to adjust to apply to the codebase,
even after today's codebase undergoes extensive modifications; quite
honestly, I'd prefer not having to worry about it so close to the
pre-release feature freeze.

Re: [PATCH] remote-curl: die on server-side errors

2018-11-13 Thread Jeff King

On Tue, Nov 13, 2018 at 07:49:15PM -0500, Jeff King wrote:

> Yes, the packet_read_line_buf() interface will both advance the buf
> pointer and decrement the length.  So if we want to "peek", we have to
> do so with a copy (there's a peek function if you use the packet_reader
> interface, but that might be overkill here).
> 
> You can rewrite it like this, which is a pretty faithful conversion and
> passes the tests (but see below).
> [...]

Here's a version which is less faithful, but I think does sensible
things in all cases, and is much easier to follow. I get a little
nervous just because it tightens some cases, and one never knows if
other implementations might be relying on the looseness. E.g.:

  - in the current code we'd still drop back to dumb http if the server
tells us "application/x-git-upload-pack" but the initial pktline
doesn't start with "#" (even though if it _does_ have "#" at
position 5 but isn't a valid pktline, we'd complain!)

  - right now the v2 protocol does not require the server to say
"application/x-git-upload-pack" for the content-type

This patch tightens both of those (I also made a few stylistic tweaks,
and added the ERR condition to show where it would go). I dunno. Part of
me sees this as a nice cleanup, but maybe it is better to just leave it
alone. A lot of these behaviors are just how it happens to work now, and
not part of the spec, but we don't know what might be relying on them.

diff --git a/remote-curl.c b/remote-curl.c
index 762a55a75f..1adb96311b 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -330,9 +330,61 @@ static int get_protocol_http_header(enum protocol_version 
version,
return 0;
 }
 
+static void check_smart_http(struct discovery *d, const char *service,
+struct strbuf *type)
+{
+   char *src_buf;
+   size_t src_len;
+   char *line;
+   const char *p;
+
+   if (!skip_prefix(type->buf, "application/x-", ) ||
+   !skip_prefix(p, service, ) ||
+   strcmp(p, "-advertisement"))
+   return;
+
+   /*
+* We speculatively try to read a packet, which means we must preserve
+* the original buf/len pair in some cases.
+*/
+   src_buf = d->buf;
+   src_len = d->len;
+   line = packet_read_line_buf(_buf, _len, NULL);
+   if (!line)
+   die("invalid server response; expected service, got flush 
packet");
+
+   if (skip_prefix(line, "# service=", ) && !strcmp(p, service)) {
+   /*
+* The header can include additional metadata lines, up
+* until a packet flush marker.  Ignore these now, but
+* in the future we might start to scan them.
+*/
+   while (packet_read_line_buf(_buf, _len, NULL))
+   ;
+
+   /*
+* v0 smart http; callers expect us to soak up the
+* service and header packets
+*/
+   d->buf = src_buf;
+   d->len = src_len;
+   d->proto_git = 1;
+
+   } else if (starts_with(line, "version 2")) { /* should be strcmp()? */
+   /*
+* v2 smart http; do not consume version packet, which will
+* be handled elsewhere.
+*/
+   d->proto_git = 1;
+   } else if (skip_prefix(line, "ERR ", )) {
+   die(_("remote error: %s"), p);
+   } else {
+   die("invalid server response; got '%s'", line);
+   }
+}
+
 static struct discovery *discover_refs(const char *service, int for_push)
 {
-   struct strbuf exp = STRBUF_INIT;
struct strbuf type = STRBUF_INIT;
struct strbuf charset = STRBUF_INIT;
struct strbuf buffer = STRBUF_INIT;
@@ -405,38 +457,8 @@ static struct discovery *discover_refs(const char 
*service, int for_push)
last->buf_alloc = strbuf_detach(, >len);
last->buf = last->buf_alloc;
 
-   strbuf_addf(, "application/x-%s-advertisement", service);
-   if (maybe_smart &&
-   (5 <= last->len && last->buf[4] == '#') &&
-   !strbuf_cmp(, )) {
-   char *line;
-
-   /*
-* smart HTTP response; validate that the service
-* pkt-line matches our request.
-*/
-   line = packet_read_line_buf(>buf, >len, NULL);
-   if (!line)
-   die("invalid server response; expected service, got 
flush packet");
-
-   strbuf_reset();
-   strbuf_addf(, "# service=%s", service);
-   if (strcmp(line, exp.buf))
-   die("invalid server response; got '%s'", line);
-   strbuf_release();
-
-   /* The header can include additional metadata lines, up
-* until a packet flush marker.  Ignore these now, but
-* in the future we might start to scan them.
-

Re: [PATCH v4 0/7] fixes for unqualified push

2018-11-13 Thread Junio C Hamano

Thanks for an update; will replace.

报名参展中国展团、拓展日本的促销礼品市场

2018-11-13 Thread MARKETING & SALES PROMOTION EXPO Show Management

尊敬的 市场经理,
Zhejiang Wuchuan Industrial Co., Ltd

您好！这里是日本国际促销礼品&营销展Marketing & Sales Promotino Expo展会主办方。

日本国际促销礼品&营销展是日本最大的促销礼品及营销展。
2019年展会将汇聚650家展商和44,000名专业观众。

本展在日本已经有10年的历史，为了满足日本当地买家对产品新颖，价格低廉的需求，
自2016年开始正式接受国际买家报名，国际展商数目日益增加。

本邮件是推荐您报名参加展会的中国展团，以进一步拓展日本的促销礼品市场。
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
参展中国咱团的优势包括:
- 中国代理提供中文服务
- 中国馆专属展位装潢
- 提供申请政府补助金的协助
- 协助填写电子产品目录
- 协助参展商簽证手续办理   等.

其他的付费服务选项:
- 找寻专业口译人员
- 展后活动安排  等
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

如果您有兴趣参展，请您联系我们，我们会给贵司提供中国官方代理的联系方式。

谢谢

MARKETING & SALES PROMOTION EXPO 展会主办方
Reed Exhibitions Japan Ltd.
TEL: +81-3-3349-8505
mailto:spwo...@reedexpo.co.jp


Marketing & Sales Promotino Expo 
　- 日本最大的促销礼品&营销展

时间：2019年6月19日-21日
地点：日本东京有明国际展览中心
官网：https://www.sp-world.jp/en/
 

*数字为预期数字


ID:E36-G1402-0075



This message is delivered to you to provide details of exhibitions and 
conferences organised, co-organised, or managed by Reed Exhibitions Japan Ltd.
If you would like to change your contact information, or prefer not to receive 
further information on this exhibition/conference, please follow the directions 
below.


Please click the URL below and follow the directions on the website to update 
your e-mail and other information.
https://contact.reedexpo.co.jp/expo/REED/?lg=en=ch=CHANGE


Please reply to this mail changing the subject to "Remove from list".
You will not receive any further information on this exhibition/conference.

Re: [PATCH] push: change needlessly ambiguous example in error

2018-11-13 Thread Matthieu Moy

"Junio C Hamano"  wrote:

> > Where 'topic' is a tracking branch of 'origin/master' (I use
> > push.default=upstream). I only recently discovered that I could push to
> > 'HEAD" to do the same thing. So one ulterior motive is to make that more
> > prominent.
[...]
> Do we consider the current behaviour useful? Is it documented

Yes, since 1750783 (Documentation: more git push examples, 2009-01-26).

It may be an accident that the doc says "to the same name on the
remote." since it predates the introduction of push.default, but it
does say so and it's the actual behavior.

> already and widely known?

https://stackoverflow.com/questions/14031970/git-push-current-branch-shortcut

458 votes for the answer suggesting it.

-- 
Matthieu Moy
https://matthieu-moy.fr/

Re: [PATCH 0/1] Some left-over add-on for bw/config-h

2018-11-13 Thread Junio C Hamano

"Johannes Schindelin via GitGitGadget" 
writes:

> Back when bw/config-h was developed (and backported to Git for Windows), I
> came up with this patch. It seems to not be strictly necessary, but I like
> the safety of falling back to the Git directory when no common directory is
> configured (for whatever reason).

Shouldn't that be diagnosed as an error with BUG()?  That is, if we
know there is the current repository, and if commondir is not set,
isn't it a bug that we want to look into in the start-up sequence?

The phrase "for whatever reason" makes me wonder if this is truly
"defensive programming", rather than just sweeping the bug under the
rug.

FWIW, with this toy patch applied on top of this patch, all tests
that I usually run seem to pass.

 config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/config.c b/config.c
index c5726af14d..e667e2ebce 100644
--- a/config.c
+++ b/config.c
@@ -1669,7 +1669,7 @@ static int do_git_config_sequence(const struct 
config_options *opts,
if (opts->commondir)
repo_config = mkpathdup("%s/config", opts->commondir);
else if (opts->git_dir)
-   repo_config = mkpathdup("%s/config", opts->git_dir);
+   BUG("git-dir exists but no commondir?");
else
repo_config = NULL;

Re: What's cooking in git.git (Nov 2018, #04; Tue, 13)

2018-11-13 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason  writes:

>> * ab/dynamic-gettext-poison (2018-11-09) 2 commits
>>  - Makefile: ease dynamic-gettext-poison transition
>>  - i18n: make GETTEXT_POISON a runtime option
>>
>>  On hold.
>>  The tip one may be controversial, but at least it would get me going.
>>  cf. 
>
> So just to clarify the state of this. I sent a v4 with the patch you're
> referring to included,...

Clearly I failed to update the comments when I replaced the patch
(v3 plus SQUASH fix) with what is queued which is v4.  Thanks for
noticing.  I think this one is ready to move forward, if it is
deemed worth doing.

> I'm a bit surprised that people aren't excited about this whole "you
> don't need to recompile git just to run this one special test mode",

I actually do not think it is surprising at all.  Only a few people
care about i18n breakages so they are not daily running the poison
test in the first place.  I myself only "see" its result in Travis
not because I am particularly interested in the poison test, but
because I cannot selectively not seeing it when visiting the summary
page like https://travis-ci.org/git/git/builds/454503815 (I do look
at its log if it fails, though).

So a true improvement for most of the developers is to somehow make
it unnecessary to even worry about poison test.  How poison test is
done is secondary to them and I think that is why we see nobody is
enthused when the way it is run is improved.  Having to carry code
that is only used while running poison test in the production binary
is an annoyance, not an improvement, to them.

As I do not immediately see a way to satisfy the pie-in-the-sky wish
"the world would be a great place if it were impossible to introduce
a bug that marks plumbing messages with _() so that we did not even
have to do a poison test", I am neutral, not negative, on this topic.
I am also hoping that the dynamic-poisoning may allow us to do a
"readable but still poisoned" fake translation more easily than the
separate build method we have used so far, so that adds a slight
positive to the topic.

Thanks.

Re: [PATCH] git-compat-util: prefer poll.h to sys/poll.h

2018-11-13 Thread Danh Doan

"brian m. carlson"  writes:

>> - t1308.23 is failing because musl `fopen` is success when open directory
>> in readonly mode. POSIX allows this behavior:
>> http://pubs.opengroup.org/onlinepubs/7908799/xsh/fopen.html
>> [EISDIR]
>> The named file is a directory and mode requires write access.
>
> Does setting FREAD_READS_DIRECTORIES fix this?

Yes, setting FREAD_READS_DIRECTORIES fixes this.
And, `configure` can set that itself.

I blindly followed the Void's build template which explicitly unset it
in the configure script without investigating it.
My bad!

I'll send them a patch to fix this downstream.

> -- 
> brian m. carlson: Houston, Texas, US
> OpenPGP: https://keybase.io/bk2204

-- 
Danh

[RFC PATCH 1/2] commit: don't add scissors line if one exists

2018-11-13 Thread Denton Liu

If commit.cleanup = scissors is specified, don't produce a scissors line
if one already exists in the commit message.

Signed-off-by: Denton Liu 
---
 builtin/commit.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 0d9828e29e..e486246329 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -659,6 +659,7 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
const char *hook_arg2 = NULL;
int clean_message_contents = (cleanup_mode != COMMIT_MSG_CLEANUP_NONE);
int old_display_comment_prefix;
+   int contains_scissors = 0;
 
/* This checks and barfs if author is badly specified */
determine_author_info(author_ident);
@@ -742,6 +743,10 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
clean_message_contents = 0;
}
 
+   if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS &&
+   wt_status_locate_end(sb.buf, sb.len) != sb.len)
+   contains_scissors = 1;
+
/*
 * The remaining cases don't modify the template message, but
 * just set the argument(s) to the prepare-commit-msg hook.
@@ -798,7 +803,8 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
struct ident_split ci, ai;
 
if (whence != FROM_COMMIT) {
-   if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS)
+   if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS &&
+   !contains_scissors)
wt_status_add_cut_line(s->fp);
status_printf_ln(s, GIT_COLOR_NORMAL,
whence == FROM_MERGE
@@ -824,7 +830,8 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
  " Lines starting\nwith '%c' will be ignored, 
and an empty"
  " message aborts the commit.\n"), 
comment_line_char);
else if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS &&
-whence == FROM_COMMIT)
+whence == FROM_COMMIT &&
+!contains_scissors)
wt_status_add_cut_line(s->fp);
else /* COMMIT_MSG_CLEANUP_SPACE, that is. */
status_printf(s, GIT_COLOR_NORMAL,
-- 
2.19.1

[RFC PATCH 2/2] merge: add scissors line on merge conflict

2018-11-13 Thread Denton Liu

This fixes a bug where the scissors line is placed after the Conflicts:
section, in the case where a merge conflict occurs and
merge.cleanup = scissors.

Signed-off-by: Denton Liu 
---
 builtin/merge.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/builtin/merge.c b/builtin/merge.c
index 8f4a5065c2..c5010cee5e 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -36,6 +36,7 @@
 #include "packfile.h"
 #include "tag.h"
 #include "alias.h"
+#include "wt-status.h"
 
 #define DEFAULT_TWOHEAD (1<<0)
 #define DEFAULT_OCTOPUS (1<<1)
@@ -96,6 +97,9 @@ enum ff_type {
 
 static enum ff_type fast_forward = FF_ALLOW;
 
+const char *cleanup_arg;
+int put_scissors;
+
 static int option_parse_message(const struct option *opt,
const char *arg, int unset)
 {
@@ -243,6 +247,7 @@ static struct option builtin_merge_options[] = {
N_("perform a commit if the merge succeeds (default)")),
OPT_BOOL('e', "edit", _edit,
N_("edit message before committing")),
+   OPT_STRING(0, "cleanup", _arg, N_("default"), N_("how to strip 
spaces and #comments from message")),
OPT_SET_INT(0, "ff", _forward, N_("allow fast-forward (default)"), 
FF_ALLOW),
OPT_SET_INT_F(0, "ff-only", _forward,
  N_("abort if fast-forward is not possible"),
@@ -606,6 +611,8 @@ static int git_merge_config(const char *k, const char *v, 
void *cb)
return git_config_string(_twohead, k, v);
else if (!strcmp(k, "pull.octopus"))
return git_config_string(_octopus, k, v);
+   else if (!strcmp(k, "commit.cleanup"))
+   return git_config_string(_arg, k, v);
else if (!strcmp(k, "merge.renormalize"))
option_renormalize = git_config_bool(k, v);
else if (!strcmp(k, "merge.ff")) {
@@ -894,6 +901,13 @@ static int suggest_conflicts(void)
filename = git_path_merge_msg(the_repository);
fp = xfopen(filename, "a");
 
+   if (put_scissors) {
+   fputc('\n', fp);
+   wt_status_add_cut_line(fp);
+   /* comments out the newline from append_conflicts_hint */
+   fputc(comment_line_char, fp);
+   }
+
append_conflicts_hint();
fputs(msgbuf.buf, fp);
strbuf_release();
@@ -1402,6 +1416,8 @@ int cmd_merge(int argc, const char **argv, const char 
*prefix)
if (option_edit < 0)
option_edit = default_edit_option();
 
+   put_scissors = cleanup_arg && !strcmp(cleanup_arg, "scissors");
+
if (!use_strategies) {
if (!remoteheads)
; /* already up-to-date */
-- 
2.19.1

[RFC PATCH 0/2] Fix scissors bug during merge conflict

2018-11-13 Thread Denton Liu

I discovered a bug in Git a while ago where if one is using
commit.cleanup = scissors, if making a commit after a merge conflict,
the scissors line will be placed after the `Conflicts:` section. As a
result, a careless Git user (e.g. me) may accidentally commit the
`Conflicts:` section.

Here is a test case to replicate the behaviour:

mkdir test
cd test/
git init
git config commit.cleanup scissors
touch a
git add a
git commit -m 'test'
echo a > a
git commit -am 'test2'
git checkout -b new HEAD^
echo b > a
git commit -am 'test3'
git merge master
echo c > a
git add a
git commit # look at the commit message here

And the resulting message that's sent to the text editor:

Merge branch 'master' into new

# Conflicts:
#   a
#  >8 
# Do not modify or remove the line above.
# Everything below it will be ignored.
#
# It looks like you may be committing a merge.
# If this is not correct, please remove the file
#   .git/MERGE_HEAD
# and try again.


# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# On branch new
# All conflicts fixed but you are still merging.
#
# Changes to be committed:
#   modified:   a
#

With this fix, the message becomes the following:

Merge branch 'master' into new

#  >8 
# Do not modify or remove the line above.
# Everything below it will be ignored.
#
# Conflicts:
#   a
#
# It looks like you may be committing a merge.
# If this is not correct, please remove the file
#   .git/MERGE_HEAD
# and try again.


# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# On branch new
# All conflicts fixed but you are still merging.
#
# Changes to be committed:
#   modified:   a
#

Let me know what you think of the change. Of course, documentation and
testing will come after this leaves the RFC phase.

Denton Liu (2):
  commit: don't add scissors line if one exists
  merge: add scissors line on merge conflict

 builtin/commit.c | 11 +--
 builtin/merge.c  | 16 
 2 files changed, 25 insertions(+), 2 deletions(-)

-- 
2.19.1

Re: [PATCH 5/5] tests: explicitly use `git.exe` on Windows

2018-11-13 Thread Junio C Hamano

"Johannes Schindelin via GitGitGadget" 
writes:

> diff --git a/Makefile b/Makefile
> index bbfbb4292d..5df0118ce9 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2590,6 +2590,7 @@ GIT-BUILD-OPTIONS: FORCE
>   @echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst 
> ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
>   @echo PAGER_ENV=\''$(subst ','\'',$(subst ','\'',$(PAGER_ENV)))'\' >>$@+
>   @echo DC_SHA1=\''$(subst ','\'',$(subst ','\'',$(DC_SHA1)))'\' >>$@+
> + @echo X=\'$(X)\' >>$@+

Made me wonder if a single letter $(X) is a bit too cute to expose
to the outside world; as a narrowly confined local convention in
this single Makefile, it was perfectly fine.  But it should do for
now.  Its terseness is attractive, and our eyes (I do not speak for
those new to our codebase and build structure) are already used to
seeing $X suffix.  Somebody may later come and complain but I am OK
to rename it to something like $EXE at that time, not now.

>  ifdef TEST_OUTPUT_DIRECTORY
>   @echo TEST_OUTPUT_DIRECTORY=\''$(subst ','\'',$(subst 
> ','\'',$(TEST_OUTPUT_DIRECTORY)))'\' >>$@+
>  endif
> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> index 801cc9b2ef..c167b2e1af 100644
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -900,7 +900,7 @@ test_create_repo () {
>   mkdir -p "$repo"
>   (
>   cd "$repo" || error "Cannot setup test environment"
> - "${GIT_TEST_INSTALLED:-$GIT_EXEC_PATH}/git" init \
> + "${GIT_TEST_INSTALLED:-$GIT_EXEC_PATH}/git$X" init \

Good.

>   "--template=$GIT_BUILD_DIR/templates/blt/" >&3 2>&4 ||
>   error "cannot run git init -- have you built things yet?"
>   mv .git/hooks .git/hooks-disabled
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 1ea20dc2dc..3e2a9ce76d 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -49,18 +49,23 @@ export ASAN_OPTIONS
>  : ${LSAN_OPTIONS=abort_on_error=1}
>  export LSAN_OPTIONS
>  
> +if test ! -f "$GIT_BUILD_DIR"/GIT-BUILD-OPTIONS
> +then
> + echo >&2 'error: GIT-BUILD-OPTIONS missing (has Git been built?).'
> + exit 1
> +fi

OK, this tells us that we at least attempted to build git once, even
under test-installed mode, and hopefully people won't run $(MAKE) and
immediately ^C it only to fool us by leaving this file while keeping
git binary and t/helpers/ binary unbuilt.

> +. "$GIT_BUILD_DIR"/GIT-BUILD-OPTIONS
> +export PERL_PATH SHELL_PATH
> +
>  
>  # It appears that people try to run tests without building...
> -test -n "$GIT_TEST_INSTALLED" || "$GIT_BUILD_DIR/git" >/dev/null ||
> +test -n "$GIT_TEST_INSTALLED" || "$GIT_BUILD_DIR/git$X" >/dev/null ||

The latter half of this change is a good one.  Given what the
proposed log message of this patch says

Note also: the many, many calls to `git this` and `git that` are
unaffected, as the regular PATH search will find the `.exe` files on
Windows (and not be confused by a directory of the name `git` that is
in one of the directories listed in the `PATH` variable), while
`/path/to/git` would not, per se, know that it is looking for an
executable and happily prefer such a directory.

which I read as "path-prefixed invocation, i.e. some/path/to/git, is
bad, it must be spelled some/path/to/git.exe", I am surprised that
tests ever worked on Windows without these five patches, as this
line used to read like this:

"$GIT_BUILD_DIR/git" >/dev/null
if test $? != 1
then
...

Wouldn't "$GIT_BUILD_DIR/git" have failed (and "executable not
found" hopefully won't produce exit code 1) and stopped the test
suite from running even after you built git and not under the
test-installed-git mode?

>  if test $? != 1
>  then
>   echo >&2 'error: you do not seem to have built git yet.'
>   exit 1
>  fi
>  
> -. "$GIT_BUILD_DIR"/GIT-BUILD-OPTIONS
> -export PERL_PATH SHELL_PATH
> -
>  # if --tee was passed, write the output not only to the terminal, but
>  # additionally to the file test-results/$BASENAME.out, too.
>  case "$GIT_TEST_TEE_STARTED, $* " in

Re: [PATCH 4/5] tests: do not require Git to be built when testing an installed Git

2018-11-13 Thread Junio C Hamano

"Johannes Schindelin via GitGitGadget" 
writes:

> From: Johannes Schindelin 
>
> We really only need the test helpers in that case, but that is not what
> we test for. So let's skip the test for now when we know that we want to
> test an installed Git.

True, but...  hopefully we are making sure t/helpers/ has been built
in some other ways, though, right?  I do not offhand see how tests
would work in a pristine checkout with "cd t && sh t-*.sh" as
t/test-lib.sh is not running $(MAKE) itself (and the design of the
test-lib.sh, as can be seen in the check this part of it makes, is
to just abort if we cannot test) with this patch (and PATCH 1/5)
under the test-installed mode, though.

> Signed-off-by: Johannes Schindelin 
> ---
>  t/test-lib.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 832ede5099..1ea20dc2dc 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -51,7 +51,7 @@ export LSAN_OPTIONS
>  
>  
>  # It appears that people try to run tests without building...
> -"$GIT_BUILD_DIR/git" >/dev/null
> +test -n "$GIT_TEST_INSTALLED" || "$GIT_BUILD_DIR/git" >/dev/null ||
>  if test $? != 1
>  then
>   echo >&2 'error: you do not seem to have built git yet.'

Re: [PATCH 2/5] tests: respect GIT_TEST_INSTALLED when initializing repositories

2018-11-13 Thread Junio C Hamano

"Johannes Schindelin via GitGitGadget" 
writes:

> From: Johannes Schindelin 
>
> It really makes very, very little sense to use a different git
> executable than the one the caller indicated via setting the environment
> variable GIT_TEST_INSTALLED.

Makes perfect sense.  Shouldn't we be asking where the template
directory of the installed version is and using it instead of the
freshly built one, no?

>
> Signed-off-by: Johannes Schindelin 
> ---
>  t/test-lib-functions.sh | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> index 78d8c3783b..801cc9b2ef 100644
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -900,7 +900,8 @@ test_create_repo () {
>   mkdir -p "$repo"
>   (
>   cd "$repo" || error "Cannot setup test environment"
> - "$GIT_EXEC_PATH/git-init" 
> "--template=$GIT_BUILD_DIR/templates/blt/" >&3 2>&4 ||
> + "${GIT_TEST_INSTALLED:-$GIT_EXEC_PATH}/git" init \
> + "--template=$GIT_BUILD_DIR/templates/blt/" >&3 2>&4 ||
>   error "cannot run git init -- have you built things yet?"
>   mv .git/hooks .git/hooks-disabled
>   ) || exit

Re: [PATCH 3/5] t/lib-gettext: test installed git-sh-i18n if GIT_TEST_INSTALLED is set

2018-11-13 Thread Junio C Hamano

"Johannes Schindelin via GitGitGadget" 
writes:

> From: Johannes Schindelin 
>
> It makes very, very little sense to test the built git-sh-i18n when the
> user asked specifically to test another one.
>
> Signed-off-by: Johannes Schindelin 
> ---

Yup.  Makes sense.

>  t/lib-gettext.sh | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/t/lib-gettext.sh b/t/lib-gettext.sh
> index eec757f104..9eb160c997 100644
> --- a/t/lib-gettext.sh
> +++ b/t/lib-gettext.sh
> @@ -10,7 +10,12 @@ GIT_TEXTDOMAINDIR="$GIT_BUILD_DIR/po/build/locale"
>  GIT_PO_PATH="$GIT_BUILD_DIR/po"
>  export GIT_TEXTDOMAINDIR GIT_PO_PATH
>  
> -. "$GIT_BUILD_DIR"/git-sh-i18n
> +if test -n "$GIT_TEST_INSTALLED"
> +then
> + . "$(git --exec-path)"/git-sh-i18n
> +else
> + . "$GIT_BUILD_DIR"/git-sh-i18n
> +fi
>  
>  if test_have_prereq GETTEXT && ! test_have_prereq GETTEXT_POISON
>  then

Re: [PATCH 1/5] tests: fix GIT_TEST_INSTALLED's PATH to include t/helper/

2018-11-13 Thread Junio C Hamano

"Johannes Schindelin via GitGitGadget" 
writes:

> From: Johannes Schindelin 
>
> We really need to be able to find the test helpers... Really. This
> change was forgotten when we moved the test helpers into t/helper/

Yeah, it is unfortunate that more and more things have internal
whitebox test that can only be checked with test-helper binaries,
which makes it impossible to fully testing installed git, but that
is not a new issue this fix introduces.  This at least allows us to
keep using the test-helpers from freshly built Git while using the
installed version of Git for most of the tests.

Thanks.

> Signed-off-by: Johannes Schindelin 
> ---
>  t/test-lib.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 47a99aa0ed..832ede5099 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -957,7 +957,7 @@ elif test -n "$GIT_TEST_INSTALLED"
>  then
>   GIT_EXEC_PATH=$($GIT_TEST_INSTALLED/git --exec-path)  ||
>   error "Cannot run git from $GIT_TEST_INSTALLED."
> - PATH=$GIT_TEST_INSTALLED:$GIT_BUILD_DIR:$PATH
> + PATH=$GIT_TEST_INSTALLED:$GIT_BUILD_DIR/t/helper:$PATH
>   GIT_EXEC_PATH=${GIT_TEST_EXEC_PATH:-$GIT_EXEC_PATH}
>  else # normal case, use ../bin-wrappers only unless $with_dashes:
>   git_bin_dir="$GIT_BUILD_DIR/bin-wrappers"

Re: [PATCH] git-compat-util: prefer poll.h to sys/poll.h

2018-11-13 Thread Junio C Hamano

"brian m. carlson"  writes:

> On Wed, Nov 14, 2018 at 08:10:43AM +0700, Đoàn Trần Công Danh wrote:
>> POSIX specifies that  is the correct header for poll(2)
>> whereas  is only needed for some old libc.
>> 
>> Let's follow the POSIX way by default.
>> 
>> This effectively eliminates musl's warning:
>> 
>> warning redirecting incorrect #include  to 
>> 
>> Signed-off-by: Đoàn Trần Công Danh 
>
> I think this patch is fine.  This was in SUSv2, and I don't feel bad
> about siding with a spec that's at least 17 years old.

Yup, I agree.  Thanks, both.

Re: [PATCH] push: change needlessly ambiguous example in error

2018-11-13 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason  writes:

>> This "git push origin HEAD" is IMHO less common. It may confuse users.
>> Or users may learn it and be happy thanks to your message. I don't know.
>
> I was hoping for the latter. I'm slightly embarrassed to say that for
> the entire time I've been using git I've been doing:
>
> # on branch 'topic'
> git push origin topic:topic
>
> Where 'topic' is a tracking branch of 'origin/master' (I use
> push.default=upstream). I only recently discovered that I could push to
> 'HEAD" to do the same thing. So one ulterior motive is to make that more
> prominent.

Personally I do not think you should feel embarrassed at all.  I
didn't know it myself, and I do not recall designing such a
behaviour, so I would not be surprised if it is merely what the code
right now happens to do.  FWIW, I would have expected that a "git
push origin HEAD" when HEAD is a symref to a branch 'x' would behave
as if "git push origin x" were given, going through the usual
simple/upstream/matching etc. dance.

Do we consider the current behaviour useful?  Is it documented
already and widely known?  If the answers are yes/no, of course we
should document it, but if they are no/no, it may not be a bad idea
to deprecate it and replace it with a saner version (i.e. deref HEAD
to branch name and behave the same way as pushing that branch) over
time.

Re: [PATCH v4] commit: add a commit.allowEmpty config variable

2018-11-13 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason  writes:

> Agreed. I'm happy to see the test for-loop gone as I noted in
> https://public-inbox.org/git/87d0rm7zeo@evledraar.gmail.com/ but as
> noted in that v3 feedback the whole "why would anyone want this?"
> explanation is still missing, and this still smells like a workaround
> for a bug we should be fixing elsewhere in the sequencing code.

Thanks.  I share the same impression that this is sweeping a bug
under a wrong rug.

Re: [PATCH] Makefile: add pending semantic patches

2018-11-13 Thread Junio C Hamano

SZEDER Gábor  writes:

>> > inifinite recursion)?  Or are they "correct but not immediately
>> > necessary" (e.g. because calling read_cache() does not hurt until
>> > that function gets removed, so rewriting the callers to call
>> > read_index() with _index may be correct but not immediately
>> > necessary)?
>> 
>> the latter. I assume correctness (of the semantic patch) to be a given,
>
> I'm afraid we can't assume that.  As far as correctness is concerned,
> I think semantic patches are not different from any other code we
> write, i.e. they are potentially buggy.  Perhaps even more so than the
> "usual" Git code, because we have long experience writing C and shell
> code (with all their quirks and error-proneness), but none of us is
> really an expert in writing semantic patches.

All correct.

And applying semantic patches generated from buggy sources can
produce buggy code, just like merging a buggy topic branch does.

These days, I ran coccicheck at the tip of 'pu' (even though this
cost me quite a lot a few times every day) and feed its findings
back to authors of topics that introduce new ones, so that their
topics next time do not need the fix-up at the tip of 'pu' in the
next integration cycle.  That way, the changes mechanically
suggested by coccicheck can still be reviewed in small bite-sized
chunks and hopefully possible problems caused by buggy sources can
either be found in the review process, or discovered at the tip of
'pu'.

[PATCH v6 11/12] sha256: add an SHA-256 implementation using libgcrypt

2018-11-13 Thread brian m. carlson

Generally, one gets better performance out of cryptographic routines
written in assembly than C, and this is also true for SHA-256.  In
addition, most Linux distributions cannot distribute Git linked against
OpenSSL for licensing reasons.

Most systems with GnuPG will also have libgcrypt, since it is a
dependency of GnuPG.  libgcrypt is also faster than the SHA1DC
implementation for messages of a few KiB and larger.

For comparison, on a Core i7-6600U, this implementation processes 16 KiB
chunks at 355 MiB/s while SHA1DC processes equivalent chunks at 337
MiB/s.

In addition, libgcrypt is licensed under the LGPL 2.1, which is
compatible with the GPL.  Add an implementation of SHA-256 that uses
libgcrypt.

Signed-off-by: brian m. carlson 
---
 Makefile| 13 +++--
 hash.h  |  4 
 sha256/gcrypt.h | 30 ++
 3 files changed, 45 insertions(+), 2 deletions(-)
 create mode 100644 sha256/gcrypt.h

diff --git a/Makefile b/Makefile
index 1302c07f5c..6c99844aa8 100644
--- a/Makefile
+++ b/Makefile
@@ -179,6 +179,10 @@ all::
 # in one call to the platform's SHA1_Update(). e.g. APPLE_COMMON_CRYPTO
 # wants 'SHA1_MAX_BLOCK_SIZE=1024L*1024L*1024L' defined.
 #
+# Define BLK_SHA256 to use the built-in SHA-256 routines.
+#
+# Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1647,8 +1651,13 @@ endif
 endif
 endif
 
-LIB_OBJS += sha256/block/sha256.o
-BASIC_CFLAGS += -DSHA256_BLK
+ifdef GCRYPT_SHA256
+   BASIC_CFLAGS += -DSHA256_GCRYPT
+   EXTLIBS += -lgcrypt
+else
+   LIB_OBJS += sha256/block/sha256.o
+   BASIC_CFLAGS += -DSHA256_BLK
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index a9bc624020..2ef098052d 100644
--- a/hash.h
+++ b/hash.h
@@ -15,7 +15,11 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#if defined(SHA256_GCRYPT)
+#include "sha256/gcrypt.h"
+#else
 #include "sha256/block/sha256.h"
+#endif
 
 #ifndef platform_SHA_CTX
 /*
diff --git a/sha256/gcrypt.h b/sha256/gcrypt.h
new file mode 100644
index 00..09bd8bb200
--- /dev/null
+++ b/sha256/gcrypt.h
@@ -0,0 +1,30 @@
+#ifndef SHA256_GCRYPT_H
+#define SHA256_GCRYPT_H
+
+#include 
+
+#define SHA256_DIGEST_SIZE 32
+
+typedef gcry_md_hd_t gcrypt_SHA256_CTX;
+
+inline void gcrypt_SHA256_Init(gcrypt_SHA256_CTX *ctx)
+{
+   gcry_md_open(ctx, GCRY_MD_SHA256, 0);
+}
+
+inline void gcrypt_SHA256_Update(gcrypt_SHA256_CTX *ctx, const void *data, 
size_t len)
+{
+   gcry_md_write(*ctx, data, len);
+}
+
+inline void gcrypt_SHA256_Final(unsigned char *digest, gcrypt_SHA256_CTX *ctx)
+{
+   memcpy(digest, gcry_md_read(*ctx, GCRY_MD_SHA256), SHA256_DIGEST_SIZE);
+}
+
+#define platform_SHA256_CTX gcrypt_SHA256_CTX
+#define platform_SHA256_Init gcrypt_SHA256_Init
+#define platform_SHA256_Update gcrypt_SHA256_Update
+#define platform_SHA256_Final gcrypt_SHA256_Final
+
+#endif

[PATCH v6 09/12] commit-graph: convert to using the_hash_algo

2018-11-13 Thread brian m. carlson

Instead of using hard-coded constants for object sizes, use
the_hash_algo to look them up.  In addition, use a function call to look
up the object ID version and produce the correct value.  For now, we use
version 1, which means to use the default algorithm used in the rest of
the repository.

Signed-off-by: brian m. carlson 
---
 commit-graph.c | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..6763d19288 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -23,16 +23,11 @@
 #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
 #define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
 
-#define GRAPH_DATA_WIDTH 36
+#define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
 
 #define GRAPH_VERSION_1 0x1
 #define GRAPH_VERSION GRAPH_VERSION_1
 
-#define GRAPH_OID_VERSION_SHA1 1
-#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ
-#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
-#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
-
 #define GRAPH_OCTOPUS_EDGES_NEEDED 0x8000
 #define GRAPH_PARENT_MISSING 0x7fff
 #define GRAPH_EDGE_LAST_MASK 0x7fff
@@ -44,13 +39,18 @@
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
-   + GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
+   + GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
 
 char *get_commit_graph_filename(const char *obj_dir)
 {
return xstrfmt("%s/info/commit-graph", obj_dir);
 }
 
+static uint8_t oid_version(void)
+{
+   return 1;
+}
+
 static struct commit_graph *alloc_commit_graph(void)
 {
struct commit_graph *g = xcalloc(1, sizeof(*g));
@@ -125,15 +125,15 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
}
 
hash_version = *(unsigned char*)(data + 5);
-   if (hash_version != GRAPH_OID_VERSION) {
+   if (hash_version != oid_version()) {
error(_("hash version %X does not match version %X"),
- hash_version, GRAPH_OID_VERSION);
+ hash_version, oid_version());
goto cleanup_fail;
}
 
graph = alloc_commit_graph();
 
-   graph->hash_len = GRAPH_OID_LEN;
+   graph->hash_len = the_hash_algo->rawsz;
graph->num_chunks = *(unsigned char*)(data + 6);
graph->graph_fd = fd;
graph->data = graph_map;
@@ -149,7 +149,7 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
 
chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 
-   if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
+   if (chunk_offset > graph_size - the_hash_algo->rawsz) {
error(_("improper chunk offset %08x%08x"), 
(uint32_t)(chunk_offset >> 32),
  (uint32_t)chunk_offset);
goto cleanup_fail;
@@ -764,6 +764,7 @@ void write_commit_graph(const char *obj_dir,
int num_extra_edges;
struct commit_list *parent;
struct progress *progress = NULL;
+   const unsigned hashsz = the_hash_algo->rawsz;
 
if (!commit_graph_compatible(the_repository))
return;
@@ -909,7 +910,7 @@ void write_commit_graph(const char *obj_dir,
hashwrite_be32(f, GRAPH_SIGNATURE);
 
hashwrite_u8(f, GRAPH_VERSION);
-   hashwrite_u8(f, GRAPH_OID_VERSION);
+   hashwrite_u8(f, oid_version());
hashwrite_u8(f, num_chunks);
hashwrite_u8(f, 0); /* unused padding byte */
 
@@ -924,8 +925,8 @@ void write_commit_graph(const char *obj_dir,
 
chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
-   chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
-   chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
+   chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
+   chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
 
for (i = 0; i <= num_chunks; i++) {
@@ -938,8 +939,8 @@ void write_commit_graph(const char *obj_dir,
}
 
write_graph_chunk_fanout(f, commits.list, commits.nr);
-   write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
-   write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
+   write_graph_chunk_oids(f, hashsz, commits.list, commits.nr);
+   write_graph_chunk_data(f, hashsz, commits.list, commits.nr);
write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
close_commit_graph(the_repository);

[PATCH v6 02/12] sha1-file: provide functions to look up hash algorithms

2018-11-13 Thread brian m. carlson

There are several ways we might refer to a hash algorithm: by name, such
as in the config file; by format ID, such as in a pack; or internally,
by a pointer to the hash_algos array.  Provide functions to look up hash
algorithms based on these various forms and return the internal constant
used for them.  If conversion to another form is necessary, this
internal constant can be used to look up the proper data in the
hash_algos array.

Signed-off-by: brian m. carlson 
---
 hash.h  | 13 +
 sha1-file.c | 21 +
 2 files changed, 34 insertions(+)

diff --git a/hash.h b/hash.h
index 7c8238bc2e..80881eea47 100644
--- a/hash.h
+++ b/hash.h
@@ -98,4 +98,17 @@ struct git_hash_algo {
 };
 extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS];
 
+/*
+ * Return a GIT_HASH_* constant based on the name.  Returns GIT_HASH_UNKNOWN if
+ * the name doesn't match a known algorithm.
+ */
+int hash_algo_by_name(const char *name);
+/* Identical, except based on the format ID. */
+int hash_algo_by_id(uint32_t format_id);
+/* Identical, except for a pointer to struct git_hash_algo. */
+static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
+{
+   return p - hash_algos;
+}
+
 #endif
diff --git a/sha1-file.c b/sha1-file.c
index 3b9c042691..93ed8c8686 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -122,6 +122,27 @@ const char *empty_blob_oid_hex(void)
return oid_to_hex_r(buf, the_hash_algo->empty_blob);
 }
 
+int hash_algo_by_name(const char *name)
+{
+   int i;
+   if (!name)
+   return GIT_HASH_UNKNOWN;
+   for (i = 1; i < GIT_HASH_NALGOS; i++)
+   if (!strcmp(name, hash_algos[i].name))
+   return i;
+   return GIT_HASH_UNKNOWN;
+}
+
+int hash_algo_by_id(uint32_t format_id)
+{
+   int i;
+   for (i = 1; i < GIT_HASH_NALGOS; i++)
+   if (format_id == hash_algos[i].format_id)
+   return i;
+   return GIT_HASH_UNKNOWN;
+}
+
+
 /*
  * This is meant to hold a *small* number of objects that you would
  * want read_sha1_file() to be able to return, but yet you do not want

[PATCH v6 10/12] Add a base implementation of SHA-256 support

2018-11-13 Thread brian m. carlson

SHA-1 is weak and we need to transition to a new hash function.  For
some time, we have referred to this new function as NewHash.  Recently,
we decided to pick SHA-256 as NewHash.  The reasons behind the choice of
SHA-256 are outlined in the thread starting at [1] and in the commit
history for the hash function transition document.

Add a basic implementation of SHA-256 based off libtomcrypt, which is in
the public domain.  Optimize it and restructure it to meet our coding
standards.  Pull in the update and final functions from the SHA-1 block
implementation, as we know these function correctly with all compilers.
This implementation is slower than SHA-1, but more performant
implementations will be introduced in future commits.

Wire up SHA-256 in the list of hash algorithms, and add a test that the
algorithm works correctly.

Note that with this patch, it is still not possible to switch to using
SHA-256 in Git.  Additional patches are needed to prepare the code to
handle a larger hash algorithm and further test fixes are needed.

[1] 
https://public-inbox.org/git/20180609224913.gc38...@genre.crustytoothpaste.net/

Signed-off-by: brian m. carlson 
---
 Makefile   |   4 +
 cache.h|  12 ++-
 hash.h |  19 +++-
 sha1-file.c|  45 ++
 sha256/block/sha256.c  | 196 +
 sha256/block/sha256.h  |  24 +
 t/helper/test-sha256.c |   7 ++
 t/helper/test-tool.c   |   1 +
 t/helper/test-tool.h   |   1 +
 t/t0015-hash.sh|  26 ++
 10 files changed, 331 insertions(+), 4 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 t/helper/test-sha256.c

diff --git a/Makefile b/Makefile
index c6f06bf50b..1302c07f5c 100644
--- a/Makefile
+++ b/Makefile
@@ -749,6 +749,7 @@ TEST_BUILTINS_OBJS += test-run-command.o
 TEST_BUILTINS_OBJS += test-scrap-cache-tree.o
 TEST_BUILTINS_OBJS += test-sha1.o
 TEST_BUILTINS_OBJS += test-sha1-array.o
+TEST_BUILTINS_OBJS += test-sha256.o
 TEST_BUILTINS_OBJS += test-sigchain.o
 TEST_BUILTINS_OBJS += test-strcmp-offset.o
 TEST_BUILTINS_OBJS += test-string-list.o
@@ -1646,6 +1647,9 @@ endif
 endif
 endif
 
+LIB_OBJS += sha256/block/sha256.o
+BASIC_CFLAGS += -DSHA256_BLK
+
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
BASIC_CFLAGS += -DSHA1_MAX_BLOCK_SIZE="$(SHA1_MAX_BLOCK_SIZE)"
diff --git a/cache.h b/cache.h
index 62b2f3a5e8..b0dfe42736 100644
--- a/cache.h
+++ b/cache.h
@@ -48,11 +48,17 @@ unsigned long git_deflate_bound(git_zstream *, unsigned 
long);
 /* The block size of SHA-1. */
 #define GIT_SHA1_BLKSZ 64
 
+/* The length in bytes and in hex digits of an object name (SHA-256 value). */
+#define GIT_SHA256_RAWSZ 32
+#define GIT_SHA256_HEXSZ (2 * GIT_SHA256_RAWSZ)
+/* The block size of SHA-256. */
+#define GIT_SHA256_BLKSZ 64
+
 /* The length in byte and in hex digits of the largest possible hash value. */
-#define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
-#define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+#define GIT_MAX_RAWSZ GIT_SHA256_RAWSZ
+#define GIT_MAX_HEXSZ GIT_SHA256_HEXSZ
 /* The largest possible block size for any supported hash. */
-#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
+#define GIT_MAX_BLKSZ GIT_SHA256_BLKSZ
 
 struct object_id {
unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 1bcf7ab6fd..a9bc624020 100644
--- a/hash.h
+++ b/hash.h
@@ -15,6 +15,8 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#include "sha256/block/sha256.h"
+
 #ifndef platform_SHA_CTX
 /*
  * platform's underlying implementation of SHA-1; could be OpenSSL,
@@ -34,6 +36,18 @@
 #define git_SHA1_Updateplatform_SHA1_Update
 #define git_SHA1_Final platform_SHA1_Final
 
+#ifndef platform_SHA256_CTX
+#define platform_SHA256_CTXSHA256_CTX
+#define platform_SHA256_Init   SHA256_Init
+#define platform_SHA256_Update SHA256_Update
+#define platform_SHA256_Final  SHA256_Final
+#endif
+
+#define git_SHA256_CTX platform_SHA256_CTX
+#define git_SHA256_Initplatform_SHA256_Init
+#define git_SHA256_Update  platform_SHA256_Update
+#define git_SHA256_Final   platform_SHA256_Final
+
 #ifdef SHA1_MAX_BLOCK_SIZE
 #include "compat/sha1-chunked.h"
 #undef git_SHA1_Update
@@ -52,12 +66,15 @@
 #define GIT_HASH_UNKNOWN 0
 /* SHA-1 */
 #define GIT_HASH_SHA1 1
+/* SHA-256  */
+#define GIT_HASH_SHA256 2
 /* Number of algorithms supported (including unknown). */
-#define GIT_HASH_NALGOS (GIT_HASH_SHA1 + 1)
+#define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1)
 
 /* A suitably aligned type for stack allocations of hash contexts. */
 union git_hash_ctx {
git_SHA_CTX sha1;
+   git_SHA256_CTX sha256;
 };
 typedef union git_hash_ctx git_hash_ctx;
 
diff --git a/sha1-file.c b/sha1-file.c
index c47349a5f8..e7764822ea 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -40,10 +40,20 @@
 #define EMPTY_TREE_SHA1_BIN_LITERAL \
 "\x4b\x82\x5d\xc6\x42\xcb\x6e\xb9\xa0\x60" \

[PATCH v6 12/12] hash: add an SHA-256 implementation using OpenSSL

2018-11-13 Thread brian m. carlson

We already have OpenSSL routines available for SHA-1, so add routines
for SHA-256 as well.

On a Core i7-6600U, this SHA-256 implementation compares favorably to
the SHA1DC SHA-1 implementation:

SHA-1: 157 MiB/s (64 byte chunks); 337 MiB/s (16 KiB chunks)
SHA-256: 165 MiB/s (64 byte chunks); 408 MiB/s (16 KiB chunks)

Signed-off-by: brian m. carlson 
---
 Makefile | 7 +++
 hash.h   | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/Makefile b/Makefile
index 6c99844aa8..6844deb92d 100644
--- a/Makefile
+++ b/Makefile
@@ -183,6 +183,8 @@ all::
 #
 # Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
 #
+# Define OPENSSL_SHA256 to use the SHA-256 routines in OpenSSL.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1651,6 +1653,10 @@ endif
 endif
 endif
 
+ifdef OPENSSL_SHA256
+   EXTLIBS += $(LIB_4_CRYPTO)
+   BASIC_CFLAGS += -DSHA256_OPENSSL
+else
 ifdef GCRYPT_SHA256
BASIC_CFLAGS += -DSHA256_GCRYPT
EXTLIBS += -lgcrypt
@@ -1658,6 +1664,7 @@ else
LIB_OBJS += sha256/block/sha256.o
BASIC_CFLAGS += -DSHA256_BLK
 endif
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index 2ef098052d..adde708cf2 100644
--- a/hash.h
+++ b/hash.h
@@ -17,6 +17,8 @@
 
 #if defined(SHA256_GCRYPT)
 #include "sha256/gcrypt.h"
+#elif defined(SHA256_OPENSSL)
+#include 
 #else
 #include "sha256/block/sha256.h"
 #endif

[PATCH v6 08/12] t/helper: add a test helper to compute hash speed

2018-11-13 Thread brian m. carlson

Add a utility (which is less for the testsuite and more for developers)
that can compute hash speeds for whatever hash algorithms are
implemented.  This allows developers to test their personal systems to
determine the performance characteristics of various algorithms.

Signed-off-by: brian m. carlson 
---
 Makefile   |  1 +
 t/helper/test-hash-speed.c | 61 ++
 t/helper/test-tool.c   |  1 +
 t/helper/test-tool.h   |  1 +
 4 files changed, 64 insertions(+)
 create mode 100644 t/helper/test-hash-speed.c

diff --git a/Makefile b/Makefile
index daf0b198ec..c6f06bf50b 100644
--- a/Makefile
+++ b/Makefile
@@ -726,6 +726,7 @@ TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
 TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
+TEST_BUILTINS_OBJS += test-hash-speed.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
 TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o
diff --git a/t/helper/test-hash-speed.c b/t/helper/test-hash-speed.c
new file mode 100644
index 00..432233c7f0
--- /dev/null
+++ b/t/helper/test-hash-speed.c
@@ -0,0 +1,61 @@
+#include "test-tool.h"
+#include "cache.h"
+
+#define NUM_SECONDS 3
+
+static inline void compute_hash(const struct git_hash_algo *algo, git_hash_ctx 
*ctx, uint8_t *final, const void *p, size_t len)
+{
+   algo->init_fn(ctx);
+   algo->update_fn(ctx, p, len);
+   algo->final_fn(final, ctx);
+}
+
+int cmd__hash_speed(int ac, const char **av)
+{
+   git_hash_ctx ctx;
+   unsigned char hash[GIT_MAX_RAWSZ];
+   clock_t initial, start, end;
+   unsigned bufsizes[] = { 64, 256, 1024, 8192, 16384 };
+   int i;
+   void *p;
+   const struct git_hash_algo *algo = NULL;
+
+   if (ac == 2) {
+   for (i = 1; i < GIT_HASH_NALGOS; i++) {
+   if (!strcmp(av[1], hash_algos[i].name)) {
+   algo = _algos[i];
+   break;
+   }
+   }
+   }
+   if (!algo)
+   die("usage: test-tool hash-speed algo_name");
+
+   /* Use this as an offset to make overflow less likely. */
+   initial = clock();
+
+   printf("algo: %s\n", algo->name);
+
+   for (i = 0; i < ARRAY_SIZE(bufsizes); i++) {
+   unsigned long j, kb;
+   double kb_per_sec;
+   p = xcalloc(1, bufsizes[i]);
+   start = end = clock() - initial;
+   for (j = 0; ((end - start) / CLOCKS_PER_SEC) < NUM_SECONDS; 
j++) {
+   compute_hash(algo, , hash, p, bufsizes[i]);
+
+   /*
+* Only check elapsed time every 128 iterations to avoid
+* dominating the runtime with system calls.
+*/
+   if (!(j & 127))
+   end = clock() - initial;
+   }
+   kb = j * bufsizes[i];
+   kb_per_sec = kb / (1024 * ((double)end - start) / 
CLOCKS_PER_SEC);
+   printf("size %u: %lu iters; %lu KiB; %0.2f KiB/s\n", 
bufsizes[i], j, kb, kb_per_sec);
+   free(p);
+   }
+
+   exit(0);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index bfb195b1a8..c222d532a2 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -20,6 +20,7 @@ static struct test_cmd cmds[] = {
{ "example-decorate", cmd__example_decorate },
{ "genrandom", cmd__genrandom },
{ "hashmap", cmd__hashmap },
+   { "hash-speed", cmd__hash_speed },
{ "index-version", cmd__index_version },
{ "json-writer", cmd__json_writer },
{ "lazy-init-name-hash", cmd__lazy_init_name_hash },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index b5b7712a1d..70dafb7649 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -16,6 +16,7 @@ int cmd__dump_untracked_cache(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
 int cmd__hashmap(int argc, const char **argv);
+int cmd__hash_speed(int argc, const char **argv);
 int cmd__index_version(int argc, const char **argv);
 int cmd__json_writer(int argc, const char **argv);
 int cmd__lazy_init_name_hash(int argc, const char **argv);

[PATCH v6 03/12] hex: introduce functions to print arbitrary hashes

2018-11-13 Thread brian m. carlson

Currently, we have functions that turn an arbitrary SHA-1 value or an
object ID into hex format, either using a static buffer or with a
user-provided buffer.  Add variants of these functions that can handle
an arbitrary hash algorithm, specified by constant.  Update the
documentation as well.

While we're at it, remove the "extern" declaration from this family of
functions, since it's not needed and our style now recommends against
it.

We use the variant taking the algorithm structure pointer as the
internal variant, since taking an algorithm pointer is the easiest way
to handle all of the variants in use.

Note that we maintain these functions because there are hashes which
must change based on the hash algorithm in use but are not object IDs
(such as pack checksums).

Signed-off-by: brian m. carlson 
---
 cache.h | 15 +--
 hex.c   | 32 
 2 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/cache.h b/cache.h
index ca36b44ee0..cb7268deea 100644
--- a/cache.h
+++ b/cache.h
@@ -1365,9 +1365,9 @@ extern int get_oid_hex(const char *hex, struct object_id 
*sha1);
 extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
 
 /*
- * Convert a binary sha1 to its hex equivalent. The `_r` variant is reentrant,
+ * Convert a binary hash to its hex equivalent. The `_r` variant is reentrant,
  * and writes the NUL-terminated output to the buffer `out`, which must be at
- * least `GIT_SHA1_HEXSZ + 1` bytes, and returns a pointer to out for
+ * least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for
  * convenience.
  *
  * The non-`_r` variant returns a static buffer, but uses a ring of 4
@@ -1375,10 +1375,13 @@ extern int hex_to_bytes(unsigned char *binary, const 
char *hex, size_t len);
  *
  *   printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two));
  */
-extern char *sha1_to_hex_r(char *out, const unsigned char *sha1);
-extern char *oid_to_hex_r(char *out, const struct object_id *oid);
-extern char *sha1_to_hex(const unsigned char *sha1);   /* static buffer 
result! */
-extern char *oid_to_hex(const struct object_id *oid);  /* same static buffer 
as sha1_to_hex */
+char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const 
struct git_hash_algo *);
+char *sha1_to_hex_r(char *out, const unsigned char *sha1);
+char *oid_to_hex_r(char *out, const struct object_id *oid);
+char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo 
*);  /* static buffer result! */
+char *sha1_to_hex(const unsigned char *sha1);  
/* same static buffer */
+char *hash_to_hex(const unsigned char *hash);  
/* same static buffer */
+char *oid_to_hex(const struct object_id *oid); 
/* same static buffer */
 
 /*
  * Parse a 40-character hexadecimal object ID starting from hex, updating the
diff --git a/hex.c b/hex.c
index 10af1a29e8..7850a8879d 100644
--- a/hex.c
+++ b/hex.c
@@ -73,14 +73,15 @@ int parse_oid_hex(const char *hex, struct object_id *oid, 
const char **end)
return ret;
 }
 
-char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
+char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
+ const struct git_hash_algo *algop)
 {
static const char hex[] = "0123456789abcdef";
char *buf = buffer;
int i;
 
-   for (i = 0; i < the_hash_algo->rawsz; i++) {
-   unsigned int val = *sha1++;
+   for (i = 0; i < algop->rawsz; i++) {
+   unsigned int val = *hash++;
*buf++ = hex[val >> 4];
*buf++ = hex[val & 0xf];
}
@@ -89,20 +90,35 @@ char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
return buffer;
 }
 
-char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
 {
-   return sha1_to_hex_r(buffer, oid->hash);
+   return hash_to_hex_algop_r(buffer, sha1, _algos[GIT_HASH_SHA1]);
 }
 
-char *sha1_to_hex(const unsigned char *sha1)
+char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+{
+   return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo);
+}
+
+char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo 
*algop)
 {
static int bufno;
static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
-   return sha1_to_hex_r(hexbuffer[bufno], sha1);
+   return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop);
+}
+
+char *sha1_to_hex(const unsigned char *sha1)
+{
+   return hash_to_hex_algop(sha1, _algos[GIT_HASH_SHA1]);
+}
+
+char *hash_to_hex(const unsigned char *hash)
+{
+   return hash_to_hex_algop(hash, the_hash_algo);
 }
 
 char *oid_to_hex(const struct object_id *oid)
 {
-   return sha1_to_hex(oid->hash);
+   return hash_to_hex_algop(oid->hash, the_hash_algo);

[PATCH v6 00/12] Base SHA-256 implementation

2018-11-13 Thread brian m. carlson

This series provides a functional SHA-256 implementation and wires it
up, along with some housekeeping patches to make it suitable for
testing.

Changes from v5:
* Remove inclusion of "git-compat-util.h" in header.
* Remove "inline" from definition of hash_to_hex_algop_r.
* Switch perl invocations in t0015 to use -e instead of -E.
* Switch perl invocations in t0015 to use autoflush and postfix for.

Changes from v4:
* Downcase hex constants for consistency.
* Remove needless parentheses in return statement.
* Remove braces for single statement loops.
* Switch to +=.
* Add references to rationale for SHA-256.

Changes from v3:
* Switch to using inline functions instead of macros in many cases.
* Undefine remaining macros at the top.

Changes from v2:
* Improve commit messages to include timing and performance information.
* Improve commit messages to be less ambiguous and more friendly to a
  wider variety of English speakers.
* Prefer functions taking struct git_hash_algo in hex.c.
* Port pieces of the block-sha1 implementation over to the block-sha256
  implementation for better compatibility.
* Drop patch 13 in favor of further discussion about the best way
  forward for versioning commit graph.
* Rename the test so as to have a different number from other tests.
* Rebase on master.

Changes from v1:
* Add a hash_to_hex function mirroring sha1_to_hex, but for
  the_hash_algo.
* Strip commit message explanation about why we chose SHA-256.
* Rebase on master
* Strip leading whitespace from commit message.
* Improve commit-graph patch to cover new code added since v1.
* Be more honest about the scope of work involved in porting the SHA-256
  implementation out of libtomcrypt.
* Revert change to limit hashcmp to 20 bytes.

brian m. carlson (12):
  sha1-file: rename algorithm to "sha1"
  sha1-file: provide functions to look up hash algorithms
  hex: introduce functions to print arbitrary hashes
  cache: make hashcmp and hasheq work with larger hashes
  t: add basic tests for our SHA-1 implementation
  t: make the sha1 test-tool helper generic
  sha1-file: add a constant for hash block size
  t/helper: add a test helper to compute hash speed
  commit-graph: convert to using the_hash_algo
  Add a base implementation of SHA-256 support
  sha256: add an SHA-256 implementation using libgcrypt
  hash: add an SHA-256 implementation using OpenSSL

 Makefile  |  22 +++
 cache.h   |  51 ---
 commit-graph.c|  33 ++---
 hash.h|  41 +-
 hex.c |  32 +++--
 sha1-file.c   |  70 -
 sha256/block/sha256.c | 196 ++
 sha256/block/sha256.h |  24 
 sha256/gcrypt.h   |  30 
 t/helper/test-hash-speed.c|  61 
 t/helper/{test-sha1.c => test-hash.c} |  19 +--
 t/helper/test-sha1.c  |  52 +--
 t/helper/test-sha256.c|   7 +
 t/helper/test-tool.c  |   2 +
 t/helper/test-tool.h  |   4 +
 t/t0015-hash.sh   |  55 
 16 files changed, 595 insertions(+), 104 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 sha256/gcrypt.h
 create mode 100644 t/helper/test-hash-speed.c
 copy t/helper/{test-sha1.c => test-hash.c} (65%)
 create mode 100644 t/helper/test-sha256.c
 create mode 100755 t/t0015-hash.sh

Range-diff against v5:
 1:  a004a4c982 <  -:  -- :hash-impl
 2:  cf9f7f5620 =  1:  fa21b5948c sha1-file: rename algorithm to "sha1"
 3:  0144deaebe =  2:  4e146c92af sha1-file: provide functions to look up hash 
algorithms
 4:  b74858fb03 !  3:  10e7893242 hex: introduce functions to print arbitrary 
hashes
@@ -64,8 +64,8 @@
  }
  
 -char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
-+inline char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
-+  const struct git_hash_algo *algop)
++char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,
++const struct git_hash_algo *algop)
  {
static const char hex[] = "0123456789abcdef";
char *buf = buffer;
 5:  e9703017a4 =  4:  6396a0ff57 cache: make hashcmp and hasheq work with 
larger hashes
 6:  ab85a834fd !  5:  a51a8b5638 t: add basic tests for our SHA-1 
implementation
@@ -33,7 +33,7 @@
 +  grep c12252ceda8be8994d5fa0290a47231c1d16aae3 actual &&
 +  printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha1 >actual &&
 +  grep 32d10c7b8cf96570ca04ce37f2a19d84240d3a89 actual &&
-+  perl -E "for (1..10) { print q{aa}; }" | \
++  perl -e "$| = 1; print q{aa} for 1..10;" | \
 +  test-tool sha1 >actual &&
 +  grep 34aa973cd4c4daa4f61eeb2bdbad27316534016f actual &&

[PATCH v6 06/12] t: make the sha1 test-tool helper generic

2018-11-13 Thread brian m. carlson

Since we're going to have multiple hash algorithms to test, it makes
sense to share as much of the test code as possible.  Convert the sha1
helper for the test-tool to be generic and move it out into its own
module.  This will allow us to share most of this code with our NewHash
implementation.

Signed-off-by: brian m. carlson 
---
 Makefile  |  1 +
 t/helper/{test-sha1.c => test-hash.c} | 19 +-
 t/helper/test-sha1.c  | 52 +--
 t/helper/test-tool.h  |  2 ++
 4 files changed, 14 insertions(+), 60 deletions(-)
 copy t/helper/{test-sha1.c => test-hash.c} (65%)

diff --git a/Makefile b/Makefile
index 016fdcdb81..daf0b198ec 100644
--- a/Makefile
+++ b/Makefile
@@ -724,6 +724,7 @@ TEST_BUILTINS_OBJS += test-dump-split-index.o
 TEST_BUILTINS_OBJS += test-dump-untracked-cache.o
 TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
+TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
diff --git a/t/helper/test-sha1.c b/t/helper/test-hash.c
similarity index 65%
copy from t/helper/test-sha1.c
copy to t/helper/test-hash.c
index 1ba0675c75..0a31de66f3 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-hash.c
@@ -1,13 +1,14 @@
 #include "test-tool.h"
 #include "cache.h"
 
-int cmd__sha1(int ac, const char **av)
+int cmd_hash_impl(int ac, const char **av, int algo)
 {
-   git_SHA_CTX ctx;
-   unsigned char sha1[20];
+   git_hash_ctx ctx;
+   unsigned char hash[GIT_MAX_HEXSZ];
unsigned bufsz = 8192;
int binary = 0;
char *buffer;
+   const struct git_hash_algo *algop = _algos[algo];
 
if (ac == 2) {
if (!strcmp(av[1], "-b"))
@@ -26,7 +27,7 @@ int cmd__sha1(int ac, const char **av)
die("OOPS");
}
 
-   git_SHA1_Init();
+   algop->init_fn();
 
while (1) {
ssize_t sz, this_sz;
@@ -38,20 +39,20 @@ int cmd__sha1(int ac, const char **av)
if (sz == 0)
break;
if (sz < 0)
-   die_errno("test-sha1");
+   die_errno("test-hash");
this_sz += sz;
cp += sz;
room -= sz;
}
if (this_sz == 0)
break;
-   git_SHA1_Update(, buffer, this_sz);
+   algop->update_fn(, buffer, this_sz);
}
-   git_SHA1_Final(sha1, );
+   algop->final_fn(hash, );
 
if (binary)
-   fwrite(sha1, 1, 20, stdout);
+   fwrite(hash, 1, algop->rawsz, stdout);
else
-   puts(sha1_to_hex(sha1));
+   puts(hash_to_hex_algop(hash, algop));
exit(0);
 }
diff --git a/t/helper/test-sha1.c b/t/helper/test-sha1.c
index 1ba0675c75..d860c387c3 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-sha1.c
@@ -3,55 +3,5 @@
 
 int cmd__sha1(int ac, const char **av)
 {
-   git_SHA_CTX ctx;
-   unsigned char sha1[20];
-   unsigned bufsz = 8192;
-   int binary = 0;
-   char *buffer;
-
-   if (ac == 2) {
-   if (!strcmp(av[1], "-b"))
-   binary = 1;
-   else
-   bufsz = strtoul(av[1], NULL, 10) * 1024 * 1024;
-   }
-
-   if (!bufsz)
-   bufsz = 8192;
-
-   while ((buffer = malloc(bufsz)) == NULL) {
-   fprintf(stderr, "bufsz %u is too big, halving...\n", bufsz);
-   bufsz /= 2;
-   if (bufsz < 1024)
-   die("OOPS");
-   }
-
-   git_SHA1_Init();
-
-   while (1) {
-   ssize_t sz, this_sz;
-   char *cp = buffer;
-   unsigned room = bufsz;
-   this_sz = 0;
-   while (room) {
-   sz = xread(0, cp, room);
-   if (sz == 0)
-   break;
-   if (sz < 0)
-   die_errno("test-sha1");
-   this_sz += sz;
-   cp += sz;
-   room -= sz;
-   }
-   if (this_sz == 0)
-   break;
-   git_SHA1_Update(, buffer, this_sz);
-   }
-   git_SHA1_Final(sha1, );
-
-   if (binary)
-   fwrite(sha1, 1, 20, stdout);
-   else
-   puts(sha1_to_hex(sha1));
-   exit(0);
+   return cmd_hash_impl(ac, av, GIT_HASH_SHA1);
 }
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 042f12464b..b5b7712a1d 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -51,4 +51,6 @@ int cmd__windows_named_pipe(int argc, const char **argv);
 #endif
 int cmd__write_cache(int argc, const char **argv);

[PATCH v6 05/12] t: add basic tests for our SHA-1 implementation

2018-11-13 Thread brian m. carlson

We have in the past had some unfortunate endianness issues with some
SHA-1 implementations we ship, especially on big-endian machines.  Add
an explicit test using the test helper to catch these issues and point
them out prominently.  This test can also be used as a staging ground
for people testing additional algorithms to verify that their
implementations are working as expected.

Signed-off-by: brian m. carlson 
---
 t/t0015-hash.sh | 29 +
 1 file changed, 29 insertions(+)
 create mode 100755 t/t0015-hash.sh

diff --git a/t/t0015-hash.sh b/t/t0015-hash.sh
new file mode 100755
index 00..884fe5acd1
--- /dev/null
+++ b/t/t0015-hash.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+test_description='test basic hash implementation'
+. ./test-lib.sh
+
+
+test_expect_success 'test basic SHA-1 hash values' '
+   test-tool sha1 actual &&
+   grep da39a3ee5e6b4b0d3255bfef95601890afd80709 actual &&
+   printf "a" | test-tool sha1 >actual &&
+   grep 86f7e437faa5a7fce15d1ddcb9eaeaea377667b8 actual &&
+   printf "abc" | test-tool sha1 >actual &&
+   grep a9993e364706816aba3e25717850c26c9cd0d89d actual &&
+   printf "message digest" | test-tool sha1 >actual &&
+   grep c12252ceda8be8994d5fa0290a47231c1d16aae3 actual &&
+   printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha1 >actual &&
+   grep 32d10c7b8cf96570ca04ce37f2a19d84240d3a89 actual &&
+   perl -e "$| = 1; print q{aa} for 1..10;" | \
+   test-tool sha1 >actual &&
+   grep 34aa973cd4c4daa4f61eeb2bdbad27316534016f actual &&
+   printf "blob 0\0" | test-tool sha1 >actual &&
+   grep e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 actual &&
+   printf "blob 3\0abc" | test-tool sha1 >actual &&
+   grep f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f actual &&
+   printf "tree 0\0" | test-tool sha1 >actual &&
+   grep 4b825dc642cb6eb9a060e54bf8d69288fbee4904 actual
+'
+
+test_done

[PATCH v6 04/12] cache: make hashcmp and hasheq work with larger hashes

2018-11-13 Thread brian m. carlson

In 183a638b7d ("hashcmp: assert constant hash size", 2018-08-23), we
modified hashcmp to assert that the hash size was always 20 to help it
optimize and inline calls to memcmp.  In a future series, we replaced
many calls to hashcmp and oidcmp with calls to hasheq and oideq to
improve inlining further.

However, we want to support hash algorithms other than SHA-1, namely
SHA-256.  When doing so, we must handle the case where these values are
32 bytes long as well as 20.  Adjust hashcmp to handle two cases:
20-byte matches, and maximum-size matches.  Therefore, when we include
SHA-256, we'll automatically handle it properly, while at the same time
teaching the compiler that there are only two possible options to
consider.  This will allow the compiler to write the most efficient
possible code.

Copy similar code into hasheq and perform an identical transformation.
At least with GCC 8.2.0, making hasheq defer to hashcmp when there are
two branches prevents the compiler from inlining the comparison, while
the code in this patch is inlined properly.  Add a comment to avoid an
accidental performance regression from well-intentioned refactoring.

Signed-off-by: brian m. carlson 
---
 cache.h | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index cb7268deea..8607878dbc 100644
--- a/cache.h
+++ b/cache.h
@@ -1028,16 +1028,12 @@ extern const struct object_id null_oid;
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
/*
-* This is a temporary optimization hack. By asserting the size here,
-* we let the compiler know that it's always going to be 20, which lets
-* it turn this fixed-size memcmp into a few inline instructions.
-*
-* This will need to be extended or ripped out when we learn about
-* hashes of different sizes.
+* Teach the compiler that there are only two possibilities of hash size
+* here, so that it can optimize for this case as much as possible.
 */
-   if (the_hash_algo->rawsz != 20)
-   BUG("hash size not yet supported by hashcmp");
-   return memcmp(sha1, sha2, the_hash_algo->rawsz);
+   if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
+   return memcmp(sha1, sha2, GIT_MAX_RAWSZ);
+   return memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id 
*oid2)
@@ -1047,7 +1043,13 @@ static inline int oidcmp(const struct object_id *oid1, 
const struct object_id *o
 
 static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
 {
-   return !hashcmp(sha1, sha2);
+   /*
+* We write this here instead of deferring to hashcmp so that the
+* compiler can properly inline it and avoid calling memcmp.
+*/
+   if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
+   return !memcmp(sha1, sha2, GIT_MAX_RAWSZ);
+   return !memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
 }
 
 static inline int oideq(const struct object_id *oid1, const struct object_id 
*oid2)

[PATCH v6 07/12] sha1-file: add a constant for hash block size

2018-11-13 Thread brian m. carlson

There is one place we need the hash algorithm block size: the HMAC code
for push certs.  Expose this constant in struct git_hash_algo and expose
values for SHA-1 and for the largest value of any hash.

Signed-off-by: brian m. carlson 
---
 cache.h | 4 
 hash.h  | 3 +++
 sha1-file.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/cache.h b/cache.h
index 8607878dbc..62b2f3a5e8 100644
--- a/cache.h
+++ b/cache.h
@@ -45,10 +45,14 @@ unsigned long git_deflate_bound(git_zstream *, unsigned 
long);
 /* The length in bytes and in hex digits of an object name (SHA-1 value). */
 #define GIT_SHA1_RAWSZ 20
 #define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
+/* The block size of SHA-1. */
+#define GIT_SHA1_BLKSZ 64
 
 /* The length in byte and in hex digits of the largest possible hash value. */
 #define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
 #define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+/* The largest possible block size for any supported hash. */
+#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
 
 struct object_id {
unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 80881eea47..1bcf7ab6fd 100644
--- a/hash.h
+++ b/hash.h
@@ -81,6 +81,9 @@ struct git_hash_algo {
/* The length of the hash in hex characters. */
size_t hexsz;
 
+   /* The block size of the hash. */
+   size_t blksz;
+
/* The hash initialization function. */
git_hash_init_fn init_fn;
 
diff --git a/sha1-file.c b/sha1-file.c
index 93ed8c8686..c47349a5f8 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -90,6 +90,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
0x,
0,
0,
+   0,
git_hash_unknown_init,
git_hash_unknown_update,
git_hash_unknown_final,
@@ -102,6 +103,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
0x73686131,
GIT_SHA1_RAWSZ,
GIT_SHA1_HEXSZ,
+   GIT_SHA1_BLKSZ,
git_hash_sha1_init,
git_hash_sha1_update,
git_hash_sha1_final,

[PATCH v6 01/12] sha1-file: rename algorithm to "sha1"

2018-11-13 Thread brian m. carlson

The transition plan anticipates us using a syntax such as "^{sha1}" for
disambiguation.  Since this is a syntax some people will be typing a
lot, it makes sense to provide a short, easy-to-type syntax.  Omitting
the dash doesn't create any ambiguity; however, it does make the syntax
shorter and easier to type, especially for touch typists.  In addition,
the transition plan already uses "sha1" in this context.

Rename the name of SHA-1 implementation to "sha1".

Note that this change creates no backwards compatibility concerns, since
we haven't yet used this field in any configuration settings.

Signed-off-by: brian m. carlson 
---
 sha1-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1-file.c b/sha1-file.c
index 2daf7d9935..3b9c042691 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -97,7 +97,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
NULL,
},
{
-   "sha-1",
+   "sha1",
/* "sha1", big-endian */
0x73686131,
GIT_SHA1_RAWSZ,

Re: [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms

2018-11-13 Thread Ramsay Jones




On 14/11/2018 02:11, brian m. carlson wrote:
> On Wed, Nov 14, 2018 at 12:11:07AM +, Ramsay Jones wrote:
>>
>>
>> On 13/11/2018 18:42, Derrick Stolee wrote:
>>> On 11/4/2018 6:44 PM, brian m. carlson wrote:
 +int hash_algo_by_name(const char *name)
 +{
 +    int i;
 +    if (!name)
 +    return GIT_HASH_UNKNOWN;
 +    for (i = 1; i < GIT_HASH_NALGOS; i++)
 +    if (!strcmp(name, hash_algos[i].name))
 +    return i;
 +    return GIT_HASH_UNKNOWN;
 +}
 +
>>>
>>> Today's test coverage report [1] shows this method is not covered in the 
>>> test suite. Looking at 'pu', it doesn't have any callers.
>>>
>>> Do you have a work in progress series that will use this? Could we add a 
>>> test-tool to exercise this somehow?
>>
>> There are actually 4 unused external symbols resulting from Brian's
>> 'bc/sha-256' branch. The new unused externals in 'pu' looks like:
>>
>> $ diff nsc psc
>> 37a38,39
>> > hex.o  - hash_to_hex
> 
> I have code that uses this in my object-id-part15 series.  I also have
> another series coming after this one that makes heavy use of it.
> 
>> > hex.o  - hash_to_hex_algop_r
> 
> I believe this is because it's inline, since it is indeed used just a
> few lines below its definition.  I'll drop the inline, since it's meant
> to be externally visible.

No, this has nothing to do with the 'inline', it is simply not
called outside of hex.c (at present). If you look at the assembler
(objdump -d hex.o), you will find practically all of the function
calls in that file are inlined (even those not marked with 'inline').

[I think the external declaration in cache.h forces the compiler to
add the external definition, despite the 'inline'. If you remove the
'inline' and re-compile and disassemble again, the result is identical.]

Thanks for confirming upcoming patches will add uses for all of
these functions - I suspected that would be the case.

Thanks!

ATB,
Ramsay Jones

> 
>> > sha1-file.o- hash_algo_by_id
> 
> This will be used when I write pack index v3, which will be in my
> object-id-part15 series.
> 
>> > sha1-file.o- hash_algo_by_name
> 
> This is used in my object-id-part15 series.
>

Re: rebase-in-C stability for 2.20

2018-11-13 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason  writes:

> According to Junio's calendar we're now 2 days from 2.20-rc0. We have
> the js/rebase-autostash-detach-fix bug I reported sitting in "pu" still,
> and then this.
>
> I see we still have rebase.useBuiltin in the code as an escape hatch,
> but it's undocumented.
>
> Given that we're still finding regressions bugs in the rebase-in-C
> version should we be considering reverting 5541bd5b8f ("rebase: default
> to using the builtin rebase", 2018-08-08)?
>
> I love the feature, but fear that the current list of known regressions
> serve as a canary for a larger list which we'd discover if we held off
> for another major release (and would re-enable rebase.useBuiltin=true in
> master right after 2.20 is out the door).
>
> But maybe I'm being overly paranoid. What do those more familiar with
> this think?

I was hoping that having it early in GfW 2.19 would have smoked out
all the remaining issues, but it seems that those building from the
source and testing have usage patterns different enough to find more
issues.  This is a normal part of the development process, and
hopefully the remaining bugs are minor and can be flushed out in the
-rc testing period---this kind of thing is the whole reason why we
code-freeze and test rcs.

It unfortunately is too late to depend on the rebase.useBuiltin as
an escape hatch, as 'master', 'next', or anywhere else had the
"rebase and rebase -i in C" series merged without it set to false
and used in any seriousness.  Quite honestly, I think we can trust
the rest of the "rebase and rebase -i in C" series much more than
its "use the scripted one even though we have built-in one" part.

So if we have to seriously consider holding back, we may be better
off ripping the whole thing out.  I do not offhand know how involved
such a reversion would be, though, and I am *NOT* looking forward to
having do such a major surgery right before the release.

Re: [PATCH 3/3] index: do not warn about unrecognized extensions

2018-11-13 Thread Junio C Hamano

Jonathan Nieder  writes:

> We cannot change the past, but for index extensions of the future,
> there is a straightforward improvement: silence that message except
> when tracing.  This way, the message is still available when
> debugging, but in everyday use it does not show up so (once most Git
> users have this patch) we can turn on new optional extensions right
> away without alarming people.

That argument ignores the "let the users know they are using a stale
version when they did use (either by accident or deliberately) a
more recent one" value, though.

Even if we consider that this is only for debugging, I am not sure
if tracing is the right place to add.  As long as the "optional
extensions can be ignored without affecting the correctness" rule
holds, there is nothing gained by letting these messages shown for
debugging purposes, and if there is such a bug (e.g. we introduced
an optional extension but the code that wrote an index with an
optional extension wrote the non-optional part in such a way that it
cannot be correctly handled without the extension that is supposed
to be optional) we'd probably want to let users notice without
having to explicitly go into a debugging session.  If Googling for
"ignoring FNCY ext" gives "discard your index with 'reset HEAD',
because an index file with FNCY ext cannot be read without
understanding it", that may prevent damages from happening in the
first place.  On the other hand, hiding it behind tracing would mean
the user first need to exprience an unknown breakage first and then
has to enable tracing among other 47 different things to diagnose
and drill down to the root cause.

Re: [PATCH 2/3] ieot: default to not writing IEOT section

2018-11-13 Thread Junio C Hamano

Ben Peart  writes:

> Why introduce a new setting to disable writing the IEOT extension
> instead of just using the existing index.threads setting?

But index.threads is about what the reader does, not about the
writer who does not even know who will be reading the resulting
index, no?

Re: [PATCH] Makefile: use CXXFLAGS for linking fuzzers

2018-11-13 Thread Junio C Hamano

Josh Steadmon  writes:

> OSS-Fuzz only provides one set of CXXFLAGS for use on both compiling
> project C++ project files as well linking the fuzzers themselves. So in
> the event that Git ever added any C++ sources, they would need to use
> the same set of CXXFLAGS.

OK.

> Given that, do you agree with Stefan that it is more intuitive to define
> CXXFLAGS next to the fuzzer build rules, since that's the only place
> it's used for now?

I am not sure.  Until we gain other C++ targets (in other words,
while linking with fuzzer is the only consumer of CXXFLAGS), I'd
consider it similar to SPARSE_FLAGS and SPATCH_FLAGS, i.e. settings
specific to an auxiliary tool that supports our development process,
and it would make more sense to define it near them higher in the
Makefile.

I'd probably feel differently if this were called FUZZ_CXXFLAGS or
something like that, which would make its natural home next to
the rule to build $(FUZZ_PROGRAMS), though.

Re: [PATCH v3 1/1] protocol: advertise multiple supported versions

2018-11-13 Thread Junio C Hamano

SZEDER Gábor  writes:

>> +if (tmp_allowed_versions[0] != config_version)
>> +for (int i = 1; i < nr_allowed_versions; i++)
>
> We don't do C99 yet, thus the declaration of a loop variable like this
> is not allowed and triggers compiler errors.

I thought we did a weather-balloon to see if this bothers people who
build on minority platforms but

git grep 'for (int'

is coming up empty.

We have been trying designated initializers with weather-balloon
changes (both arrays and struct fields) and I somehow thought that
we already were trying this out, but apparently that is not the
case.  Also trailing comma in the enum definition was what we
discussed earlier but I do not know if we had weather-balloon for it
or not offhand.  

Perhaps we should write these down in CodingGuidelines.

Re: [PATCH v3 1/1] protocol: advertise multiple supported versions

2018-11-13 Thread Junio C Hamano

Josh Steadmon  writes:

> On 2018.11.13 13:01, Junio C Hamano wrote:
>> stead...@google.com writes:
>> 
>> > Currently the client advertises that it supports the wire protocol
>> > version set in the protocol.version config. However, not all services
>> > support the same set of protocol versions. When connecting to
>> > git-receive-pack, the client automatically downgrades to v0 if
>> > config.protocol is set to v2, but this check is not performed for other
>> > services.
>> 
>> "downgrades to v0 even if ... is set to v2" you mean?  Otherwise it
>> is unclear why asking for v2 leads to using v0.
>
> The downgrade on push happens only when the the configured version is
> v2. If v1 is requested, no downgrade is triggered. I'll clarify the
> commit message in the next version.

OK, then it will still be confusing unless "we downgrade v2 to v0
because ..."gives the reason.

> In any case, the ordering of the server's allowed versions won't matter;
> we'll pick the the first version in the client's list which is also
> allowed on the server.

That sounds like a very sensible semantics.

>
>> I am wondering if the code added by this patch outside this
>> function, with if (strcmp(client_ad.buf, "version=0") sprinkled all
>> over the place, works sensibly when the other side says "I prefer
>> version=0 but I do not mind talking version=1".
>
> Depends on what you mean by "sensibly" :). In the current case, if the
> client prefers v0, it will always end up using v0. After the fixes
> described above, it will always use v0 unless the server no longer
> supports v0. Is that what you would expect?

Yes, that sounds like a sensible behaviour.

What I was alludding to was a lot simpler, though.  An advert string
"version=0:version=1" from a client that prefers version 0 won't be
!strcmp("version=0", advert) but seeing many strcmp() in the patch
made me wonder.

Re: Re* [PATCH v3 1/1] protocol: advertise multiple supported versions

2018-11-13 Thread Junio C Hamano

Stefan Beller  writes:

> "Nobody reads documentation any more, we have too much of it."

Maybe, but spelling it out and writing it down, you can point at it
(or hope that other people point at it without making a bad or incorrect
rephrase).

> I would have hoped to have a .cocci patch in the bad style category,
> i.e. disallowing the _() function inside the context of BUG().
>
> That said, I like the patch below. Thanks for writing it.

That was a mere weather-baloon, and too early to thank about.
Hopefully people can pick nits and come to a final version that can
be applied.

I have a few things in it that I think may be debatable.  I am not
sure if we should talk about being careful not to send localized
strings over the wire in that comment, for example.

[PATCH v4 1/1] protocol: advertise multiple supported versions

2018-11-13 Thread Josh Steadmon

Currently the client advertises that it supports the wire protocol
version set in the protocol.version config. However, not all services
support the same set of protocol versions. For example, git-receive-pack
supports v1 and v0, but not v2. If a client connects to git-receive-pack
and requests v2, it will instead be downgraded to v0. Other services,
such as git-upload-archive, do not do any version negotiation checks.

This patch creates a protocol version registry. Individual client and
server programs register all the protocol versions they support prior to
communicating with a remote instance. Versions should be listed in
preference order; the version specified in protocol.version will
automatically be moved to the front of the registry.

The protocol version registry is passed to remote helpers via the
GIT_PROTOCOL environment variable.

Clients now advertise the full list of registered versions. Servers
select the first allowed version from this advertisement.

While we're at it, remove unnecessary externs from function declarations
in protocol.h.

Signed-off-by: Josh Steadmon 
---
 builtin/archive.c|   3 +
 builtin/clone.c  |   4 ++
 builtin/fetch-pack.c |   4 ++
 builtin/fetch.c  |   5 ++
 builtin/ls-remote.c  |   5 ++
 builtin/pull.c   |   5 ++
 builtin/push.c   |   4 ++
 builtin/receive-pack.c   |   3 +
 builtin/send-pack.c  |   3 +
 builtin/upload-archive.c |   3 +
 builtin/upload-pack.c|   4 ++
 connect.c|  47 +++
 protocol.c   | 122 ---
 protocol.h   |  23 +++-
 remote-curl.c|  28 ++---
 t/t5570-git-daemon.sh|   2 +-
 t/t5700-protocol-v1.sh   |   8 +--
 t/t5702-protocol-v2.sh   |  16 +++--
 transport-helper.c   |   6 ++
 19 files changed, 237 insertions(+), 58 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index e74f675390..8adb9f381b 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -8,6 +8,7 @@
 #include "transport.h"
 #include "parse-options.h"
 #include "pkt-line.h"
+#include "protocol.h"
 #include "sideband.h"
 
 static void create_output_file(const char *output_file)
@@ -94,6 +95,8 @@ int cmd_archive(int argc, const char **argv, const char 
*prefix)
OPT_END()
};
 
+   register_allowed_protocol_version(protocol_v0);
+
argc = parse_options(argc, argv, prefix, local_opts, NULL,
 PARSE_OPT_KEEP_ALL);
 
diff --git a/builtin/clone.c b/builtin/clone.c
index fd2c3ef090..1651a950b6 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -900,6 +900,10 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
struct refspec rs = REFSPEC_INIT_FETCH;
struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
+   register_allowed_protocol_version(protocol_v2);
+   register_allowed_protocol_version(protocol_v1);
+   register_allowed_protocol_version(protocol_v0);
+
fetch_if_missing = 0;
 
packet_trace_identity("clone");
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 1a1bc63566..cba935e4d3 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -57,6 +57,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
 
fetch_if_missing = 0;
 
+   register_allowed_protocol_version(protocol_v2);
+   register_allowed_protocol_version(protocol_v1);
+   register_allowed_protocol_version(protocol_v0);
+
packet_trace_identity("fetch-pack");
 
memset(, 0, sizeof(args));
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 61bec5d213..2a20cf8bfd 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -21,6 +21,7 @@
 #include "argv-array.h"
 #include "utf8.h"
 #include "packfile.h"
+#include "protocol.h"
 #include "list-objects-filter-options.h"
 
 static const char * const builtin_fetch_usage[] = {
@@ -1476,6 +1477,10 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
int prune_tags_ok = 1;
struct argv_array argv_gc_auto = ARGV_ARRAY_INIT;
 
+   register_allowed_protocol_version(protocol_v2);
+   register_allowed_protocol_version(protocol_v1);
+   register_allowed_protocol_version(protocol_v0);
+
packet_trace_identity("fetch");
 
fetch_if_missing = 0;
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index 1a25df7ee1..ea685e8bb9 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -1,5 +1,6 @@
 #include "builtin.h"
 #include "cache.h"
+#include "protocol.h"
 #include "transport.h"
 #include "ref-filter.h"
 #include "remote.h"
@@ -80,6 +81,10 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
 
memset(_array, 0, sizeof(ref_array));
 
+   register_allowed_protocol_version(protocol_v2);
+   register_allowed_protocol_version(protocol_v1);
+   register_allowed_protocol_version(protocol_v0);
+
argc = parse_options(argc, argv,

[PATCH v4 0/1] Advertise multiple supported proto versions

2018-11-13 Thread Josh Steadmon

Fix several bugs identified in v3, clarify commit message, and clean up
extern keyword in protocol.h.

Josh Steadmon (1):
  protocol: advertise multiple supported versions

 builtin/archive.c|   3 +
 builtin/clone.c  |   4 ++
 builtin/fetch-pack.c |   4 ++
 builtin/fetch.c  |   5 ++
 builtin/ls-remote.c  |   5 ++
 builtin/pull.c   |   5 ++
 builtin/push.c   |   4 ++
 builtin/receive-pack.c   |   3 +
 builtin/send-pack.c  |   3 +
 builtin/upload-archive.c |   3 +
 builtin/upload-pack.c|   4 ++
 connect.c|  47 +++
 protocol.c   | 122 ---
 protocol.h   |  23 +++-
 remote-curl.c|  28 ++---
 t/t5570-git-daemon.sh|   2 +-
 t/t5700-protocol-v1.sh   |   8 +--
 t/t5702-protocol-v2.sh   |  16 +++--
 transport-helper.c   |   6 ++
 19 files changed, 237 insertions(+), 58 deletions(-)

Range-diff against v3:
1:  b9968e3fb0 ! 1:  3c023991c0 protocol: advertise multiple supported versions
@@ -4,22 +4,25 @@
 
 Currently the client advertises that it supports the wire protocol
 version set in the protocol.version config. However, not all services
-support the same set of protocol versions. When connecting to
-git-receive-pack, the client automatically downgrades to v0 if
-config.protocol is set to v2, but this check is not performed for other
-services.
+support the same set of protocol versions. For example, 
git-receive-pack
+supports v1 and v0, but not v2. If a client connects to 
git-receive-pack
+and requests v2, it will instead be downgraded to v0. Other services,
+such as git-upload-archive, do not do any version negotiation checks.
 
-This patch creates a protocol version registry. Individual operations
-register all the protocol versions they support prior to communicating
-with a server. Versions should be listed in preference order; the
-version specified in protocol.version will automatically be moved to 
the
-front of the registry.
+This patch creates a protocol version registry. Individual client and
+server programs register all the protocol versions they support prior 
to
+communicating with a remote instance. Versions should be listed in
+preference order; the version specified in protocol.version will
+automatically be moved to the front of the registry.
 
 The protocol version registry is passed to remote helpers via the
 GIT_PROTOCOL environment variable.
 
 Clients now advertise the full list of registered versions. Servers
-select the first recognized version from this advertisement.
+select the first allowed version from this advertisement.
+
+While we're at it, remove unnecessary externs from function 
declarations
+in protocol.h.
 
 Signed-off-by: Josh Steadmon 
@@ -165,6 +168,20 @@
git_config(git_push_config, );
argc = parse_options(argc, argv, prefix, options, push_usage, 0);
 
+ diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
+ --- a/builtin/receive-pack.c
+ +++ b/builtin/receive-pack.c
+@@
+ 
+   packet_trace_identity("receive-pack");
+ 
++  register_allowed_protocol_version(protocol_v1);
++  register_allowed_protocol_version(protocol_v0);
++
+   argc = parse_options(argc, argv, prefix, options, receive_pack_usage, 
0);
+ 
+   if (argc > 1)
+
  diff --git a/builtin/send-pack.c b/builtin/send-pack.c
  --- a/builtin/send-pack.c
  +++ b/builtin/send-pack.c
@@ -179,6 +196,42 @@
argc = parse_options(argc, argv, prefix, options, send_pack_usage, 0);
if (argc > 0) {
 
+ diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
+ --- a/builtin/upload-archive.c
+ +++ b/builtin/upload-archive.c
+@@
+ #include "builtin.h"
+ #include "archive.h"
+ #include "pkt-line.h"
++#include "protocol.h"
+ #include "sideband.h"
+ #include "run-command.h"
+ #include "argv-array.h"
+@@
+   if (argc == 2 && !strcmp(argv[1], "-h"))
+   usage(upload_archive_usage);
+ 
++  register_allowed_protocol_version(protocol_v0);
++
+   /*
+* Set up sideband subprocess.
+*
+
+ diff --git a/builtin/upload-pack.c b/builtin/upload-pack.c
+ --- a/builtin/upload-pack.c
+ +++ b/builtin/upload-pack.c
+@@
+   packet_trace_identity("upload-pack");
+   read_replace_refs = 0;
+ 
++  register_allowed_protocol_version(protocol_v2);
++  register_allowed_protocol_version(protocol_v1);
++  register_allowed_protocol_version(protocol_v0);
++
+   argc = parse_options(argc, argv, NULL, options, upload_pack_usage, 0);
+ 
+   if (argc != 1)
+

Re: [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms

2018-11-13 Thread brian m. carlson

On Wed, Nov 14, 2018 at 12:11:07AM +, Ramsay Jones wrote:
> 
> 
> On 13/11/2018 18:42, Derrick Stolee wrote:
> > On 11/4/2018 6:44 PM, brian m. carlson wrote:
> >> +int hash_algo_by_name(const char *name)
> >> +{
> >> +    int i;
> >> +    if (!name)
> >> +    return GIT_HASH_UNKNOWN;
> >> +    for (i = 1; i < GIT_HASH_NALGOS; i++)
> >> +    if (!strcmp(name, hash_algos[i].name))
> >> +    return i;
> >> +    return GIT_HASH_UNKNOWN;
> >> +}
> >> +
> > 
> > Today's test coverage report [1] shows this method is not covered in the 
> > test suite. Looking at 'pu', it doesn't have any callers.
> > 
> > Do you have a work in progress series that will use this? Could we add a 
> > test-tool to exercise this somehow?
> 
> There are actually 4 unused external symbols resulting from Brian's
> 'bc/sha-256' branch. The new unused externals in 'pu' looks like:
> 
> $ diff nsc psc
> 37a38,39
> > hex.o   - hash_to_hex

I have code that uses this in my object-id-part15 series.  I also have
another series coming after this one that makes heavy use of it.

> > hex.o   - hash_to_hex_algop_r

I believe this is because it's inline, since it is indeed used just a
few lines below its definition.  I'll drop the inline, since it's meant
to be externally visible.

> > sha1-file.o - hash_algo_by_id

This will be used when I write pack index v3, which will be in my
object-id-part15 series.

> > sha1-file.o - hash_algo_by_name

This is used in my object-id-part15 series.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: [PATCH] git-compat-util: prefer poll.h to sys/poll.h

2018-11-13 Thread brian m. carlson

On Wed, Nov 14, 2018 at 08:10:43AM +0700, Đoàn Trần Công Danh wrote:
> POSIX specifies that  is the correct header for poll(2)
> whereas  is only needed for some old libc.
> 
> Let's follow the POSIX way by default.
> 
> This effectively eliminates musl's warning:
> 
> warning redirecting incorrect #include  to 
> 
> Signed-off-by: Đoàn Trần Công Danh 

I think this patch is fine.  This was in SUSv2, and I don't feel bad
about siding with a spec that's at least 17 years old.

> t0028, t1308.23, t3900.34 is failing under musl,
> Those test cases in question also fails without this patch.
> 
> - t0028 is failing because musl `iconv` output UTF-16 without BOM.
> I'm not sure if my installation is broken, or it's musl's default behavior.
> But, I think RFC2781, section 4.3 allows the missing BOM

While the spec may allow this, we cannot for practical reasons.  There
are a large number of broken Windows programs that don't honor the spec
when it says to interpret UTF-16 byte sequences without a BOM as
big-endian, and instead use little-endian.  Since we cannot fix all the
broken Windows programs people use, we need to emit the BOM in UTF-16
mode.

> - t1308.23 is failing because musl `fopen` is success when open directory
> in readonly mode. POSIX allows this behavior:
> http://pubs.opengroup.org/onlinepubs/7908799/xsh/fopen.html
> [EISDIR]
> The named file is a directory and mode requires write access.

Does setting FREAD_READS_DIRECTORIES fix this?
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: [PATCH 1/3] eoie: default to not writing EOIE section

2018-11-13 Thread Junio C Hamano

Jonathan Nieder  writes:

>  1. Using multiple versions of Git on a single machine.  For example,
> some IDEs bundle a particular version of Git, which can be a
> different version from the system copy, or on a Mac, /usr/bin/git
> quickly goes out of sync with the Homebrew git in
> /usr/local/bin/git.

Exactly this, especially the latter, is the answer to your 
question in an earlier message:

>> Am I understanding correclty?  Can you give an example of when a user
>> would *want* to see this message and what they would do in response?

The user may not be even aware of using another version of Git that
does not know how to take advantage of the version of Git you have
used in the repository, and it can be a mistake the user may want to
fix (e.g. by futzing with PATH).  The message would help the user
notice the situation and take corrective action.  Users of IDEs that
bundle stale version of Git cannot even bug the supplier of the IDE
to make them more up-to-date if they aren't aware of it.

[PATCH] git-compat-util: prefer poll.h to sys/poll.h

2018-11-13 Thread Đoàn Trần Công Danh

POSIX specifies that  is the correct header for poll(2)
whereas  is only needed for some old libc.

Let's follow the POSIX way by default.

This effectively eliminates musl's warning:

warning redirecting incorrect #include  to 

Signed-off-by: Đoàn Trần Công Danh 
---
t0028, t1308.23, t3900.34 is failing under musl,
Those test cases in question also fails without this patch.

- t0028 is failing because musl `iconv` output UTF-16 without BOM.
I'm not sure if my installation is broken, or it's musl's default behavior.
But, I think RFC2781, section 4.3 allows the missing BOM

- t1308.23 is failing because musl `fopen` is success when open directory
in readonly mode. POSIX allows this behavior:
http://pubs.opengroup.org/onlinepubs/7908799/xsh/fopen.html
[EISDIR]
The named file is a directory and mode requires write access.

- t3900.34: from what I understand, musl haven't supported ISO-2022-JP, yet.
https://wiki.musl-libc.org/functional-differences-from-glibc.html#iconv

 Makefile  | 8 +++-
 configure.ac  | 6 ++
 git-compat-util.h | 5 -
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index bbfbb4292..5734efe74 100644
--- a/Makefile
+++ b/Makefile
@@ -207,10 +207,12 @@ all::
 # Define MMAP_PREVENTS_DELETE if a file that is currently mmapped cannot be
 # deleted or cannot be replaced using rename().
 #
+# Define NO_POLL_H if you don't have poll.h.
+#
 # Define NO_SYS_POLL_H if you don't have sys/poll.h.
 #
 # Define NO_POLL if you do not have or don't want to use poll().
-# This also implies NO_SYS_POLL_H.
+# This also implies NO_POLL_H and NO_SYS_POLL_H.
 #
 # Define NEEDS_SYS_PARAM_H if you need to include sys/param.h to compile,
 # *PLEASE* REPORT to git@vger.kernel.org if your platform needs this;
@@ -1459,6 +1461,7 @@ ifdef NO_GETTEXT
USE_GETTEXT_SCHEME ?= fallthrough
 endif
 ifdef NO_POLL
+   NO_POLL_H = YesPlease
NO_SYS_POLL_H = YesPlease
COMPAT_CFLAGS += -DNO_POLL -Icompat/poll
COMPAT_OBJS += compat/poll/poll.o
@@ -1497,6 +1500,9 @@ endif
 ifdef NO_SYS_SELECT_H
BASIC_CFLAGS += -DNO_SYS_SELECT_H
 endif
+ifdef NO_POLL_H
+   BASIC_CFLAGS += -DNO_POLL_H
+endif
 ifdef NO_SYS_POLL_H
BASIC_CFLAGS += -DNO_SYS_POLL_H
 endif
diff --git a/configure.ac b/configure.ac
index e11b7976a..908e66a97 100644
--- a/configure.ac
+++ b/configure.ac
@@ -792,6 +792,12 @@ AC_CHECK_HEADER([sys/select.h],
 [NO_SYS_SELECT_H=UnfortunatelyYes])
 GIT_CONF_SUBST([NO_SYS_SELECT_H])
 #
+# Define NO_POLL_H if you don't have poll.h
+AC_CHECK_HEADER([poll.h],
+[NO_POLL_H=],
+[NO_POLL_H=UnfortunatelyYes])
+GIT_CONF_SUBST([NO_POLL_H])
+#
 # Define NO_SYS_POLL_H if you don't have sys/poll.h
 AC_CHECK_HEADER([sys/poll.h],
 [NO_SYS_POLL_H=],
diff --git a/git-compat-util.h b/git-compat-util.h
index 96a3f86d8..65f229f10 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -180,9 +180,12 @@
 #include 
 #include 
 #include 
-#ifndef NO_SYS_POLL_H
+#if !defined(NO_POLL_H)
+#include 
+#elif !defined(NO_SYS_POLL_H)
 #include 
 #else
+/* Pull the compat stuff */
 #include 
 #endif
 #ifdef HAVE_BSD_SYSCTL
-- 
2.19.1.856.g8858448bb.dirty

Re: [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms

2018-11-13 Thread brian m. carlson

On Tue, Nov 13, 2018 at 07:45:41PM +0100, Duy Nguyen wrote:
> On Tue, Nov 13, 2018 at 7:42 PM Derrick Stolee  wrote:
> >
> > On 11/4/2018 6:44 PM, brian m. carlson wrote:
> > > +int hash_algo_by_name(const char *name)
> > > +{
> > > + int i;
> > > + if (!name)
> > > + return GIT_HASH_UNKNOWN;
> > > + for (i = 1; i < GIT_HASH_NALGOS; i++)
> > > + if (!strcmp(name, hash_algos[i].name))
> > > + return i;
> > > + return GIT_HASH_UNKNOWN;
> > > +}
> > > +
> >
> > Today's test coverage report [1] shows this method is not covered in the
> > test suite. Looking at 'pu', it doesn't have any callers.
> >
> > Do you have a work in progress series that will use this? Could we add a
> > test-tool to exercise this somehow?
> 
> Going by the function name, it should be used when hash-object or
> other commands learn about --object-format=.

Correct.  I have extensions.objectFormat code that will use this.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms

2018-11-13 Thread Jeff King

On Wed, Nov 14, 2018 at 12:42:45AM +, Ramsay Jones wrote:

> BTW, if you were puzzling over the 3rd symbol from sha1-file.o
> (which I wasn't counting in the 4 symbols above! ;-) ), then I
> believe that is because Jeff's commit 3a2e08245c ("object-store: 
> provide helpers for loose_objects_cache", 2018-11-12) effectively
> moved the only call outside of sha1-file.c (in sha1-name.c) back
> into sha1-file.c
> 
> So, for_each_file_in_obj_subdir() could now be marked 'static'.
> (whether it should is a different issue).

I think it would be reasonable to do so. Most code shouldn't have to
care about the on-disk hashing structure, so ideally it wouldn't be part
of the public interface.

-Peff

Re: [PATCH] remote-curl: die on server-side errors

2018-11-13 Thread Jeff King

On Tue, Nov 13, 2018 at 02:25:40PM -0800, Josh Steadmon wrote:

> > The magic "4" here and in the existing "version 2" check is because we
> > are expecting pkt-lines. The original conditional always calls
> > packed_read_line_buf() and will complain if we didn't actually get a
> > pkt-line.
> > 
> > Should we confirm that we got a real packet-line? Or at least that those
> > first four are even plausible hex chars?
> > 
> > I admit that it's pretty unlikely that the server is going to fool us
> > here. It would need something like "foo ERRORS ARE FUN!". And even then
> > we'd report an error (whereas the correct behavior is falling back to
> > dumb http, but we know that won't work anyway because that's not a valid
> > ref advertisement). So I doubt this is really a bug per se, but it might
> > make it easier to understand what's going on if we actually parsed the
> > packet.
> 
> Unfortunately we can't just directly parse the data in last->buf,
> because other parts of the code are expecting to see the raw pkt-line
> data there. I tried adding a duplicate pointer and length variable for
> this data and parsing that through packet_read_line_buf(), but even
> without using the results this apparently has side-effects that break
> all of t5550 (and probably other tests as well). It also fails if I
> completely duplicate last->buf into a new char* and call
> packet_readline_buf() on that, so there's clearly some interaction in
> the pkt-line guts that I'm not properly accounting for.

Yes, the packet_read_line_buf() interface will both advance the buf
pointer and decrement the length.  So if we want to "peek", we have to
do so with a copy (there's a peek function if you use the packet_reader
interface, but that might be overkill here).

You can rewrite it like this, which is a pretty faithful conversion and
passes the tests (but see below).

diff --git a/remote-curl.c b/remote-curl.c
index 762a55a75f..66c57c9d62 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -330,9 +330,78 @@ static int get_protocol_http_header(enum protocol_version 
version,
return 0;
 }
 
+static void check_smart_http(struct discovery *d, const char *service,
+struct strbuf *type)
+{
+   char *src_buf;
+   size_t src_len;
+   char pkt[LARGE_PACKET_MAX];
+   int pkt_len;
+   enum packet_read_status r;
+
+   /*
+* We speculatively try to read a packet, which means we must preserve
+* the original buf/len pair in some cases.
+*/
+   src_buf = d->buf;
+   src_len = d->len;
+   r = packet_read_with_status(-1, _buf, _len,
+   pkt, sizeof(pkt), _len,
+   PACKET_READ_GENTLE |
+   PACKET_READ_CHOMP_NEWLINE);
+
+   /*
+* This could probably just be handled as "not smart" like all the
+* other pkt-line error cases, but traditionally we've complained
+* about it (technically we used to do so only when we got the right
+* content-type, but it probably doesn't matter).
+*/
+   if (r == PACKET_READ_FLUSH)
+   die("invalid server response; expected service, got flush 
packet");
+   if (r != PACKET_READ_NORMAL)
+   return; /* not smart */
+
+   if (pkt[0] == '#') {
+   /* v0 smart http */
+   struct strbuf exp = STRBUF_INIT;
+
+   strbuf_addf(, "application/x-%s-advertisement", service);
+   if (strbuf_cmp(, type)) {
+   strbuf_release();
+   return;
+   }
+
+   strbuf_reset();
+   strbuf_addf(, "# service=%s", service);
+   if (strcmp(pkt, exp.buf))
+   die("invalid server response; got '%s'", pkt);
+
+   strbuf_release();
+
+   /*
+* The header can include additional metadata lines, up
+* until a packet flush marker.  Ignore these now, but
+* in the future we might start to scan them.
+*/
+   while (packet_read_line_buf(_buf, _len, NULL))
+   ;
+
+   d->buf = src_buf;
+   d->len = src_len;
+   d->proto_git = 1;
+
+   } else if (starts_with(pkt, "version 2")) {
+   /*
+* v2 smart http; do not consume version packet, which will
+* be handled elsewhere. Should we be checking the content-type
+* in this code-path, too?
+*/
+   d->proto_git = 1;
+   }
+}
+
 static struct discovery *discover_refs(const char *service, int for_push)
 {
-   struct strbuf exp = STRBUF_INIT;
struct strbuf type = STRBUF_INIT;
struct strbuf charset = STRBUF_INIT;
struct strbuf buffer = STRBUF_INIT;
@@ -405,38 +474,8 @@ static struct discovery *discover_refs(const char 
*service, int

Re: [PATCH v3 1/1] protocol: advertise multiple supported versions

2018-11-13 Thread SZEDER Gábor

On Tue, Nov 13, 2018 at 03:03:47PM -0800, Josh Steadmon wrote:
> > > + for (int i = 1; i < nr_allowed_versions; i++)
> > 
> > We don't do C99 yet, thus the declaration of a loop variable like this
> > is not allowed and triggers compiler errors.

> Sorry about that. Will fix in v4. Out of curiousity, do you have a
> config.mak snippet that will make these into errors? I played around
> with adding combinations of -ansi, -std=c89, and -pedantic to CFLAGS,
> but I couldn't get anything that detect the problem without also
> breaking on other parts of the build.

Unfortunately, I don't have such an universal CFLAGS.

With gcc-4.8 the default CFLAGS is sufficient:

  $ make V=1 CC=gcc-4.8 protocol.o
  gcc-4.8 -o protocol.o -c -MF ./.depend/protocol.o.d -MQ protocol.o -MMD -MP  
-g -O2 -Wall -I. -DHAVE_SYSINFO -DGIT_HOST_CPU="\"x86_64\"" -DHAVE_ALLOCA_H 
-DUSE_CURL_FOR_IMAP_SEND -DSHA1_DC -DSHA1DC_NO_STANDARD_INCLUDES 
-DSHA1DC_INIT_SAFE_HASH_DEFAULT=0 -DSHA1DC_CUSTOM_INCLUDE_SHA1_C="\"cache.h\"" 
-DSHA1DC_CUSTOM_INCLUDE_UBC_CHECK_C="\"git-compat-util.h\""  -DHAVE_PATHS_H 
-DHAVE_DEV_TTY -DHAVE_CLOCK_GETTIME -DHAVE_CLOCK_MONOTONIC -DHAVE_GETDELIM 
'-DPROCFS_EXECUTABLE_PATH="/proc/self/exe"'  -DFREAD_READS_DIRECTORIES 
-DNO_STRLCPY -DSHELL_PATH='"/bin/sh"' -DPAGER_ENV='"LESS=FRX LV=-c"'  protocol.c
  protocol.c: In function ‘get_client_protocol_version_advertisement’:
  protocol.c:112:3: error: ‘for’ loop initial declarations are only allowed in 
C99 mode
 for (int i = 1; i < nr_allowed_versions; i++)
 ^
  < ... snip ... >

I couldn't get this error with any newer GCC or Clang, and using
options like -std=c89 trigger many other errors as well, just like you
mentioned.

Re: [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms

2018-11-13 Thread Ramsay Jones




On 14/11/2018 00:11, Ramsay Jones wrote:
> 
> 
> On 13/11/2018 18:42, Derrick Stolee wrote:
>> On 11/4/2018 6:44 PM, brian m. carlson wrote:
>>> +int hash_algo_by_name(const char *name)
>>> +{
>>> +    int i;
>>> +    if (!name)
>>> +    return GIT_HASH_UNKNOWN;
>>> +    for (i = 1; i < GIT_HASH_NALGOS; i++)
>>> +    if (!strcmp(name, hash_algos[i].name))
>>> +    return i;
>>> +    return GIT_HASH_UNKNOWN;
>>> +}
>>> +
>>
>> Today's test coverage report [1] shows this method is not covered in the 
>> test suite. Looking at 'pu', it doesn't have any callers.
>>
>> Do you have a work in progress series that will use this? Could we add a 
>> test-tool to exercise this somehow?
> 
> There are actually 4 unused external symbols resulting from Brian's
> 'bc/sha-256' branch. The new unused externals in 'pu' looks like:
> 
> $ diff nsc psc
> 37a38,39
> > hex.o   - hash_to_hex
> > hex.o   - hash_to_hex_algop_r
> 48a51
> > parse-options.o - optname
> 71a75
> > sha1-file.o - for_each_file_in_obj_subdir
> 72a77,78
> > sha1-file.o - hash_algo_by_id
> > sha1-file.o - hash_algo_by_name
> $ 
> 
> The symbols from hex.o and sha1-file.o being the 4 symbols from
> this branch.
> 
> I suspect that upcoming patches will make use of them. ;-)

BTW, if you were puzzling over the 3rd symbol from sha1-file.o
(which I wasn't counting in the 4 symbols above! ;-) ), then I
believe that is because Jeff's commit 3a2e08245c ("object-store: 
provide helpers for loose_objects_cache", 2018-11-12) effectively
moved the only call outside of sha1-file.c (in sha1-name.c) back
into sha1-file.c

So, for_each_file_in_obj_subdir() could now be marked 'static'.
(whether it should is a different issue).

ATB,
Ramsay Jones

Re: rebase-in-C stability for 2.20

2018-11-13 Thread Elijah Newren

On Tue, Nov 13, 2018 at 1:52 PM Ævar Arnfjörð Bjarmason
 wrote:
> According to Junio's calendar we're now 2 days from 2.20-rc0. We have
> the js/rebase-autostash-detach-fix bug I reported sitting in "pu" still,
> and then this.
>
> I see we still have rebase.useBuiltin in the code as an escape hatch,
> but it's undocumented.
>
> Given that we're still finding regressions bugs in the rebase-in-C
> version should we be considering reverting 5541bd5b8f ("rebase: default
> to using the builtin rebase", 2018-08-08)?

That date feels slightly misleading; it has a commit date of Oct 11,
and only merged to master less than two weeks ago.  Given how big the
two different rebase in C series were, I'd expect a couple small
things to fall out after it hit master, which is what appears to be
happening.

> I love the feature, but fear that the current list of known regressions
> serve as a canary for a larger list which we'd discover if we held off
> for another major release (and would re-enable rebase.useBuiltin=true in
> master right after 2.20 is out the door).
>
> But maybe I'm being overly paranoid. What do those more familiar with
> this think?

I probably don't qualify as more familar, but giving my $0.02
anyway...  I'm happy that setting rebase.useBuiltin=false by default
exists an escape hatch if things don't settle down as we get closer to
the release, but I'd rather wait until further into the RC cycle
before going that route, personally.

[PATCH v2 10/11] fast-export: add a --show-original-ids option to show original names

2018-11-13 Thread Elijah Newren

Knowing the original names (hashes) of commits can sometimes enable
post-filtering that would otherwise be difficult or impossible.  In
particular, the desire to rewrite commit messages which refer to other
prior commits (on top of whatever other filtering is being done) is
very difficult without knowing the original names of each commit.

In addition, knowing the original names (hashes) of blobs can allow
filtering by blob-id without requiring re-hashing the content of the
blob, and is thus useful as a small optimization.

Once we add original ids for both commits and blobs, we may as well
add them for tags too for completeness.  Perhaps someone will have a
use for them.

This commit teaches a new --show-original-ids option to fast-export
which will make it add a 'original-oid ' line to blob, commits,
and tags.  It also teaches fast-import to parse (and ignore) such
lines.

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-export.txt |  7 +++
 Documentation/git-fast-import.txt | 16 
 builtin/fast-export.c | 20 +++-
 fast-import.c | 12 
 t/t9350-fast-export.sh| 17 +
 5 files changed, 67 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-fast-export.txt 
b/Documentation/git-fast-export.txt
index f65026662a..64c01ba918 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -122,6 +122,13 @@ marks the same across runs.
repository which already contains the necessary parent
commits.
 
+--show-original-ids::
+   Add an extra directive to the output for commits and blobs,
+   `original-oid `.  While such directives will likely be
+   ignored by importers such as git-fast-import, it may be useful
+   for intermediary filters (e.g. for rewriting commit messages
+   which refer to older commits, or for stripping blobs by id).
+
 --refspec::
Apply the specified refspec to each ref exported. Multiple of them can
be specified.
diff --git a/Documentation/git-fast-import.txt 
b/Documentation/git-fast-import.txt
index 7ab97745a6..43ab3b1637 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -385,6 +385,7 @@ change to the project.
 
'commit' SP  LF
mark?
+   original-oid?
('author' (SP )? SP LT  GT SP  LF)?
'committer' (SP )? SP LT  GT SP  LF
data
@@ -741,6 +742,19 @@ New marks are created automatically.  Existing marks can 
be moved
 to another object simply by reusing the same `` in another
 `mark` command.
 
+`original-oid`
+~~
+Provides the name of the object in the original source control system.
+fast-import will simply ignore this directive, but filter processes
+which operate on and modify the stream before feeding to fast-import
+may have uses for this information
+
+
+   'original-oid' SP  LF
+
+
+where `` is any string not containing LF.
+
 `tag`
 ~
 Creates an annotated tag referring to a specific commit.  To create
@@ -749,6 +763,7 @@ lightweight (non-annotated) tags see the `reset` command 
below.
 
'tag' SP  LF
'from' SP  LF
+   original-oid?
'tagger' (SP )? SP LT  GT SP  LF
data
 
@@ -823,6 +838,7 @@ assigned mark.
 
'blob' LF
mark?
+   original-oid?
data
 
 
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 3cc98c31ad..e0f794811e 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -38,6 +38,7 @@ static int use_done_feature;
 static int no_data;
 static int full_tree;
 static int reference_excluded_commits;
+static int show_original_ids;
 static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
 static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
 static struct refspec refspecs = REFSPEC_INIT_FETCH;
@@ -271,7 +272,10 @@ static void export_blob(const struct object_id *oid)
 
mark_next_object(object);
 
-   printf("blob\nmark :%"PRIu32"\ndata %lu\n", last_idnum, size);
+   printf("blob\nmark :%"PRIu32"\n", last_idnum);
+   if (show_original_ids)
+   printf("original-oid %s\n", oid_to_hex(oid));
+   printf("data %lu\n", size);
if (size && fwrite(buf, size, 1, stdout) != 1)
die_errno("could not write blob '%s'", oid_to_hex(oid));
printf("\n");
@@ -634,8 +638,10 @@ static void handle_commit(struct commit *commit, struct 
rev_info *rev,
reencoded = reencode_string(message, "UTF-8", encoding);
if (!commit->parents)
printf("reset %s\n", refname);
-   printf("commit %s\nmark :%"PRIu32"\n%.*s\n%.*s\ndata %u\n%s",
-  refname, last_idnum,
+   printf("commit %s\nmark :%"PRIu32"\n", refname, last_idnum);
+   if (show_original_ids)
+   printf("original-oid %s\n", oid_to_hex(>object.oid));
+   printf("%.*s\n%.*s\ndata %u\n%s",

[PATCH v2 05/11] fast-export: move commit rewriting logic into a function for reuse

2018-11-13 Thread Elijah Newren

Logic to replace a filtered commit with an unfiltered ancestor is useful
elsewhere; put it into a function we can call.

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c | 37 ++---
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index b984a44224..7888fc98b5 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -187,6 +187,22 @@ static int get_object_mark(struct object *object)
return ptr_to_mark(decoration);
 }
 
+static struct commit *rewrite_commit(struct commit *p)
+{
+   for (;;) {
+   if (p->parents && p->parents->next)
+   break;
+   if (p->object.flags & UNINTERESTING)
+   break;
+   if (!(p->object.flags & TREESAME))
+   break;
+   if (!p->parents)
+   return NULL;
+   p = p->parents->item;
+   }
+   return p;
+}
+
 static void show_progress(void)
 {
static int counter = 0;
@@ -766,21 +782,12 @@ static void handle_tag(const char *name, struct tag *tag)
oid_to_hex(>object.oid),
type_name(tagged->type));
}
-   p = (struct commit *)tagged;
-   for (;;) {
-   if (p->parents && p->parents->next)
-   break;
-   if (p->object.flags & UNINTERESTING)
-   break;
-   if (!(p->object.flags & TREESAME))
-   break;
-   if (!p->parents) {
-   printf("reset %s\nfrom %s\n\n",
-  name, sha1_to_hex(null_sha1));
-   free(buf);
-   return;
-   }
-   p = p->parents->item;
+   p = rewrite_commit((struct commit *)tagged);
+   if (!p) {
+   printf("reset %s\nfrom %s\n\n",
+  name, sha1_to_hex(null_sha1));
+   free(buf);
+   return;
}
tagged_mark = get_object_mark(>object);
}
-- 
2.19.1.1063.g2b8e4a4f82.dirty

[PATCH v2 11/11] fast-export: add --always-show-modify-after-rename

2018-11-13 Thread Elijah Newren

I wanted a way to gather all the following information efficiently
(with as few history traversals as possible):
  * Get all blob sizes
  * Map blob shas to filename(s) they appeared under in the history
  * Find when files and directories were deleted (and whether they
were later reinstated, since that means they aren't actually gone)
  * Find sets of filenames referring to the same logical 'file'. (e.g.
foo->bar in commit A and bar->baz in commit B mean that
{foo,bar,baz} refer to the same 'file', so someone wanting to just
"keep baz and its history" need all versions of those three
filenames).  I need to know about things like another foo or bar
being introduced after the rename though, since that breaks the
connection between filenames)
and then I would generate various aggregations on the data and display
some type of report for the user.

The only way I know of to get blob sizes is via
  cat-file --batch-all-objects --batch-check

The rest of the data would traditionally be gathered from a log command,
e.g.

  git log --format='%H%n%P%n%cd' --date=short --topo-order --reverse \
  -M --diff-filter=RAMD --no-abbrev --raw -c

however, parsing log output seems slightly dangerous given that it is a
porcelain command.  While we have specified --format and --raw to try
to avoid the most obvious problems, I'm still slightly concerned about
--date=short, the combinations of --raw and -c, options that might
colorize the output, and also the --diff-filter (there is no current
option named --no-find-copies or --no-break-rewrites, but what if those
turn on by default in the future much as we changed the default with
detecting renames?).  Each of those is a small worry, but they add up.

A command meant for data serialization, such as fast-export, seems like
a better candidate for this job.  There's just one missing item: in
order to connect blob sizes to filenames, I need fast-export to tell me
the blob sha1sum of any file changes.  It does this for modifies, but
not always for renames.  In particular, if a file is a 100% rename, it
only prints
R oldname newname
instead of
R oldname newname
M 100644 $SHA1 newname
as occurs when there is a rename+modify.  Add an option which allows us
to force the latter output even when commits have exact renames of
files.

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-export.txt | 11 ++
 builtin/fast-export.c |  7 +-
 t/t9350-fast-export.sh| 36 +++
 3 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-fast-export.txt 
b/Documentation/git-fast-export.txt
index 64c01ba918..b663b6f8af 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -129,6 +129,17 @@ marks the same across runs.
for intermediary filters (e.g. for rewriting commit messages
which refer to older commits, or for stripping blobs by id).
 
+--always-show-modify-after-rename::
+   When a rename is detected, fast-export normally issues both a
+   'R' (rename) and a 'M' (modify) directive.  However, if the
+   contents of the old and new filename match exactly, it will
+   only issue the rename directive.  Use this flag to have it
+   always issue the modify directive after the rename, which may
+   be useful for tools which are using the fast-export stream as
+   a mechanism for gathering statistics about a repository.  Note
+   that this option only has effect when rename detection is
+   active (see the -M option).
+
 --refspec::
Apply the specified refspec to each ref exported. Multiple of them can
be specified.
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index e0f794811e..31ad43077a 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -38,6 +38,7 @@ static int use_done_feature;
 static int no_data;
 static int full_tree;
 static int reference_excluded_commits;
+static int always_show_modify_after_rename;
 static int show_original_ids;
 static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
 static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
@@ -407,7 +408,8 @@ static void show_filemodify(struct diff_queue_struct *q,
putchar('\n');
 
if (oideq(>oid, >oid) &&
-   ospec->mode == spec->mode)
+   ospec->mode == spec->mode &&
+   !always_show_modify_after_rename)
break;
}
/* fallthrough */
@@ -1105,6 +1107,9 @@ int cmd_fast_export(int argc, const char **argv, const 
char *prefix)
 _excluded_commits, N_("Reference parents 
which are not in fast-export stream by sha1sum")),
OPT_BOOL(0, "show-original-ids", _original_ids,

[PATCH v2 00/11] fast export and import fixes and features

2018-11-13 Thread Elijah Newren

This is a series of small fixes and features for fast-export and
fast-import, mostly on the fast-export side.

Changes since v1 (full range-diff below):
  - used {tilde} in asciidoc documentation to avoid subscripting and
escaping problems
  - renamed ABORT/ERROR enum values to help avoid further misusage
  - multiple small testcase cleanups (use $ZERO_OID, remove grep -A, etc.)
  - add FIXME comment to code about string_list usage
  - record Peff's idea for a future optimization in patch 8 commit message
(is there a better place to put that??)
  - New patch (9/11): remove the unmaintained copy of fast-import stream
format documentation at the beginning of fast-import.c
  - Rewrite commit message for 10/11 to match the wording Peff liked
better, s/originally/original-oid/, and add documentation to
git-fast-import.txt
  - Rewrite commit message for 11/11; the last one didn't make sense to
Peff.  I hope this one does.

Elijah Newren (11):
  git-fast-import.txt: fix documentation for --quiet option
  git-fast-export.txt: clarify misleading documentation about rev-list
args
  fast-export: use value from correct enum
  fast-export: avoid dying when filtering by paths and old tags exist
  fast-export: move commit rewriting logic into a function for reuse
  fast-export: when using paths, avoid corrupt stream with non-existent
mark
  fast-export: ensure we export requested refs
  fast-export: add --reference-excluded-parents option
  fast-import: remove unmaintained duplicate documentation
  fast-export: add a --show-original-ids option to show original names
  fast-export: add --always-show-modify-after-rename

 Documentation/git-fast-export.txt |  34 +-
 Documentation/git-fast-import.txt |  23 +++-
 builtin/fast-export.c | 172 ++
 fast-import.c | 166 +++-
 t/t9350-fast-export.sh| 116 +++-
 5 files changed, 308 insertions(+), 203 deletions(-)

 1:  0744f65b0d =  1:  8870fb1340 git-fast-import.txt: fix documentation for 
--quiet option
 2:  aba1e22fdd !  2:  16d1c3e22d git-fast-export.txt: clarify misleading 
documentation about rev-list args
@@ -13,7 +13,7 @@
current master reference to be exported along with all objects
 -  added since its 10th ancestor commit.
 +  added since its 10th ancestor commit and all files common to
-+  master\~9 and master~10.
++  master{tilde}9 and master{tilde}10.
  
  EXAMPLES
  
 3:  6983e845b2 <  -:  -- fast-export: use value from correct enum
 -:  -- >  3:  e19f6b36f9 fast-export: use value from correct enum
 4:  761ba324d5 !  4:  2b305561d5 fast-export: avoid dying when filtering by 
paths and old tags exist
@@ -49,18 +49,14 @@
 +  (
 +  cd rewrite_tag_predating_pathspecs &&
 +
-+  touch ignored &&
-+  git add ignored &&
 +  test_commit initial &&
 +
 +  git tag -a -m "Some old tag" v0.0.0.0.0.0.1 &&
 +
-+  echo foo >bar &&
-+  git add bar &&
-+  test_commit add-bar &&
++  test_commit bar &&
 +
-+  git fast-export --tag-of-filtered-object=rewrite --all -- bar 
>output &&
-+  grep -A 1 refs/tags/v0.0.0.0.0.0.1 output | grep -E ^from.0{40}
++  git fast-export --tag-of-filtered-object=rewrite --all -- bar.t 
>output &&
++  grep from.$ZERO_OID output
 +  )
 +'
 +
 5:  64e9f0d360 =  5:  607b1dc2b2 fast-export: move commit rewriting logic into 
a function for reuse
 6:  fd14d9749a !  6:  ec1862e858 fast-export: when using paths, avoid corrupt 
stream with non-existent mark
@@ -54,22 +54,18 @@
 +  (
 +  cd avoid_non_existent_mark &&
 +
-+  touch important-path &&
-+  git add important-path &&
-+  test_commit initial &&
++  test_commit important-path &&
 +
-+  touch ignored &&
-+  git add ignored &&
-+  test_commit whatever &&
++  test_commit ignored &&
 +
 +  git branch A &&
 +  git branch B &&
 +
-+  echo foo >>important-path &&
-+  git add important-path &&
++  echo foo >>important-path.t &&
++  git add important-path.t &&
 +  test_commit more changes &&
 +
-+  git fast-export --all -- important-path | git fast-import 
--force
++  git fast-export --all -- important-path.t | git fast-import 
--force
 +  )
 +'
 +
 7:  4e67a2bc7f !  7:  9da26e3ccb fast-export: ensure we export requested refs
@@ -21,9 +21,6 @@
 were just silently omitted from being exported despite having been
 explicitly requested for export.
 
-NOTE: The usage of string_list should really be replaced with the
-strmap proposal, once

[PATCH v2 06/11] fast-export: when using paths, avoid corrupt stream with non-existent mark

2018-11-13 Thread Elijah Newren

If file paths are specified to fast-export and multiple refs point to a
commit that does not touch any of the relevant file paths, then
fast-export can hit problems.  fast-export has a list of additional refs
that it needs to explicitly set after exporting all blobs and commits,
and when it tries to get_object_mark() on the relevant commit, it can
get a mark of 0, i.e. "not found", because the commit in question did
not touch the relevant paths and thus was not exported.  Trying to
import a stream with a mark corresponding to an unexported object will
cause fast-import to crash.

Avoid this problem by taking the commit the ref points to and finding an
ancestor of it that was exported, and make the ref point to that commit
instead.

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c  | 13 -
 t/t9350-fast-export.sh | 20 
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 7888fc98b5..2eafe351ea 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -900,7 +900,18 @@ static void handle_tags_and_duplicates(void)
if (anonymize)
name = anonymize_refname(name);
/* create refs pointing to already seen commits */
-   commit = (struct commit *)object;
+   commit = rewrite_commit((struct commit *)object);
+   if (!commit) {
+   /*
+* Neither this object nor any of its
+* ancestors touch any relevant paths, so
+* it has been filtered to nothing.  Delete
+* it.
+*/
+   printf("reset %s\nfrom %s\n\n",
+  name, sha1_to_hex(null_sha1));
+   continue;
+   }
printf("reset %s\nfrom :%d\n\n", name,
   get_object_mark(>object));
show_progress();
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index 3400ebeb51..299120ba70 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -382,6 +382,26 @@ test_expect_success 'path limiting with import-marks does 
not lose unmodified fi
grep file0 actual
 '
 
+test_expect_success 'avoid corrupt stream with non-existent mark' '
+   test_create_repo avoid_non_existent_mark &&
+   (
+   cd avoid_non_existent_mark &&
+
+   test_commit important-path &&
+
+   test_commit ignored &&
+
+   git branch A &&
+   git branch B &&
+
+   echo foo >>important-path.t &&
+   git add important-path.t &&
+   test_commit more changes &&
+
+   git fast-export --all -- important-path.t | git fast-import 
--force
+   )
+'
+
 test_expect_success 'full-tree re-shows unmodified files''
git checkout -f simple &&
git fast-export --full-tree simple >actual &&
-- 
2.19.1.1063.g2b8e4a4f82.dirty

[PATCH v2 03/11] fast-export: use value from correct enum

2018-11-13 Thread Elijah Newren

ABORT and ERROR happen to have the same value, but come from differnt
enums.  Use the one from the correct enum, and while at it, rename the
values to avoid such problems.

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 456797c12a..af724e9937 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -31,8 +31,8 @@ static const char *fast_export_usage[] = {
 };
 
 static int progress;
-static enum { ABORT, VERBATIM, WARN, WARN_STRIP, STRIP } signed_tag_mode = 
ABORT;
-static enum { ERROR, DROP, REWRITE } tag_of_filtered_mode = ERROR;
+static enum { SIGNED_TAG_ABORT, VERBATIM, WARN, WARN_STRIP, STRIP } 
signed_tag_mode = SIGNED_TAG_ABORT;
+static enum { TAG_FILTERING_ABORT, DROP, REWRITE } tag_of_filtered_mode = 
TAG_FILTERING_ABORT;
 static int fake_missing_tagger;
 static int use_done_feature;
 static int no_data;
@@ -46,7 +46,7 @@ static int parse_opt_signed_tag_mode(const struct option *opt,
 const char *arg, int unset)
 {
if (unset || !strcmp(arg, "abort"))
-   signed_tag_mode = ABORT;
+   signed_tag_mode = SIGNED_TAG_ABORT;
else if (!strcmp(arg, "verbatim") || !strcmp(arg, "ignore"))
signed_tag_mode = VERBATIM;
else if (!strcmp(arg, "warn"))
@@ -64,7 +64,7 @@ static int parse_opt_tag_of_filtered_mode(const struct option 
*opt,
  const char *arg, int unset)
 {
if (unset || !strcmp(arg, "abort"))
-   tag_of_filtered_mode = ERROR;
+   tag_of_filtered_mode = TAG_FILTERING_ABORT;
else if (!strcmp(arg, "drop"))
tag_of_filtered_mode = DROP;
else if (!strcmp(arg, "rewrite"))
@@ -727,7 +727,7 @@ static void handle_tag(const char *name, struct tag *tag)
   "\n-BEGIN PGP 
SIGNATURE-\n");
if (signature)
switch(signed_tag_mode) {
-   case ABORT:
+   case SIGNED_TAG_ABORT:
die("encountered signed tag %s; use "
"--signed-tags= to handle it",
oid_to_hex(>object.oid));
@@ -752,7 +752,7 @@ static void handle_tag(const char *name, struct tag *tag)
tagged_mark = get_object_mark(tagged);
if (!tagged_mark) {
switch(tag_of_filtered_mode) {
-   case ABORT:
+   case TAG_FILTERING_ABORT:
die("tag %s tags unexported object; use "
"--tag-of-filtered-object= to handle it",
oid_to_hex(>object.oid));
-- 
2.19.1.1063.g2b8e4a4f82.dirty

[PATCH v2 01/11] git-fast-import.txt: fix documentation for --quiet option

2018-11-13 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-import.txt | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-fast-import.txt 
b/Documentation/git-fast-import.txt
index e81117d27f..7ab97745a6 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -40,9 +40,10 @@ OPTIONS
not contain the old commit).
 
 --quiet::
-   Disable all non-fatal output, making fast-import silent when it
-   is successful.  This option disables the output shown by
-   --stats.
+   Disable the output shown by --stats, making fast-import usually
+   be silent when it is successful.  However, if the import stream
+   has directives intended to show user output (e.g. `progress`
+   directives), the corresponding messages will still be shown.
 
 --stats::
Display some basic statistics about the objects fast-import has
-- 
2.19.1.1063.g2b8e4a4f82.dirty

[PATCH v2 04/11] fast-export: avoid dying when filtering by paths and old tags exist

2018-11-13 Thread Elijah Newren

If --tag-of-filtered-object=rewrite is specified along with a set of
paths to limit what is exported, then any tags pointing to old commits
that do not contain any of those specified paths cause problems.  Since
the old tagged commit is not exported, fast-export attempts to rewrite
such tags to an ancestor commit which was exported.  If no such commit
exists, then fast-export currently die()s.  Five years after the tag
rewriting logic was added to fast-export (see commit 2d8ad4691921,
"fast-export: Add a --tag-of-filtered-object  option for newly dangling
tags", 2009-06-25), fast-import gained the ability to delete refs (see
commit 4ee1b225b99f, "fast-import: add support to delete refs",
2014-04-20), so now we do have a valid option to rewrite the tag to.
Delete these tags instead of dying.

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c  |  9 ++---
 t/t9350-fast-export.sh | 16 
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index af724e9937..b984a44224 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -774,9 +774,12 @@ static void handle_tag(const char *name, struct tag *tag)
break;
if (!(p->object.flags & TREESAME))
break;
-   if (!p->parents)
-   die("can't find replacement commit for 
tag %s",
-oid_to_hex(>object.oid));
+   if (!p->parents) {
+   printf("reset %s\nfrom %s\n\n",
+  name, sha1_to_hex(null_sha1));
+   free(buf);
+   return;
+   }
p = p->parents->item;
}
tagged_mark = get_object_mark(>object);
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index 6a392e87bc..3400ebeb51 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -325,6 +325,22 @@ test_expect_success 'rewriting tag of filtered out object' 
'
 )
 '
 
+test_expect_success 'rewrite tag predating pathspecs to nothing' '
+   test_create_repo rewrite_tag_predating_pathspecs &&
+   (
+   cd rewrite_tag_predating_pathspecs &&
+
+   test_commit initial &&
+
+   git tag -a -m "Some old tag" v0.0.0.0.0.0.1 &&
+
+   test_commit bar &&
+
+   git fast-export --tag-of-filtered-object=rewrite --all -- bar.t 
>output &&
+   grep from.$ZERO_OID output
+   )
+'
+
 cat > limit-by-paths/expected << EOF
 blob
 mark :1
-- 
2.19.1.1063.g2b8e4a4f82.dirty

[PATCH v2 09/11] fast-import: remove unmaintained duplicate documentation

2018-11-13 Thread Elijah Newren

fast-import.c has started with a comment for nine and a half years
re-directing the reader to Documentation/git-fast-import.txt for
maintained documentation.  Instead of leaving the unmaintained
documentation in place, just excise it.

Signed-off-by: Elijah Newren 
---
 fast-import.c | 154 --
 1 file changed, 154 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 95600c78e0..555d49ad23 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1,157 +1,3 @@
-/*
-(See Documentation/git-fast-import.txt for maintained documentation.)
-Format of STDIN stream:
-
-  stream ::= cmd*;
-
-  cmd ::= new_blob
-| new_commit
-| new_tag
-| reset_branch
-| checkpoint
-| progress
-;
-
-  new_blob ::= 'blob' lf
-mark?
-file_content;
-  file_content ::= data;
-
-  new_commit ::= 'commit' sp ref_str lf
-mark?
-('author' (sp name)? sp '<' email '>' sp when lf)?
-'committer' (sp name)? sp '<' email '>' sp when lf
-commit_msg
-('from' sp commit-ish lf)?
-('merge' sp commit-ish lf)*
-(file_change | ls)*
-lf?;
-  commit_msg ::= data;
-
-  ls ::= 'ls' sp '"' quoted(path) '"' lf;
-
-  file_change ::= file_clr
-| file_del
-| file_rnm
-| file_cpy
-| file_obm
-| file_inm;
-  file_clr ::= 'deleteall' lf;
-  file_del ::= 'D' sp path_str lf;
-  file_rnm ::= 'R' sp path_str sp path_str lf;
-  file_cpy ::= 'C' sp path_str sp path_str lf;
-  file_obm ::= 'M' sp mode sp (hexsha1 | idnum) sp path_str lf;
-  file_inm ::= 'M' sp mode sp 'inline' sp path_str lf
-data;
-  note_obm ::= 'N' sp (hexsha1 | idnum) sp commit-ish lf;
-  note_inm ::= 'N' sp 'inline' sp commit-ish lf
-data;
-
-  new_tag ::= 'tag' sp tag_str lf
-'from' sp commit-ish lf
-('tagger' (sp name)? sp '<' email '>' sp when lf)?
-tag_msg;
-  tag_msg ::= data;
-
-  reset_branch ::= 'reset' sp ref_str lf
-('from' sp commit-ish lf)?
-lf?;
-
-  checkpoint ::= 'checkpoint' lf
-lf?;
-
-  progress ::= 'progress' sp not_lf* lf
-lf?;
-
- # note: the first idnum in a stream should be 1 and subsequent
- # idnums should not have gaps between values as this will cause
- # the stream parser to reserve space for the gapped values.  An
- # idnum can be updated in the future to a new object by issuing
- # a new mark directive with the old idnum.
- #
-  mark ::= 'mark' sp idnum lf;
-  data ::= (delimited_data | exact_data)
-lf?;
-
-# note: delim may be any string but must not contain lf.
-# data_line may contain any data but must not be exactly
-# delim.
-  delimited_data ::= 'data' sp '<<' delim lf
-(data_line lf)*
-delim lf;
-
- # note: declen indicates the length of binary_data in bytes.
- # declen does not include the lf preceding the binary data.
- #
-  exact_data ::= 'data' sp declen lf
-binary_data;
-
- # note: quoted strings are C-style quoting supporting \c for
- # common escapes of 'c' (e..g \n, \t, \\, \") or \nnn where nnn
- # is the signed byte value in octal.  Note that the only
- # characters which must actually be escaped to protect the
- # stream formatting is: \, " and LF.  Otherwise these values
- # are UTF8.
- #
-  commit-ish  ::= (ref_str | hexsha1 | sha1exp_str | idnum);
-  ref_str ::= ref;
-  sha1exp_str ::= sha1exp;
-  tag_str ::= tag;
-  path_str::= path| '"' quoted(path)'"' ;
-  mode::= '100644' | '644'
-| '100755' | '755'
-| '12'
-;
-
-  declen ::= # unsigned 32 bit value, ascii base10 notation;
-  bigint ::= # unsigned integer value, ascii base10 notation;
-  binary_data ::= # file content, not interpreted;
-
-  when ::= raw_when | rfc2822_when;
-  raw_when ::= ts sp tz;
-  rfc2822_when ::= # Valid RFC 2822 date and time;
-
-  sp ::= # ASCII space character;
-  lf ::= # ASCII newline (LF) character;
-
- # note: a colon (':') must precede the numerical value assigned to
- # an idnum.  This is to distinguish it from a ref or tag name as
- # GIT does not permit ':' in ref or tag strings.
- #
-  idnum   ::= ':' bigint;
-  path::= # GIT style file path, e.g. "a/b/c";
-  ref ::= # GIT ref name, e.g. "refs/heads/MOZ_GECKO_EXPERIMENT";
-  tag ::= # GIT tag name, e.g. "FIREFOX_1_5";
-  sha1exp ::= # Any valid GIT SHA1 expression;
-  hexsha1 ::= # SHA1 in hexadecimal format;
-
- # note: name and email are UTF8 strings, however name must not
- # contain '<' or lf and email must not contain any of the
- # following: '<', '>', lf.
- #
-  name  ::= # valid GIT author/committer name;
-  email ::= # valid GIT author/committer email;
-  ts::= # time since the epoch in seconds, ascii base10 notation;
-  tz::= # GIT style timezone;
-
- # note: comments, get-mark, ls-tree, and cat-blob requests may
- # appear anywhere in the input, except within a data

[PATCH v2 08/11] fast-export: add --reference-excluded-parents option

2018-11-13 Thread Elijah Newren

git filter-branch has a nifty feature allowing you to rewrite, e.g. just
the last 8 commits of a linear history
  git filter-branch $OPTIONS HEAD~8..HEAD

If you try the same with git fast-export, you instead get a history of
only 8 commits, with HEAD~7 being rewritten into a root commit.  There
are two alternatives:

  1) Don't use the negative revision specification, and when you're
 filtering the output to make modifications to the last 8 commits,
 just be careful to not modify any earlier commits somehow.

  2) First run 'git fast-export --export-marks=somefile HEAD~8', then
 run 'git fast-export --import-marks=somefile HEAD~8..HEAD'.

Both are more error prone than I'd like (the first for obvious reasons;
with the second option I have sometimes accidentally included too many
revisions in the first command and then found that the corresponding
extra revisions were not exported by the second command and thus were
not modified as I expected).  Also, both are poor from a performance
perspective.

Add a new --reference-excluded-parents option which will cause
fast-export to refer to commits outside the specified rev-list-args
range by their sha1sum.  Such a stream will only be useful in a
repository which already contains the necessary commits (much like the
restriction imposed when using --no-data).

Note from Peff:
  I think we might be able to do a little more optimization here. If
  we're exporting HEAD^..HEAD and there's an object in HEAD^ which is
  unchanged in HEAD, I think we'd still print it (because it would not
  be marked SHOWN), but we could omit it (by walking the tree of the
  boundary commits and marking them shown).  I don't think it's a
  blocker for what you're doing here, but just a possible future
  optimization.

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-export.txt | 17 +++--
 builtin/fast-export.c | 42 +++
 t/t9350-fast-export.sh| 11 
 3 files changed, 58 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-fast-export.txt 
b/Documentation/git-fast-export.txt
index fda55b3284..f65026662a 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -110,6 +110,18 @@ marks the same across runs.
the shape of the history and stored tree.  See the section on
`ANONYMIZING` below.
 
+--reference-excluded-parents::
+   By default, running a command such as `git fast-export
+   master~5..master` will not include the commit master{tilde}5
+   and will make master{tilde}4 no longer have master{tilde}5 as
+   a parent (though both the old master{tilde}4 and new
+   master{tilde}4 will have all the same files).  Use
+   --reference-excluded-parents to instead have the the stream
+   refer to commits in the excluded range of history by their
+   sha1sum.  Note that the resulting stream can only be used by a
+   repository which already contains the necessary parent
+   commits.
+
 --refspec::
Apply the specified refspec to each ref exported. Multiple of them can
be specified.
@@ -119,8 +131,9 @@ marks the same across runs.
'git rev-list', that specifies the specific objects and references
to export.  For example, `master~10..master` causes the
current master reference to be exported along with all objects
-   added since its 10th ancestor commit and all files common to
-   master{tilde}9 and master{tilde}10.
+   added since its 10th ancestor commit and (unless the
+   --reference-excluded-parents option is specified) all files
+   common to master{tilde}9 and master{tilde}10.
 
 EXAMPLES
 
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 2fef00436b..3cc98c31ad 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -37,6 +37,7 @@ static int fake_missing_tagger;
 static int use_done_feature;
 static int no_data;
 static int full_tree;
+static int reference_excluded_commits;
 static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
 static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
 static struct refspec refspecs = REFSPEC_INIT_FETCH;
@@ -596,7 +597,8 @@ static void handle_commit(struct commit *commit, struct 
rev_info *rev,
message += 2;
 
if (commit->parents &&
-   get_object_mark(>parents->item->object) != 0 &&
+   (get_object_mark(>parents->item->object) != 0 ||
+reference_excluded_commits) &&
!full_tree) {
parse_commit_or_die(commit->parents->item);
diff_tree_oid(get_commit_tree_oid(commit->parents->item),
@@ -644,13 +646,21 @@ static void handle_commit(struct commit *commit, struct 
rev_info *rev,
unuse_commit_buffer(commit, commit_buffer);
 
for (i = 0, p = commit->parents; p; p = p->next) {
-   int mark = get_object_mark(>item->object);
-   if (!mark)
+

[PATCH v2 07/11] fast-export: ensure we export requested refs

2018-11-13 Thread Elijah Newren

If file paths are specified to fast-export and a ref points to a commit
that does not touch any of the relevant paths, then that ref would
sometimes fail to be exported.  (This depends on whether any ancestors
of the commit which do touch the relevant paths would be exported with
that same ref name or a different ref name.)  To avoid this problem,
put *all* specified refs into extra_refs to start, and then as we export
each commit, remove the refname used in the 'commit $REFNAME' directive
from extra_refs.  Then, in handle_tags_and_duplicates() we know which
refs actually do need a manual reset directive in order to be included.

This means that we do need some special handling for excluded refs; e.g.
if someone runs
   git fast-export ^master master
then they've asked for master to be exported, but they have also asked
for the commit which master points to and all of its history to be
excluded.  That logically means ref deletion.  Previously, such refs
were just silently omitted from being exported despite having been
explicitly requested for export.

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c  | 54 --
 t/t9350-fast-export.sh | 16 ++---
 2 files changed, 55 insertions(+), 15 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 2eafe351ea..2fef00436b 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -38,6 +38,7 @@ static int use_done_feature;
 static int no_data;
 static int full_tree;
 static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
+static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
 static struct refspec refspecs = REFSPEC_INIT_FETCH;
 static int anonymize;
 static struct revision_sources revision_sources;
@@ -611,6 +612,13 @@ static void handle_commit(struct commit *commit, struct 
rev_info *rev,
export_blob(_queued_diff.queue[i]->two->oid);
 
refname = *revision_sources_at(_sources, commit);
+   /*
+* FIXME: string_list_remove() below for each ref is overall
+* O(N^2).  Compared to a history walk and diffing trees, this is
+* just lost in the noise in practice.  However, theoretically a
+* repo may have enough refs for this to become slow.
+*/
+   string_list_remove(_refs, refname, 0);
if (anonymize) {
refname = anonymize_refname(refname);
anonymize_ident_line(, _end);
@@ -814,7 +822,7 @@ static struct commit *get_commit(struct rev_cmdline_entry 
*e, char *full_name)
/* handle nested tags */
while (tag && tag->object.type == OBJ_TAG) {
parse_object(the_repository, >object.oid);
-   string_list_append(_refs, full_name)->util = tag;
+   string_list_append(_refs, full_name)->util = tag;
tag = (struct tag *)tag->tagged;
}
if (!tag)
@@ -873,25 +881,30 @@ static void get_tags_and_duplicates(struct 
rev_cmdline_info *info)
}
 
/*
-* This ref will not be updated through a commit, lets make
-* sure it gets properly updated eventually.
+* Make sure this ref gets properly updated eventually, whether
+* through a commit or manually at the end.
 */
-   if (*revision_sources_at(_sources, commit) ||
-   commit->object.flags & SHOWN)
+   if (e->item->type != OBJ_TAG)
string_list_append(_refs, full_name)->util = 
commit;
+
if (!*revision_sources_at(_sources, commit))
*revision_sources_at(_sources, commit) = 
full_name;
}
+
+   string_list_sort(_refs);
+   string_list_remove_duplicates(_refs, 0);
 }
 
-static void handle_tags_and_duplicates(void)
+static void handle_tags_and_duplicates(struct string_list *extras)
 {
struct commit *commit;
int i;
 
-   for (i = extra_refs.nr - 1; i >= 0; i--) {
-   const char *name = extra_refs.items[i].string;
-   struct object *object = extra_refs.items[i].util;
+   for (i = extras->nr - 1; i >= 0; i--) {
+   const char *name = extras->items[i].string;
+   struct object *object = extras->items[i].util;
+   int mark;
+
switch (object->type) {
case OBJ_TAG:
handle_tag(name, (struct tag *)object);
@@ -912,8 +925,24 @@ static void handle_tags_and_duplicates(void)
   name, sha1_to_hex(null_sha1));
continue;
}
-   printf("reset %s\nfrom :%d\n\n", name,
-  get_object_mark(>object));
+
+   mark = get_object_mark(>object);
+   if (!mark) {
+

[PATCH v2 02/11] git-fast-export.txt: clarify misleading documentation about rev-list args

2018-11-13 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-export.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-fast-export.txt 
b/Documentation/git-fast-export.txt
index ce954be532..fda55b3284 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -119,7 +119,8 @@ marks the same across runs.
'git rev-list', that specifies the specific objects and references
to export.  For example, `master~10..master` causes the
current master reference to be exported along with all objects
-   added since its 10th ancestor commit.
+   added since its 10th ancestor commit and all files common to
+   master{tilde}9 and master{tilde}10.
 
 EXAMPLES
 
-- 
2.19.1.1063.g2b8e4a4f82.dirty

[PATCH 21/23] commit: prepare free_commit_buffer and release_commit_memory for any repo

2018-11-13 Thread Stefan Beller

Pass the object pool to free_commit_buffer and release_commit_memory,
such that we can eliminate access to 'the_repository'.

Also remove the TODO in release_commit_memory, as commit->util was
removed in 9d2c97016f (commit.h: delete 'util' field in struct commit,
2018-05-19)

Signed-off-by: Stefan Beller 
---
 builtin/fsck.c | 3 ++-
 builtin/log.c  | 6 --
 builtin/rev-list.c | 3 ++-
 commit.c   | 9 -
 commit.h   | 4 ++--
 object.c   | 2 +-
 6 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 06eb421720..c476ac6983 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -382,7 +382,8 @@ static int fsck_obj(struct object *obj, void *buffer, 
unsigned long size)
if (obj->type == OBJ_TREE)
free_tree_buffer((struct tree *)obj);
if (obj->type == OBJ_COMMIT)
-   free_commit_buffer((struct commit *)obj);
+   free_commit_buffer(the_repository->parsed_objects,
+  (struct commit *)obj);
return err;
 }
 
diff --git a/builtin/log.c b/builtin/log.c
index 061d4fd864..64c2649c7c 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -395,7 +395,8 @@ static int cmd_log_walk(struct rev_info *rev)
 * We may show a given commit multiple times when
 * walking the reflogs.
 */
-   free_commit_buffer(commit);
+   free_commit_buffer(the_repository->parsed_objects,
+  commit);
free_commit_list(commit->parents);
commit->parents = NULL;
}
@@ -1922,7 +1923,8 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
open_next_file(rev.numbered_files ? NULL : commit, NULL, 
, quiet))
die(_("Failed to create output files"));
shown = log_tree_commit(, commit);
-   free_commit_buffer(commit);
+   free_commit_buffer(the_repository->parsed_objects,
+  commit);
 
/* We put one extra blank line between formatted
 * patches and this flag is used by log-tree code
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index cc1b70522f..2b301fa315 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -196,7 +196,8 @@ static void finish_commit(struct commit *commit, void *data)
free_commit_list(commit->parents);
commit->parents = NULL;
}
-   free_commit_buffer(commit);
+   free_commit_buffer(the_repository->parsed_objects,
+  commit);
 }
 
 static inline void finish_object__ma(struct object *obj)
diff --git a/commit.c b/commit.c
index 7d2f3a9a93..4fe74aa4bc 100644
--- a/commit.c
+++ b/commit.c
@@ -328,10 +328,10 @@ void repo_unuse_commit_buffer(struct repository *r,
free((void *)buffer);
 }
 
-void free_commit_buffer(struct commit *commit)
+void free_commit_buffer(struct parsed_object_pool *pool, struct commit *commit)
 {
struct commit_buffer *v = buffer_slab_peek(
-   the_repository->parsed_objects->buffer_slab, commit);
+   pool->buffer_slab, commit);
if (v) {
FREE_AND_NULL(v->buffer);
v->size = 0;
@@ -354,13 +354,12 @@ struct object_id *get_commit_tree_oid(const struct commit 
*commit)
return _commit_tree(commit)->object.oid;
 }
 
-void release_commit_memory(struct commit *c)
+void release_commit_memory(struct parsed_object_pool *pool, struct commit *c)
 {
c->maybe_tree = NULL;
c->index = 0;
-   free_commit_buffer(c);
+   free_commit_buffer(pool, c);
free_commit_list(c->parents);
-   /* TODO: what about commit->util? */
 
c->object.parsed = 0;
 }
diff --git a/commit.h b/commit.h
index 2e6b799b26..d2779a23f6 100644
--- a/commit.h
+++ b/commit.h
@@ -140,7 +140,7 @@ void repo_unuse_commit_buffer(struct repository *r,
 /*
  * Free any cached object buffer associated with the commit.
  */
-void free_commit_buffer(struct commit *);
+void free_commit_buffer(struct parsed_object_pool *pool, struct commit *);
 
 struct tree *get_commit_tree(const struct commit *);
 struct object_id *get_commit_tree_oid(const struct commit *);
@@ -149,7 +149,7 @@ struct object_id *get_commit_tree_oid(const struct commit 
*);
  * Release memory related to a commit, including the parent list and
  * any cached object buffer.
  */
-void release_commit_memory(struct commit *c);
+void release_commit_memory(struct parsed_object_pool *pool, struct commit *c);
 
 /*
  * Disassociate any cached object buffer from the commit, but do not free it.
diff --git a/object.c b/object.c
index 003f870484..c4170d2d0c 100644
--- a/object.c
+++ b/object.c
@@ -540,7 +540,7 @@ void parsed_object_pool_clear(struct parsed_object_pool

[PATCH 18/23] submodule: use submodule repos for object lookup

2018-11-13 Thread Stefan Beller

This converts the 'show_submodule_header' function to use
the repository API properly, such that the submodule objects
are not added to the main object store.

Signed-off-by: Stefan Beller 
---
 submodule.c | 73 ++---
 1 file changed, 58 insertions(+), 15 deletions(-)

diff --git a/submodule.c b/submodule.c
index d9d3046689..262f03f118 100644
--- a/submodule.c
+++ b/submodule.c
@@ -443,7 +443,7 @@ static int prepare_submodule_summary(struct rev_info *rev, 
const char *path,
return prepare_revision_walk(rev);
 }
 
-static void print_submodule_summary(struct rev_info *rev, struct diff_options 
*o)
+static void print_submodule_summary(struct repository *r, struct rev_info 
*rev, struct diff_options *o)
 {
static const char format[] = "  %m %s";
struct strbuf sb = STRBUF_INIT;
@@ -454,7 +454,8 @@ static void print_submodule_summary(struct rev_info *rev, 
struct diff_options *o
ctx.date_mode = rev->date_mode;
ctx.output_encoding = get_log_output_encoding();
strbuf_setlen(, 0);
-   format_commit_message(commit, format, , );
+   repo_format_commit_message(r, commit, format, ,
+ );
strbuf_addch(, '\n');
if (commit->object.flags & SYMMETRIC_LEFT)
diff_emit_submodule_del(o, sb.buf);
@@ -481,14 +482,44 @@ void prepare_submodule_repo_env(struct argv_array *out)
 DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
-/* Helper function to display the submodule header line prior to the full
- * summary output. If it can locate the submodule objects directory it will
- * attempt to lookup both the left and right commits and put them into the
- * left and right pointers.
+/*
+ * Initialize 'out' based on the provided submodule path.
+ *
+ * Unlike repo_submodule_init, this tolerates submodules not present
+ * in .gitmodules. This function exists only to preserve historical behavior,
+ *
+ * Returns 0 on success, -1 when the submodule is not present.
  */
-static void show_submodule_header(struct diff_options *o, const char *path,
+static struct repository *open_submodule(const char *path)
+{
+   struct strbuf sb = STRBUF_INIT;
+   struct repository *out = xmalloc(sizeof(*out));
+
+   if (submodule_to_gitdir(, path) || repo_init(out, sb.buf, NULL)) {
+   strbuf_release();
+   free(out);
+   return NULL;
+   }
+
+   out->submodule_prefix = xstrdup(path);
+
+   strbuf_release();
+   return out;
+}
+
+/*
+ * Helper function to display the submodule header line prior to the full
+ * summary output.
+ *
+ * If it can locate the submodule git directory it will create a repository
+ * handle for the submodule and lookup both the left and right commits and
+ * put them into the left and right pointers.
+ */
+static void show_submodule_header(struct diff_options *o,
+   const char *path,
struct object_id *one, struct object_id *two,
unsigned dirty_submodule,
+   struct repository *sub,
struct commit **left, struct commit **right,
struct commit_list **merge_bases)
 {
@@ -507,7 +538,7 @@ static void show_submodule_header(struct diff_options *o, 
const char *path,
else if (is_null_oid(two))
message = "(submodule deleted)";
 
-   if (add_submodule_odb(path)) {
+   if (!sub) {
if (!message)
message = "(commits not present)";
goto output_header;
@@ -517,8 +548,8 @@ static void show_submodule_header(struct diff_options *o, 
const char *path,
 * Attempt to lookup the commit references, and determine if this is
 * a fast forward or fast backwards update.
 */
-   *left = lookup_commit_reference(the_repository, one);
-   *right = lookup_commit_reference(the_repository, two);
+   *left = lookup_commit_reference(sub, one);
+   *right = lookup_commit_reference(sub, two);
 
/*
 * Warn about missing commits in the submodule project, but only if
@@ -528,7 +559,7 @@ static void show_submodule_header(struct diff_options *o, 
const char *path,
 (!is_null_oid(two) && !*right))
message = "(commits not present)";
 
-   *merge_bases = get_merge_bases(*left, *right);
+   *merge_bases = repo_get_merge_bases(sub, *left, *right);
if (*merge_bases) {
if ((*merge_bases)->item == *left)
fast_forward = 1;
@@ -562,16 +593,18 @@ void show_submodule_summary(struct diff_options *o, const 
char *path,
struct rev_info rev;
struct commit *left = NULL, *right = NULL;
struct commit_list *merge_bases = NULL;
+   struct repository *sub;
 
+   sub = open_submodule(path);
show_submodule_header(o, path, one, two,

[PATCH 22/23] path.h: make REPO_GIT_PATH_FUNC repository agnostic

2018-11-13 Thread Stefan Beller

git_pathdup uses the_repository internally, but the macro
REPO_GIT_PATH_FUNC is specifically made for arbitrary repositories.
Switch to repo_git_path which works on arbitrary repositories.

Signed-off-by: Stefan Beller 
---
 path.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/path.h b/path.h
index b654ea8ff5..651e6157fc 100644
--- a/path.h
+++ b/path.h
@@ -165,7 +165,7 @@ extern void report_linked_checkout_garbage(void);
const char *git_path_##var(struct repository *r) \
{ \
if (!r->cached_paths.var) \
-   r->cached_paths.var = git_pathdup(filename); \
+   r->cached_paths.var = repo_git_path(r, filename); \
return r->cached_paths.var; \
}
 
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 20/23] commit-graph: convert remaining functions to handle any repo

2018-11-13 Thread Stefan Beller

Convert all functions to handle arbitrary repositories in commit-graph.c
that are used by functions taking a repository argument already.

Notable exclusion is write_commit_graph and its local functions as that
only works on the_repository.

Signed-off-by: Stefan Beller 
---
 commit-graph.c | 40 
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..f78a8e96b5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -292,7 +292,8 @@ static int bsearch_graph(struct commit_graph *g, struct 
object_id *oid, uint32_t
g->chunk_oid_lookup, g->hash_len, pos);
 }
 
-static struct commit_list **insert_parent_or_die(struct commit_graph *g,
+static struct commit_list **insert_parent_or_die(struct repository *r,
+struct commit_graph *g,
 uint64_t pos,
 struct commit_list **pptr)
 {
@@ -303,7 +304,7 @@ static struct commit_list **insert_parent_or_die(struct 
commit_graph *g,
die("invalid parent position %"PRIu64, pos);
 
hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
-   c = lookup_commit(the_repository, );
+   c = lookup_commit(r, );
if (!c)
die(_("could not find commit %s"), oid_to_hex());
c->graph_pos = pos;
@@ -317,7 +318,9 @@ static void fill_commit_graph_info(struct commit *item, 
struct commit_graph *g,
item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 }
 
-static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, 
uint32_t pos)
+static int fill_commit_in_graph(struct repository *r,
+   struct commit *item,
+   struct commit_graph *g, uint32_t pos)
 {
uint32_t edge_value;
uint32_t *parent_data_ptr;
@@ -341,13 +344,13 @@ static int fill_commit_in_graph(struct commit *item, 
struct commit_graph *g, uin
edge_value = get_be32(commit_data + g->hash_len);
if (edge_value == GRAPH_PARENT_NONE)
return 1;
-   pptr = insert_parent_or_die(g, edge_value, pptr);
+   pptr = insert_parent_or_die(r, g, edge_value, pptr);
 
edge_value = get_be32(commit_data + g->hash_len + 4);
if (edge_value == GRAPH_PARENT_NONE)
return 1;
if (!(edge_value & GRAPH_OCTOPUS_EDGES_NEEDED)) {
-   pptr = insert_parent_or_die(g, edge_value, pptr);
+   pptr = insert_parent_or_die(r, g, edge_value, pptr);
return 1;
}
 
@@ -355,7 +358,7 @@ static int fill_commit_in_graph(struct commit *item, struct 
commit_graph *g, uin
  4 * (uint64_t)(edge_value & GRAPH_EDGE_LAST_MASK));
do {
edge_value = get_be32(parent_data_ptr);
-   pptr = insert_parent_or_die(g,
+   pptr = insert_parent_or_die(r, g,
edge_value & GRAPH_EDGE_LAST_MASK,
pptr);
parent_data_ptr++;
@@ -374,7 +377,9 @@ static int find_commit_in_graph(struct commit *item, struct 
commit_graph *g, uin
}
 }
 
-static int parse_commit_in_graph_one(struct commit_graph *g, struct commit 
*item)
+static int parse_commit_in_graph_one(struct repository *r,
+struct commit_graph *g,
+struct commit *item)
 {
uint32_t pos;
 
@@ -382,7 +387,7 @@ static int parse_commit_in_graph_one(struct commit_graph 
*g, struct commit *item
return 1;
 
if (find_commit_in_graph(item, g, ))
-   return fill_commit_in_graph(item, g, pos);
+   return fill_commit_in_graph(r, item, g, pos);
 
return 0;
 }
@@ -391,7 +396,7 @@ int parse_commit_in_graph(struct repository *r, struct 
commit *item)
 {
if (!prepare_commit_graph(r))
return 0;
-   return parse_commit_in_graph_one(r->objects->commit_graph, item);
+   return parse_commit_in_graph_one(r, r->objects->commit_graph, item);
 }
 
 void load_commit_graph_info(struct repository *r, struct commit *item)
@@ -403,19 +408,22 @@ void load_commit_graph_info(struct repository *r, struct 
commit *item)
fill_commit_graph_info(item, r->objects->commit_graph, pos);
 }
 
-static struct tree *load_tree_for_commit(struct commit_graph *g, struct commit 
*c)
+static struct tree *load_tree_for_commit(struct repository *r,
+struct commit_graph *g,
+struct commit *c)
 {
struct object_id oid;
const unsigned char *commit_data = g->chunk_commit_data +
   GRAPH_DATA_WIDTH * (c->graph_pos);
 
hashcpy(oid.hash, commit_data);
-

[PATCH 23/23] t/helper/test-repository: celebrate independence from the_repository

2018-11-13 Thread Stefan Beller

dade47c06c (commit-graph: add repo arg to graph readers, 2018-07-11)
brought more independence from the_repository to the commit graph, however
it was not completely independent of the_repository, as the previous
patches show.

To ensure we're not accessing the_repository by accident, we'd ideally
assign NULL to the_repository to trigger a segfault on access.

We currently have a temporary hack in cache.h, which relies on
the_hash_algo (which is a short form of the_repository->hash_algo) to
be set, so we cannot do that. The next best thing is to set all fields of
the_repository to 0, so any accidental access is more likely to be found.

Signed-off-by: Stefan Beller 
---
 cache.h|  2 ++
 t/helper/test-repository.c | 10 ++
 2 files changed, 12 insertions(+)

diff --git a/cache.h b/cache.h
index 59c8a93046..8864d7ec15 100644
--- a/cache.h
+++ b/cache.h
@@ -1033,6 +1033,8 @@ static inline int hashcmp(const unsigned char *sha1, 
const unsigned char *sha2)
 *
 * This will need to be extended or ripped out when we learn about
 * hashes of different sizes.
+*
+* When ripping this out, see TODO in test-repository.c.
 */
if (the_hash_algo->rawsz != 20)
BUG("hash size not yet supported by hashcmp");
diff --git a/t/helper/test-repository.c b/t/helper/test-repository.c
index 6a84a53efb..f7f8618445 100644
--- a/t/helper/test-repository.c
+++ b/t/helper/test-repository.c
@@ -17,6 +17,11 @@ static void test_parse_commit_in_graph(const char *gitdir, 
const char *worktree,
 
setup_git_env(gitdir);
 
+   memset(the_repository, 0, sizeof(*the_repository));
+
+   /* TODO: Needed for temporary hack in hashcmp, see 183a638b7da. */
+   repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
if (repo_init(, gitdir, worktree))
die("Couldn't init repo");
 
@@ -43,6 +48,11 @@ static void test_get_commit_tree_in_graph(const char *gitdir,
 
setup_git_env(gitdir);
 
+   memset(the_repository, 0, sizeof(*the_repository));
+
+   /* TODO: Needed for temporary hack in hashcmp, see 183a638b7da. */
+   repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
if (repo_init(, gitdir, worktree))
die("Couldn't init repo");
 
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 02/23] packfile: allow has_packed_and_bad to handle arbitrary repositories

2018-11-13 Thread Stefan Beller

has_packed_and_bad is not widely used, so just migrate it all at once.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 packfile.c  | 5 +++--
 packfile.h  | 2 +-
 sha1-file.c | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/packfile.c b/packfile.c
index 841b36182f..bc2e0f5043 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1131,12 +1131,13 @@ void mark_bad_packed_object(struct packed_git *p, const 
unsigned char *sha1)
p->num_bad_objects++;
 }
 
-const struct packed_git *has_packed_and_bad(const unsigned char *sha1)
+const struct packed_git *has_packed_and_bad(struct repository *r,
+   const unsigned char *sha1)
 {
struct packed_git *p;
unsigned i;
 
-   for (p = the_repository->objects->packed_git; p; p = p->next)
+   for (p = r->objects->packed_git; p; p = p->next)
for (i = 0; i < p->num_bad_objects; i++)
if (hasheq(sha1,
   p->bad_object_sha1 + the_hash_algo->rawsz * 
i))
diff --git a/packfile.h b/packfile.h
index 442625723d..7a62d72231 100644
--- a/packfile.h
+++ b/packfile.h
@@ -146,7 +146,7 @@ extern int packed_object_info(struct repository *r,
  off_t offset, struct object_info *);
 
 extern void mark_bad_packed_object(struct packed_git *p, const unsigned char 
*sha1);
-extern const struct packed_git *has_packed_and_bad(const unsigned char *sha1);
+extern const struct packed_git *has_packed_and_bad(struct repository *r, const 
unsigned char *sha1);
 
 /*
  * Iff a pack file in the given repository contains the object named by sha1,
diff --git a/sha1-file.c b/sha1-file.c
index b8ce21cbaf..856e000ee1 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1432,7 +1432,7 @@ void *read_object_file_extended(const struct object_id 
*oid,
die(_("loose object %s (stored in %s) is corrupt"),
oid_to_hex(repl), path);
 
-   if ((p = has_packed_and_bad(repl->hash)) != NULL)
+   if ((p = has_packed_and_bad(the_repository, repl->hash)) != NULL)
die(_("packed object %s (stored in %s) is corrupt"),
oid_to_hex(repl), p->pack_name);
 
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 17/23] pretty: prepare format_commit_message to handle arbitrary repositories

2018-11-13 Thread Stefan Beller

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 contrib/coccinelle/the_repository.pending.cocci | 10 ++
 pretty.c| 15 ---
 pretty.h|  7 ++-
 3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index f5b42cfc62..2ee702ecf7 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -132,3 +132,13 @@ expression G;
 - logmsg_reencode(
 + repo_logmsg_reencode(the_repository,
   E, F, G);
+
+@@
+expression E;
+expression F;
+expression G;
+expression H;
+@@
+- format_commit_message(
++ repo_format_commit_message(the_repository,
+  E, F, G, H);
diff --git a/pretty.c b/pretty.c
index b359b68750..3240495308 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1508,9 +1508,10 @@ void userformat_find_requirements(const char *fmt, 
struct userformat_want *w)
strbuf_release();
 }
 
-void format_commit_message(const struct commit *commit,
-  const char *format, struct strbuf *sb,
-  const struct pretty_print_context *pretty_ctx)
+void repo_format_commit_message(struct repository *r,
+   const struct commit *commit,
+   const char *format, struct strbuf *sb,
+   const struct pretty_print_context *pretty_ctx)
 {
struct format_commit_context context;
const char *output_enc = pretty_ctx->output_encoding;
@@ -1524,9 +1525,9 @@ void format_commit_message(const struct commit *commit,
 * convert a commit message to UTF-8 first
 * as far as 'format_commit_item' assumes it in UTF-8
 */
-   context.message = logmsg_reencode(commit,
- _encoding,
- utf8);
+   context.message = repo_logmsg_reencode(r, commit,
+  _encoding,
+  utf8);
 
strbuf_expand(sb, format, format_commit_item, );
rewrap_message_tail(sb, , 0, 0, 0);
@@ -1550,7 +1551,7 @@ void format_commit_message(const struct commit *commit,
}
 
free(context.commit_encoding);
-   unuse_commit_buffer(commit, context.message);
+   repo_unuse_commit_buffer(r, commit, context.message);
 }
 
 static void pp_header(struct pretty_print_context *pp,
diff --git a/pretty.h b/pretty.h
index 7359d318a9..e6625269cf 100644
--- a/pretty.h
+++ b/pretty.h
@@ -103,9 +103,14 @@ void pp_remainder(struct pretty_print_context *pp, const 
char **msg_p,
  * Put the result to "sb".
  * Please use this function for custom formats.
  */
-void format_commit_message(const struct commit *commit,
+void repo_format_commit_message(struct repository *r,
+   const struct commit *commit,
const char *format, struct strbuf *sb,
const struct pretty_print_context *context);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define format_commit_message(c, f, s, con) \
+   repo_format_commit_message(the_repository, c, f, s, con)
+#endif
 
 /*
  * Parse given arguments from "arg", check it for correctness and
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 16/23] commit: prepare logmsg_reencode to handle arbitrary repositories

2018-11-13 Thread Stefan Beller

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 commit.h|  8 
 contrib/coccinelle/the_repository.pending.cocci |  9 +
 pretty.c| 13 +++--
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/commit.h b/commit.h
index 57375e3239..2e6b799b26 100644
--- a/commit.h
+++ b/commit.h
@@ -180,6 +180,14 @@ extern int has_non_ascii(const char *text);
 extern const char *logmsg_reencode(const struct commit *commit,
   char **commit_encoding,
   const char *output_encoding);
+const char *repo_logmsg_reencode(struct repository *r,
+const struct commit *commit,
+char **commit_encoding,
+const char *output_encoding);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define logmsg_reencode(c, enc, out) repo_logmsg_reencode(the_repository, c, 
enc, out)
+#endif
+
 extern const char *skip_blank_lines(const char *msg);
 
 /** Removes the first commit from a list sorted by date, and adds all
diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index 516f19ffee..f5b42cfc62 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -123,3 +123,12 @@ expression F;
 - unuse_commit_buffer(
 + repo_unuse_commit_buffer(the_repository,
   E, F);
+
+@@
+expression E;
+expression F;
+expression G;
+@@
+- logmsg_reencode(
++ repo_logmsg_reencode(the_repository,
+  E, F, G);
diff --git a/pretty.c b/pretty.c
index 8ca29e9281..b359b68750 100644
--- a/pretty.c
+++ b/pretty.c
@@ -595,14 +595,15 @@ static char *replace_encoding_header(char *buf, const 
char *encoding)
return strbuf_detach(, NULL);
 }
 
-const char *logmsg_reencode(const struct commit *commit,
-   char **commit_encoding,
-   const char *output_encoding)
+const char *repo_logmsg_reencode(struct repository *r,
+const struct commit *commit,
+char **commit_encoding,
+const char *output_encoding)
 {
static const char *utf8 = "UTF-8";
const char *use_encoding;
char *encoding;
-   const char *msg = get_commit_buffer(commit, NULL);
+   const char *msg = repo_get_commit_buffer(r, commit, NULL);
char *out;
 
if (!output_encoding || !*output_encoding) {
@@ -630,7 +631,7 @@ const char *logmsg_reencode(const struct commit *commit,
 * the cached copy from get_commit_buffer, we need to duplicate 
it
 * to avoid munging the cached copy.
 */
-   if (msg == get_cached_commit_buffer(the_repository, commit, 
NULL))
+   if (msg == get_cached_commit_buffer(r, commit, NULL))
out = xstrdup(msg);
else
out = (char *)msg;
@@ -644,7 +645,7 @@ const char *logmsg_reencode(const struct commit *commit,
 */
out = reencode_string(msg, output_encoding, use_encoding);
if (out)
-   unuse_commit_buffer(commit, msg);
+   repo_unuse_commit_buffer(r, commit, msg);
}
 
/*
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 14/23] commit: prepare get_commit_buffer to handle any repo

2018-11-13 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 commit.c| 8 +---
 commit.h| 7 ++-
 contrib/coccinelle/the_repository.pending.cocci | 8 
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/commit.c b/commit.c
index 7a931d7fd4..4034def16c 100644
--- a/commit.c
+++ b/commit.c
@@ -297,13 +297,15 @@ const void *get_cached_commit_buffer(struct repository 
*r, const struct commit *
return v->buffer;
 }
 
-const void *get_commit_buffer(const struct commit *commit, unsigned long 
*sizep)
+const void *repo_get_commit_buffer(struct repository *r,
+  const struct commit *commit,
+  unsigned long *sizep)
 {
-   const void *ret = get_cached_commit_buffer(the_repository, commit, 
sizep);
+   const void *ret = get_cached_commit_buffer(r, commit, sizep);
if (!ret) {
enum object_type type;
unsigned long size;
-   ret = read_object_file(>object.oid, , );
+   ret = repo_read_object_file(r, >object.oid, , 
);
if (!ret)
die("cannot read commit object %s",
oid_to_hex(>object.oid));
diff --git a/commit.h b/commit.h
index 08935f9a19..591a77a5bb 100644
--- a/commit.h
+++ b/commit.h
@@ -117,7 +117,12 @@ const void *get_cached_commit_buffer(struct repository *, 
const struct commit *,
  * from disk. The resulting memory should not be modified, and must be given
  * to unuse_commit_buffer when the caller is done.
  */
-const void *get_commit_buffer(const struct commit *, unsigned long *size);
+const void *repo_get_commit_buffer(struct repository *r,
+  const struct commit *,
+  unsigned long *size);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define get_commit_buffer(c, s) repo_get_commit_buffer(the_repository, c, s)
+#endif
 
 /*
  * Tell the commit subsytem that we are done with a particular commit buffer.
diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index 8c6a71bf64..4018e6eaf7 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -107,3 +107,11 @@ expression G;
 - in_merge_bases_many(
 + repo_in_merge_bases_many(the_repository,
   E, F, G);
+
+@@
+expression E;
+expression F;
+@@
+- get_commit_buffer(
++ repo_get_commit_buffer(the_repository,
+  E, F);
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 07/23] commit: allow parse_commit* to handle any repo

2018-11-13 Thread Stefan Beller

Just like the previous commit, parse_commit and friends are used a lot
and are found in new patches, so we cannot change their signature easily.

Re-introduce these function prefixed with 'repo_' that take a repository
argument and keep the original as a shallow macro.

Signed-off-by: Stefan Beller 
---
 commit.c  | 18 --
 commit.h  | 17 +
 .../coccinelle/the_repository.pending.cocci   | 24 +++
 3 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/commit.c b/commit.c
index dc8a39d52a..7a931d7fd4 100644
--- a/commit.c
+++ b/commit.c
@@ -443,7 +443,10 @@ int parse_commit_buffer(struct repository *r, struct 
commit *item, const void *b
return 0;
 }
 
-int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph)
+int repo_parse_commit_internal(struct repository *r,
+  struct commit *item,
+  int quiet_on_missing,
+  int use_commit_graph)
 {
enum object_type type;
void *buffer;
@@ -454,9 +457,9 @@ int parse_commit_internal(struct commit *item, int 
quiet_on_missing, int use_com
return -1;
if (item->object.parsed)
return 0;
-   if (use_commit_graph && parse_commit_in_graph(the_repository, item))
+   if (use_commit_graph && parse_commit_in_graph(r, item))
return 0;
-   buffer = read_object_file(>object.oid, , );
+   buffer = repo_read_object_file(r, >object.oid, , );
if (!buffer)
return quiet_on_missing ? -1 :
error("Could not read %s",
@@ -467,18 +470,19 @@ int parse_commit_internal(struct commit *item, int 
quiet_on_missing, int use_com
 oid_to_hex(>object.oid));
}
 
-   ret = parse_commit_buffer(the_repository, item, buffer, size, 0);
+   ret = parse_commit_buffer(r, item, buffer, size, 0);
if (save_commit_buffer && !ret) {
-   set_commit_buffer(the_repository, item, buffer, size);
+   set_commit_buffer(r, item, buffer, size);
return 0;
}
free(buffer);
return ret;
 }
 
-int parse_commit_gently(struct commit *item, int quiet_on_missing)
+int repo_parse_commit_gently(struct repository *r,
+struct commit *item, int quiet_on_missing)
 {
-   return parse_commit_internal(item, quiet_on_missing, 1);
+   return repo_parse_commit_internal(r, item, quiet_on_missing, 1);
 }
 
 void parse_commit_or_die(struct commit *item)
diff --git a/commit.h b/commit.h
index 1d260d62f5..08935f9a19 100644
--- a/commit.h
+++ b/commit.h
@@ -79,12 +79,21 @@ struct commit *lookup_commit_reference_by_name(const char 
*name);
 struct commit *lookup_commit_or_die(const struct object_id *oid, const char 
*ref_name);
 
 int parse_commit_buffer(struct repository *r, struct commit *item, const void 
*buffer, unsigned long size, int check_graph);
-int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph);
-int parse_commit_gently(struct commit *item, int quiet_on_missing);
-static inline int parse_commit(struct commit *item)
+int repo_parse_commit_internal(struct repository *r, struct commit *item,
+  int quiet_on_missing, int use_commit_graph);
+int repo_parse_commit_gently(struct repository *r,
+struct commit *item,
+int quiet_on_missing);
+static inline int repo_parse_commit(struct repository *r, struct commit *item)
 {
-   return parse_commit_gently(item, 0);
+   return repo_parse_commit_gently(r, item, 0);
 }
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define parse_commit_internal(item, quiet, use) 
repo_parse_commit_internal(the_repository, item, quiet, use)
+#define parse_commit_gently(item, quiet) 
repo_parse_commit_gently(the_repository, item, quiet)
+#define parse_commit(item) repo_parse_commit(the_repository, item)
+#endif
+
 void parse_commit_or_die(struct commit *item);
 
 struct buffer_slab;
diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index 46f3a1b23a..b185fe0a1d 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -40,3 +40,27 @@ expression F;
 - has_object_file_with_flags(
 + repo_has_object_file_with_flags(the_repository,
   E)
+
+@@
+expression E;
+expression F;
+expression G;
+@@
+- parse_commit_internal(
++ repo_parse_commit_internal(the_repository,
+  E, F, G)
+
+@@
+expression E;
+expression F;
+@@
+- parse_commit_gently(
++ repo_parse_commit_gently(the_repository,
+  E, F)
+
+@@
+expression E;
+@@
+- parse_commit(
++ repo_parse_commit(the_repository,
+  E)
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 19/23] submodule: don't add submodule as odb for push

2018-11-13 Thread Stefan Beller

In push_submodule(), because we do not actually need access to objects
in the submodule, do not invoke add_submodule_odb().
(for_each_remote_ref_submodule() does not require access to those
objects, and the actual push is done by spawning another process,
which handles object access by itself.)

This code of push_submodule() is exercised in t5531 and continues
to work, showing that the submodule odbc is not needed.

Signed-off-by: Stefan Beller 
---
 submodule.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index 262f03f118..5818088df2 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1021,9 +1021,6 @@ static int push_submodule(const char *path,
  const struct string_list *push_options,
  int dry_run)
 {
-   if (add_submodule_odb(path))
-   return 1;
-
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
struct child_process cp = CHILD_PROCESS_INIT;
argv_array_push(, "push");
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 08/23] commit-reach.c: allow paint_down_to_common to handle any repo

2018-11-13 Thread Stefan Beller

As the function is file local and not widely used, migrate it all at once.

Signed-off-by: Stefan Beller 
---
 commit-reach.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 9f79ce0a22..67c2e43d5e 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -30,7 +30,8 @@ static int queue_has_nonstale(struct prio_queue *queue)
 }
 
 /* all input commits in one and twos[] must have been parsed! */
-static struct commit_list *paint_down_to_common(struct commit *one, int n,
+static struct commit_list *paint_down_to_common(struct repository *r,
+   struct commit *one, int n,
struct commit **twos,
int min_generation)
 {
@@ -83,7 +84,7 @@ static struct commit_list *paint_down_to_common(struct commit 
*one, int n,
parents = parents->next;
if ((p->object.flags & flags) == flags)
continue;
-   if (parse_commit(p))
+   if (repo_parse_commit(r, p))
return NULL;
p->object.flags |= flags;
prio_queue_put(, p);
@@ -116,7 +117,7 @@ static struct commit_list *merge_bases_many(struct commit 
*one, int n, struct co
return NULL;
}
 
-   list = paint_down_to_common(one, n, twos, 0);
+   list = paint_down_to_common(the_repository, one, n, twos, 0);
 
while (list) {
struct commit *commit = pop_commit();
@@ -187,8 +188,8 @@ static int remove_redundant(struct commit **array, int cnt)
if (array[j]->generation < min_generation)
min_generation = array[j]->generation;
}
-   common = paint_down_to_common(array[i], filled, work,
- min_generation);
+   common = paint_down_to_common(the_repository, array[i], filled,
+ work, min_generation);
if (array[i]->object.flags & PARENT2)
redundant[i] = 1;
for (j = 0; j < filled; j++)
@@ -322,7 +323,9 @@ int in_merge_bases_many(struct commit *commit, int 
nr_reference, struct commit *
if (commit->generation > min_generation)
return ret;
 
-   bases = paint_down_to_common(commit, nr_reference, reference, 
commit->generation);
+   bases = paint_down_to_common(the_repository, commit,
+nr_reference, reference,
+commit->generation);
if (commit->object.flags & PARENT2)
ret = 1;
clear_commit_marks(commit, all_flags);
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 05/23] object-store: prepare has_{sha1, object}_file to handle any repo

2018-11-13 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 .../coccinelle/the_repository.pending.cocci   | 30 +++
 object-store.h| 22 ++
 sha1-file.c   | 15 ++
 3 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index a7ac9e0c46..46f3a1b23a 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -10,3 +10,33 @@ expression G;
 - read_object_file(
 + repo_read_object_file(the_repository,
   E, F, G)
+
+@@
+expression E;
+@@
+- has_sha1_file(
++ repo_has_sha1_file(the_repository,
+  E)
+
+@@
+expression E;
+expression F;
+@@
+- has_sha1_file_with_flags(
++ repo_has_sha1_file_with_flags(the_repository,
+  E)
+
+@@
+expression E;
+@@
+- has_object_file(
++ repo_has_object_file(the_repository,
+  E)
+
+@@
+expression E;
+expression F;
+@@
+- has_object_file_with_flags(
++ repo_has_object_file_with_flags(the_repository,
+  E)
diff --git a/object-store.h b/object-store.h
index 00a64622e6..2b5e6ff1ed 100644
--- a/object-store.h
+++ b/object-store.h
@@ -212,15 +212,27 @@ int read_loose_object(const char *path,
  * object_info. OBJECT_INFO_SKIP_CACHED is automatically set; pass
  * nonzero flags to also set other flags.
  */
-extern int has_sha1_file_with_flags(const unsigned char *sha1, int flags);
-static inline int has_sha1_file(const unsigned char *sha1)
+int repo_has_sha1_file_with_flags(struct repository *r,
+ const unsigned char *sha1, int flags);
+static inline int repo_has_sha1_file(struct repository *r,
+const unsigned char *sha1)
 {
-   return has_sha1_file_with_flags(sha1, 0);
+   return repo_has_sha1_file_with_flags(r, sha1, 0);
 }
 
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define has_sha1_file_with_flags(sha1, flags) 
repo_has_sha1_file_with_flags(the_repository, sha1, flags)
+#define has_sha1_file(sha1) repo_has_sha1_file(the_repository, sha1)
+#endif
+
 /* Same as the above, except for struct object_id. */
-extern int has_object_file(const struct object_id *oid);
-extern int has_object_file_with_flags(const struct object_id *oid, int flags);
+int repo_has_object_file(struct repository *r, const struct object_id *oid);
+int repo_has_object_file_with_flags(struct repository *r,
+   const struct object_id *oid, int flags);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define has_object_file(oid) repo_has_object_file(the_repository, oid)
+#define has_object_file_with_flags(oid, flags) 
repo_has_object_file_with_flags(the_repository, oid, flags)
+#endif
 
 /*
  * Return true iff an alternate object database has a loose object
diff --git a/sha1-file.c b/sha1-file.c
index c5b704aec5..e77273ccfd 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1768,24 +1768,27 @@ int force_object_loose(const struct object_id *oid, 
time_t mtime)
return ret;
 }
 
-int has_sha1_file_with_flags(const unsigned char *sha1, int flags)
+int repo_has_sha1_file_with_flags(struct repository *r,
+ const unsigned char *sha1, int flags)
 {
struct object_id oid;
if (!startup_info->have_repository)
return 0;
hashcpy(oid.hash, sha1);
-   return oid_object_info_extended(the_repository, , NULL,
+   return oid_object_info_extended(r, , NULL,
flags | OBJECT_INFO_SKIP_CACHED) >= 0;
 }
 
-int has_object_file(const struct object_id *oid)
+int repo_has_object_file(struct repository *r,
+const struct object_id *oid)
 {
-   return has_sha1_file(oid->hash);
+   return repo_has_sha1_file(r, oid->hash);
 }
 
-int has_object_file_with_flags(const struct object_id *oid, int flags)
+int repo_has_object_file_with_flags(struct repository *r,
+   const struct object_id *oid, int flags)
 {
-   return has_sha1_file_with_flags(oid->hash, flags);
+   return repo_has_sha1_file_with_flags(r, oid->hash, flags);
 }
 
 static void check_tree(const void *buf, size_t size)
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 15/23] commit: prepare repo_unuse_commit_buffer to handle any repo

2018-11-13 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 commit.c| 6 --
 commit.h| 7 ++-
 contrib/coccinelle/the_repository.pending.cocci | 8 
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/commit.c b/commit.c
index 4034def16c..7d2f3a9a93 100644
--- a/commit.c
+++ b/commit.c
@@ -318,10 +318,12 @@ const void *repo_get_commit_buffer(struct repository *r,
return ret;
 }
 
-void unuse_commit_buffer(const struct commit *commit, const void *buffer)
+void repo_unuse_commit_buffer(struct repository *r,
+ const struct commit *commit,
+ const void *buffer)
 {
struct commit_buffer *v = buffer_slab_peek(
-   the_repository->parsed_objects->buffer_slab, commit);
+   r->parsed_objects->buffer_slab, commit);
if (!(v && v->buffer == buffer))
free((void *)buffer);
 }
diff --git a/commit.h b/commit.h
index 591a77a5bb..57375e3239 100644
--- a/commit.h
+++ b/commit.h
@@ -130,7 +130,12 @@ const void *repo_get_commit_buffer(struct repository *r,
  * from an earlier call to get_commit_buffer.  The buffer may or may not be
  * freed by this call; callers should not access the memory afterwards.
  */
-void unuse_commit_buffer(const struct commit *, const void *buffer);
+void repo_unuse_commit_buffer(struct repository *r,
+ const struct commit *,
+ const void *buffer);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define unuse_commit_buffer(c, b) repo_unuse_commit_buffer(the_repository, c, 
b)
+#endif
 
 /*
  * Free any cached object buffer associated with the commit.
diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index 4018e6eaf7..516f19ffee 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -115,3 +115,11 @@ expression F;
 - get_commit_buffer(
 + repo_get_commit_buffer(the_repository,
   E, F);
+
+@@
+expression E;
+expression F;
+@@
+- unuse_commit_buffer(
++ repo_unuse_commit_buffer(the_repository,
+  E, F);
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 06/23] object: parse_object to honor its repository argument

2018-11-13 Thread Stefan Beller

In 8e4b0b6047 (object.c: allow parse_object to handle
arbitrary repositories, 2018-06-28), we forgot to pass the
repository down to the read_object_file.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 object.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/object.c b/object.c
index e54160550c..003f870484 100644
--- a/object.c
+++ b/object.c
@@ -259,8 +259,8 @@ struct object *parse_object(struct repository *r, const 
struct object_id *oid)
if (obj && obj->parsed)
return obj;
 
-   if ((obj && obj->type == OBJ_BLOB && has_object_file(oid)) ||
-   (!obj && has_object_file(oid) &&
+   if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
+   (!obj && repo_has_object_file(r, oid) &&
 oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
if (check_object_signature(repl, NULL, 0, NULL) < 0) {
error(_("sha1 mismatch %s"), oid_to_hex(oid));
@@ -270,7 +270,7 @@ struct object *parse_object(struct repository *r, const 
struct object_id *oid)
return lookup_object(r, oid->hash);
}
 
-   buffer = read_object_file(oid, , );
+   buffer = repo_read_object_file(r, oid, , );
if (buffer) {
if (check_object_signature(repl, buffer, size, type_name(type)) 
< 0) {
free(buffer);
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 12/23] commit-reach: prepare get_merge_bases to handle any repo

2018-11-13 Thread Stefan Beller

Similarly to previous patches, the get_merge_base functions are used
often in the code base, which makes migrating them hard.

Implement the new functions, prefixed with 'repo_' and hide the old
functions behind a wrapper macro.

Signed-off-by: Stefan Beller 
---
 commit-reach.c| 24 ++---
 commit-reach.h| 26 ---
 .../coccinelle/the_repository.pending.cocci   | 26 +++
 3 files changed, 56 insertions(+), 20 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index b3b1f62aba..657a4e9b5a 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -258,23 +258,27 @@ static struct commit_list *get_merge_bases_many_0(struct 
repository *r,
return result;
 }
 
-struct commit_list *get_merge_bases_many(struct commit *one,
-int n,
-struct commit **twos)
+struct commit_list *repo_get_merge_bases_many(struct repository *r,
+ struct commit *one,
+ int n,
+ struct commit **twos)
 {
-   return get_merge_bases_many_0(the_repository, one, n, twos, 1);
+   return get_merge_bases_many_0(r, one, n, twos, 1);
 }
 
-struct commit_list *get_merge_bases_many_dirty(struct commit *one,
-  int n,
-  struct commit **twos)
+struct commit_list *repo_get_merge_bases_many_dirty(struct repository *r,
+   struct commit *one,
+   int n,
+   struct commit **twos)
 {
-   return get_merge_bases_many_0(the_repository, one, n, twos, 0);
+   return get_merge_bases_many_0(r, one, n, twos, 0);
 }
 
-struct commit_list *get_merge_bases(struct commit *one, struct commit *two)
+struct commit_list *repo_get_merge_bases(struct repository *r,
+struct commit *one,
+struct commit *two)
 {
-   return get_merge_bases_many_0(the_repository, one, 1, , 1);
+   return get_merge_bases_many_0(r, one, 1, , 1);
 }
 
 /*
diff --git a/commit-reach.h b/commit-reach.h
index 7d313e2975..52667d64ac 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -8,17 +8,23 @@ struct commit_list;
 struct contains_cache;
 struct ref_filter;
 
-struct commit_list *get_merge_bases_many(struct commit *one,
-int n,
-struct commit **twos);
-struct commit_list *get_merge_bases_many_dirty(struct commit *one,
-  int n,
-  struct commit **twos);
-struct commit_list *get_merge_bases(struct commit *one, struct commit *two);
-struct commit_list *get_octopus_merge_bases(struct commit_list *in);
-
+struct commit_list *repo_get_merge_bases(struct repository *r,
+struct commit *rev1,
+struct commit *rev2);
+struct commit_list *repo_get_merge_bases_many(struct repository *r,
+ struct commit *one, int n,
+ struct commit **twos);
 /* To be used only when object flags after this call no longer matter */
-struct commit_list *get_merge_bases_many_dirty(struct commit *one, int n, 
struct commit **twos);
+struct commit_list *repo_get_merge_bases_many_dirty(struct repository *r,
+   struct commit *one, int n,
+   struct commit **twos);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define get_merge_bases(r1, r2)   repo_get_merge_bases(the_repository, 
r1, r2)
+#define get_merge_bases_many(one, n, two) 
repo_get_merge_bases_many(the_repository, one, n, two)
+#define get_merge_bases_many_dirty(one, n, twos) 
repo_get_merge_bases_many_dirty(the_repository, one, n, twos)
+#endif
+
+struct commit_list *get_octopus_merge_bases(struct commit_list *in);
 
 int is_descendant_of(struct commit *commit, struct commit_list *with_commit);
 int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit 
**reference);
diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index b185fe0a1d..f6c2915a4e 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -64,3 +64,29 @@ expression E;
 - parse_commit(
 + repo_parse_commit(the_repository,
   E)
+
+@@
+expression E;
+expression F;
+@@
+- get_merge_bases(
++ repo_get_merge_bases(the_repository,
+  E, F);
+
+@@
+expression E;
+expression F;
+expression G;
+@@

[PATCH 04/23] object-store: prepare read_object_file to deal with any repo

2018-11-13 Thread Stefan Beller

As read_object_file is a widely used function (which is also regularly used
in new code in flight between master..pu), changing its signature is painful
is hard, as other series in flight rely on the original signature. It would
burden the maintainer if we'd just change the signature.

Introduce repo_read_object_file which takes the repository argument, and
hide the original read_object_file as a macro behind
NO_THE_REPOSITORY_COMPATIBILITY_MACROS, similar to
e675765235 (diff.c: remove implicit dependency on the_index, 2018-09-21)

Add a coccinelle patch to convert existing callers, but do not apply
the resulting patch to keep the diff of this patch small.

Signed-off-by: Stefan Beller 
---
 contrib/coccinelle/the_repository.pending.cocci | 12 
 object-store.h  | 10 --
 2 files changed, 20 insertions(+), 2 deletions(-)
 create mode 100644 contrib/coccinelle/the_repository.pending.cocci

diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
new file mode 100644
index 00..a7ac9e0c46
--- /dev/null
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -0,0 +1,12 @@
+// This file is used for the ongoing refactoring of
+// bringing the index or repository struct in all of
+// our code base.
+
+@@
+expression E;
+expression F;
+expression G;
+@@
+- read_object_file(
++ repo_read_object_file(the_repository,
+  E, F, G)
diff --git a/object-store.h b/object-store.h
index 3d98a682b2..00a64622e6 100644
--- a/object-store.h
+++ b/object-store.h
@@ -165,10 +165,16 @@ extern void *read_object_file_extended(struct repository 
*r,
   const struct object_id *oid,
   enum object_type *type,
   unsigned long *size, int lookup_replace);
-static inline void *read_object_file(const struct object_id *oid, enum 
object_type *type, unsigned long *size)
+static inline void *repo_read_object_file(struct repository *r,
+ const struct object_id *oid,
+ enum object_type *type,
+ unsigned long *size)
 {
-   return read_object_file_extended(the_repository, oid, type, size, 1);
+   return read_object_file_extended(r, oid, type, size, 1);
 }
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define read_object_file(oid, type, size) 
repo_read_object_file(the_repository, oid, type, size)
+#endif
 
 /* Read and unpack an object file into memory, write memory to an object file 
*/
 int oid_object_info(struct repository *r, const struct object_id *, unsigned 
long *);
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 13/23] commit-reach: prepare in_merge_bases[_many] to handle any repo

2018-11-13 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 commit-reach.c  | 15 +--
 commit-reach.h  | 12 ++--
 contrib/coccinelle/the_repository.pending.cocci | 17 +
 3 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 657a4e9b5a..8715008fef 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -312,16 +312,17 @@ int is_descendant_of(struct commit *commit, struct 
commit_list *with_commit)
 /*
  * Is "commit" an ancestor of one of the "references"?
  */
-int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit 
**reference)
+int repo_in_merge_bases_many(struct repository *r, struct commit *commit,
+int nr_reference, struct commit **reference)
 {
struct commit_list *bases;
int ret = 0, i;
uint32_t min_generation = GENERATION_NUMBER_INFINITY;
 
-   if (parse_commit(commit))
+   if (repo_parse_commit(r, commit))
return ret;
for (i = 0; i < nr_reference; i++) {
-   if (parse_commit(reference[i]))
+   if (repo_parse_commit(r, reference[i]))
return ret;
if (reference[i]->generation < min_generation)
min_generation = reference[i]->generation;
@@ -330,7 +331,7 @@ int in_merge_bases_many(struct commit *commit, int 
nr_reference, struct commit *
if (commit->generation > min_generation)
return ret;
 
-   bases = paint_down_to_common(the_repository, commit,
+   bases = paint_down_to_common(r, commit,
 nr_reference, reference,
 commit->generation);
if (commit->object.flags & PARENT2)
@@ -344,9 +345,11 @@ int in_merge_bases_many(struct commit *commit, int 
nr_reference, struct commit *
 /*
  * Is "commit" an ancestor of (i.e. reachable from) the "reference"?
  */
-int in_merge_bases(struct commit *commit, struct commit *reference)
+int repo_in_merge_bases(struct repository *r,
+   struct commit *commit,
+   struct commit *reference)
 {
-   return in_merge_bases_many(commit, 1, );
+   return repo_in_merge_bases_many(r, commit, 1, );
 }
 
 struct commit_list *reduce_heads(struct commit_list *heads)
diff --git a/commit-reach.h b/commit-reach.h
index 52667d64ac..a0d4a29d25 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -27,8 +27,16 @@ struct commit_list *repo_get_merge_bases_many_dirty(struct 
repository *r,
 struct commit_list *get_octopus_merge_bases(struct commit_list *in);
 
 int is_descendant_of(struct commit *commit, struct commit_list *with_commit);
-int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit 
**reference);
-int in_merge_bases(struct commit *commit, struct commit *reference);
+int repo_in_merge_bases(struct repository *r,
+   struct commit *commit,
+   struct commit *reference);
+int repo_in_merge_bases_many(struct repository *r,
+struct commit *commit,
+int nr_reference, struct commit **reference);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define in_merge_bases(c1, c2) repo_in_merge_bases(the_repository, c1, c2)
+#define in_merge_bases_many(c1, n, cs) 
repo_in_merge_bases_many(the_repository, c1, n, cs)
+#endif
 
 /*
  * Takes a list of commits and returns a new list where those
diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index f6c2915a4e..8c6a71bf64 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -90,3 +90,20 @@ expression G;
 - get_merge_bases_many_dirty(
 + repo_get_merge_bases_many_dirty(the_repository,
   E, F, G);
+
+@@
+expression E;
+expression F;
+@@
+- in_merge_bases(
++ repo_in_merge_bases(the_repository,
+  E, F);
+
+@@
+expression E;
+expression F;
+expression G;
+@@
+- in_merge_bases_many(
++ repo_in_merge_bases_many(the_repository,
+  E, F, G);
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 11/23] commit-reach.c: allow get_merge_bases_many_0 to handle any repo

2018-11-13 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 commit-reach.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 81015830cb..b3b1f62aba 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -216,7 +216,8 @@ static int remove_redundant(struct repository *r, struct 
commit **array, int cnt
return filled;
 }
 
-static struct commit_list *get_merge_bases_many_0(struct commit *one,
+static struct commit_list *get_merge_bases_many_0(struct repository *r,
+ struct commit *one,
  int n,
  struct commit **twos,
  int cleanup)
@@ -226,7 +227,7 @@ static struct commit_list *get_merge_bases_many_0(struct 
commit *one,
struct commit_list *result;
int cnt, i;
 
-   result = merge_bases_many(the_repository, one, n, twos);
+   result = merge_bases_many(r, one, n, twos);
for (i = 0; i < n; i++) {
if (one == twos[i])
return result;
@@ -249,7 +250,7 @@ static struct commit_list *get_merge_bases_many_0(struct 
commit *one,
clear_commit_marks(one, all_flags);
clear_commit_marks_many(n, twos, all_flags);
 
-   cnt = remove_redundant(the_repository, rslt, cnt);
+   cnt = remove_redundant(r, rslt, cnt);
result = NULL;
for (i = 0; i < cnt; i++)
commit_list_insert_by_date(rslt[i], );
@@ -261,19 +262,19 @@ struct commit_list *get_merge_bases_many(struct commit 
*one,
 int n,
 struct commit **twos)
 {
-   return get_merge_bases_many_0(one, n, twos, 1);
+   return get_merge_bases_many_0(the_repository, one, n, twos, 1);
 }
 
 struct commit_list *get_merge_bases_many_dirty(struct commit *one,
   int n,
   struct commit **twos)
 {
-   return get_merge_bases_many_0(one, n, twos, 0);
+   return get_merge_bases_many_0(the_repository, one, n, twos, 0);
 }
 
 struct commit_list *get_merge_bases(struct commit *one, struct commit *two)
 {
-   return get_merge_bases_many_0(one, 1, , 1);
+   return get_merge_bases_many_0(the_repository, one, 1, , 1);
 }
 
 /*
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 03/23] object-store: allow read_object_file_extended to read from any repo

2018-11-13 Thread Stefan Beller

read_object_file_extended is not widely used, so migrate it all at once.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 object-store.h |  5 +++--
 sha1-file.c| 11 ++-
 streaming.c|  2 +-
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/object-store.h b/object-store.h
index 63b7605a3e..3d98a682b2 100644
--- a/object-store.h
+++ b/object-store.h
@@ -161,12 +161,13 @@ void sha1_file_name(struct repository *r, struct strbuf 
*buf, const unsigned cha
 
 void *map_sha1_file(struct repository *r, const unsigned char *sha1, unsigned 
long *size);
 
-extern void *read_object_file_extended(const struct object_id *oid,
+extern void *read_object_file_extended(struct repository *r,
+  const struct object_id *oid,
   enum object_type *type,
   unsigned long *size, int lookup_replace);
 static inline void *read_object_file(const struct object_id *oid, enum 
object_type *type, unsigned long *size)
 {
-   return read_object_file_extended(oid, type, size, 1);
+   return read_object_file_extended(the_repository, oid, type, size, 1);
 }
 
 /* Read and unpack an object file into memory, write memory to an object file 
*/
diff --git a/sha1-file.c b/sha1-file.c
index 856e000ee1..c5b704aec5 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1403,7 +1403,8 @@ int pretend_object_file(void *buf, unsigned long len, 
enum object_type type,
  * deal with them should arrange to call read_object() and give error
  * messages themselves.
  */
-void *read_object_file_extended(const struct object_id *oid,
+void *read_object_file_extended(struct repository *r,
+   const struct object_id *oid,
enum object_type *type,
unsigned long *size,
int lookup_replace)
@@ -1413,10 +1414,10 @@ void *read_object_file_extended(const struct object_id 
*oid,
const char *path;
struct stat st;
const struct object_id *repl = lookup_replace ?
-   lookup_replace_object(the_repository, oid) : oid;
+   lookup_replace_object(r, oid) : oid;
 
errno = 0;
-   data = read_object(the_repository, repl->hash, type, size);
+   data = read_object(r, repl->hash, type, size);
if (data)
return data;
 
@@ -1428,11 +1429,11 @@ void *read_object_file_extended(const struct object_id 
*oid,
die(_("replacement %s not found for %s"),
oid_to_hex(repl), oid_to_hex(oid));
 
-   if (!stat_sha1_file(the_repository, repl->hash, , ))
+   if (!stat_sha1_file(r, repl->hash, , ))
die(_("loose object %s (stored in %s) is corrupt"),
oid_to_hex(repl), path);
 
-   if ((p = has_packed_and_bad(the_repository, repl->hash)) != NULL)
+   if ((p = has_packed_and_bad(r, repl->hash)) != NULL)
die(_("packed object %s (stored in %s) is corrupt"),
oid_to_hex(repl), p->pack_name);
 
diff --git a/streaming.c b/streaming.c
index d1e6b2dce6..c843a1230f 100644
--- a/streaming.c
+++ b/streaming.c
@@ -490,7 +490,7 @@ static struct stream_vtbl incore_vtbl = {
 
 static open_method_decl(incore)
 {
-   st->u.incore.buf = read_object_file_extended(oid, type, >size, 0);
+   st->u.incore.buf = read_object_file_extended(the_repository, oid, type, 
>size, 0);
st->u.incore.read_ptr = 0;
st->vtbl = _vtbl;
 
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 09/23] commit-reach.c: allow merge_bases_many to handle any repo

2018-11-13 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 commit-reach.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 67c2e43d5e..a53b31e6a2 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -95,7 +95,9 @@ static struct commit_list *paint_down_to_common(struct 
repository *r,
return result;
 }
 
-static struct commit_list *merge_bases_many(struct commit *one, int n, struct 
commit **twos)
+static struct commit_list *merge_bases_many(struct repository *r,
+   struct commit *one, int n,
+   struct commit **twos)
 {
struct commit_list *list = NULL;
struct commit_list *result = NULL;
@@ -110,14 +112,14 @@ static struct commit_list *merge_bases_many(struct commit 
*one, int n, struct co
return commit_list_insert(one, );
}
 
-   if (parse_commit(one))
+   if (repo_parse_commit(r, one))
return NULL;
for (i = 0; i < n; i++) {
-   if (parse_commit(twos[i]))
+   if (repo_parse_commit(r, twos[i]))
return NULL;
}
 
-   list = paint_down_to_common(the_repository, one, n, twos, 0);
+   list = paint_down_to_common(r, one, n, twos, 0);
 
while (list) {
struct commit *commit = pop_commit();
@@ -224,7 +226,7 @@ static struct commit_list *get_merge_bases_many_0(struct 
commit *one,
struct commit_list *result;
int cnt, i;
 
-   result = merge_bases_many(one, n, twos);
+   result = merge_bases_many(the_repository, one, n, twos);
for (i = 0; i < n; i++) {
if (one == twos[i])
return result;
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 10/23] commit-reach.c: allow remove_redundant to handle any repo

2018-11-13 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 commit-reach.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index a53b31e6a2..81015830cb 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -156,7 +156,7 @@ struct commit_list *get_octopus_merge_bases(struct 
commit_list *in)
return ret;
 }
 
-static int remove_redundant(struct commit **array, int cnt)
+static int remove_redundant(struct repository *r, struct commit **array, int 
cnt)
 {
/*
 * Some commit in the array may be an ancestor of
@@ -174,7 +174,7 @@ static int remove_redundant(struct commit **array, int cnt)
ALLOC_ARRAY(filled_index, cnt - 1);
 
for (i = 0; i < cnt; i++)
-   parse_commit(array[i]);
+   repo_parse_commit(r, array[i]);
for (i = 0; i < cnt; i++) {
struct commit_list *common;
uint32_t min_generation = array[i]->generation;
@@ -190,7 +190,7 @@ static int remove_redundant(struct commit **array, int cnt)
if (array[j]->generation < min_generation)
min_generation = array[j]->generation;
}
-   common = paint_down_to_common(the_repository, array[i], filled,
+   common = paint_down_to_common(r, array[i], filled,
  work, min_generation);
if (array[i]->object.flags & PARENT2)
redundant[i] = 1;
@@ -249,7 +249,7 @@ static struct commit_list *get_merge_bases_many_0(struct 
commit *one,
clear_commit_marks(one, all_flags);
clear_commit_marks_many(n, twos, all_flags);
 
-   cnt = remove_redundant(rslt, cnt);
+   cnt = remove_redundant(the_repository, rslt, cnt);
result = NULL;
for (i = 0; i < cnt; i++)
commit_list_insert_by_date(rslt[i], );
@@ -370,7 +370,7 @@ struct commit_list *reduce_heads(struct commit_list *heads)
p->item->object.flags &= ~STALE;
}
}
-   num_head = remove_redundant(array, num_head);
+   num_head = remove_redundant(the_repository, array, num_head);
for (i = 0; i < num_head; i++)
tail = _list_insert(array[i], tail)->next;
free(array);
-- 
2.19.1.1215.g8438c0b245-goog

[PATCHv3 00/23] Bring more repository handles into our code base

2018-11-13 Thread Stefan Beller

This resends origin/sb/more-repo-in-api.

Unlike the previous resend (v2), this is not rebased to a newer base.
This doesn't contain the patch for pending semantic changes, as that
seems to go in its own topic branch.

Please have a look at the last 4 patches specifically as they were new in
the last iteration (but did not receive any comment), as they demonstrate
and fix a problem that is only exposed when using GIT_TEST_COMMIT_GRAPH=1
for the test suite.

Previous discussion at
https://public-inbox.org/git/20181030220817.61691-1-sbel...@google.com/

Thanks,
Stefan

Stefan Beller (23):
  sha1_file: allow read_object to read objects in arbitrary repositories
  packfile: allow has_packed_and_bad to handle arbitrary repositories
  object-store: allow read_object_file_extended to read from any repo
  object-store: prepare read_object_file to deal with any repo
  object-store: prepare has_{sha1, object}_file to handle any repo
  object: parse_object to honor its repository argument
  commit: allow parse_commit* to handle any repo
  commit-reach.c: allow paint_down_to_common to handle any repo
  commit-reach.c: allow merge_bases_many to handle any repo
  commit-reach.c: allow remove_redundant to handle any repo
  commit-reach.c: allow get_merge_bases_many_0 to handle any repo
  commit-reach: prepare get_merge_bases to handle any repo
  commit-reach: prepare in_merge_bases[_many] to handle any repo
  commit: prepare get_commit_buffer to handle any repo
  commit: prepare repo_unuse_commit_buffer to handle any repo
  commit: prepare logmsg_reencode to handle arbitrary repositories
  pretty: prepare format_commit_message to handle arbitrary repositories
  submodule: use submodule repos for object lookup
  submodule: don't add submodule as odb for push
  commit-graph: convert remaining functions to handle any repo
  commit: prepare free_commit_buffer and release_commit_memory for any
repo
  path.h: make REPO_GIT_PATH_FUNC repository agnostic
  t/helper/test-repository: celebrate independence from the_repository

 builtin/fsck.c|   3 +-
 builtin/log.c |   6 +-
 builtin/rev-list.c|   3 +-
 cache.h   |   2 +
 commit-graph.c|  40 +++--
 commit-reach.c|  73 +
 commit-reach.h|  38 +++--
 commit.c  |  41 ++---
 commit.h  |  43 +-
 .../coccinelle/the_repository.pending.cocci   | 144 ++
 object-store.h|  35 -
 object.c  |   8 +-
 packfile.c|   5 +-
 packfile.h|   2 +-
 path.h|   2 +-
 pretty.c  |  28 ++--
 pretty.h  |   7 +-
 sha1-file.c   |  34 +++--
 streaming.c   |   2 +-
 submodule.c   |  76 ++---
 t/helper/test-repository.c|  10 ++
 21 files changed, 452 insertions(+), 150 deletions(-)
 create mode 100644 contrib/coccinelle/the_repository.pending.cocci

Range-diff:
 1:  1b9b5c695e =  1:  ca9fece80e sha1_file: allow read_object to read objects 
in arbitrary repositories
 2:  33b94066f2 =  2:  8eac25fe32 packfile: allow has_packed_and_bad to handle 
arbitrary repositories
 3:  5217b6b1e1 !  3:  06e5f83b66 object-store: allow read_object_file_extended 
to read from arbitrary repositories
@@ -1,6 +1,6 @@
 Author: Stefan Beller 
 
-object-store: allow read_object_file_extended to read from arbitrary 
repositories
+object-store: allow read_object_file_extended to read from any repo
 
 read_object_file_extended is not widely used, so migrate it all at 
once.
 
 4:  2b7239b55b !  4:  6167722608 object-store: prepare read_object_file to 
deal with arbitrary repositories
@@ -1,6 +1,6 @@
 Author: Stefan Beller 
 
-object-store: prepare read_object_file to deal with arbitrary 
repositories
+object-store: prepare read_object_file to deal with any repo
 
 As read_object_file is a widely used function (which is also regularly 
used
 in new code in flight between master..pu), changing its signature is 
painful
@@ -13,16 +13,14 @@
 e675765235 (diff.c: remove implicit dependency on the_index, 
2018-09-21)
 
 Add a coccinelle patch to convert existing callers, but do not apply
-the resulting patch from 'make coccicheck' to keep the diff of this
-patch small.
+the resulting patch to keep the diff of this patch small.
 
 Signed-off-by: Stefan Beller 
-Signed-off-by: Junio C Hamano

[PATCH 01/23] sha1_file: allow read_object to read objects in arbitrary repositories

2018-11-13 Thread Stefan Beller

Allow read_object (a file local functon in sha1_file) to
handle arbitrary repositories by passing the repository down
to oid_object_info_extended.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 sha1-file.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/sha1-file.c b/sha1-file.c
index dd0b6aa873..b8ce21cbaf 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1361,7 +1361,9 @@ int oid_object_info(struct repository *r,
return type;
 }
 
-static void *read_object(const unsigned char *sha1, enum object_type *type,
+static void *read_object(struct repository *r,
+const unsigned char *sha1,
+enum object_type *type,
 unsigned long *size)
 {
struct object_id oid;
@@ -1373,7 +1375,7 @@ static void *read_object(const unsigned char *sha1, enum 
object_type *type,
 
hashcpy(oid.hash, sha1);
 
-   if (oid_object_info_extended(the_repository, , , 0) < 0)
+   if (oid_object_info_extended(r, , , 0) < 0)
return NULL;
return content;
 }
@@ -1414,7 +1416,7 @@ void *read_object_file_extended(const struct object_id 
*oid,
lookup_replace_object(the_repository, oid) : oid;
 
errno = 0;
-   data = read_object(repl->hash, type, size);
+   data = read_object(the_repository, repl->hash, type, size);
if (data)
return data;
 
@@ -1755,7 +1757,7 @@ int force_object_loose(const struct object_id *oid, 
time_t mtime)
 
if (has_loose_object(oid))
return 0;
-   buf = read_object(oid->hash, , );
+   buf = read_object(the_repository, oid->hash, , );
if (!buf)
return error(_("cannot read sha1_file for %s"), 
oid_to_hex(oid));
hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", type_name(type), len) + 
1;
-- 
2.19.1.1215.g8438c0b245-goog

Re: [PATCH v5 02/12] sha1-file: provide functions to look up hash algorithms

2018-11-13 Thread Ramsay Jones




On 13/11/2018 18:42, Derrick Stolee wrote:
> On 11/4/2018 6:44 PM, brian m. carlson wrote:
>> +int hash_algo_by_name(const char *name)
>> +{
>> +    int i;
>> +    if (!name)
>> +    return GIT_HASH_UNKNOWN;
>> +    for (i = 1; i < GIT_HASH_NALGOS; i++)
>> +    if (!strcmp(name, hash_algos[i].name))
>> +    return i;
>> +    return GIT_HASH_UNKNOWN;
>> +}
>> +
> 
> Today's test coverage report [1] shows this method is not covered in the test 
> suite. Looking at 'pu', it doesn't have any callers.
> 
> Do you have a work in progress series that will use this? Could we add a 
> test-tool to exercise this somehow?

There are actually 4 unused external symbols resulting from Brian's
'bc/sha-256' branch. The new unused externals in 'pu' looks like:

$ diff nsc psc
37a38,39
> hex.o - hash_to_hex
> hex.o - hash_to_hex_algop_r
48a51
> parse-options.o   - optname
71a75
> sha1-file.o   - for_each_file_in_obj_subdir
72a77,78
> sha1-file.o   - hash_algo_by_id
> sha1-file.o   - hash_algo_by_name
$ 

The symbols from hex.o and sha1-file.o being the 4 symbols from
this branch.

I suspect that upcoming patches will make use of them. ;-)

ATB,
Ramsay Jones

Re: rebase-in-C stability for 2.20

2018-11-13 Thread Stefan Beller

> But maybe I'm being overly paranoid. What do those more familiar with
> this think?

I am not too worried,
* as rebase is a main porcelain, that is even hard to use in a script.
  so any failures are not deep down in some automation,
  but when found exposed quickly (and hopefully reported).
* 5541bd5b8f was merged to next a month ago; internally we
   distribute the next branch to Googlers (on a weekly basis)
   and we have not had any bug reports regarding rebase.
   (Maybe our environment is too strict for the wide range
of bugs reported)
* Johannes reported that the rebase is used in GfW
   
https://public-inbox.org/git/nycvar.qro.7.76.6.1808241320540...@tvgsbejvaqbjf.bet/
   https://github.com/git-for-windows/git/pull/1800
   and from my cursory reading it is part of
   2.19.windows, which has a large user base.

> (and would re-enable rebase.useBuiltin=true in
> master right after 2.20 is out the door).

That would be fine with me as well, but I'd rather
document rebase.useBuiltin instead of flip-flopping
the switch around the release.

Have there been any fixes that are only in
the C version (has the shell version already bitrotted)?

Stefan

Re: [PATCH 02/10] git-fast-export.txt: clarify misleading documentation about rev-list args

2018-11-13 Thread Elijah Newren

On Tue, Nov 13, 2018 at 3:39 PM Jonathan Nieder  wrote:
> Elijah Newren wrote:
> > Actually, no, it actually needs to be inconsistent.
> >
> > Different Input Choices (neither backslashed, both backslashed, then just 
> > one):
> >   master~9 and master~10
> >   master\~9 and master\~10
> >   master\~9 and master~10
> >
> > What the outputs look like:
> >   master9 and master10
> >   master~9 and master\~10
> >   master~9 and master~10
> >
> > I have no idea why asciidoc behaves this way, but it appears my
> > backslash escaping of just one of the two was necessary.
>
> {tilde} should work consistently.

Indeed it does (well, outside of `backtick blocks`); thanks for the tip.

Re: git-scm.com GUI client links

2018-11-13 Thread Jonathan Nieder

+git@vger.kernel.org, git-secur...@googlegroups.com -> bcc
Paul J Sanchez wrote:
> On Nov 13, 2018, at 2:25 PM, Stefan Beller  wrote:

>> The link seems to be https://aurees.com/ ?
>>
>> They seem to have
>> https://aurees.com/legal/license-agreement
>> which is a hilarious read:
>>
>> You agree that each and every e-mail address, which You use during
>> registration or to commit changes into a Git repository, is
>> automatically sent to and stored by Nezaboodka for verification
>> purposes;
>> You agree with Nezaboodka's and their partners' advertising to be
>> shown by the Software and to be sent to Your registration e-mail;
>> You may neither disable nor block automatic updates of the Software;
>> You may neither disable nor block sending of anonymous usage
>> statistics to Nezaboodka;
>> You may download, install and any number of copies of the Software
>> registered under Free License;
>>
>> Further:
>>   The Agreement is a public agreement (offer) as defined by the law
>> of Republic of Belarus (article 396 of Civil Code of Republic of
>> Belarus). This Agreement is governed by the laws of Republic of
>> Belarus
>>
>> ... I did not know English is an official language in Belarus.
>
> I saw the link on git-scm.com.com.  You’re correct, the site is
> https://aurees.com.com/>.
>
> And no, I hadn’t yet gotten as far as the license-agreement.  Egad!
> Total show-stopper.
>
> After seeing the licensing terms, I’d agree with Sophos.  Software
> which harvests my e-mail addresses and usage data and has
> autoupdates which cannot be disabled or blocked qualifies as malware
> in my opinion.

1 2 3 >

1 - 100 of 201 matches

Mail list logo