date:20180829

Re: Automatic core.autocrlf?

2018-08-29 Thread Jonathan Nieder

Hi,

Robert Dailey wrote:

> Is there an 'auto' setting for the 'core.autocrlf' config? Reason I
> ask is, I want that setting to be 'input' on linux but 'true' on
> Windows.

Others are exploring your question about the configuration language,
but I want to emphasize some other ramifications.

Why do we still have 'core.autocrlf'?  Do 'core.eol' and related
settings take care of that need, or is autocrlf still needed?  If
core.eol etc do not take care of this need, what should we do to get
them to?

Thanks, after having run into a few too many autocrlf-related messes,
Jonathan

Re: Thank you for public-inbox!

2018-08-29 Thread Jonathan Nieder

Jeff King wrote:

> I guess I just wonder if I set up a mirror on another domain, would
> anybody actually _use_ it? I'd think most people would just go to
> public-inbox.org as the de facto URL.

If it's faster than public-inbox.org and you don't mind the traffic I
would send, then I'll use it. :)

[...]
> That would be neat, but I think it actually makes references less useful
> in a lot of cases. URLs are universally understood, which means:
>
>  - people who don't know about public-inbox can just follow the link
>(and in fact, that's how they learn how useful it is!)

I agree: please don't stop using URLs.

Having the message-id in the URL is very useful for being able to
migrate to another server.

In an ideal world, we'd be habitually using URLs that reference a
redirector, in the same spirit as https://www.kernel.org/lore.html.
That way, there is a very restricted syntax to use (e.g. just
message-ids) that is stable and can be reconfigured to redirect to
another service as appropriate.  In principle it also allows tricks
like redirecting based on geography or making the redirector go to the
user's preferred archive interface based on a cookie.

Thanks,
Jonathan

Re: Thank you for public-inbox!

2018-08-29 Thread Jeff King

On Wed, Aug 29, 2018 at 10:02:43AM +, Eric Wong wrote:

> Jeff King  wrote:
> > I've thought about mirroring it to a public server as well, just for
> > redundancy. But without the same domain, I'm not sure it would be all
> > that useful as a community resource.
> 
> I wouldn't get too attached to the domain, "public-inbox.org" is
> too long for my tastes anyways.  "peff.net/git/$MESSAGE_ID"
> would actually be more user-friendly :>
> 
> A generic Message-ID redirection/finding service would be good,
> (maybe some DHT thing, but... has that taken off for git blobs, yet?)

Yes, and I agree that the URL portability is one of the things I really
love about public-inbox (after all, I do have my own archive and now I
can follow people's public-inbox links into my very-fast local copy).

I guess I just wonder if I set up a mirror on another domain, would
anybody actually _use_ it? I'd think most people would just go to
public-inbox.org as the de facto URL.

> Anyways I hope to teach public-inbox to auto-linkify Message-ID-looking
> strings "" into URLs for domain-portability,
> (but it's ambiguous with email addresses).  But yeah, I don't
> like things being tied to domain names.

That would be neat, but I think it actually makes references less useful
in a lot of cases. URLs are universally understood, which means:

 - people who don't know about public-inbox can just follow the link
   (and in fact, that's how they learn how useful it is!)

 - even for people who do know about it, they are likely to read mails
   in their MUA. And most MUAs have some mechanism for easily following
   a URL, but won't know how to auto-linkify a message-id.

So I too dream of a world where I can say "give me more information on
this identifier" and my tools search a peer to peer distributed hash
table for it. But I don't think we live in that world yet.

At the very least, I think if we plan to reference without an http URL
that we would use something like URI-ish, like . That gives
tools a better chance to say "OK, I know how to find message-ids"
(though I still think that it's much less helpful out of the box
compared to an http URL).

-Peff

Re: Contributor Summit planning

2018-08-29 Thread Jeff King

On Wed, Aug 29, 2018 at 04:46:29PM +0200, Johannes Schindelin wrote:

> On Wed, 29 Aug 2018, Jeff King wrote:
> 
> > On Mon, Aug 27, 2018 at 03:34:16PM +0200, Johannes Schindelin wrote:
> > 
> > > Rather than have a "hack day", I would actually prefer to work with
> > > other contributors in a way that we have not done before, but which I
> > > had the pleasure of "test ballooning" with Pratik: using Visual Studio
> > > Code Live Share. This allows multiple users to work on the same code
> > > base, in the same worktree, seeing what each other is doing. It
> > > requires a separate communication channel to talk; Pratik & I used
> > > IRC, but I think Google Hangout (or Skype or WhatsApp or
> > > ) would have worked a bit better. It's
> > > kind of pair programming, but with some of the limitations removed.
> > 
> > OK, I said in my earlier email that I would give any scheme a try, but I
> > really don't like pair programming. ;)
> 
> Tastes do differ, and that's okay.
> 
> I found it pretty invaluable a tool for intense 1:1 mentoring. It
> definitely helps a lot with getting over the initial hurdles.

Actually, I would agree that mentoring is probably one of the best uses
for pairing like that. In retrospect, I think we should have done more
real-time collaboration during Outreachy last winter.  It's still not my
taste, but I think we could have accomplished more overall.

Not that I need to convince you, but just seconding your advice for the
benefit of people who are thinking about mentoring in the future.

-Peff

Re: Missing Tagger Entry

2018-08-29 Thread Stephen & Linda Smith

On Wednesday, August 29, 2018 7:43:05 PM MST Ramsay Jones wrote:
> 
> Hope this helps.
> 
> ATB,
> Ramsay Jones

I was working on the patch for wt-status.c and thought I screwed up my git 
database.  So I ran fsck and ran into the tag issue.

Thanks
sps

Re: Git in Outreachy Dec-Mar?

2018-08-29 Thread Jeff King

On Wed, Aug 29, 2018 at 03:12:37PM +0200, Ævar Arnfjörð Bjarmason wrote:

> >   2. To get our landing page and list of projects in order (and also
> >  micro-projects for applicants). This can probably build on the
> >  previous round at:
> >
> >https://git.github.io/Outreachy-15/
> >
> >  and on the project/microprojects lists for GSoC (which will need
> >  some updating and culling).
> [...]
> I just have a "yes" to the first one of those. Which tells you how much
> skin I have in the game (and how much you should(n't) listen to me) :)

Yes, if nobody steps up to do 2, then it won't happen. :)

For myself, I don't think I have time to commit to mentoring this round.
And IMHO the people signing up to mentor should be the ones contributing
to the project list (since they will ultimately be on the hook for
working on those projects with the intern).

> Just a question: It seems to me that #1 and #2 is not tied up to the
> Outreachy process. I agree that finding a qualified intern to work on
> Git would be a good use of project funds.
> 
> What's not clear to me is if/how tied up this needs to be to a specific
> external program such as Outreachy. I.e. do we as a project need to go
> through that organization, or can that be just one of the ways in which
> we send out a call for interns?

It doesn't need to be. As far as I know, the main reasons (from the
perspective of a project) to do it through Outreachy are:

 - being part of a larger program generates attention and gets the
   interest of intern candidates (free advertising, if you will)

 - Outreachy handles payment, invoicing for external funds, and any
   legal stuff

 - it's possibly easier to external funding if it's earmarked for a
   program like Outreachy, since that program provides a framework with
   particular goals, conditions, oversight, etc.

I think there's some general value in having a group, too. Because there
are many interns all participating at the same time, they can offer each
other support or advice, show off their work to each other via blog
posts, etc. And it may be easier for them to communicate about their
accomplishments and status for future work, since it's part of an
established program that can easily be explained.

As for reasons _not_ to do it, I don't think the requirements are particularly
onerous. Mostly it's:

  - it has to happen at a specific time, which might not be convenient
for mentors or interns (last year I found it hard to get focused
starting in December, with all of the holidays)

  - it naturally limits the candidate pool to under-represented groups
(which is the whole point of the program, but if you don't
actually care about that, then it's just a complication)

So IMHO it's easily worth the trouble.

> With GSoC we don't have a choice in the matter, since Google's paying
> the bills and runs the show, but it sounds like in this case we at least
> partially do.

I think that the autonomy and level of responsibility for the
mentors/project is about the same between GSoC and Outreachy. The main
difference is just the funding model (but again, I suspect we would not
have too much trouble securing external funding).

-Peff

Re: Possible bug: identical lines added/removed in git diff

2018-08-29 Thread Jeff King

On Wed, Aug 29, 2018 at 10:10:25PM -0400, Gabriel Holodak wrote:

> > Could you cut down to a real minimal reproduction, i.e. just these 20
> > lines or so?
> 
> I'm working on getting down to a minimal reproduction, a few lines at
> a time. One thing that seems strange: as I've removed lines, there are
> a bunch of lines that don't matter. Then I'll find some lines that, if
> removed, completely fix the issue. But the ordering for these
> apparently important lines doesn't matter. They just have to be
> somewhere in the file to cause the duplicated diffs.
> 
> I'll upload again when I've figured out all the unimportant lines to remove.

Yeah, I reproduced based on your initial post, but noticed that when I
cut it down the problem went away.

An easy and pretty mechanical reproduction is:

  git diff --no-index unitera_bold_italic.bdf.{old,new} |
  sed -ne '/STARTCHAR U+00F0/,/ENDCHAR/p'

which shows a hunk that could easily be reduced by its first line
("DWIDTH 8 0"), and which has a common line in the middle of -/+ run.
But if we cut it down to the lines in that hunk, like this:

  for i in unitera_bold_italic.bdf.{old,new}; do
sed -ne '/STARTCHAR U+00F0/,/ENDCHAR/p' <$i >$i.cut
  done
  git diff --no-index unitera_bold_italic.bdf.{old,new}.cut

then those two lines become context.

I note also that GNU "diff -u" gets this case right.

> > Do you have any smudge filters or configuration regarding
> > line endings?
> 
> No filters, I did have core.autocrlf = input. But as I mentioned, I
> can also reproduce with an empty config.

Me too. Amusingly, if you have diff.colorMoved configured, the context
lines appear as moves, showing that we really do know they're the same
(but that happens as post-diff processing, so I am not at all surprised
that it is orthogonal to the original issue).

-Peff

Re: Missing Tagger Entry

2018-08-29 Thread Ramsay Jones




On 30/08/18 02:56, Stephen & Linda Smith wrote:
> I am getting the following warning when runing a git fsck command against tag 
> v0.99.

Yes, that is expected.

> 
> $ git --version
> git version 2.18.0
> 
> $ git fsck
> checking object directories: 100% (256/256), done.
> warning in tag d6602ec5194c87b0fc87103ca4d67251c76f233a: missingTaggerEntry: 
> invalid format - expected 'tagger' line
> Checking objects: 100% (254339/254339), done.
> Checking connectivity: 254329, done.

This tag is so old that it dates to before a change in
the tag object format - in particular, before the 'tagger'
line was included. Viz:

  $ git cat-file tag v0.99
  object a3eb250f996bf5e12376ec88622c4ccaabf20ea8
  type commit
  tag v0.99
  
  Test-release for wider distribution.
  
  I'll make the first public RPM's etc, thus the tag.
  -BEGIN PGP SIGNATURE-
  Version: GnuPG v1.4.1 (GNU/Linux)
  
  iD8DBQBC0b9oF3YsRnbiHLsRAlUUAKCJEyvw8tewGFKd/A3aCd82Wi/zAgCgl7z4
  GYPjO+Xio0IvuEYsrhFc2KI=
  =TVVN
  -END PGP SIGNATURE-
  $ 

Note the lack of a 'tagger' line, unlike a more up-to-date tag:

  $ git cat-file tag v2.18.0
  object 53f9a3e157dbbc901a02ac2c73346d375e24978c
  type commit
  tag v2.18.0
  tagger Junio C Hamano  1529600438 -0700
  
  Git 2.18
  -BEGIN PGP SIGNATURE-
  
  iQIzBAABCAAdFiEE4fA2sf7nIh/HeOzvsLXohpav5ssFAlsr2bYACgkQsLXohpav
  5suqEhAAgDu2A1n9G7ik+HdKoH2VNGwDqaRu/3k8znLPR6NmcOpHqopCgaxPYN4T
  gH69ff+8Le8NiOYcoWaOE2WdpGGY9Gu12N65MpxYbEhehEGo7ze4T8jDNlHz7q5B
  XC55FKHAwqy51NtdzvqNgsptc3bASy+ThxNM5XS0GSeqz00ublquHhiGTzhkBKm2
  KbexWhGWjzq0zP+wOrRIX4zU1lAOHXzjVV7G8vo3pTcg+GgK0BmiAz8zmOlef2au
  SYlU2LJCcQFm12j7pdDx42qCfZYM3QB0vJkHAcEdKYlcSEKRYUdOEnIQHxHwPPvB
  A/uogytfeExnpBd/aHA/YBKlr8FNBMZeDKGHiwxWsBK5yExxfelIFnOg27YBIxl2
  zzbMnHubBqHs5luo2Yv9JmFCbmuqV6ei6qgDKn2BXtJkuXVqYI1FYuKQyO26b3cz
  C6hF5n3OIixL0wv1S+44QqDEc/ss8kvqosT2Ypjd56dNeZripTe3jC+bqUouHblD
  NGaUn+V2YGBKc3rPw1UE3WnXgqOcbyvxn8AoZIKhJveaq7z89CbcvQYpqNjGhmrp
  OvqSVG3NUoOKGXiMAg4/a4wx6JWTyu5SLHY269tC3cPfxQkD3br6hMsBy+AXrCwq
  5yk6A3kQ2d6S9QfbWr6PGT7FI/AhG9CftFXPjpF0h9W9xbPJfkE=
  =ResM
  -END PGP SIGNATURE-
  $ 

You can suppress this warning by using an fsck 'skiplist':

  1) Add the following lines to your .git/config file:
[fsck]
skiplist = .git/skip

  2) Add the object-id of the v0.99 tag to the skiplist file:

$ echo d6602ec5194c87b0fc87103ca4d67251c76f233a >.git/skip

Hope this helps.

ATB,
Ramsay Jones

ASSIST ME TO GET THIS MONEY PLEASE

2018-08-29 Thread Dr Bartholomew Caleb

Good day my dear friend,

Let me start by introducing myself. I am  Bartholomew Caleb, an accounts 
officer with Bank of Africa here in Burkina Faso West Africa.

I am writing you this letter based on the latest development at my bank whichI 
will like to bring to your personal edification. ($9million) transfer claim 
into your bank account.

Pleaded,I need your help here, do reply for more details on how we are going to 
proceed if you are intrested in this grateful oppurtunity.

Get back to me at my private email address at, drbartholomewcaleb...@gmail.com
 

Thanks,
Dr Bartholomew Caleb,
+22663382485.

Re: [RFC PATCH 00/12] Base SHA-256 algorithm implementation

2018-08-29 Thread brian m. carlson

On Thu, Aug 30, 2018 at 02:21:51AM +, brian m. carlson wrote:
> On Wed, Aug 29, 2018 at 11:37:25AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> > It seems to me that aside from t/helper/test-hash-speed.c and
> > t/t0014-hash.sh everything being added here modifies existing files with
> > many authors, and would thus also need their permission to re-license as
> > anything except GPLv2.
> > 
> > Or do you mean whatever fixes/changes you did to libtomcrypt (living in
> > sha256/block/ in this series) you consider e.g. LPGL instead of GPL?
> 
> Yes, that's what I mean, specifically the code in sha256/block.  libgit2
> is GPLv2 with a linking exception, I believe, but either way I'd be fine
> with it.
> 
> It wasn't my intention to offer code I didn't wholly author, so thanks
> for clarifying.

I should clarify further: I obviously didn't wholly author the code.  I
do, however, have permission to license it under alternative terms,
since the original was public domain.  I apologize for my
mischaracterization of the situation.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: [RFC PATCH 08/12] commit-graph: convert to using the_hash_algo

2018-08-29 Thread brian m. carlson

On Wed, Aug 29, 2018 at 08:41:36AM -0400, Derrick Stolee wrote:
> On 8/28/2018 8:58 PM, brian m. carlson wrote:
> > Instead of using hard-coded constants for object sizes, use
> > the_hash_algo to look them up.  In addition, use a function call to look
> > up the object ID version and produce the correct value.
> 
> The C code in this patch looks good to me. The only issue is that I predict
> failure in the 'git commit-graph verify' tests in t5318-commit-graph.sh.
> Squashing in this commit should help (assuming that test_oid works, it
> doesn't at my current branch):

Yeah, this is a separate series not based on the other one.  If I
finally submit this after the other series lands, I'll squash that
change in.

Thanks for the patch.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: [RFC PATCH 00/12] Base SHA-256 algorithm implementation

2018-08-29 Thread brian m. carlson

On Wed, Aug 29, 2018 at 11:37:25AM +0200, Ævar Arnfjörð Bjarmason wrote:

> It seems to me that aside from t/helper/test-hash-speed.c and
> t/t0014-hash.sh everything being added here modifies existing files with
> many authors, and would thus also need their permission to re-license as
> anything except GPLv2.
> 
> Or do you mean whatever fixes/changes you did to libtomcrypt (living in
> sha256/block/ in this series) you consider e.g. LPGL instead of GPL?

Yes, that's what I mean, specifically the code in sha256/block.  libgit2
is GPLv2 with a linking exception, I believe, but either way I'd be fine
with it.

It wasn't my intention to offer code I didn't wholly author, so thanks
for clarifying.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: Possible bug: identical lines added/removed in git diff

2018-08-29 Thread Gabriel Holodak

I did have some possibly interfering settings in my .gitconfig
previously. I removed everything, so all the commands I'll describe
were run with an empty config.

On Mon, Aug 27, 2018 at 1:47 PM Stefan Beller  wrote:
> I suspected you might have a different diff algorithm configured,
> so I tested
> git diff --no-index old new
> git diff --patience --no-index old new
> git diff --histogram --no-index old new
>
> all of which do not reproduce the issue.

I don't believe I had any settings to change the algorithm. Using any
of --minimal, --patience, or --histogram seems to fix the issue. It
still occurs with --diff-algorithm=default. I also tried
--no-indent-heuristic, which had no effect.

> Are there any encoding issues locally (Which platform
> are you on?) or in transit (Could you re-download the files
> from [1] and confirm it still produces this bug?)
>
> [1] 
> https://public-inbox.org/git/CAE6=wb_4_phjfqpubfcyknkejfdr22s-y0npqkw5yd4gvan...@mail.gmail.com/

I should have mentioned before, this is on Arch Linux. I can reproduce
from the downloaded files. I can also reproduce on macOS 10.13.6, with
git 2.18.0.

> Could you cut down to a real minimal reproduction, i.e. just these 20
> lines or so?

I'm working on getting down to a minimal reproduction, a few lines at
a time. One thing that seems strange: as I've removed lines, there are
a bunch of lines that don't matter. Then I'll find some lines that, if
removed, completely fix the issue. But the ordering for these
apparently important lines doesn't matter. They just have to be
somewhere in the file to cause the duplicated diffs.

I'll upload again when I've figured out all the unimportant lines to remove.

> Do you have any smudge filters or configuration regarding
> line endings?

No filters, I did have core.autocrlf = input. But as I mentioned, I
can also reproduce with an empty config.

> Are the lines really different or the same ? (Can you inspect with a
> hex editor, maybe there are different kinds of invisible white spaces?)

Yep, the lines in question are identical.

Thanks for looking into this.
Gabriel

Expecting Response.

2018-08-29 Thread Mr. Sean Kimasala.

Hello my dear.

I have been expecting your response based on the email I sent you few days ago. 
 Please, study my mail and respond back to me as the matter is becoming late.  
Expecting your urgent response.

Yours Sincerely,

Mr. Kimasala.

Missing Tagger Entry

2018-08-29 Thread Stephen & Linda Smith

I am getting the following warning when runing a git fsck command against tag 
v0.99.

$ git --version
git version 2.18.0

$ git fsck
checking object directories: 100% (256/256), done.
warning in tag d6602ec5194c87b0fc87103ca4d67251c76f233a: missingTaggerEntry: 
invalid format - expected 'tagger' line
Checking objects: 100% (254339/254339), done.
Checking connectivity: 254329, done.

improved diff tool

2018-08-29 Thread Piers Titus van der Torren

Dear git people,
I've created a diff algorithm that focuses on creating readable diffs,
see https://github.com/pierstitus/klondiff

The git integration as an external diff command works quite well,
though it would be nice to integrate it deeper in git, and also in
git-gui and gitk. Any way to use ext-diff from the gui tools?

Is there interest to incorporate this algorithm in the main git
codebase? And if so, any hints on how to proceed?

Piers

Re: [RFC PATCH 09/12] Add a base implementation of SHA-256 support

2018-08-29 Thread brian m. carlson

On Wed, Aug 29, 2018 at 11:32:08AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Aug 29 2018, brian m. carlson wrote:
> 
> > SHA-1 is weak and we need to transition to a new hash function.  For
> > some time, we have referred to this new function as NewHash.
> >
> > The selection criteria for NewHash specify that it should (a) be 256
> > bits in length, (b) have high quality implementations available, (c)
> > should match Git's needs in terms of security, and (d) ideally, be fast
> > to compute.
> >
> > SHA-256 has a variety of high quality implementations across various
> > libraries.  It is implemented by every cryptographic library we support
> > and is available on every platform and in almost every programming
> > language.  It is often highly optimized, since it is commonly used in
> > TLS and elsewhere.  Additionally, there are various command line
> > utilities that implement it, which is useful for educational and testing
> > purposes.
> >
> > SHA-256 is presently considered secure and has received a reasonable
> > amount of cryptanalysis in the literature.  It is, admittedly, not
> > resistant to length extension attacks, but Git object storage is immune
> > to those due to the length field at the beginning.
> >
> > SHA-256 is somewhat slower to compute than SHA-1 in software.  However,
> > since our default SHA-1 implementation is collision-detecting, a
> > reasonable cryptographic library implementation of SHA-256 will actually
> > be faster than SHA-256.  In addition, modern ARM and AMD processors (and
> > some Intel processors) contain instructions for implementing SHA-256 in
> > hardware, making it the fastest possible option.
> >
> > There are other reasons to select SHA-256.  With signed commits and
> > tags, it's possible to use SHA-256 for signatures and therefore have to
> > rely on only one hash algorithm for security.
> 
> None of this is wrong, but I think this would be better off as a simple
> "See Documentation/technical/hash-function-transition.txt for why we're
> switching to SHA-256", and to the extent that something is said here
> that isn't said there it could be a patch to amend that document.

I can certainly shorten this somewhat.  I wrote this back when there
wasn't a consensus on hash algorithm and Junio was going to leave it to
me to make a decision.  I was therefore obligated to provide a coherent
rationale for that decision.

> > Add a basic implementation of SHA-256 based off libtomcrypt, which is in
> > the public domain.  Optimize it and tidy it somewhat.
> 
> For future changes & maintenance of this, let's do that in two
> steps. One where we add the upstream code as-is, and another where the
> tidying / cleanup / git specific stuff is wired, which makes it easy to
> audit upstream as-is v.s. our changes in isolation. Also in the first of
> those commits, say in the commit message "add a [libtomcrypt] copy from
> such-and-such a URL at such-and-such a version", so that it's easy to
> reproduce the import & find out how to re-update it.

Doing what you suggest basically means importing a large amount of
libtomcrypt into our codebase, since there are a large number of reused
macros all over libtomcrypt (such as for processing a generic hash and
for memcpy).

This isn't surprising for a general purpose crypto library, but I did a
decent amount to change it, condense it into a small number of files,
and make it meet our code standards.  The phrase "somewhat" may have
been an understatement.

This is also why I added tests: because I'm human and making a small
change in a crypto library can result in wrong output very quickly.

> Is this something we see ourselves perma-forking? Or as with sha1dc are
> we likely to pull in upstream changes from time-to-time?SHA256 obiously
> isn't under active development, but there's been some churn in the
> upstream code since it was added, and if you're doing some optimizing /
> tidying that's presumably something upstream could benefit from as well,
> as well as just us being nicer open source citizens feeding
> e.g. portability fixes to upstream (since git tends to get ported a
> lot).

This is a permafork.  We need a basic SHA-256 implementation, and this
one was faster than the one I'd written some time ago.  Similarly to the
block-sha1 implementation, I see this as something that we'll be
shipping forever with little updating.

I expect with the amount of changes we're making, they're unlikely to
want our code.  Also, any changes to our code would be under the GPLv2,
which would be unappealing to a public domain library.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: [PATCH 1/3] t3401: add another directory rename testcase for rebase and am

2018-08-29 Thread Elijah Newren

Hi Junio,

On Wed, Aug 29, 2018 at 3:12 PM Junio C Hamano  wrote:
>
> Elijah Newren  writes:
>
> > +test_expect_success 'rebase --interactive: NO directory rename' '
> > + test_when_finished "git -C no-dir-rename rebase --abort" &&
> > + (
> > + cd no-dir-rename &&
> > +
> > + git checkout B^0 &&
> > +
> > + set_fake_editor &&
> > + FAKE_LINES="1" test_must_fail git rebase --interactive A &&
>
> Is this a single-shot environment assignment?  That would have been
> caught with the test linter.

Ugh, yes.  Sorry.

I was trying to allow backporting to 2.18, so tried to build my series
on that...but moved forward slightly to en/rebase-consistency in order
to re-use the same testfile (as mentioned in my cover letter).  But
that meant my branch was missing a0a630192dca
("t/check-non-portable-shell: detect "FOO=bar shell_func"",
2018-07-13), so the test-linter couldn't catch this for me.

> Perhaps squashing this in would be sufficient fix?

Thanks for fixing it up.  That works...although now I've spotted
another minor issue.  The FAKE_LINES setting is only needed for the
interactive rebase case and can be removed for the other two.  Oops.
It doesn't hurt, but might confuse readers of the testcase.

Would you like me to resend a fixup on top of your fixup, resend the
whole series with the fixes squashed, or just ignore this final
(cosmetic) issue since it doesn't affect correctness and I've caused
you enough extra work already?

>  t/t3401-rebase-and-am-rename.sh | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/t/t3401-rebase-and-am-rename.sh b/t/t3401-rebase-and-am-rename.sh
> index 94bdfbd69c..13e09afec0 100755
> --- a/t/t3401-rebase-and-am-rename.sh
> +++ b/t/t3401-rebase-and-am-rename.sh
> @@ -141,7 +141,7 @@ test_expect_success 'rebase --interactive: NO directory 
> rename' '
> git checkout B^0 &&
>
> set_fake_editor &&
> -   FAKE_LINES="1" test_must_fail git rebase --interactive A &&
> +   test_must_fail env FAKE_LINES="1" git rebase --interactive A 
> &&
>
> git ls-files -s >out &&
> test_line_count = 6 out &&
> @@ -160,7 +160,7 @@ test_expect_success 'rebase (am): NO directory rename' '
> git checkout B^0 &&
>
> set_fake_editor &&
> -   FAKE_LINES="1" test_must_fail git rebase A &&
> +   test_must_fail env FAKE_LINES="1" git rebase A &&
>
> git ls-files -s >out &&
> test_line_count = 6 out &&
> @@ -179,7 +179,7 @@ test_expect_success 'rebase --merge: NO directory rename' 
> '
> git checkout B^0 &&
>
> set_fake_editor &&
> -   FAKE_LINES="1" test_must_fail git rebase --merge A &&
> +   test_must_fail env FAKE_LINES="1" git rebase --merge A &&
>
> git ls-files -s >out &&
> test_line_count = 6 out &&

Re: How is the ^{sha256} peel syntax supposed to work?

2018-08-29 Thread Jeff King

On Wed, Aug 29, 2018 at 01:46:23PM -0700, Jonathan Nieder wrote:

> > Can you elaborate on that?
> 
> What I'm saying is, regardless of the syntax used, as a user I *need*
> a way to look up $some_hash as a sha256-name, with zero risk of Git
> trying to outsmart me and treating $some_hash as a sha1-name instead.
> 
> Any design without that capability is a non-starter.

Right, this is IMHO the only thing that makes sense for ^{hash} to do:
it disambiguates the sha1 that you just gave it. Nothing more, nothing
less.

> > I.e. if I'm using this in a script I'd need:
> >
> > if x = git rev-parse $some_hash^{sha256}^{commit}
> > hash = x
> > elsif x = git rev-parse $some_hash^{sha1}^{commit}
> > hash = x
> > endif
> 
> Why wouldn't you use "git rev-parse $some_hash^{commit}" instead?

Yes, the sane rules seem to me to be:

  # try any available hash for $some_hash
  git rev-parse $some_hash

  # look _only_ for $some_hash as a sha1
  git rev-parse $some_hash^{sha1}

  # ditto for sha256
  git rev-parse $some_hash^{sha256}

  # ditto, but then peel the result to a commit
  git rev-parse $some_hash^{sha256}^{commit}

  # this is nonsense, and should produce an error
  git rev-parse $some_hash^{commit}^{sha256}

For convenience of scripts, we may also want:

  git rev-parse --input-hash=sha256 $some_hash

to pretend as if "^{sha256}" was appended to each command-line hash we
try to resolve (e.g., consider a case where a script is feeding 0 or
more hashes).

-Peff

Re: [RFC PATCH 10/12] sha256: add an SHA-256 implementation using libgcrypt

2018-08-29 Thread brian m. carlson

On Wed, Aug 29, 2018 at 10:53:01AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Aug 29 2018, brian m. carlson wrote:
> 
> > Generally, one gets better performance out of cryptographic routines
> > written in assembly than C, and this is also true for SHA-256
> 
> It makes sense to have a libgcrypt implementation...
> 
> > In addition, most Linux distributions cannot distribute Git linked
> > against OpenSSL for licensing reasons.
> 
> ...but I'm curious to know what licensing reasons these are, e.g. Debian
> who's usually the most strict about these things distributes git linked
> to OpenSSL:

On my Debian system, that's linked to libgnutls.

The reason is section 3 of the GPLv2 (emphasis mine):

  3. You may copy and distribute the Program (or a work based on it,
  under Section 2) in object code or executable form **under the terms
  of Sections 1 and 2 above** provided that you also do one of the
  following:

  [provide source somehow]

  The source code for a work means the preferred form of the work for
  making modifications to it.  For an executable work, complete source
  code means all the source code for all modules it contains, plus any
  associated interface definition files, plus the scripts used to
  control compilation and installation of the executable.  **However, as
  a special exception, the source code distributed need not include
  anything that is normally distributed (in either source or binary
  form) with the major components (compiler, kernel, and so on) of the
  operating system on which the executable runs, unless that component
  itself accompanies the executable.**

Basically, you can only distribute binary versions of Git under the
terms of the GPLv2, and you have to distribute source for the entire
thing under those terms.  OpenSSL is licensed incompatibly with the
GPLv2, so you can't legally comply with that part, but if you use the
system OpenSSL and don't distribute that OpenSSL with Git, you're
exempt.  This is called the system library exception.

Debian (and Red Hat, and every other Linux distro) ships Git and OpenSSL
side by side on the same mirrors, and hence "that component [OpenSSL]
accompanies the executable."  Consequently, they can't take advantage of
the exception, and must link it to a GPLv2 compatible library.  Debian
uses GnuTLS for libcurl, and Red Hat uses NSS.

A more comprehensive explanation of the whole thing is here:
https://people.gnome.org/~markmc/openssl-and-the-gpl.html
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

signature.asc
Description: PGP signature

Re: [RFC] revision: Don't let ^ cancel out the default

2018-08-29 Thread Jeff King

On Wed, Aug 29, 2018 at 02:03:34PM -0700, Junio C Hamano wrote:

> Andreas Gruenbacher  writes:
> 
> > Some commands like 'log' default to HEAD if no other revisions are
> > specified on the command line or otherwise.  Unfortunately, excludes
> > (^) cancel out that default, so when a command only excludes
> > revisions (e.g., 'git log ^origin/master'), the command will not produce
> > any result.
> >
> > If all the specified revisions are excludes, it seems more useful to
> > stick with the default revision instead.
> >
> > This makes writing wrappers that exclude certain revisions much easier:
> > for example, a simple alias l='git log ^origin/master' will show the
> > revisions between origin/master and HEAD by default, and 'l foo' will
> > show the revisions between origin/master and foo, as you would usually
> > expect.
> 
> That is a _huge_ departure from the behaviour established for the
> past 10 years, but it would certainly be more useful.
> 
> As long as we can prove that that a command line with only negative
> revs is absolutely useless, the backward incompatibility may be OK
> to swallow, especially for commands like "git log" that implicitly
> use "--default HEAD", as they are meant for human consumption, and
> not for scripts.
> 
> I am not offhand 100% sure that a rev list with only negative revs
> is totally useless, though.

Yeah, I'm uneasy that somebody is relying on the current behavior,
especially for scripting where we often do something like:

  git rev-list $new --not --all

and an empty $new really should return an empty result (that's our usual
connectivity check, but I'd imagine some pre-receive hooks may end up
doing similar things). For rev-list I think you'd have to specify
--default to trigger this behavior, so that helps. But it still makes me
nervous.

I'm _less_ uneasy with it for git-log, though I think people do have a
habit of scripting around it (because it knows some tricks that rev-list
doesn't).

Given that the mentioned use case was writing wrappers, could we hide
this behind:

  git config alias.l 'git log --default-on-negative ^origin/master'

That's obviously less convenient to type, but I think it would usually
be part of scripted calls (otherwise you'd know already that you have no
positive refs and would just add ".." or HEAD or whatever).

-Peff

Re: [PATCH v3 01/11] t: add tool to translate hash-related values

2018-08-29 Thread brian m. carlson

On Tue, Aug 28, 2018 at 09:05:48PM -0700, Jonathan Nieder wrote:
> >Add two additional helpers,
> > test_oid_cache, which can be used to load data for test_oid from
> > standard input, and test_oid_init, which can be used to load certain
> > fixed values from lookup charts.  Check that these functions work in
> > t, as the rest of the testsuite will soon come to depend on them.
> >
> > Implement two basic lookup charts, one for common invalid or synthesized
> > object IDs, and one for various facts about the hash function in use.
> > Provide versions for both SHA-1 and SHA-256.
> 
> What do test_oid_cache and test_oid_init do?  How can I use them?
> 
> Judging from t-basic.sh, the idea looks something like
> 
>   Add a test function helper, test_oid, that ...
> 
>   test_oid allows looking up arbitrary information about an object format:
>   the length of object ids, values of well known object ids, etc.  Before
>   calling it, a test script must invoke test_oid_cache (either directly
>   or indirectly through test_oid_init) to load the lookup charts.
> 
>   See t for an example, which also serves as a sanity-check that
>   these functions work in preparation for using them in the rest of the
>   test suite.
> 
>   There are two basic lookup charts for now: oid-info/oid, with common
>   invalid or synthesized object IDs; and oid-info/hash-info, with facts
>   such as object id length about the formats in use.  The charts contain
>   information about both SHA-1 and SHA-256.
> 
>   So now you can update existing tests to be format-independent by (1)
>   adding an invocation of test_oid_init to test setup and (2) replacing
>   format dependencies with $(test_oid foo).
> 
>   Since values are stored as shell variables, names used for lookup can
>   only consist of shell identifier characters.  If this is a problem in
>   the future, we can hash the names before use.
> 
>   Improved-by: Eric Sunshine 
> 
> Do these always use sha1 for now?  Ah, t answers but it might be
> worth mentioning in the commit message, too:
> 
>   test_set_hash allows setting which object format test_oid should look
>   up information for, and test_detect_hash returns to the default format.

I'll expand the commit message.  They do use SHA-1 for now, but I have a
branch that makes them use SHA-256 instead.

> [...]
> > --- /dev/null
> > +++ b/t/oid-info/hash-info
> > @@ -0,0 +1,8 @@
> > +rawsz sha1:20
> > +rawsz sha256:32
> 
> Can there be a README in this directory describing the files and format?

Sure, if you like.

> [...]
> > --- a/t/t-basic.sh
> > +++ b/t/t-basic.sh
> > @@ -821,6 +821,41 @@ test_expect_success 'tests clean up even on failures' "
> > EOF
> >  "
> >  
> > +test_oid_init
> 
> Can this be wrapped in test_expect_success?  That way, if it fails or
> prints an error message then the usual test machinery would handle it.

Sure.

> > +
> > +test_expect_success 'test_oid provides sane info by default' '
> > +   test_oid zero >actual &&
> > +   grep "^00*$" actual &&
> 
> nit: can save the reader some confusion by escaping the $.

Good point.

> > +   rawsz="$(test_oid rawsz)" &&
> > +   hexsz="$(test_oid hexsz)" &&
> 
> optional: no need for these quotation marks --- a command substitution
> assigned to a shell variable is treated as if it were quoted.

That's good to know.  I will honestly say that looking through the Git
codebase and getting reviews on the list has taught me huge amounts
about the intricacies of shell.

> [...]
> > --- a/t/test-lib-functions.sh
> > +++ b/t/test-lib-functions.sh
> > @@ -1155,3 +1155,47 @@ depacketize () {
> [...]
> > +test_oid_cache () {
> > +   test -n "$test_hash_algo" || test_detect_hash
> 
> Should this use an uninterrupted &&-chain?

Yes.  Will fix.

> > +   while read _tag _rest
> 
> This appears to be the first use of this naming convention.  I wonder
> if we can use "local" instead.

We probably can.  There was a discussion about this elsewhere, and we
determined that it's probably safe, and if it's not, it should be
relatively easy to back out.

> > +   esac &&
> > +
> > +   _k="${_rest%:*}" &&
> > +   _v="${_rest#*:}" &&
> > +   { echo "$_k" | egrep '^[a-z0-9]+$' >/dev/null ||
> > +   error 'bug in the test script: bad hash algorithm'; } &&
> > +   eval "test_oid_${_k}_$_tag=\"\$_v\"" || return 1
> 
> This is dense, so I'm having trouble taking it in at a glance.
> 
> I think the idea is
> 
>   key=${rest%%:*} &&
>   val=${rest#*:} &&
> 
>   if ! expr "$key" : '[a-z0-9]*$' >/dev/null
>   then
>   error ...
>   fi &&
>   eval "test_oid_${key}_${tag}=\${val}"

Yes.  I take it that you think that's easier to read, so I'll rewrite it
that way.  I will admit a tendency to write code that is more

Re: [PATCH 2/3] merge-recursive: add ability to turn off directory rename detection

2018-08-29 Thread Elijah Newren

On Wed, Aug 29, 2018 at 5:54 AM Johannes Schindelin
 wrote:
>
> Hi Elijah,
>
> On Wed, 29 Aug 2018, Elijah Newren wrote:
>
> > Signed-off-by: Elijah Newren 
> > ---
> >  merge-recursive.c | 18 +-
> >  merge-recursive.h |  1 +
> >  2 files changed, 14 insertions(+), 5 deletions(-)
> >
> > diff --git a/merge-recursive.c b/merge-recursive.c
> > index f110e1c5ec..bf3cb03d3a 100644
> > --- a/merge-recursive.c
> > +++ b/merge-recursive.c
> > @@ -2843,12 +2843,19 @@ static int handle_renames(struct merge_options *o,
> >   head_pairs = get_diffpairs(o, common, head);
> >   merge_pairs = get_diffpairs(o, common, merge);
> >
> > - dir_re_head = get_directory_renames(head_pairs, head);
> > - dir_re_merge = get_directory_renames(merge_pairs, merge);
> > + if (o->detect_directory_renames) {
> > + dir_re_head = get_directory_renames(head_pairs, head);
> > + dir_re_merge = get_directory_renames(merge_pairs, merge);
> >
> > - handle_directory_level_conflicts(o,
> > -  dir_re_head, head,
> > -  dir_re_merge, merge);
> > + handle_directory_level_conflicts(o,
> > +  dir_re_head, head,
> > +  dir_re_merge, merge);
> > + } else {
> > + dir_re_head  = xmalloc(sizeof(*dir_re_head));
> > + dir_re_merge = xmalloc(sizeof(*dir_re_merge));
>
> This is not a suggestion to change anything, but a genuine question out of
> curiosity: would it make sense to put the `dir_re_head` and `dir_re_merge`
> structures into `struct merge_options` to avoid these extra `malloc()`s?
> Or would that cause issues with the recursive nature of the recursive
> merge?

That would work to avoid the extra `malloc()`s, and be inline with the
current usage of merge_options.  However, I'm not sure I like the
current usage of merge_options. That struct is supposed to be public
API, but it's got a lot of private internal-only use stuff (and
putting dir_re_head and dir_re_merge there would add more).  I'm
tempted to go the other way and eject some of the other internal-only
stuff from merge_options (or wrap it inside an opaque struct
merge_options_internal* internal field, or something like that).

Re: [PATCH v3 01/11] t: add tool to translate hash-related values

2018-08-29 Thread brian m. carlson

On Wed, Aug 29, 2018 at 08:37:48AM -0400, Derrick Stolee wrote:
> On 8/28/2018 8:56 PM, brian m. carlson wrote:
> > +   rawsz="$(test_oid rawsz)" &&
> > +   hexsz="$(test_oid hexsz)" &&
> 
> These are neat helpers! The 'git commit-graph verify' tests in
> t5318-commit-graph.sh should automatically work if we use these for HASH_LEN
> instead of 20. I'll use a similar pattern when writing 'git multi-pack-index
> verify'.

Thanks.  In the future, test_oid will learn about internal versus
external object names, since we'll have what's printed by the UI (which
might be SHA-1) and what's under the hood (which might be SHA-256).
However, it should be easy enough to update all those pplaces, so if you
use them now, it should be easy enough to update them in them in the
future to refer to the right thing.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: [PATCH v2 1/1] builtin rebase: prepare for builtin rebase -i

2018-08-29 Thread Junio C Hamano

"Johannes Schindelin via GitGitGadget" 
writes:

> From: Johannes Schindelin 
>
> The builtin rebase and the builtin interactive rebase have been
> developed independently, on purpose: Google Summer of Code rules
> specifically state that students have to work on independent projects,
> they cannot collaborate on the same project.

A much better description, especially without the less relevant "the
reason probably is..." omitted from here.  The author's personal
guess, while adding it does not help understanding what is already
in the above paragraph an iota, is still a fine reading material in
the cover letter 0/1, though.

> One fallout is that the rebase-in-c and rebase-i-in-c patches cause no
> merge conflicts but a royal number of tests in the test suite to fail.
>
> It is easy to explain why: rebase-in-c was developed under the
> assumption that all rebase backends are implemented in Unix shell script
> and can be sourced via `. git-rebase--`, which is no longer
> true with rebase-i-in-c, where git-rebase--interactive is a hard-linked
> builtin.
>
> This patch fixes that.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  builtin/rebase.c | 81 
>  1 file changed, 81 insertions(+)

Will replace by doing:

$ git checkout js/rebase-in-c-5.5-work-with-rebase-i-in-c
$ git checkout HEAD^
$ git am -s mbox
$ git range-diff @{-1}...
$ git checkout -B @{-1}

$ git checkout pk/rebase-i-in-c-6-final
$ git rebase --onto js/rebase-in-c-5.5-work-with-rebase-i-in-c \
  js/rebase-in-c-5.5-work-with-rebase-i-in-c@{1} HEAD^0
$ git range-diff @{-1}...
$ git checkout -B @{-1}

to update the two topics and then rebuilding the integration
branches the usual way.  I also need to replace the "other" topic
used in this topic, so the actual procedure would be a bit more
involved than the above, though.

Thanks.

Re: [PATCH] mailinfo: support format=flowed

2018-08-29 Thread Junio C Hamano

René Scharfe  writes:

> Add best-effort support for patches sent using format=flowed (RFC 3676).
> Remove leading spaces ("unstuff"), remove soft line breaks (indicated
> by space + newline), but leave the signature separator (dash dash space
> newline) alone.
>
> Warn in git am when encountering a format=flowed patch, because any
> trailing spaces would most probably be lost, as the sending MUA is
> encouraged to remove them when preparing the email.

The warning is a very good idea, but I wonder if it is loud enough
when mixed with other noise (e.g. "--whitespace=warn" for a short
series, or patch titles when applying a very long series).

Lossage of trailing spaces may even be a feature (just joking), when
the project policy makes it OK to use "am --whitespace=fix".  I
usually use "am --whitespace=fix" but I used "am --whitespace=warn"
while applying this patch to preserve lines that begin with "SP SP
HT" in the sample patch.

Re: [PATCH v2 2/3] read-cache: load cache extensions on worker thread

2018-08-29 Thread Junio C Hamano

Ben Peart  writes:

> There isn't any change in behavior with unknown extensions and this
> patch.  If an unknown extension exists it will just get ignored and
> reported as an "unknown extension" or "die" if it is marked as
> "required."

OK.

Re: [PATCH v2 1/2] rerere: mention caveat about unmatched conflict markers

2018-08-29 Thread Junio C Hamano

Thomas Gummerer  writes:

> Yeah that makes sense.  Maybe something like this?
>
> (replying to  here to keep
> the patches in one thread)
>
>  Documentation/technical/rerere.txt | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/Documentation/technical/rerere.txt 
> b/Documentation/technical/rerere.txt
> index e65ba9b0c6..8fefe51b00 100644
> --- a/Documentation/technical/rerere.txt
> +++ b/Documentation/technical/rerere.txt
> @@ -149,7 +149,10 @@ version, and the sorting the conflict hunks, both for 
> the outer and the
>  inner conflict.  This is done recursively, so any number of nested
>  conflicts can be handled.
>  
> +Note that this only works for conflict markers that "cleanly nest".  If
> +there are any unmatched conflict markers, rerere will fail to handle
> +the conflict and record a conflict resolution.
> +
>  The only difference is in how the conflict ID is calculated.  For the
>  inner conflict, the conflict markers themselves are not stripped out
>  before calculating the sha1.

Looks good to me except for the line count on the @@ line.  The
preimage ought to have 6 (not 7) lines and adding 4 new lines makes
it a 10 line postimage.  I wonder who miscounted the hunk---it is
immediately followed by the signature cut mark "-- \n" and some
tools (including Emacs's patch editing mode) are known to
misinterpret it as a preimage line that was removed.

What is curious is that your 2/2 counts the preimage lines
correctly.

In any case, both patches look good.  Will apply.

Thanks.

Re: [PATCH v3] doc: Don't echo sed command for manpage-base-url.xsl

2018-08-29 Thread Junio C Hamano

Jonathan Nieder  writes:

> Tim Schumacher wrote:
>
>> Subject: doc: Don't echo sed command for manpage-base-url.xsl
>
> Cribbing from my review of v2: a description like
>
>   Documentation/Makefile: make manpage-base-url.xsl generation quieter
>
> would make it more obvious what this does when viewed in "git log
> --oneline".

Sounds good; let's take it.

>> Previously, the sed command for generating manpage-base-url.xsl
>> was printed to the console when being run.

The convention is that we talk about the state before the current
series in question is applied in the present tense, so "previously"
is not needed.  Perhaps

The exact sed command to generate manpage-base-url.xsl appears in
the output, unlike the rules for other files that by default only
show summary.

is sufficient.  The output is not always going to "the console", and
it is not like we change behaviour depending on where the output is
going, so it is misleading to say "the console" (iow, the phrase "to
the console" has negative information density in the above
sentence).

>> Make the console output for this rule similiar to all the
>> other rules by printing a short status message instead of
>> the whole command.

Likewise, s/console //;

I'll all do the above tweaks while queueing.

Thanks, both.

>>
>> Signed-off-by: Tim Schumacher 
>> ---
>>  Documentation/Makefile | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> Oh!  Ignore my reply to v2; looks like you anticipated what I was
> going to suggest already.  For next time, if you include a note about
> what changed between versions after the --- delimiter, that can help
> save some time.
>
> With or without the suggested commit message tweak,
>
> Reviewed-by: Jonathan Nieder 
>
> Thank you.

Re: [RFC] revision: Don't let ^ cancel out the default

2018-08-29 Thread Junio C Hamano

Andreas Gruenbacher  writes:

> Some commands like 'log' default to HEAD if no other revisions are
> specified on the command line or otherwise.  Unfortunately, excludes
> (^) cancel out that default, so when a command only excludes
> revisions (e.g., 'git log ^origin/master'), the command will not produce
> any result.
>
> If all the specified revisions are excludes, it seems more useful to
> stick with the default revision instead.
>
> This makes writing wrappers that exclude certain revisions much easier:
> for example, a simple alias l='git log ^origin/master' will show the
> revisions between origin/master and HEAD by default, and 'l foo' will
> show the revisions between origin/master and foo, as you would usually
> expect.

That is a _huge_ departure from the behaviour established for the
past 10 years, but it would certainly be more useful.

As long as we can prove that that a command line with only negative
revs is absolutely useless, the backward incompatibility may be OK
to swallow, especially for commands like "git log" that implicitly
use "--default HEAD", as they are meant for human consumption, and
not for scripts.

I am not offhand 100% sure that a rev list with only negative revs
is totally useless, though.

Re: [PATCH] chainlint: match "quoted" here-doc tags

2018-08-29 Thread Junio C Hamano

Eric Sunshine  writes:

> This is an extract from v3 of es/chain-lint-more[1] which never got
> picked up. Instead, v2 of that series[2] got merged to 'next' and later
> 'master'. The only change between v2 and v3 was to make 'chainlint' also
> recognize double-quoted here-doc tags. This solo patch makes that change
> incrementally atop v2.

Thanks.

What's cooking in git.git (Aug 2018, #06; Wed, 29)

2018-08-29 Thread Junio C Hamano

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

Git 2.19-rc1 is out.  Hopefully the tip of 'master' is more or less
identical to the final one without needing much updates.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* ab/test-must-be-empty-for-master (2018-08-22) 1 commit
  (merged to 'next' on 2018-08-22 at 580dfd1024)
 + t6018-rev-list-glob: fix 'empty stdin' test
 (this branch is used by jk/rev-list-stdin-noop-is-ok.)

 Test fixes.


* ab/unconditional-free-and-null (2018-08-17) 1 commit
  (merged to 'next' on 2018-08-22 at 2661a3cb96)
 + refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)

 Code clean-up.


* ds/commit-graph-fsck (2018-08-23) 1 commit
  (merged to 'next' on 2018-08-23 at cb27eada82)
 + config: fix commit-graph related config docs

 Finishing touches to doc.


* ep/worktree-quiet-option (2018-08-17) 1 commit
  (merged to 'next' on 2018-08-22 at 4a0a85e907)
 + worktree: add --quiet option

 "git worktree" command learned "--quiet" option to make it less
 verbose.


* ja/i18n-message-fixes (2018-08-23) 1 commit
  (merged to 'next' on 2018-08-23 at 907c1f69a2)
 + i18n: fix mistakes in translated strings

 Messages fix.


* jk/hashcmp-optim-for-2.19 (2018-08-23) 1 commit
  (merged to 'next' on 2018-08-23 at e7e8abe0e3)
 + hashcmp: assert constant hash size

 Partially revert the support for multiple hash functions to regain
 hash comparison performance; we'd think of a way to do this better
 in the next cycle.


* jk/use-compat-util-in-test-tool (2018-08-21) 1 commit
  (merged to 'next' on 2018-08-22 at 98c3acd8df)
 + test-tool.h: include git-compat-util.h

 Dev tool update.


* js/larger-timestamps (2018-08-21) 1 commit
  (merged to 'next' on 2018-08-22 at 23a3d5bcf8)
 + commit: use timestamp_t for author_date_slab

 Portability fix.


* js/range-diff (2018-08-27) 1 commit
  (merged to 'next' on 2018-08-27 at ca29413b20)
 + range-diff: update stale summary of --no-dual-color

 Finishing touched to help string.


* nd/complete-config-vars (2018-08-21) 1 commit
  (merged to 'next' on 2018-08-22 at 758f937bef)
 + generate-cmdlist.sh: collect config from all config.txt files

 "git help --config" (which is used in command line completion)
 missed the configuration variables not described in the main
 config.txt file but are described in another file that is included
 by it, which has been corrected.


* nd/config-core-checkstat-doc (2018-08-17) 1 commit
  (merged to 'next' on 2018-08-22 at 3663026a41)
 + config.txt: clarify core.checkStat

 The meaning of the possible values the "core.checkStat"
 configuration variable can take were not adequately documented,
 which has been fixed.


* nd/pack-deltify-regression-fix (2018-07-23) 1 commit
  (merged to 'next' on 2018-08-02 at f3b2bf0fef)
 + pack-objects: fix performance issues on packing large deltas

 In a recent update in 2.18 era, "git pack-objects" started
 producing a larger than necessary packfiles by missing
 opportunities to use large deltas, which has been fixed.


* rs/opt-updates (2018-08-21) 3 commits
  (merged to 'next' on 2018-08-22 at 1703b9b489)
 + parseopt: group literal string alternatives in argument help
 + remote: improve argument help for add --mirror
 + checkout-index: improve argument help for --stage

 "git cmd -h" updates.


* sg/t0020-conversion-fix (2018-08-22) 1 commit
  (merged to 'next' on 2018-08-22 at 79508284c0)
 + t0020-crlf: check the right file

 Test fixes.


* sg/t3420-autostash-fix (2018-08-22) 1 commit
  (merged to 'next' on 2018-08-22 at 569cf2dc3d)
 + t3420-rebase-autostash: don't try to grep non-existing files

 Test fixes.


* sg/t3903-missing-fix (2018-08-22) 1 commit
  (merged to 'next' on 2018-08-22 at 22f1240ed5)
 + t3903-stash: don't try to grep non-existing file

 Test fixes.


* sg/t4051-fix (2018-08-22) 1 commit
  (merged to 'next' on 2018-08-22 at e1c9adcc14)
 + t4051-diff-function-context: read the right file

 Test fixes.


* sg/t7501-thinkofix (2018-08-22) 1 commit
  (merged to 'next' on 2018-08-22 at 69a9b472c7)
 + t7501-commit: drop silly command substitution

 Test fixes.


* sg/test-must-be-empty (2018-08-21) 4 commits
  (merged to 'next' on 2018-08-22 at a376680200)
 + tests: use 'test_must_be_empty' instead of 'test_cmp  '
 + tests: use 'test_must_be_empty' instead of 'test_cmp /dev/null '
 + tests: use 'test_must_be_empty' instead of 'test ! -s'
 + tests: use 'test_must_be_empty' instead of '! test -s'

 Test fixes.


* sg/test-rebase-editor-fix (2018-08-23) 1 commit
  (merged to 'next' on 2018-08-23 at 504538d005)
 + t/lib-rebase.sh: support explicit 'pick'

Re: [PATCH 3/3] t5303: add tests for corrupted deltas

2018-08-29 Thread Jeff King

On Wed, Aug 29, 2018 at 06:03:53PM -0400, Jeff King wrote:

> Without your second patch applied, this complains about mmap-ing
> /dev/null (or any zero-length file).
> 
> Also, \x escapes are sadly not portable (dash, for example, does not
> respect them). You have to use octal instead (which is not too onerous
> for these small numbers).
> 
> I needed the patch below to get it to behave as expected (I also
> annotated the deltas to make it more comprehensible to somebody who
> hasn't just been digging in the patch code ;) ).
> 
> I wonder if we should more fully test the 4 cases I outlined in my
> earlier mail, too.

So here's a version which checks each of those cases (plus your minimal
base-case and the trailing garbage case from your original).

I did it as a straight patch rather than on top of yours, since I think
that's easier to read (and I'd expect us to squash together whatever we
end up with anyway).

This doesn't cover out-of-bounds reads while parsing the offset/size.
It's dinner-time here, but I may take a look at that later tonight.

---
diff --git a/t/t5303-pack-corruption-resilience.sh 
b/t/t5303-pack-corruption-resilience.sh
index 3634e258f8..305922eeb3 100755
--- a/t/t5303-pack-corruption-resilience.sh
+++ b/t/t5303-pack-corruption-resilience.sh
@@ -311,4 +311,81 @@ test_expect_success \
  test_must_fail git cat-file blob $blob_2 > /dev/null &&
  test_must_fail git cat-file blob $blob_3 > /dev/null'
 
+# Base sanity check; this is the smallest possible delta.
+#
+# \5 - five bytes in base (though we do not use it)
+# \1 - one byte in result
+# \1 - copy one byte (X)
+test_expect_success \
+'apply good minimal delta' \
+'printf "\5\1\1X" > minimal_delta &&
+ echo base >base &&
+ test-tool delta -p base minimal_delta /dev/null
+ '
+
+# This delta has too many literal bytes to fit in destination.
+#
+# \5 - five bytes in base (though we do not use it)
+# \1 - 1 byte in result
+# \2 - copy two bytes (one too many)
+test_expect_success \
+'apply truncated delta' \
+'printf "\5\1\2XX" > too_big_literal &&
+ echo base >base &&
+ test_must_fail test-tool delta -p base too_big_literal /dev/null
+ '
+
+# This delta has too many copy bytes to fit in destination.
+#
+# \5 - five bytes in base
+# \1 - one byte in result
+# \221 - copy, one byte offset, one byte size
+#   \0 - copy from offset 0
+#   \2 - copy two bytes (one too many)
+test_expect_success \
+'apply delta with trailing garbage command' \
+'printf "\5\1\221\0\2" > too_big_copy &&
+ echo base >base &&
+ test_must_fail test-tool delta -p base too_big_copy /dev/null
+ '
+
+# This delta has too few bytes in the delta itself.
+#
+# \5 - five bytes in base (though we do not use it)
+# \2 - two bytes in result
+# \2 - copy two bytes (we are short one)
+test_expect_success \
+'apply truncated delta' \
+'printf "\5\2\2X" > truncated_delta &&
+ echo base >base &&
+ test_must_fail test-tool delta -p base truncated_delta /dev/null
+ '
+
+# This delta has too few bytes in the base.
+#
+# \5 - five bytes in base
+# \6 - six bytes in result
+# \221 - copy, one byte offset, one byte size
+#   \0 - copy from offset 0
+#   \6 - copy six bytes (one too many)
+test_expect_success \
+'apply delta with trailing garbage command' \
+'printf "\5\6\221\0\6" > truncated_base &&
+ echo base >base &&
+ test_must_fail test-tool delta -p base truncated_base /dev/null
+ '
+
+# Trailing garbage command.
+#
+# \5 - five bytes in base (though we do not use it)
+# \1 - one byte in result
+# \1 - copy one byte (X)
+# \1 - trailing garbage command
+test_expect_success \
+'apply delta with trailing garbage command' \
+'printf "\5\1\1X\1" > tail_garbage_delta &&
+ echo base >base &&
+ test_must_fail test-tool delta -p base tail_garbage_delta /dev/null
+ '
+
 test_done

Re: [PATCH 1/3] patch-delta: fix oob read

2018-08-29 Thread Jeff King

On Wed, Aug 29, 2018 at 05:20:25PM -0400, Jeff King wrote:

> Nice catch. The patch looks good to me, but just to lay out my thought
> process looking for other related problems:
> 
> We have two types of instructions:
> 
>   1. Take N bytes from position P within the source.
> 
>   2. Take the next N bytes from the delta stream.
> 
> In both cases we need to check that:
> 
>   a. We have enough space in the destination buffer.
> 
>   b. We have enough source bytes.

Actually, there's one other case to consider: reading the actual offset
for a copy operation. E.g., if we see '0x80' at the end of the delta,
I think we'd read garbage into cp_offset? That would typically generate
nonsense that would be caught by the other checks, but I think it's
possible that the memory could happen to produce a valid copy
instruction from the base, leading to a confusing result (though it
would still need to match the expected result size).

Thinking on it more, one other interesting tidbit here: in actual git
code (i.e., not test-delta), we'd always be reading from an mmap'd
packfile. And we're guaranteed to have at least 20 trailing bytes in the
mmap due to the pack format (which is enforced when we parse the pack).

So I don't think this can ever read truly out-of-bounds memory, though
obviously it's something we should fix regardless.

-Peff

Re: [PATCH 1/3] t3401: add another directory rename testcase for rebase and am

2018-08-29 Thread Junio C Hamano

Elijah Newren  writes:

> +test_expect_success 'rebase --interactive: NO directory rename' '
> + test_when_finished "git -C no-dir-rename rebase --abort" &&
> + (
> + cd no-dir-rename &&
> +
> + git checkout B^0 &&
> +
> + set_fake_editor &&
> + FAKE_LINES="1" test_must_fail git rebase --interactive A &&

Is this a single-shot environment assignment?  That would have been
caught with the test linter.

Perhaps squshing this in would be sufficient fix?

 t/t3401-rebase-and-am-rename.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t3401-rebase-and-am-rename.sh b/t/t3401-rebase-and-am-rename.sh
index 94bdfbd69c..13e09afec0 100755
--- a/t/t3401-rebase-and-am-rename.sh
+++ b/t/t3401-rebase-and-am-rename.sh
@@ -141,7 +141,7 @@ test_expect_success 'rebase --interactive: NO directory 
rename' '
git checkout B^0 &&
 
set_fake_editor &&
-   FAKE_LINES="1" test_must_fail git rebase --interactive A &&
+   test_must_fail env FAKE_LINES="1" git rebase --interactive A &&
 
git ls-files -s >out &&
test_line_count = 6 out &&
@@ -160,7 +160,7 @@ test_expect_success 'rebase (am): NO directory rename' '
git checkout B^0 &&
 
set_fake_editor &&
-   FAKE_LINES="1" test_must_fail git rebase A &&
+   test_must_fail env FAKE_LINES="1" git rebase A &&
 
git ls-files -s >out &&
test_line_count = 6 out &&
@@ -179,7 +179,7 @@ test_expect_success 'rebase --merge: NO directory rename' '
git checkout B^0 &&
 
set_fake_editor &&
-   FAKE_LINES="1" test_must_fail git rebase --merge A &&
+   test_must_fail env FAKE_LINES="1" git rebase --merge A &&
 
git ls-files -s >out &&
test_line_count = 6 out &&

Re: [PATCH 3/3] t5303: add tests for corrupted deltas

2018-08-29 Thread Jeff King

On Wed, Aug 29, 2018 at 10:58:57PM +0200, Jann Horn wrote:

> This verifies the changes from commit "patch-delta: fix oob read".

A minor nit, but usually we'd either introduce tests along with the
fix, or introduce them beforehand as test_expect_failure and then flip
them to success along with the fix.

> diff --git a/t/t5303-pack-corruption-resilience.sh 
> b/t/t5303-pack-corruption-resilience.sh
> index 3634e258f..7152376b6 100755
> --- a/t/t5303-pack-corruption-resilience.sh
> +++ b/t/t5303-pack-corruption-resilience.sh
> @@ -311,4 +311,22 @@ test_expect_success \
>   test_must_fail git cat-file blob $blob_2 > /dev/null &&
>   test_must_fail git cat-file blob $blob_3 > /dev/null'
>  
> +test_expect_success \
> +'apply good minimal delta' \
> +'printf "\x00\x01\x01X" > minimal_delta &&
> + test-tool delta -p /dev/null minimal_delta /dev/null
> + '

Without your second patch applied, this complains about mmap-ing
/dev/null (or any zero-length file).

Also, \x escapes are sadly not portable (dash, for example, does not
respect them). You have to use octal instead (which is not too onerous
for these small numbers).

I needed the patch below to get it to behave as expected (I also
annotated the deltas to make it more comprehensible to somebody who
hasn't just been digging in the patch code ;) ).

I wonder if we should more fully test the 4 cases I outlined in my
earlier mail, too.

-Peff

---
diff --git a/t/t5303-pack-corruption-resilience.sh 
b/t/t5303-pack-corruption-resilience.sh
index 7152376b67..df28cce68b 100755
--- a/t/t5303-pack-corruption-resilience.sh
+++ b/t/t5303-pack-corruption-resilience.sh
@@ -311,22 +311,35 @@ test_expect_success \
  test_must_fail git cat-file blob $blob_2 > /dev/null &&
  test_must_fail git cat-file blob $blob_3 > /dev/null'
 
+# \5 - five bytes in base (though we do not use it)
+# \1 - one byte in result
+# \1 - copy one byte (X)
 test_expect_success \
 'apply good minimal delta' \
-'printf "\x00\x01\x01X" > minimal_delta &&
- test-tool delta -p /dev/null minimal_delta /dev/null
+'printf "\5\1\1X" > minimal_delta &&
+ echo base >base &&
+ test-tool delta -p base minimal_delta /dev/null
  '
 
+# \5 - five bytes in base (though we do not use it)
+# \2 - two bytes in result
+# \2 - copy two bytes (we are short one)
 test_expect_success \
 'apply truncated delta' \
-'printf "\x00\x02\x02X" > truncated_delta &&
- test_must_fail test-tool delta -p /dev/null truncated_delta /dev/null
+'printf "\5\2\2X" > truncated_delta &&
+ echo base >base &&
+ test_must_fail test-tool delta -p base truncated_delta /dev/null
  '
 
+# \5 - five bytes in base (though we do not use it)
+# \1 - one byte in result
+# \1 - copy one byte (X)
+# \1 - trailing garbage command
 test_expect_success \
 'apply delta with trailing garbage command' \
-'printf "\x00\x01\x01X\x01" > tail_garbage_delta &&
- test_must_fail test-tool delta -p /dev/null tail_garbage_delta /dev/null
+'printf "\5\1\1X\1" > tail_garbage_delta &&
+ echo base >base &&
+ test_must_fail test-tool delta -p base tail_garbage_delta /dev/null
  '
 
 test_done

Re: [PATCH 2/3] t/helper/test-delta: segfault on OOB access

2018-08-29 Thread Jeff King

On Wed, Aug 29, 2018 at 05:46:21PM -0400, Jeff King wrote:

> > I think even with ASAN, you'd still need read_in_full() or an mmap()
> > wrapper that fiddles with the ASAN shadow, because mmap() always maps
> > whole pages:
> > 
> > $ cat mmap-read-asan-blah.c
> > #include 
> > #include 
> > int main(void) {
> >   volatile char *p = mmap(NULL, 1, PROT_READ|PROT_WRITE,
> > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
> >   p[200] = 1;
> > }
> > $ gcc -o mmap-read-asan-blah mmap-read-asan-blah.c -fsanitize=address
> > $ ./mmap-read-asan-blah
> > $
> 
> Yeah, I was just trying to run your tests with ASan and couldn't
> convince it to complain. I also tried MSan, but no luck.
> 
> > But that aside, you do have a point about having some custom hack for
> > a single patch.
> 
> I'm also not sure how portable it is. Looks like we have a Windows
> wrapper for getpagesize(), but I don't see any other uses of mprotect().

Actually, there is no real need for this test helper to use mmap. I
suppose we could swap it out for malloc + read_in_full() and let ASan
(or even valgrind) handle the tricky parts.

-Peff

Re: [PATCH 2/3] t/helper/test-delta: segfault on OOB access

2018-08-29 Thread Jeff King

On Wed, Aug 29, 2018 at 11:40:41PM +0200, Jann Horn wrote:

> > If we want to detect this kind of thing in tests, we should probably be
> > relying on tools like ASan, which would cover all mmaps.
> >
> > It would be nice if there was a low-cost way to detect this in
> > production use, but it looks like this replaces mmap with
> > read_in_full(), which I think is a non-starter for most uses.
> 
> I think even with ASAN, you'd still need read_in_full() or an mmap()
> wrapper that fiddles with the ASAN shadow, because mmap() always maps
> whole pages:
> 
> $ cat mmap-read-asan-blah.c
> #include 
> #include 
> int main(void) {
>   volatile char *p = mmap(NULL, 1, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
>   p[200] = 1;
> }
> $ gcc -o mmap-read-asan-blah mmap-read-asan-blah.c -fsanitize=address
> $ ./mmap-read-asan-blah
> $

Yeah, I was just trying to run your tests with ASan and couldn't
convince it to complain. I also tried MSan, but no luck.

> But that aside, you do have a point about having some custom hack for
> a single patch.

I'm also not sure how portable it is. Looks like we have a Windows
wrapper for getpagesize(), but I don't see any other uses of mprotect().

-Peff

Re: [PATCH v2 2/3] read-cache: load cache extensions on worker thread

2018-08-29 Thread Ben Peart





On 8/29/2018 1:12 PM, Junio C Hamano wrote:

Ben Peart  writes:


This is possible because the current extensions don't access the cache
entries in the index_state structure so are OK that they don't all exist
yet.

The CACHE_EXT_TREE, CACHE_EXT_RESOLVE_UNDO, and CACHE_EXT_UNTRACKED
extensions don't even get a pointer to the index so don't have access to the
cache entries.

CACHE_EXT_LINK only uses the index_state to initialize the split index.
CACHE_EXT_FSMONITOR only uses the index_state to save the fsmonitor last
update and dirty flags.


Good to see such an analysis here.  Once we define an extension
section, which requires us to have the cache entries before
populating it, this scheme would falls down, of course, but the
extension mechanism is all about protecting ourselves from the
future changes, so we'd at least need a good feel for how we read an
unknown extension from the future with the current code.  Perhaps
just like the main cache entries were pre-scanned to apportion them
to worker threads, we can pre-scan the sections and compare them
with a white-list built into our binary before deciding that it is
safe to read them in parallel (and otherwise, we ask the last thread
for reading extensions to wait until the workers that read the main
index all return)?



Yes, when we add a new extension that requires the cache entries to 
exist and be parsed, we will need to add a mechanism to ensure that 
happens for that extension.  I agree a white list is probably the right 
way to deal with it.  Until we have that need, it would just add 
unnecessary complexity so I think we should wait till it is actually needed.


There isn't any change in behavior with unknown extensions and this 
patch.  If an unknown extension exists it will just get ignored and 
reported as an "unknown extension" or "die" if it is marked as "required."


I'll fix the rest of your suggestions - thanks for the close review.

Re: [PATCH 2/3] t/helper/test-delta: segfault on OOB access

2018-08-29 Thread Jann Horn

On Wed, Aug 29, 2018 at 11:34 PM Jeff King  wrote:
>
> On Wed, Aug 29, 2018 at 10:58:56PM +0200, Jann Horn wrote:
>
> > This ensures that any attempts to access memory directly after the input
> > buffer or delta buffer in a delta test will cause a segmentation fault.
> >
> > Inspired by vsftpd.
>
> Neat trick, but it seems funny to protect this one buffer in
> non-production code. Obviously you were interested in demonstrating the
> issue for your tests, but do we want to carry this all the time?
>
> If we want to detect this kind of thing in tests, we should probably be
> relying on tools like ASan, which would cover all mmaps.
>
> It would be nice if there was a low-cost way to detect this in
> production use, but it looks like this replaces mmap with
> read_in_full(), which I think is a non-starter for most uses.

I think even with ASAN, you'd still need read_in_full() or an mmap()
wrapper that fiddles with the ASAN shadow, because mmap() always maps
whole pages:

$ cat mmap-read-asan-blah.c
#include 
#include 
int main(void) {
  volatile char *p = mmap(NULL, 1, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
  p[200] = 1;
}
$ gcc -o mmap-read-asan-blah mmap-read-asan-blah.c -fsanitize=address
$ ./mmap-read-asan-blah
$

But that aside, you do have a point about having some custom hack for
a single patch.

Re: [PATCH v2 1/3] read-cache: speed up index load through parallelization

2018-08-29 Thread Ben Peart





On 8/29/2018 1:14 PM, Junio C Hamano wrote:

Ben Peart  writes:


diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1c42364988..79f8296d9c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2391,6 +2391,12 @@ imap::
The configuration variables in the 'imap' section are described
in linkgit:git-imap-send[1].
  



Adding something like

You can disable multi-threaded code by setting this variable
to 'false' (or 1).

may reduce the risk of a similar "Huh?" reaction by other readers.



Will do


+struct load_cache_entries_thread_data
+{
+   pthread_t pthread;
+   struct index_state *istate;
+   struct mem_pool *ce_mem_pool;
+   int offset, nr;
+   void *mmap;
+   unsigned long start_offset;
+   struct strbuf previous_name_buf;
+   struct strbuf *previous_name;
+   unsigned long consumed; /* return # of bytes in index file processed */
+};


We saw that Duy's "let's not use strbuf to remember the previous
name but instead use the previous ce" approach gave us a nice
performance boost; I wonder if we can build on that idea here?

One possible approach might be to create one ce per "block" in the
pre-scanning thread and use that ce as the "previous one" in the
per-thread data before spawning a worker.



Yes, I believe this can be done.  I was planning to wait until both 
patches settled down a bit before adapting it to threads.  It's a little 
trickier because the previous ce doesn't yet exist but I believe one can 
be fabricated enough to make the optimization work.



+static unsigned long load_cache_entries(struct index_state *istate,
+   void *mmap, size_t mmap_size, unsigned long src_offset)
+{
+   struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+   struct load_cache_entries_thread_data *data;
+   int nr_threads, cpus, ce_per_thread;
+   unsigned long consumed;
+   int i, thread;
+
+   nr_threads = git_config_get_index_threads();
+   if (!nr_threads) {
+   cpus = online_cpus();
+   nr_threads = istate->cache_nr / THREAD_COST;


Here, nr_threads could become 0 with a small index, but any value
below 2 makes us call load_all_cache_entries() by the main thread
(and the value of nr_thread is not used anyore), it is fine.  Of
course, forced test will set it to 2 so there is no problem, either.

OK.


+   /* a little sanity checking */
+   if (istate->name_hash_initialized)
+   die("the name hash isn't thread safe");


If it is a programming error to call into this codepath without
initializing the name_hash, which I think is the case, this is
better done with BUG("").



Will do


The remainder of the patch looked good.  Thanks.

Re: [PATCH 2/3] t/helper/test-delta: segfault on OOB access

2018-08-29 Thread Jeff King

On Wed, Aug 29, 2018 at 10:58:56PM +0200, Jann Horn wrote:

> This ensures that any attempts to access memory directly after the input
> buffer or delta buffer in a delta test will cause a segmentation fault.
> 
> Inspired by vsftpd.

Neat trick, but it seems funny to protect this one buffer in
non-production code. Obviously you were interested in demonstrating the
issue for your tests, but do we want to carry this all the time?

If we want to detect this kind of thing in tests, we should probably be
relying on tools like ASan, which would cover all mmaps.

It would be nice if there was a low-cost way to detect this in
production use, but it looks like this replaces mmap with
read_in_full(), which I think is a non-starter for most uses.

-Peff

Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-29 Thread Jeff King

On Wed, Aug 29, 2018 at 02:09:13PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> > On Tue, Aug 28, 2018 at 02:35:25PM -0700, Stefan Beller wrote:
> 
> >> Yeah, then let's just convert '/' with as little overhead as possible.
> >
> > Do you care about case-folding issues (e.g., submodules "FOO" and "foo"
> > colliding)?
> >
> > I'm OK if the answer is "no", but if you do want to deal with it, the
> > time is probably now.
> 
> Have we rejected the config approach?  I really liked the attribute of
> not having to solve everything right away.  I'm getting scared that
> we've forgotten that goal.

I personally have no problem with that approach, but I also haven't
thought that hard about it (I was mostly ignoring the discussion since
it seemed like submodule-interested folks, but I happened to see what
looked like a potentially bad idea cc'd to me ;) ).

-Peff

Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-29 Thread Jeff King

On Wed, Aug 29, 2018 at 02:10:37PM -0700, Stefan Beller wrote:

> > Yes, that makes even the capitalized "CON" issues go away. It's not a
> > one-to-one mapping, though ("foo-" and "foo_" map to the same entity).
> 
> foo_ would map to foo__, and foo- would map to something else.
> (foo- as we do not rewrite dashes, yet?)

Ah, OK, I took your:

>   [A-Z]  -> _[a-z]

to mean "A-Z becomes a-z, and everything else becomes underscore".

If you mean a real one-to-one mapping that allows a-z and only a few
safe metacharacters, then yeah, that's what I was thinking, too.

> > If we want that, too, I think something like url-encoding is fine, with
> > the caveat that we simply urlencode _more_ things (i.e., anything not in
> > [a-z_]).
> 
> Yeah I think we need more than url encoding now.

If you take "url encoding" to only be the mechanical transformation of
quoting, not the set of _what_ gets quoting, we can still stick with it.
We don't need to, but it's probably no worse than inventing our own
set of quoting rules.

-Peff

Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-29 Thread Stefan Beller

On Wed, Aug 29, 2018 at 2:18 PM Jonathan Nieder  wrote:
>
> Hi,
>
> Stefan Beller wrote:
>
> >> Yes, that makes even the capitalized "CON" issues go away. It's not a
> >> one-to-one mapping, though ("foo-" and "foo_" map to the same entity).
> >
> > foo_ would map to foo__, and foo- would map to something else.
> > (foo- as we do not rewrite dashes, yet?)
> >
> >> If we want that, too, I think something like url-encoding is fine, with
> >> the caveat that we simply urlencode _more_ things (i.e., anything not in
> >> [a-z_]).
> >
> > Yeah I think we need more than url encoding now.
>
> Can you say more?

https://public-inbox.org/git/CAGZ79kZv4BjRq=kq_1UeT2Kn38OZwYFgnMsTe6X_WP41=hb...@mail.gmail.com/

> Can you spell out for me what problem we're solving with something more 
> custom?

case sensitivity for example.

> the ability to tweak things later.

That is unrelated to the choice of encoding, but more related to
https://public-inbox.org/git/20180816181940.46114-1-bmw...@google.com/

Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-29 Thread Brandon Williams

On 08/29, Stefan Beller wrote:
> On Wed, Aug 29, 2018 at 2:09 PM Jonathan Nieder  wrote:
> >
> > Jeff King wrote:
> > > On Tue, Aug 28, 2018 at 02:35:25PM -0700, Stefan Beller wrote:
> >
> > >> Yeah, then let's just convert '/' with as little overhead as possible.
> > >
> > > Do you care about case-folding issues (e.g., submodules "FOO" and "foo"
> > > colliding)?
> > >
> > > I'm OK if the answer is "no", but if you do want to deal with it, the
> > > time is probably now.
> >
> > Have we rejected the config approach?
> 
> I did not reject that approach, but am rather waiting for patches. ;-)

Note I did send out a patch using this approach, so no need to wait any
longer! :D

https://public-inbox.org/git/20180816181940.46114-1-bmw...@google.com/

-- 
Brandon Williams

Re: [PATCH 1/3] patch-delta: fix oob read

2018-08-29 Thread Jeff King

On Wed, Aug 29, 2018 at 10:58:55PM +0200, Jann Horn wrote:

> If `cmd` is in the range [0x01,0x7f] and `cmd > top-data`, the
> `memcpy(out, data, cmd)` can copy out-of-bounds data from after `delta_buf`
> into `dst_buf`.
> 
> This is not an exploitable bug because triggering the bug increments the
> `data` pointer beyond `top`, causing the `data != top` sanity check after
> the loop to trigger and discard the destination buffer - which means that
> the result of the out-of-bounds read is never used for anything.
> 
> Also, directly jump into the error handler instead of just breaking out of
> the loop - otherwise, data corruption would be silently ignored if the
> delta buffer ends with a command and the destination buffer is already
> full.

Nice catch. The patch looks good to me, but just to lay out my thought
process looking for other related problems:

We have two types of instructions:

  1. Take N bytes from position P within the source.

  2. Take the next N bytes from the delta stream.

In both cases we need to check that:

  a. We have enough space in the destination buffer.

  b. We have enough source bytes.

So this:

> diff --git a/patch-delta.c b/patch-delta.c
> index 56e0a5ede..283fb4b75 100644
> --- a/patch-delta.c
> +++ b/patch-delta.c
> @@ -51,13 +51,13 @@ void *patch_delta(const void *src_buf, unsigned long 
> src_size,
>   if (unsigned_add_overflows(cp_off, cp_size) ||
>   cp_off + cp_size > src_size ||
>   cp_size > size)
> - break;
> + goto bad_length;
>   memcpy(out, (char *) src_buf + cp_off, cp_size);

Covers 1a (cp_size > size) and 1b (cp_off + cp_size > src_size), plus
the obvious overflow possibility.

And then here:

>   } else if (cmd) {
> - if (cmd > size)
> - break;
> + if (cmd > size || cmd > top - data)
> + goto bad_length;
>   memcpy(out, data, cmd);

We had previously dealt with 2a (cmd > size), but failed to handle 2b,
which you've added. We don't need to deal with over/underflow here,
because our subtraction is on pointers to the same buffer.

> @@ -75,6 +75,7 @@ void *patch_delta(const void *src_buf, unsigned long 
> src_size,
>  
>   /* sanity check */
>   if (data != top || size != 0) {
> + bad_length:
>   error("delta replay has gone wild");
>   bad:
>   free(dst_buf);

And I agree that jumping straight here is a good idea.

Overall, very nicely done.

-Peff

Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-29 Thread Jonathan Nieder

Hi,

Stefan Beller wrote:

>> Yes, that makes even the capitalized "CON" issues go away. It's not a
>> one-to-one mapping, though ("foo-" and "foo_" map to the same entity).
>
> foo_ would map to foo__, and foo- would map to something else.
> (foo- as we do not rewrite dashes, yet?)
>
>> If we want that, too, I think something like url-encoding is fine, with
>> the caveat that we simply urlencode _more_ things (i.e., anything not in
>> [a-z_]).
>
> Yeah I think we need more than url encoding now.

Can you say more?  Perhaps my expectations have been poisoned by tools
like dpkg-buildpackage that use urlencode.  As far as I can tell, it
works fine.

Moreover, urlencode has some attributes that make it a good potential
fit: it's intuitive, it's unambiguous (yes, it's one-to-many, but at
least it's not many-to-many), and people know how to deal with it from
their lives using browsers.  Can you spell out for me what problem
we're solving with something more custom?

Stepping back, I am very worried about any design that doesn't give us
the ability to tweak things later.  See [1] and [2] for more on that
subject.

Thanks,
Jonathan

[1] 
https://public-inbox.org/git/20180816023446.ga127...@aiede.svl.corp.google.com/
[2] 
https://public-inbox.org/git/20180829210913.gf7...@aiede.svl.corp.google.com/

Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-29 Thread Stefan Beller

On Wed, Aug 29, 2018 at 2:09 PM Jonathan Nieder  wrote:
>
> Jeff King wrote:
> > On Tue, Aug 28, 2018 at 02:35:25PM -0700, Stefan Beller wrote:
>
> >> Yeah, then let's just convert '/' with as little overhead as possible.
> >
> > Do you care about case-folding issues (e.g., submodules "FOO" and "foo"
> > colliding)?
> >
> > I'm OK if the answer is "no", but if you do want to deal with it, the
> > time is probably now.
>
> Have we rejected the config approach?

I did not reject that approach, but am rather waiting for patches. ;-)

> I really liked the attribute of
> not having to solve everything right away.  I'm getting scared that
> we've forgotten that goal.

Eh, sorry for side tracking this issue.

I am just under the impression that the URL encoding is not particularly
good for our use case as it solves just one out of many things, whereas
the one thing (having no slashes) can also be solved in an easier way.

> It mixes well with Stefan's idea of setting up a new .git/submodules/
> directory.  We could require that everything in .git/submodules/ have
> configuration (or that everything in that directory either have
> configuration or be the result of a "very simple" transformation) and
> that way, all ambiguity goes away.

I would not want to have a world where we require that config, but I
would agree to the latter, hence we would need to discuss "very simple".
I guess that are 2 or 3 rules at most.

> Part of the definition of "very simple" could be that the submodule
> name must consist of some whitelisted list of characters (including no
> uppercase), for example.

Good catch!

Thanks for chiming in,
Stefan

Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-29 Thread Stefan Beller

> Yes, that makes even the capitalized "CON" issues go away. It's not a
> one-to-one mapping, though ("foo-" and "foo_" map to the same entity).

foo_ would map to foo__, and foo- would map to something else.
(foo- as we do not rewrite dashes, yet?)

>
> If we want that, too, I think something like url-encoding is fine, with
> the caveat that we simply urlencode _more_ things (i.e., anything not in
> [a-z_]).

Yeah I think we need more than url encoding now.

Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-29 Thread Jonathan Nieder

Jeff King wrote:
> On Tue, Aug 28, 2018 at 02:35:25PM -0700, Stefan Beller wrote:

>> Yeah, then let's just convert '/' with as little overhead as possible.
>
> Do you care about case-folding issues (e.g., submodules "FOO" and "foo"
> colliding)?
>
> I'm OK if the answer is "no", but if you do want to deal with it, the
> time is probably now.

Have we rejected the config approach?  I really liked the attribute of
not having to solve everything right away.  I'm getting scared that
we've forgotten that goal.

It mixes well with Stefan's idea of setting up a new .git/submodules/
directory.  We could require that everything in .git/submodules/ have
configuration (or that everything in that directory either have
configuration or be the result of a "very simple" transformation) and
that way, all ambiguity goes away.

Part of the definition of "very simple" could be that the submodule
name must consist of some whitelisted list of characters (including no
uppercase), for example.

Thanks,
Jonathan

Re: [PATCH 1/2] t2013: add test for missing but active submodule

2018-08-29 Thread SZEDER Gábor

On Mon, Aug 27, 2018 at 03:12:56PM -0700, Stefan Beller wrote:
> When cloning a superproject with the option
>  --recurse-submodules='.', it is easy to find yourself wanting
> a submodule active, but not having that submodule present in
> the modules directory.
> 
> Signed-off-by: Stefan Beller 
> ---
>  t/t2013-checkout-submodule.sh | 24 
>  1 file changed, 24 insertions(+)
> 
> diff --git a/t/t2013-checkout-submodule.sh b/t/t2013-checkout-submodule.sh
> index 6ef15738e44..c69640fc341 100755
> --- a/t/t2013-checkout-submodule.sh
> +++ b/t/t2013-checkout-submodule.sh
> @@ -63,6 +63,30 @@ test_expect_success '"checkout " honors 
> submodule.*.ignore from .git/
>   ! test -s actual
>  '
>  
> +test_expect_success 'setup superproject with historic submodule' '
> + test_create_repo super1 &&
> + test_create_repo sub1 &&
> + test_commit -C sub1 sub_content &&
> + git -C super1 submodule add ../sub1 &&
> + git -C super1 commit -a -m "sub1 added" &&
> + test_commit -C super1 historic_state &&
> + git -C super1 rm sub1 &&
> + git -C super1 commit -a -m "deleted sub" &&
> + test_commit -C super1 new_state &&

These six consecutive commands above all specify the '-C super1'
options ...

> + test_path_is_missing super1/sub &&
> +
> + # The important part is to ensure sub1 is not in there any more.
> + # There is another series in flight, that may remove an
> + # empty .gitmodules file entirely.
> + test_must_be_empty super1/.gitmodules

... and both of these two checks use the 'super1/' path prefix.  I
think it would be more readable to simply 'cd super1' first.

> +'
> +
> +test_expect_failure 'checkout old state with deleted submodule' '
> + test_when_finished "rm -rf super1 sub1 super1_clone" &&
> + git clone --recurse-submodules super1 super1_clone &&
> + git -C super1_clone checkout --recurse-submodules historic_state
> +'
> +
>  KNOWN_FAILURE_DIRECTORY_SUBMODULE_CONFLICTS=1
>  test_submodule_switch_recursing_with_args "checkout"
>  
> -- 
> 2.18.0
>

Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-29 Thread Jeff King

On Wed, Aug 29, 2018 at 11:10:51AM -0700, Stefan Beller wrote:

> > Do you care about case-folding issues (e.g., submodules "FOO" and "foo"
> > colliding)?
> 
> I do. :(
> 
> 2d84f13dcb6 (config: fix case sensitive subsection names on writing, 
> 2018-08-08)
> explains the latest episode of case folding with submodules involved.
> 
> > I'm OK if the answer is "no", but if you do want to deal with it, the
> > time is probably now.
> 
> Good point. But as soon as we start discussing case sensitivity, we
> are drawn down the rabbit hole of funny file names. (Try naming
> a submodule "CON1" and obtain it on Windows for example)
> So we would need to have a file system specific encoding function for
> submodule names, which sounds like a maintenance night mare.

Hmph. I'd hoped that simply escaping metacharacters and doing some
obvious case-folding would be enough. And I think that would cover most
accidental cases. But yeah, Windows reserved names are basically
indistinguishable from reasonable names. They'd probably need
special-cased.

OTOH, I'm not sure how we handle those for entries in the actual tree.
Poking around git-for-windows/git, I think it uses the magic "\\?"
marker to tell the OS to interpret the name literally.

So I wonder if it might be sufficient to just deal with the more obvious
folding issues. Or as you noted, if we just choose lowercase names as
the normalized form, that might also be enough. :)

> So if I was thinking in the scheme presented above, we could just
> have another rule that is
> 
>   [A-Z]  -> _[a-z]
> 
> (lowercase capital letters and escape them with an underscore)

Yes, that makes even the capitalized "CON" issues go away. It's not a
one-to-one mapping, though ("foo-" and "foo_" map to the same entity).

If we want that, too, I think something like url-encoding is fine, with
the caveat that we simply urlencode _more_ things (i.e., anything not in
[a-z_]).

-Peff

Re: How is the ^{sha256} peel syntax supposed to work?

2018-08-29 Thread Jonathan Nieder

Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> In other words, I want the input format and output format completely
>> decoupled.
>
> I thought that the original suggestion was to use "hashname:" as a
> prefix to specify input format.  In other words
>
>   sha1:abababab
>   sha256:abababab

That's fine with me too, and it's probably easier to understand than
^{sha1}.  The disadvantage is that it clashes with existing meaning of
"path abababab in branch sha1".  If we're okay with that change, then
it's a good syntax.

If we have a collection of proposed syntaxes, I can get some help from
a UI designer here, too, to help find any ramifications we've missed.

[...]
> I do not think ^{hashname} mixes well with ^{objecttype} syntax at
> all as an output specifier, either.  It would make sense to be more
> explicit, I would think, e.g.
>
>   git rev-parse --output=sha1 sha256:abababab

Agreed.  I don't think it makes sense to put output specifiers in
revision names.  It would create a lot of unnecessary complexity and
ambiguity.

Thanks,
Jonathan

[PATCH 2/3] t/helper/test-delta: segfault on OOB access

2018-08-29 Thread Jann Horn

This ensures that any attempts to access memory directly after the input
buffer or delta buffer in a delta test will cause a segmentation fault.

Inspired by vsftpd.

Signed-off-by: Jann Horn 
---
 t/helper/test-delta.c | 78 +--
 1 file changed, 53 insertions(+), 25 deletions(-)

diff --git a/t/helper/test-delta.c b/t/helper/test-delta.c
index 34c725924..64d0ec902 100644
--- a/t/helper/test-delta.c
+++ b/t/helper/test-delta.c
@@ -16,45 +16,73 @@
 static const char usage_str[] =
"test-tool delta (-d|-p)   ";
 
-int cmd__delta(int argc, const char **argv)
+/*
+ * We want to detect OOB reads behind the resulting buffer, even in non-ASAN
+ * builds. This helper reads some data into memory, aligns the *end* of the
+ * buffer on a page boundary, and reserves the next virtual page. This ensures
+ * that a single-byte OOB access segfaults.
+ */
+static void *map_with_adjacent_trailing_guard(const char *path,
+ unsigned long *sizep)
 {
int fd;
struct stat st;
-   void *from_buf, *data_buf, *out_buf;
-   unsigned long from_size, data_size, out_size;
+   unsigned long page_size = getpagesize();
+   unsigned long padded_size, padding;
 
-   if (argc != 5 || (strcmp(argv[1], "-d") && strcmp(argv[1], "-p"))) {
-   fprintf(stderr, "usage: %s\n", usage_str);
-   return 1;
+   fd = open(path, O_RDONLY);
+   if (fd < 0) {
+   perror(path);
+   return NULL;
}
+   if (fstat(fd, )) {
+   perror(path);
+   close(fd);
+   return NULL;
+   }
+   *sizep = st.st_size;
 
-   fd = open(argv[2], O_RDONLY);
-   if (fd < 0 || fstat(fd, )) {
-   perror(argv[2]);
-   return 1;
+   /* pad in front for alignment and add trailing page */
+   padded_size = ((page_size-1) + st.st_size + page_size) & ~(page_size-1);
+   padding = padded_size - (st.st_size + page_size);
+
+   char *mapping = mmap(NULL, padded_size, PROT_READ|PROT_WRITE,
+MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
+   if (mapping == MAP_FAILED ||
+   mprotect(mapping + padded_size - page_size, page_size, PROT_NONE)) {
+   perror("mmap");
+   close(fd);
+   return NULL;
}
-   from_size = st.st_size;
-   from_buf = mmap(NULL, from_size, PROT_READ, MAP_PRIVATE, fd, 0);
-   if (from_buf == MAP_FAILED) {
-   perror(argv[2]);
+   if (read_in_full(fd, mapping + padding, st.st_size) != st.st_size) {
+   perror("read_in_full");
+   munmap(mapping, padded_size);
close(fd);
-   return 1;
+   return NULL;
}
+   mprotect(mapping, padded_size - page_size, PROT_READ);
close(fd);
+   return mapping + padding;
+}
 
-   fd = open(argv[3], O_RDONLY);
-   if (fd < 0 || fstat(fd, )) {
-   perror(argv[3]);
+int cmd__delta(int argc, const char **argv)
+{
+   int fd;
+   void *from_buf, *data_buf, *out_buf;
+   unsigned long from_size, data_size, out_size;
+
+   if (argc != 5 || (strcmp(argv[1], "-d") && strcmp(argv[1], "-p"))) {
+   fprintf(stderr, "usage: %s\n", usage_str);
return 1;
}
-   data_size = st.st_size;
-   data_buf = mmap(NULL, data_size, PROT_READ, MAP_PRIVATE, fd, 0);
-   if (data_buf == MAP_FAILED) {
-   perror(argv[3]);
-   close(fd);
+
+   from_buf = map_with_adjacent_trailing_guard(argv[2], _size);
+   if (from_buf == NULL)
+   return 1;
+
+   data_buf = map_with_adjacent_trailing_guard(argv[3], _size);
+   if (data_buf == NULL)
return 1;
-   }
-   close(fd);
 
if (argv[1][1] == 'd')
out_buf = diff_delta(from_buf, from_size,
-- 
2.19.0.rc0.228.g281dcd1b4d0-goog

[PATCH 3/3] t5303: add tests for corrupted deltas

2018-08-29 Thread Jann Horn

This verifies the changes from commit "patch-delta: fix oob read".

Signed-off-by: Jann Horn 
---
 t/t5303-pack-corruption-resilience.sh | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/t/t5303-pack-corruption-resilience.sh 
b/t/t5303-pack-corruption-resilience.sh
index 3634e258f..7152376b6 100755
--- a/t/t5303-pack-corruption-resilience.sh
+++ b/t/t5303-pack-corruption-resilience.sh
@@ -311,4 +311,22 @@ test_expect_success \
  test_must_fail git cat-file blob $blob_2 > /dev/null &&
  test_must_fail git cat-file blob $blob_3 > /dev/null'
 
+test_expect_success \
+'apply good minimal delta' \
+'printf "\x00\x01\x01X" > minimal_delta &&
+ test-tool delta -p /dev/null minimal_delta /dev/null
+ '
+
+test_expect_success \
+'apply truncated delta' \
+'printf "\x00\x02\x02X" > truncated_delta &&
+ test_must_fail test-tool delta -p /dev/null truncated_delta /dev/null
+ '
+
+test_expect_success \
+'apply delta with trailing garbage command' \
+'printf "\x00\x01\x01X\x01" > tail_garbage_delta &&
+ test_must_fail test-tool delta -p /dev/null tail_garbage_delta /dev/null
+ '
+
 test_done
-- 
2.19.0.rc0.228.g281dcd1b4d0-goog

[PATCH 1/3] patch-delta: fix oob read

2018-08-29 Thread Jann Horn

If `cmd` is in the range [0x01,0x7f] and `cmd > top-data`, the
`memcpy(out, data, cmd)` can copy out-of-bounds data from after `delta_buf`
into `dst_buf`.

This is not an exploitable bug because triggering the bug increments the
`data` pointer beyond `top`, causing the `data != top` sanity check after
the loop to trigger and discard the destination buffer - which means that
the result of the out-of-bounds read is never used for anything.

Also, directly jump into the error handler instead of just breaking out of
the loop - otherwise, data corruption would be silently ignored if the
delta buffer ends with a command and the destination buffer is already
full.

Signed-off-by: Jann Horn 
---
 patch-delta.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/patch-delta.c b/patch-delta.c
index 56e0a5ede..283fb4b75 100644
--- a/patch-delta.c
+++ b/patch-delta.c
@@ -51,13 +51,13 @@ void *patch_delta(const void *src_buf, unsigned long 
src_size,
if (unsigned_add_overflows(cp_off, cp_size) ||
cp_off + cp_size > src_size ||
cp_size > size)
-   break;
+   goto bad_length;
memcpy(out, (char *) src_buf + cp_off, cp_size);
out += cp_size;
size -= cp_size;
} else if (cmd) {
-   if (cmd > size)
-   break;
+   if (cmd > size || cmd > top - data)
+   goto bad_length;
memcpy(out, data, cmd);
out += cmd;
data += cmd;
@@ -75,6 +75,7 @@ void *patch_delta(const void *src_buf, unsigned long src_size,
 
/* sanity check */
if (data != top || size != 0) {
+   bad_length:
error("delta replay has gone wild");
bad:
free(dst_buf);
-- 
2.19.0.rc0.228.g281dcd1b4d0-goog

Re: How is the ^{sha256} peel syntax supposed to work?

2018-08-29 Thread Junio C Hamano

Jonathan Nieder  writes:

> In other words, I want the input format and output format completely
> decoupled.

I thought that the original suggestion was to use "hashname:" as a
prefix to specify input format.  In other words

sha1:abababab
sha256:abababab

And an unadorned abababab is first looked up in sha256 space for
uniqueness, and if and only if there is only one object whose sha256
name begins with abababab and there is *no* object whose sha1 name
begins with that hexstring (or vice versa), that string will be
resolved to an object name.

I do not think ^{hashname} mixes well with ^{objecttype} syntax at
all as an output specifier, either.  It would make sense to be more
explicit, I would think, e.g.

git rev-parse --output=sha1 sha256:abababab

(or would that be the job for name-rev?)

Re: How is the ^{sha256} peel syntax supposed to work?

2018-08-29 Thread Jonathan Nieder

Hi,

Ævar Arnfjörð Bjarmason wrote:
> On Wed, Aug 29 2018, Jonathan Nieder wrote:

>> In other words, I want the input format and output format completely
>> decoupled.  If I pass ^{sha1}, I am indicating the input format.  To
>> specify the output format, I'd use --output-format instead.
>
> This is also a reasonable thing to want, but I don't see how it can be
> sensibly squared with the existing peel syntax.

All the weight here is on the word "sensibly".  Currently, ^{thing}
means "act on the object" and @{thing} means "act on the ref".  This
^{sha1} syntax is really a new kind of modifier, ~{thing}, meaning
"act on the string".

That said, we can make it do anything we want.  There is nothing
forcing us to make it more similar to ^{commit} than to
^{/searchstring}, say.

In that context:

[...]
>> Ævar Arnfjörð Bjarmason wrote:

>>> Similarly, I think it would be very useful if we just make this work:
>>>
>>> git rev-parse $some_hash^{sha256}^{commit}
>>>
>>> And not care whether $some_hash is SHA-1 or SHA-256, if it's the former
>>> we'd consult the SHA-1 <-> SHA-256 lookup table and go from there, and
>>> always return a useful value.
>>
>> The opposite of this. :)
>
> Can you elaborate on that?

What I'm saying is, regardless of the syntax used, as a user I *need*
a way to look up $some_hash as a sha256-name, with zero risk of Git
trying to outsmart me and treating $some_hash as a sha1-name instead.

Any design without that capability is a non-starter.

[...]
> I.e. if I'm using this in a script I'd need:
>
> if x = git rev-parse $some_hash^{sha256}^{commit}
> hash = x
> elsif x = git rev-parse $some_hash^{sha1}^{commit}
> hash = x
> endif

Why wouldn't you use "git rev-parse $some_hash^{commit}" instead?

Thanks,
Jonathan

Re: [PATCH 2/2] submodule.c: warn about missing submodule git directories

2018-08-29 Thread Stefan Beller

On Wed, Aug 29, 2018 at 10:17 AM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> > Not quite, as this is ...
> >
> > Looking at the test in the previous patch, I would think a reasonable 
> > workflow
> > in the test is ...
> >
> >> The MOVE_HEAD_FORCE codepath that follows this hunk is, eh, already
> >> forcing to correct the situation, so there is no need to touch that,
> >> which makes sense, if I understand correctly.
> >
> > No, that is not executed for now as it depends on 'old_head'.
>
> All explanation worth having in the log message to help future
> readers, don't you think?

ok, will do.

Thanks,
Stefan

Re: [PATCH 1/2] tests: fix non-portable "${var:-"str"}" construct

2018-08-29 Thread Ann T Ropea

Ævar Arnfjörð Bjarmason  writes:

> Fix this by removing the redundant quotes. There's no need for them,
> and the resulting code works under all the aforementioned shells. This
> fixes a regression in c2f1d3989 ("t4013: test new output from diff
> --abbrev --raw", 2017-12-03) first released with Git v2.16.0.

Acked-by: Ann T Ropea 

Thanks.

Re: Git Unrelated Histories

2018-08-29 Thread Stefan Beller

+cc the mailing list

> > >
> > > git merge development_feature_branch
> > > fatal: refusing to merge unrelated histories
> > >
> >
> > See the git merge man page for the
> >   --allow-unrelated-histories switch.
>
> I have tried that switch, however when merging a small feature branch,
> the merge tries to merge ALL of the differences between both branches,
> and they have diverged quite a long while ago.
> I am not getting the expected behaviour of just merging the changes
> from the feature branch.
>
> git-merge-base shows a common ancestor from 2 months ago,  btw.
>

Fwd: Git Unrelated Histories

2018-08-29 Thread Tomas Zubiri

Thank you for taking the time to help me Stefan


On Aug 29, 2018 15:15, "Stefan Beller"  wrote:
>
> On Wed, Aug 29, 2018 at 9:49 AM Tomas Zubiri  wrote:
> >
> > Hello all,
> >
> > I have recently joined a team there seems to be a couple of  issue
> > with the git repositories:
> >
> >
> > 1- A branch created from development cannot be merged into the
> > production branch.
> >
> >
> >
> > (production)
> >
> > git merge development_feature_branch
> >
> >
> >
> > fatal: refusing to merge unrelated histories
> >
>
> See the git merge man page for the
>   --allow-unrelated-histories switch.

I have tried that switch, however when merging a small feature branch,
the merge tries to merge ALL of the differences between both branches,
and they have diverged quite a long while ago.
I am not getting the expected behaviour of just merging the changes
from the feature branch.

git-merge-base shows a common ancestor from 2 months ago,  btw.


>
> >
> >
> >
> > 2- If there is a file that only has a 1 line difference in production,
> > a git diff will return that the whole file is different:
> >
> > git diff production:folder development:folder
> >
> >
> > “
> > diff --git a/folder/file.py b/folder/file.py
> >
> > index 9bfd6612..20cce520 100644
> >
> > --- a/folder/file py
> >
> > +++ b/folder/file.py
> >
> > @@ -1,245 +1,245 @@
> >
> > “
> >
> > I’m not 100% sure what happened here. But it seems that changes and
> > added files are copied and pasted into the production branch and
> > uploaded indepenedently as separate files, contributing to a huge
> > difference between branches.
>
> It sounds to me as if there would be line ending issues or some sort
> of whitespace issues (tab vs spaces).
>
> >
> > How can I confirm this hypothesis, and what steps can I take to solve it?
>
> git diff --ignore-all-space
> or --ignore-space-at-eol

This worked. Thank you!

>
> Look at gitattributes to set your flavor for files so you don't have
> to pass these flags all the time
>
> Stefan

[RFC] revision: Don't let ^ cancel out the default

2018-08-29 Thread Andreas Gruenbacher

Some commands like 'log' default to HEAD if no other revisions are
specified on the command line or otherwise.  Unfortunately, excludes
(^) cancel out that default, so when a command only excludes
revisions (e.g., 'git log ^origin/master'), the command will not produce
any result.

If all the specified revisions are excludes, it seems more useful to
stick with the default revision instead.

This makes writing wrappers that exclude certain revisions much easier:
for example, a simple alias l='git log ^origin/master' will show the
revisions between origin/master and HEAD by default, and 'l foo' will
show the revisions between origin/master and foo, as you would usually
expect.

Signed-off-by: Andreas Gruenbacher 
---
 revision.c | 18 ++
 t/t4202-log.sh |  6 ++
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/revision.c b/revision.c
index de4dce600..c2c51bd5d 100644
--- a/revision.c
+++ b/revision.c
@@ -1158,6 +1158,18 @@ static void add_rev_cmdline(struct rev_info *revs,
info->nr++;
 }
 
+static int has_interesting_revisions(struct rev_info *revs)
+{
+   struct rev_cmdline_info *info = >cmdline;
+   unsigned int n;
+
+   for (n = 0; n < info->nr; n++) {
+   if (!(info->rev[n].flags & UNINTERESTING))
+   return 1;
+   }
+   return 0;
+}
+
 static void add_rev_cmdline_list(struct rev_info *revs,
 struct commit_list *commit_list,
 int whence,
@@ -2318,7 +2330,7 @@ static void NORETURN diagnose_missing_default(const char 
*def)
  */
 int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct 
setup_revision_opt *opt)
 {
-   int i, flags, left, seen_dashdash, read_from_stdin, got_rev_arg = 0, 
revarg_opt;
+   int i, flags, left, seen_dashdash, read_from_stdin, revarg_opt;
struct argv_array prune_data = ARGV_ARRAY_INIT;
const char *submodule = NULL;
 
@@ -2401,8 +2413,6 @@ int setup_revisions(int argc, const char **argv, struct 
rev_info *revs, struct s
argv_array_pushv(_data, argv + i);
break;
}
-   else
-   got_rev_arg = 1;
}
 
if (prune_data.argc) {
@@ -2431,7 +2441,7 @@ int setup_revisions(int argc, const char **argv, struct 
rev_info *revs, struct s
opt->tweak(revs, opt);
if (revs->show_merge)
prepare_show_merge(revs);
-   if (revs->def && !revs->pending.nr && !revs->rev_input_given && 
!got_rev_arg) {
+   if (revs->def && !revs->rev_input_given && 
!has_interesting_revisions(revs)) {
struct object_id oid;
struct object *object;
struct object_context oc;
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 153a50615..e6670859c 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -213,6 +213,12 @@ test_expect_success 'git show  leaves list of 
commits as given' '
test_cmp expect actual
 '
 
+printf "sixth\nfifth\n" > expect
+test_expect_success '^' '
+   git log --pretty="tformat:%s" ^HEAD~2 > actual &&
+   test_cmp expect actual
+'
+
 test_expect_success 'setup case sensitivity tests' '
echo case >one &&
test_tick &&
-- 
2.17.1

Re: How is the ^{sha256} peel syntax supposed to work?

2018-08-29 Thread Ævar Arnfjörð Bjarmason



On Wed, Aug 29 2018, Jonathan Nieder wrote:

> Hi,
>
> Ævar Arnfjörð Bjarmason wrote:
>> On Wed, Aug 29 2018, Jonathan Nieder wrote:
>
>>> what objects would you expect the following to refer to?
>>>
>>>   abcdabcd^{sha1}
>>>   abcdabcd^{sha256}
>>>   ef01ef01^{sha1}
>>>   ef01ef01^{sha256}
>>
>> I still can't really make any sense of why anyone would even want #2 as
>> described above, but for this third case I think we should do this:
>>
>> abcdabcd^{sha1}   = abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd
>> abcdabcd^{sha256} = 
>> ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
>> ef01ef01^{sha1}   = ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
>> ef01ef01^{sha256} = abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd...
>>
>> I.e. a really useful thing about this peel syntax is that it's
>> forgiving, and will try to optimistically look up what you want.
>
> Sorry, I'm still not understanding.
>
> I am not attached to any particular syntax, but what I really want is
> the following:
>
>   Someone who only uses SHA-256 sent me the commit id
>   abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd... out of band.
>   Show me that commit.

This is reasonable.

>   I don't care what object id you show me when you show that
>   commit.  If I pass --output-format=sha1, then that means I
>   care, and show me the SHA-1.
>
> In other words, I want the input format and output format completely
> decoupled.  If I pass ^{sha1}, I am indicating the input format.  To
> specify the output format, I'd use --output-format instead.

This is also a reasonable thing to want, but I don't see how it can be
sensibly squared with the existing peel syntax.

The peel syntax ^{commit} doesn't mean  is a commit, it
means that thing might be some thing (commit, tag), and it should be
(recursively if needed) *resolved* as the thing on the RHS.

So to be consistent ^{sha1} shouldn't mean  is SHA-1, but
that I want a SHA-1 out of .

> That lets me mix both hash functions in my input:
>
>   git --output-format=sha256 diff abcdabcd^{sha1} abcdabcd^{sha256}

Presumably you mean something like:

 git diff-tree --raw -r -p bcdabcd^{sha1} abcdabcd^{sha256}

I.e. we don't show any sort of SHAs in diff output, so what would this
--output-format=sha256 mean?

> I learned about these two commits out of band from different users,
> one who only uses SHA-1 and the other who only uses SHA-256.

I think for those cases we would just support:

 git diff-tree --raw -r -p bcdabcd abcdabcd

I.e. there's no need to specify the hash type, unless the two happen to
be ambiguous, but yeah, if that's the case we'd need to peel them (or
supply more hexdigits).

> In other words:
>
> [...]
>> Similarly, I think it would be very useful if we just make this work:
>>
>> git rev-parse $some_hash^{sha256}^{commit}
>>
>> And not care whether $some_hash is SHA-1 or SHA-256, if it's the former
>> we'd consult the SHA-1 <-> SHA-256 lookup table and go from there, and
>> always return a useful value.
>
> The opposite of this. :)

Can you elaborate on that? What do you think that should do? Return an
error if $some_hash is SHA-1, even though we have a $some_hash =
$some_hash_256 mapping?

I.e. if I'm using this in a script I'd need:

if x = git rev-parse $some_hash^{sha256}^{commit}
hash = x
elsif x = git rev-parse $some_hash^{sha1}^{commit}
hash = x
endif

As opposed to the thing I'm saying is the redeeming quality of the peel
syntax:

hash = git rev-parse $some_hash^{sha256}^{commit}

Re: How is the ^{sha256} peel syntax supposed to work?

2018-08-29 Thread Jonathan Nieder

Hi,

Ævar Arnfjörð Bjarmason wrote:
> On Wed, Aug 29 2018, Jonathan Nieder wrote:

>> what objects would you expect the following to refer to?
>>
>>   abcdabcd^{sha1}
>>   abcdabcd^{sha256}
>>   ef01ef01^{sha1}
>>   ef01ef01^{sha256}
>
> I still can't really make any sense of why anyone would even want #2 as
> described above, but for this third case I think we should do this:
>
> abcdabcd^{sha1}   = abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd
> abcdabcd^{sha256} = 
> ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
> ef01ef01^{sha1}   = ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
> ef01ef01^{sha256} = abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd...
>
> I.e. a really useful thing about this peel syntax is that it's
> forgiving, and will try to optimistically look up what you want.

Sorry, I'm still not understanding.

I am not attached to any particular syntax, but what I really want is
the following:

Someone who only uses SHA-256 sent me the commit id
abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd... out of band.
Show me that commit.

I don't care what object id you show me when you show that
commit.  If I pass --output-format=sha1, then that means I
care, and show me the SHA-1.

In other words, I want the input format and output format completely
decoupled.  If I pass ^{sha1}, I am indicating the input format.  To
specify the output format, I'd use --output-format instead.

That lets me mix both hash functions in my input:

git --output-format=sha256 diff abcdabcd^{sha1} abcdabcd^{sha256}

I learned about these two commits out of band from different users,
one who only uses SHA-1 and the other who only uses SHA-256.

In other words:

[...]
> Similarly, I think it would be very useful if we just make this work:
>
> git rev-parse $some_hash^{sha256}^{commit}
>
> And not care whether $some_hash is SHA-1 or SHA-256, if it's the former
> we'd consult the SHA-1 <-> SHA-256 lookup table and go from there, and
> always return a useful value.

The opposite of this. :)

Thanks,
Jonathan

Re: Trivial enhancement: All commands which require an author should accept --author

2018-08-29 Thread Junio C Hamano

Johannes Schindelin  writes:

> The `stash` command only incidentally requires that the author is set, as
> it calls `git commit` internally (which records the author). As stashes
> are intended to be local only, that author information was never meant to
> be a vital part of the `stash`.
>
> I could imagine that an even better enhancement request would ask for `git
> stash` to work even if `user.name` is not configured.

This would make a good bite-sized microproject, worth marking it as
#leftoverbits unless somebody is already working on it ;-)

Re: How is the ^{sha256} peel syntax supposed to work?

2018-08-29 Thread Ævar Arnfjörð Bjarmason

On Wed, Aug 29 2018, Jonathan Nieder wrote:

> Stefan Beller wrote:
>
>>  And with that model, ^{sha256}^{tree}
>> could mean to obtain the sha256 value of  and then derive
>> the tree from that object,
>
> What does "the sha256 value of " mean?
>
> For example, in a repository with two objects:
>
>  1. an object with sha1-name abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd
> and sha256-name 
> ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
>
>  2. an object with sha1-name ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
> and sha256-name abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd...

I'm not saying this makes sense, or that it doesn't honestly my head's
still spinning a bit from this mail exchange (these are the patches I
need to re-submit):
https://public-inbox.org/git/878t8txfyf@evledraar.gmail.com/#t

But paraphrasing my understanding of what Junio & Jeff are saying in
that thread, basically what the peel syntax means is different in the
two completely different scenarios it's used:

 1. When it's being used as ^{}[...^{}] AND
 is an unambiguous SHA1 it's fairly straightforward, i.e. if
 is a commit and you say ^{tree} it lists the tree SHA-1,
but if  is e.g. a tree and you say ^{blob} it produces an
error, since there's no one blob.

 2. When it's used in the same way, but  is an ambiguous SHA1 we
fall back on a completely different sort of behavior.

Now it's, or well, supposed to be, I haven't worked through the
feedback and rewritten the patches, this weird sort of filter syntax
where ^{} will return SHA1s of starting with
a prefix of  IF the types of such SHA1s could be
contained within that type of object.

So e.g. abcabc^{tree} is supposed to list all tree and blob objects
starting with a prefix of abcabc, even though some of the blobs
could not be reachable from those trees.

It doesn't make sense to me, but there it is.

Now, because of this SHA1 v.s. SHA256 thing we have a third case.

> what objects would you expect the following to refer to?
>
>   abcdabcd^{sha1}
>   abcdabcd^{sha256}
>   ef01ef01^{sha1}
>   ef01ef01^{sha256}

I still can't really make any sense of why anyone would even want #2 as
described above, but for this third case I think we should do this:

abcdabcd^{sha1}   = abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd
abcdabcd^{sha256} = 
ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
ef01ef01^{sha1}   = ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
ef01ef01^{sha256} = abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd...

I.e. a really useful thing about this peel syntax is that it's
forgiving, and will try to optimistically look up what you want.

So e.g. ^{commit} is not an error if  is already a commit,
it could be (why are you trying to peel something already peeled!),
because it's useful to be able to feed it a set of things, some of which
are commits, some of which are tags, and have it always resolve things
without error handling on the caller side.

Similarly, I think it would be very useful if we just make this work:

git rev-parse $some_hash^{sha256}^{commit}

And not care whether $some_hash is SHA-1 or SHA-256, if it's the former
we'd consult the SHA-1 <-> SHA-256 lookup table and go from there, and
always return a useful value.

Re: How is the ^{sha256} peel syntax supposed to work?

2018-08-29 Thread Stefan Beller

On Wed, Aug 29, 2018 at 10:59 AM Jonathan Nieder  wrote:
>
> Stefan Beller wrote:
>
> >  And with that model, ^{sha256}^{tree}
> > could mean to obtain the sha256 value of  and then derive
> > the tree from that object,
>
> What does "the sha256 value of " mean?

s/hexvalue/hexdigits/
..

And with that model, ^{sha256}^{tree}
could mean to obtain the object using sha256 descriptors
(for trees/blobs/commits/tags) of  (as defined by
the step of the transition plan, it could mean 
to be interpreted as SHA1 or SHA256 or DWIM).

>
> For example, in a repository with two objects:
>
>  1. an object with sha1-name abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd
> and sha256-name 
> ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
>
>  2. an object with sha1-name ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
> and sha256-name abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd...
>

It would be super cool to have hash values to match vice versa,
but for the sake of the example, let's go with that.

> what objects would you expect the following to refer to?

That generally depends on the step of the transition plan
in that specific Git repository.

I thought these format specifiers would only describe how to
output the object names correctly for now, so it could be
possible to have:

$ git show abcdabcd^{sha1}
commit abcdabcd...

$ git show abcdabcd^{sha256}
commit ef01ef01e...

in one step and

$ git show abcdabcd^{sha1}
commit ef01ef01e...

$ git show abcdabcd^{sha256}
commit abcdabcd...

in another step, and in yet another step it could mean

$ git show abcdabcd^{sha1}
commit abcdabcd[...]^{sha1}
...

But my question was more hinting to the point that we should not
overload the syntax to mean much more than either output formatting
or hash selection.

The third meaning could be used for verifying objects as we could
use this syntax to mean

  "please verify the signature of the object (as given by ^{hash}"

or it could mean

  "please verify the signature of the object as given and ensure that
it was signed in this ^{hash} and not in a weaker hash world".

And I would think all the verification should not be folded into this
notation for now, but we only want to ask for the output to be
one or the other hash, or we could ask for an object that is
 in the specified hash, but these two modes depend
on the step in the transition plan.

Stefan

Re: Git Unrelated Histories

2018-08-29 Thread Stefan Beller

On Wed, Aug 29, 2018 at 9:49 AM Tomas Zubiri  wrote:
>
> Hello all,
>
> I have recently joined a team there seems to be a couple of  issue
> with the git repositories:
>
>
> 1- A branch created from development cannot be merged into the
> production branch.
>
>
>
> (production)
>
> git merge development_feature_branch
>
>
>
> fatal: refusing to merge unrelated histories
>

See the git merge man page for the
  --allow-unrelated-histories switch.

>
>
>
> 2- If there is a file that only has a 1 line difference in production,
> a git diff will return that the whole file is different:
>
> git diff production:folder development:folder
>
>
> “
> diff --git a/folder/file.py b/folder/file.py
>
> index 9bfd6612..20cce520 100644
>
> --- a/folder/file py
>
> +++ b/folder/file.py
>
> @@ -1,245 +1,245 @@
>
> “
>
> I’m not 100% sure what happened here. But it seems that changes and
> added files are copied and pasted into the production branch and
> uploaded indepenedently as separate files, contributing to a huge
> difference between branches.

It sounds to me as if there would be line ending issues or some sort
of whitespace issues (tab vs spaces).

>
> How can I confirm this hypothesis, and what steps can I take to solve it?

git diff --ignore-all-space
or --ignore-space-at-eol

Look at gitattributes to set your flavor for files so you don't have
to pass these flags all the time

Stefan

Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-29 Thread Stefan Beller

On Tue, Aug 28, 2018 at 10:25 PM Jeff King  wrote:
>
> On Tue, Aug 28, 2018 at 02:35:25PM -0700, Stefan Beller wrote:
>
> > 3) (optional) instead of putting it all in modules/, use another
> >directory gitmodules/ for example. this will make sure we can tell
> >if a repository has been converted or is stuck with a setup of a
> >current git.
>
> I actually kind of like that idea, as it makes the interaction between
> old and new names much simpler to reason about.
>
> And since old code won't know about the new names anyway, there's in
> theory no downside. In practice, of course, the encoding may often be a
> noop, and lazy scripts would continue to work most of the time if you
> didn't change out the prefix directory. I'm not sure if that is an
> argument for the scheme (because it will suss out broken scripts more
> consistently) or against it (because 99% of the time those old scripts
> would just happen to work).
>
> > > This is exactly the reason why I wanted to get some opinions on what the
> > > best thing to do here would be.  I _think_ the best thing would probably
> > > be to write a specific routine to do the conversion, and it wouldn't
> > > even have to be all that complex.  Basically I'm just interested in
> > > converting '/' characters so that things no longer behave like
> > > nested directories.
> >
> > Yeah, then let's just convert '/' with as little overhead as possible.
>
> Do you care about case-folding issues (e.g., submodules "FOO" and "foo"
> colliding)?

I do. :(

2d84f13dcb6 (config: fix case sensitive subsection names on writing, 2018-08-08)
explains the latest episode of case folding with submodules involved.

> I'm OK if the answer is "no", but if you do want to deal with it, the
> time is probably now.

Good point. But as soon as we start discussing case sensitivity, we
are drawn down the rabbit hole of funny file names. (Try naming
a submodule "CON1" and obtain it on Windows for example)
So we would need to have a file system specific encoding function for
submodule names, which sounds like a maintenance night mare.

The CON1 example shows that URL encoding may not be enough
on Windows and we'd have to extend the encoding if we care about
FS issues.

Another example would be "a" and "a\b" which would be a mess
in Windows as the '\' would work as a dir separator whereas these
two names were ok on linux. This would be fixed with url encoding.

URL encoding would not fix the case-folding issue that you
mentioned above.

So if I was thinking in the scheme presented above, we could just
have another rule that is

  [A-Z]  -> _[a-z]

(lowercase capital letters and escape them with an underscore)

But with that rule added, we are inventing a really complicated
encoding scheme already.

Re: How is the ^{sha256} peel syntax supposed to work?

2018-08-29 Thread Jonathan Nieder

Stefan Beller wrote:

>  And with that model, ^{sha256}^{tree}
> could mean to obtain the sha256 value of  and then derive
> the tree from that object,

What does "the sha256 value of " mean?

For example, in a repository with two objects:

 1. an object with sha1-name abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd
and sha256-name 
ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01

 2. an object with sha1-name ef01ef01ef01ef01ef01ef01ef01ef01ef01ef01
and sha256-name abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd...

what objects would you expect the following to refer to?

  abcdabcd^{sha1}
  abcdabcd^{sha256}
  ef01ef01^{sha1}
  ef01ef01^{sha256}

Thanks,
Jonathan

Re: How is the ^{sha256} peel syntax supposed to work?

2018-08-29 Thread Jonathan Nieder

Hi,

Ævar Arnfjörð Bjarmason wrote:
> On Fri, Aug 24 2018, Jonathan Nieder wrote:
>> Ævar Arnfjörð Bjarmason wrote:

>>> Or is this expected to be chained, as e.g. ^{tag}^{sha256} ?
>>
>> Great question.  The latter (well, ^{sha256}^{tag}, not the
>> other way around).
>
> Since nobody's chimed in with an answer, and I suspect many have an
> adversion to that big thread I thought I'd spin out just this small
> question into its own thread.
>
> brian m. carlson did some prep work for this in his just-submitted
> https://public-inbox.org/git/20180829005857.980820-2-sand...@crustytoothpaste.net/
>
> I was going to work on some of the peel code soon (digging up the type
> disambiguation patches I still need to re-submit), so could do this
> while I'm at it, i.e. implement ^{sha1}.

Cool!

> But as noted above it's not clear how it should work. Jonathan's
> chaining suggestion (^{sha256}^{tag} not
> ^{tag}^{sha256}) makes more sense than mine, but is that what
> we're going for, or ^{sha256:tag}?

I don't have a strong opinion about this, but since it affects the
interpretation of , my assumption has been that, in the
spirit of referential transparency, you would put
'^{format}' and could put any additional specifiers after
that.

In other words, ^{format} changes the interpretation of  so
my assumption is that people would want it to be close by.

But if something else is easier to implement, we can start with that
something else and figure out whether we like it in review.

Thanks,
Jonathan

Re: How is the ^{sha256} peel syntax supposed to work?

2018-08-29 Thread Stefan Beller

On Wed, Aug 29, 2018 at 2:13 AM Ævar Arnfjörð Bjarmason
 wrote:
>
>
> On Fri, Aug 24 2018, Jonathan Nieder wrote:
>
> > Hi,
> >
> > Ævar Arnfjörð Bjarmason wrote:
> >
> >>> git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256}
> >>
> >> How is this going to interact with other peel syntax? I.e. now we have
> >> ^{commit} ^{tag} etc. It seems to me we'll need not ^{sha1}
> >> but ^{sha1:}, e.g. ^{sha1:commit} or ^{sha1:tag}, with
> >> current ^{} being a synonym for ^{sha1:}.
> >>
> >> Or is this expected to be chained, as e.g. ^{tag}^{sha256} ?
> >
> > Great question.  The latter (well, ^{sha256}^{tag}, not the
> > other way around).
>
> Since nobody's chimed in with an answer, and I suspect many have an
> adversion to that big thread I thought I'd spin out just this small
> question into its own thread.
>
> brian m. carlson did some prep work for this in his just-submitted
> https://public-inbox.org/git/20180829005857.980820-2-sand...@crustytoothpaste.net/
>
> I was going to work on some of the peel code soon (digging up the type
> disambiguation patches I still need to re-submit), so could do this
> while I'm at it, i.e. implement ^{sha1}.
>
> But as noted above it's not clear how it should work. Jonathan's
> chaining suggestion (^{sha256}^{tag} not
> ^{tag}^{sha256}) makes more sense than mine, but is that what
> we're going for, or ^{sha256:tag}?

The choice of hash seems position independent to me, so as a user
I would expect both to work at first. Though when looking at more
syntax of these expressions, e.g. b9dfa238d5c34~1^2^^, it is
read left to right, i.e. you arrive at the destination by evaluating
the next part of the expression and then jumping around based on
each expression. And with that model, ^{sha256}^{tree}
could mean to obtain the sha256 value of  and then derive
the tree from that object, so it is unclear if the tree object would also come
in sha256 or if we could just return the tree in sha1 notation (as it would
be correctly - though confusingly - described that way. The sha256
conversion happened at an intermediate step.)

So with that said, I would expect the hash specifier at the end of the chain.

Would the position of the hash specifier make any difference for
verifying signed tags/commits ? (subtle asking to verify the sha1
signature or the sha256 signature explicitly vs asking to verify an object
that is given with  in sha1 or in sha256)

Thanks,
Stefan

Re: [PATCH v3] doc: Don't echo sed command for manpage-base-url.xsl

2018-08-29 Thread Tim Schumacher


On 29.08.18 18:55, Jonathan Nieder wrote:

Tim Schumacher wrote:


Subject: doc: Don't echo sed command for manpage-base-url.xsl


Cribbing from my review of v2: a description like

Documentation/Makefile: make manpage-base-url.xsl generation quieter

would make it more obvious what this does when viewed in "git log
--oneline".


imho, the "Documentation/Makefile" is a bit too long (about 1/3 of the
first line). Would it be advisable to keep the previous prefix of "doc"
(which would shorten down the line from 68 characters down to 49) and
to use only the second part of your proposed first line? Depending on
the response I would address this in v4 of this patch.




Previously, the sed command for generating manpage-base-url.xsl
was printed to the console when being run.

Make the console output for this rule similiar to all the
other rules by printing a short status message instead of
the whole command.

Signed-off-by: Tim Schumacher 
---
  Documentation/Makefile | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Oh!  Ignore my reply to v2; looks like you anticipated what I was
going to suggest already.  For next time, if you include a note about
what changed between versions after the --- delimiter, that can help
save some time.



The change to QUIET_GEN was proposed by Junio, but that E-Mail
wasn't CC'ed to the mailing list, probably due to him typing
the response on a phone.

I originally included a note about the change as well, but I
forgot to copy it over to the new patch after I generated a
second version of v3.


With or without the suggested commit message tweak,

Reviewed-by: Jonathan Nieder 

Thank you.



Thanks for the review!

Re: [PATCH 2/2] submodule.c: warn about missing submodule git directories

2018-08-29 Thread Junio C Hamano

Stefan Beller  writes:

> Not quite, as this is ...
>
> Looking at the test in the previous patch, I would think a reasonable workflow
> in the test is ...
> 
>> The MOVE_HEAD_FORCE codepath that follows this hunk is, eh, already
>> forcing to correct the situation, so there is no need to touch that,
>> which makes sense, if I understand correctly.
>
> No, that is not executed for now as it depends on 'old_head'.

All explanation worth having in the log message to help future
readers, don't you think?

Thanks.

Re: [PATCH v2 1/3] read-cache: speed up index load through parallelization

2018-08-29 Thread Junio C Hamano

Ben Peart  writes:

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1c42364988..79f8296d9c 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2391,6 +2391,12 @@ imap::
>   The configuration variables in the 'imap' section are described
>   in linkgit:git-imap-send[1].
>  
> +index.threads::
> + Specifies the number of threads to spawn when loading the index.
> + This is meant to reduce index load time on multiprocessor machines.
> + Specifying 0 or 'true' will cause Git to auto-detect the number of
> + CPU's and set the number of threads accordingly. Defaults to 'true'.

"0 or 'true' means 'auto'" made me go "Huh?"

The "Huh?"  I initially felt comes from the fact that usually 0 and
false are interchangeable, but for this particular application,
"disabling" the threading means setting the count to one (not zero),
leaving us zero as a usable "special value" to signal 'auto'.

So the end result does make sense, especially with this bit ...

> diff --git a/config.c b/config.c
> index 9a0b10d4bc..3bda124550 100644
> --- a/config.c
> +++ b/config.c
> @@ -2289,6 +2289,20 @@ int git_config_get_fsmonitor(void)
> ...
> + if (!git_config_get_bool_or_int("index.threads", _bool, )) {
> + if (is_bool)
> + return val ? 0 : 1;
> + else
> + return val;

... which says "'0' and 'true' are the same and yields 0, '1' and
'false' yields 1, and '2' and above will give the int".  

Adding something like

You can disable multi-threaded code by setting this variable
to 'false' (or 1).

may reduce the risk of a similar "Huh?" reaction by other readers.

> +struct load_cache_entries_thread_data
> +{
> + pthread_t pthread;
> + struct index_state *istate;
> + struct mem_pool *ce_mem_pool;
> + int offset, nr;
> + void *mmap;
> + unsigned long start_offset;
> + struct strbuf previous_name_buf;
> + struct strbuf *previous_name;
> + unsigned long consumed; /* return # of bytes in index file processed */
> +};

We saw that Duy's "let's not use strbuf to remember the previous
name but instead use the previous ce" approach gave us a nice
performance boost; I wonder if we can build on that idea here?

One possible approach might be to create one ce per "block" in the
pre-scanning thread and use that ce as the "previous one" in the
per-thread data before spawning a worker.

> +static unsigned long load_cache_entries(struct index_state *istate,
> + void *mmap, size_t mmap_size, unsigned long src_offset)
> +{
> + struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
> + struct load_cache_entries_thread_data *data;
> + int nr_threads, cpus, ce_per_thread;
> + unsigned long consumed;
> + int i, thread;
> +
> + nr_threads = git_config_get_index_threads();
> + if (!nr_threads) {
> + cpus = online_cpus();
> + nr_threads = istate->cache_nr / THREAD_COST;

Here, nr_threads could become 0 with a small index, but any value
below 2 makes us call load_all_cache_entries() by the main thread
(and the value of nr_thread is not used anyore), it is fine.  Of
course, forced test will set it to 2 so there is no problem, either.

OK.

> + /* a little sanity checking */
> + if (istate->name_hash_initialized)
> + die("the name hash isn't thread safe");

If it is a programming error to call into this codepath without
initializing the name_hash, which I think is the case, this is
better done with BUG("").

The remainder of the patch looked good.  Thanks.

Re: [PATCH] read-cache.c: optimize reading index format v4

2018-08-29 Thread Junio C Hamano

Duy Nguyen  writes:

> Yeah I kinda hated dummy_entry too but the feeling wasn't strong
> enough to move towards the index->version check. I guess I'm going to
> do it now.

Sounds like a plan.  Thanks again for a pleasant read.

Re: [PATCH v2 2/3] read-cache: load cache extensions on worker thread

2018-08-29 Thread Junio C Hamano

Ben Peart  writes:

> This is possible because the current extensions don't access the cache
> entries in the index_state structure so are OK that they don't all exist
> yet.
>
> The CACHE_EXT_TREE, CACHE_EXT_RESOLVE_UNDO, and CACHE_EXT_UNTRACKED
> extensions don't even get a pointer to the index so don't have access to the
> cache entries.
>
> CACHE_EXT_LINK only uses the index_state to initialize the split index.
> CACHE_EXT_FSMONITOR only uses the index_state to save the fsmonitor last
> update and dirty flags.

Good to see such an analysis here.  Once we define an extension
section, which requires us to have the cache entries before
populating it, this scheme would falls down, of course, but the
extension mechanism is all about protecting ourselves from the
future changes, so we'd at least need a good feel for how we read an
unknown extension from the future with the current code.  Perhaps
just like the main cache entries were pre-scanned to apportion them
to worker threads, we can pre-scan the sections and compare them
with a white-list built into our binary before deciding that it is
safe to read them in parallel (and otherwise, we ask the last thread
for reading extensions to wait until the workers that read the main
index all return)?

> -/*
> -* A thread proc to run the load_cache_entries() computation
> -* across multiple background threads.
> -*/

This one was mis-indented (lacking SP before '*') but they are gone
so ... ;-)

> @@ -1978,6 +1975,36 @@ static void *load_cache_entries_thread(void *_data)
>   return NULL;
>  }
>  
> +static void *load_index_extensions_thread(void *_data)
> +{
> + struct load_cache_entries_thread_data *p = _data;
> + unsigned long src_offset = p->start_offset;
> +
> + while (src_offset <= p->mmap_size - the_hash_algo->rawsz - 8) {
> + /* After an array of active_nr index entries,
> +  * there can be arbitrary number of extended
> +  * sections, each of which is prefixed with
> +  * extension name (4-byte) and section length
> +  * in 4-byte network byte order.
> +  */
> + uint32_t extsize;
> + memcpy(, (char *)p->mmap + src_offset + 4, 4);
> + extsize = ntohl(extsize);
> + if (read_index_extension(p->istate,
> + (const char 
> *)p->mmap + src_offset,
> + (char *)p->mmap 
> + src_offset + 8,
> + extsize) < 0) {

Overly deep indentation.  Used a wrong tab-width?

> + /* allocate an extra thread for loading the index extensions */
>   ce_per_thread = DIV_ROUND_UP(istate->cache_nr, nr_threads);
> - data = xcalloc(nr_threads, sizeof(struct 
> load_cache_entries_thread_data));
> + data = xcalloc(nr_threads + 1, sizeof(struct 
> load_cache_entries_thread_data));
>  
>   /*
>* Loop through index entries starting a thread for every ce_per_thread
> -  * entries. Exit the loop when we've created the final thread (no need
> -  * to parse the remaining entries.
> +  * entries.
>*/

I see.  Now the pre-parsing process needs to go through all the
cache entries to find the beginning of the extensions section.

>   consumed = thread = 0;
> - for (i = 0; ; i++) {
> + for (i = 0; i < istate->cache_nr; i++) {
>   struct ondisk_cache_entry *ondisk;
>   const char *name;
>   unsigned int flags;
> @@ -2055,9 +2082,7 @@ static unsigned long load_cache_entries(struct 
> index_state *istate,
>   if (pthread_create(>pthread, NULL, 
> load_cache_entries_thread, p))
>   die("unable to create 
> load_cache_entries_thread");
>  
> - /* exit the loop when we've created the last thread */
> - if (++thread == nr_threads)
> - break;
> + ++thread;

This is not C++, and in (void) context, the codebase always prefers
post-increment.

> @@ -2086,7 +2111,18 @@ static unsigned long load_cache_entries(struct 
> index_state *istate,
>   src_offset += (name - ((char *)ondisk)) + 
> expand_name_field(previous_name, name);
>   }
>  
> - for (i = 0; i < nr_threads; i++) {
> + /* create a thread to load the index extensions */
> + struct load_cache_entries_thread_data *p = [thread];

This probably triggers decl-after-statement.

> + p->istate = istate;
> + mem_pool_init(>ce_mem_pool, 0);
> + p->mmap = mmap;
> + p->mmap_size = mmap_size;
> + p->start_offset = src_offset;
> +
> + if (pthread_create(>pthread, NULL, load_index_extensions_thread, p))
> + die("unable to create load_index_extensions_thread");
> +
> + for (i = 0; i < nr_threads + 1; i++) {
>   struct load_cache_entries_thread_data

Re: [PATCH v3] doc: Don't echo sed command for manpage-base-url.xsl

2018-08-29 Thread Jonathan Nieder

Tim Schumacher wrote:

> Subject: doc: Don't echo sed command for manpage-base-url.xsl

Cribbing from my review of v2: a description like

Documentation/Makefile: make manpage-base-url.xsl generation quieter

would make it more obvious what this does when viewed in "git log
--oneline".

> Previously, the sed command for generating manpage-base-url.xsl
> was printed to the console when being run.
>
> Make the console output for this rule similiar to all the
> other rules by printing a short status message instead of
> the whole command.
>
> Signed-off-by: Tim Schumacher 
> ---
>  Documentation/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Oh!  Ignore my reply to v2; looks like you anticipated what I was
going to suggest already.  For next time, if you include a note about
what changed between versions after the --- delimiter, that can help
save some time.

With or without the suggested commit message tweak,

Reviewed-by: Jonathan Nieder 

Thank you.

Re: [PATCH v2] doc: Don't echo sed command for manpage-base-url.xsl

2018-08-29 Thread Jonathan Nieder

Tim Schumacher wrote:

> Subject: doc: Don't echo sed command for manpage-base-url.xsl

At first glance, I thought this was going to change the rendered
documentation in some way.  Maybe this should say

Makefile: make 'sed' commands quieter

?  That led me to look in the Makefile, where it appears we already do
this for some sed commands, using QUIET_GEN.  What would you think of
using the same for manpage-base-url.xsl?

Thanks,
Jonathan

Git Unrelated Histories

2018-08-29 Thread Tomas Zubiri

Hello all,

I have recently joined a team there seems to be a couple of  issue
with the git repositories:


1- A branch created from development cannot be merged into the
production branch.



(production)

git merge development_feature_branch



fatal: refusing to merge unrelated histories




2- If there is a file that only has a 1 line difference in production,
a git diff will return that the whole file is different:

git diff production:folder development:folder


“
diff --git a/folder/file.py b/folder/file.py

index 9bfd6612..20cce520 100644

--- a/folder/file py

+++ b/folder/file.py

@@ -1,245 +1,245 @@

“

I’m not 100% sure what happened here. But it seems that changes and
added files are copied and pasted into the production branch and
uploaded indepenedently as separate files, contributing to a huge
difference between branches.

How can I confirm this hypothesis, and what steps can I take to solve it?

Thank you.

Re: Thank you for public-inbox!

2018-08-29 Thread Ævar Arnfjörð Bjarmason



On Wed, Aug 29 2018, Andrei Rybak wrote:

> On 2018-08-29 12:02, Eric Wong wrote:
>> Anyways I hope to teach public-inbox to auto-linkify Message-ID-looking
>> strings "" into URLs for domain-portability,
>> (but it's ambiguous with email addresses).  But yeah, I don't
>> like things being tied to domain names.
>
> This would be very useful for people who use MUAs without
> Message-ID lookup capabilities, even without domain-portability.

FWIW Many MUAs have some hidden way to do this, for example to find the
E-Mail I'm replying to (your E-Mail) on GMail:
https://mail.google.com/mail/u/0/#search/rfc822msgid%3A3eb1c5e8-3e89-0d2e-30b1-339f38c4c703%40gmail.com

I.e. rfc822msgid:

Confusingly Google Groups accepts the same syntax, but will barf if you
include the <>'s, but that's from memory, maybe I misrecall or they've
fixed it.

Re: Trivial enhancement: All commands which require an author should accept --author

2018-08-29 Thread Johannes Schindelin

Hi Ulrich,

On Tue, 28 Aug 2018, Ulrich Gemkow wrote:

> A trivial enhancement request:
> 
> All commands which require that the author is set (and complain if
> it is not set) should accept the option --author.
> 
> At least the command stash does not accept this option. We are using
> git version 2.17.1 (Ubuntu 18.04).

The `stash` command only incidentally requires that the author is set, as
it calls `git commit` internally (which records the author). As stashes
are intended to be local only, that author information was never meant to
be a vital part of the `stash`.

I could imagine that an even better enhancement request would ask for `git
stash` to work even if `user.name` is not configured.

However, to get you unblocked: what you ask for exists already, in some
form:

git \
-c user.name="Ulrich Gemkow" \
-c user.email=ulrich.gem...@ikr.uni-stuttgart.de \
stash

Granted, this is not the nicest way to specify it, but you are probably
scripting things for environments where you do not really want to
configure an author, right?

Ciao,
Johannes

[PATCH v3] doc: Don't echo sed command for manpage-base-url.xsl

2018-08-29 Thread Tim Schumacher

Previously, the sed command for generating manpage-base-url.xsl
was printed to the console when being run.

Make the console output for this rule similiar to all the
other rules by printing a short status message instead of
the whole command.

Signed-off-by: Tim Schumacher 
---
 Documentation/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/Makefile b/Documentation/Makefile
index a42dcfc74..95f6a321f 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -344,7 +344,7 @@ $(OBSOLETE_HTML): %.html : %.txto asciidoc.conf
mv $@+ $@
 
 manpage-base-url.xsl: manpage-base-url.xsl.in
-   sed "s|@@MAN_BASE_URL@@|$(MAN_BASE_URL)|" $< > $@
+   $(QUIET_GEN)sed "s|@@MAN_BASE_URL@@|$(MAN_BASE_URL)|" $< > $@
 
 %.1 %.5 %.7 : %.xml manpage-base-url.xsl
$(QUIET_XMLTO)$(RM) $@ && \
-- 
2.19.0.rc1.1.g093671f86

Re: Thank you for public-inbox!

2018-08-29 Thread Andrei Rybak

On 2018-08-29 12:02, Eric Wong wrote:
> Anyways I hope to teach public-inbox to auto-linkify Message-ID-looking
> strings "" into URLs for domain-portability,
> (but it's ambiguous with email addresses).  But yeah, I don't
> like things being tied to domain names.

This would be very useful for people who use MUAs without
Message-ID lookup capabilities, even without domain-portability.

[PATCH v2 0/3] read-cache: speed up index load through parallelization

2018-08-29 Thread Ben Peart

The big changes in this itteration are:

- Switched to index.threads to provide control over the use of threading

- Added another worker thread to load the index extensions in parallel

- Applied optimization expand_name_field() suggested by Duy

The net result of these optimizations is a savings of 25.8% (1,000,000 files)
to 38.1% (100,000 files) as measured by p0002-read-cache.sh.

This patch conflicts with Duy's patch to remove the double memory copy and
pass in the previous ce instead.  The two will need to be merged/reconciled
once they settle down a bit.


Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/39f2b0f5fe
Checkout: git fetch https://github.com/benpeart/git read-index-multithread-v2 
&& git checkout 39f2b0f5fe


### Interdiff (v1..v2):

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 3344685cc4..79f8296d9c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -899,14 +899,6 @@ relatively high IO latencies.  When enabled, Git will do 
the
 index comparison to the filesystem data in parallel, allowing
 overlapping IO's.  Defaults to true.
 
-core.fastIndex::
-   Enable parallel index loading
-+
-This can speed up operations like 'git diff' and 'git status' especially
-when the index is very large.  When enabled, Git will do the index
-loading from the on disk format to the in-memory format in parallel.
-Defaults to true.
-
 core.createObject::
You can set this to 'link', in which case a hardlink followed by
a delete of the source are used to make sure that object creation
@@ -2399,6 +2391,12 @@ imap::
The configuration variables in the 'imap' section are described
in linkgit:git-imap-send[1].
 
+index.threads::
+   Specifies the number of threads to spawn when loading the index.
+   This is meant to reduce index load time on multiprocessor machines.
+   Specifying 0 or 'true' will cause Git to auto-detect the number of
+   CPU's and set the number of threads accordingly. Defaults to 'true'.
+
 index.version::
Specify the version with which new index files should be
initialized.  This does not affect existing repositories.
diff --git a/config.c b/config.c
index 883092fdd3..3bda124550 100644
--- a/config.c
+++ b/config.c
@@ -2289,17 +2289,18 @@ int git_config_get_fsmonitor(void)
return 0;
 }
 
-int git_config_get_fast_index(void)
+int git_config_get_index_threads(void)
 {
-   int val;
+   int is_bool, val;
 
-   if (!git_config_get_maybe_bool("core.fastindex", ))
+   if (!git_config_get_bool_or_int("index.threads", _bool, )) {
+   if (is_bool)
+   return val ? 0 : 1;
+   else
return val;
+   }
 
-   if (getenv("GIT_FASTINDEX_TEST"))
-   return 1;
-
-   return -1; /* default value */
+   return 0; /* auto-detect */
 }
 
 NORETURN
diff --git a/config.h b/config.h
index 74ca4e7db5..a06027e69b 100644
--- a/config.h
+++ b/config.h
@@ -250,7 +250,7 @@ extern int git_config_get_untracked_cache(void);
 extern int git_config_get_split_index(void);
 extern int git_config_get_max_percent_split_change(void);
 extern int git_config_get_fsmonitor(void);
-extern int git_config_get_fast_index(void);
+extern int git_config_get_index_threads(void);
 
 /* This dies if the configured or default date is in the future */
 extern int git_config_get_expiry(const char *key, const char **output);
diff --git a/read-cache.c b/read-cache.c
index 0fa7e1a04c..f5e7c86c42 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -24,10 +24,6 @@
 #include "utf8.h"
 #include "fsmonitor.h"
 
-#ifndef min
-#define min(a,b) (((a) < (b)) ? (a) : (b))
-#endif
-
 /* Mask for the name length in ce_flags in the on-disk index */
 
 #define CE_NAMEMASK  (0x0fff)
@@ -1758,9 +1754,8 @@ static unsigned long expand_name_field(struct strbuf 
*name, const char *cp_)
 
if (name->len < len)
die("malformed name field in the index");
-   strbuf_remove(name, name->len - len, len);
-   for (ep = cp; *ep; ep++)
-   ; /* find the end */
+   strbuf_setlen(name, name->len - len);
+   ep = cp + strlen((const char *)cp);
strbuf_add(name, cp, ep - cp);
return (const char *)ep + 1 - cp_;
 }
@@ -1893,7 +1888,13 @@ static size_t estimate_cache_size(size_t ondisk_size, 
unsigned int entries)
return ondisk_size + entries * per_entry;
 }
 
-static unsigned long load_cache_entry_block(struct index_state *istate, struct 
mem_pool *ce_mem_pool, int offset, int nr, void *mmap, unsigned long 
start_offset, struct strbuf *previous_name)
+/*
+ * A helper function that will load the specified range of cache entries
+ * from the memory mapped file and add them to the given index.
+ */
+static unsigned long load_cache_entry_block(struct index_state *istate,
+   struct mem_pool *ce_mem_pool, int offset, int nr, void 
*mmap,
+

[PATCH v2 1/3] read-cache: speed up index load through parallelization

2018-08-29 Thread Ben Peart

This patch helps address the CPU cost of loading the index by creating
multiple threads to divide the work of loading and converting the cache
entries across all available CPU cores.

It accomplishes this by having the primary thread loop across the index file
tracking the offset and (for V4 indexes) expanding the name. It creates a
thread to process each block of entries as it comes to them. Once the
threads are complete and the cache entries are loaded, the rest of the
extensions can be loaded and processed normally on the primary thread.

I used p0002-read-cache.sh to generate some performance data:

100,000 entries

TestHEAD~3   HEAD~2
---
read_cache/discard_cache 1000 times 14.02(0.01+0.12) 9.81(0.01+0.07) -30.0%

1,000,000 entries

TestHEAD~3HEAD~2
--
read_cache/discard_cache 1000 times 202.06(0.06+0.09) 155.72(0.03+0.06) -22.9%

Signed-off-by: Ben Peart 
---
 Documentation/config.txt |   6 +
 config.c |  14 +++
 config.h |   1 +
 read-cache.c | 240 +++
 4 files changed, 237 insertions(+), 24 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1c42364988..79f8296d9c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2391,6 +2391,12 @@ imap::
The configuration variables in the 'imap' section are described
in linkgit:git-imap-send[1].
 
+index.threads::
+   Specifies the number of threads to spawn when loading the index.
+   This is meant to reduce index load time on multiprocessor machines.
+   Specifying 0 or 'true' will cause Git to auto-detect the number of
+   CPU's and set the number of threads accordingly. Defaults to 'true'.
+
 index.version::
Specify the version with which new index files should be
initialized.  This does not affect existing repositories.
diff --git a/config.c b/config.c
index 9a0b10d4bc..3bda124550 100644
--- a/config.c
+++ b/config.c
@@ -2289,6 +2289,20 @@ int git_config_get_fsmonitor(void)
return 0;
 }
 
+int git_config_get_index_threads(void)
+{
+   int is_bool, val;
+
+   if (!git_config_get_bool_or_int("index.threads", _bool, )) {
+   if (is_bool)
+   return val ? 0 : 1;
+   else
+   return val;
+   }
+
+   return 0; /* auto-detect */
+}
+
 NORETURN
 void git_die_config_linenr(const char *key, const char *filename, int linenr)
 {
diff --git a/config.h b/config.h
index ab46e0165d..a06027e69b 100644
--- a/config.h
+++ b/config.h
@@ -250,6 +250,7 @@ extern int git_config_get_untracked_cache(void);
 extern int git_config_get_split_index(void);
 extern int git_config_get_max_percent_split_change(void);
 extern int git_config_get_fsmonitor(void);
+extern int git_config_get_index_threads(void);
 
 /* This dies if the configured or default date is in the future */
 extern int git_config_get_expiry(const char *key, const char **output);
diff --git a/read-cache.c b/read-cache.c
index 7b1354d759..c30346388a 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1889,16 +1889,229 @@ static size_t estimate_cache_size(size_t ondisk_size, 
unsigned int entries)
return ondisk_size + entries * per_entry;
 }
 
+/*
+ * A helper function that will load the specified range of cache entries
+ * from the memory mapped file and add them to the given index.
+ */
+static unsigned long load_cache_entry_block(struct index_state *istate,
+   struct mem_pool *ce_mem_pool, int offset, int nr, void 
*mmap,
+   unsigned long start_offset, struct strbuf 
*previous_name)
+{
+   int i;
+   unsigned long src_offset = start_offset;
+
+   for (i = offset; i < offset + nr; i++) {
+   struct ondisk_cache_entry *disk_ce;
+   struct cache_entry *ce;
+   unsigned long consumed;
+
+   disk_ce = (struct ondisk_cache_entry *)((char *)mmap + 
src_offset);
+   ce = create_from_disk(ce_mem_pool, disk_ce, , 
previous_name);
+   set_index_entry(istate, i, ce);
+
+   src_offset += consumed;
+   }
+   return src_offset - start_offset;
+}
+
+static unsigned long load_all_cache_entries(struct index_state *istate,
+   void *mmap, size_t mmap_size, unsigned long src_offset)
+{
+   struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+   unsigned long consumed;
+
+   if (istate->version == 4) {
+   previous_name = _name_buf;
+   mem_pool_init(>ce_mem_pool,
+ 
estimate_cache_size_from_compressed(istate->cache_nr));
+   } else {
+   previous_name = NULL;
+

[PATCH v2 3/3] read-cache: micro-optimize expand_name_field() to speed up V4 index parsing.

2018-08-29 Thread Ben Peart

 - strbuf_remove() in expand_name_field() is not exactly a good fit
   for stripping a part at the end, _setlen() would do the same job
   and is much cheaper.

 - the open-coded loop to find the end of the string in
   expand_name_field() can't beat an optimized strlen()

I used p0002-read-cache.sh to generate some performance data on the
cumulative impact:

100,000 files

TestHEAD~3   HEAD
---
read_cache/discard_cache 1000 times 14.08(0.03+0.09) 8.71(0.01+0.09) -38.1%

1,000,000 files

TestHEAD~3HEAD
--
read_cache/discard_cache 1000 times 201.77(0.03+0.07) 149.68(0.04+0.07) -25.8%

Suggested by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Ben Peart 
---
 read-cache.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index f768004617..f5e7c86c42 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1754,9 +1754,8 @@ static unsigned long expand_name_field(struct strbuf 
*name, const char *cp_)
 
if (name->len < len)
die("malformed name field in the index");
-   strbuf_remove(name, name->len - len, len);
-   for (ep = cp; *ep; ep++)
-   ; /* find the end */
+   strbuf_setlen(name, name->len - len);
+   ep = cp + strlen((const char *)cp);
strbuf_add(name, cp, ep - cp);
return (const char *)ep + 1 - cp_;
 }
-- 
2.18.0.windows.1

[PATCH v2 2/3] read-cache: load cache extensions on worker thread

2018-08-29 Thread Ben Peart

This patch helps address the CPU cost of loading the index by loading
the cache extensions on a worker thread in parallel with loading the cache
entries.

This is possible because the current extensions don't access the cache
entries in the index_state structure so are OK that they don't all exist
yet.

The CACHE_EXT_TREE, CACHE_EXT_RESOLVE_UNDO, and CACHE_EXT_UNTRACKED
extensions don't even get a pointer to the index so don't have access to the
cache entries.

CACHE_EXT_LINK only uses the index_state to initialize the split index.
CACHE_EXT_FSMONITOR only uses the index_state to save the fsmonitor last
update and dirty flags.

I used p0002-read-cache.sh to generate some performance data on the
cumulative impact:

100,000 entries

TestHEAD~3   HEAD~2
---
read_cache/discard_cache 1000 times 14.08(0.01+0.10) 9.72(0.03+0.06) -31.0%

1,000,000 entries

TestHEAD~3HEAD~2
--
read_cache/discard_cache 1000 times 202.95(0.01+0.07) 154.14(0.03+0.06) -24.1%

Signed-off-by: Ben Peart 
---
 read-cache.c | 60 +---
 1 file changed, 48 insertions(+), 12 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index c30346388a..f768004617 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1959,16 +1959,13 @@ struct load_cache_entries_thread_data
struct mem_pool *ce_mem_pool;
int offset, nr;
void *mmap;
+   size_t mmap_size;
unsigned long start_offset;
struct strbuf previous_name_buf;
struct strbuf *previous_name;
unsigned long consumed; /* return # of bytes in index file processed */
 };
 
-/*
-* A thread proc to run the load_cache_entries() computation
-* across multiple background threads.
-*/
 static void *load_cache_entries_thread(void *_data)
 {
struct load_cache_entries_thread_data *p = _data;
@@ -1978,6 +1975,36 @@ static void *load_cache_entries_thread(void *_data)
return NULL;
 }
 
+static void *load_index_extensions_thread(void *_data)
+{
+   struct load_cache_entries_thread_data *p = _data;
+   unsigned long src_offset = p->start_offset;
+
+   while (src_offset <= p->mmap_size - the_hash_algo->rawsz - 8) {
+   /* After an array of active_nr index entries,
+* there can be arbitrary number of extended
+* sections, each of which is prefixed with
+* extension name (4-byte) and section length
+* in 4-byte network byte order.
+*/
+   uint32_t extsize;
+   memcpy(, (char *)p->mmap + src_offset + 4, 4);
+   extsize = ntohl(extsize);
+   if (read_index_extension(p->istate,
+   (const char 
*)p->mmap + src_offset,
+   (char *)p->mmap 
+ src_offset + 8,
+   extsize) < 0) {
+   munmap(p->mmap, p->mmap_size);
+   die("index file corrupt");
+   }
+   src_offset += 8;
+   src_offset += extsize;
+   }
+   p->consumed += src_offset - p->start_offset;
+
+   return NULL;
+}
+
 static unsigned long load_cache_entries(struct index_state *istate,
void *mmap, size_t mmap_size, unsigned long src_offset)
 {
@@ -2012,16 +2039,16 @@ static unsigned long load_cache_entries(struct 
index_state *istate,
else
previous_name = NULL;
 
+   /* allocate an extra thread for loading the index extensions */
ce_per_thread = DIV_ROUND_UP(istate->cache_nr, nr_threads);
-   data = xcalloc(nr_threads, sizeof(struct 
load_cache_entries_thread_data));
+   data = xcalloc(nr_threads + 1, sizeof(struct 
load_cache_entries_thread_data));
 
/*
 * Loop through index entries starting a thread for every ce_per_thread
-* entries. Exit the loop when we've created the final thread (no need
-* to parse the remaining entries.
+* entries.
 */
consumed = thread = 0;
-   for (i = 0; ; i++) {
+   for (i = 0; i < istate->cache_nr; i++) {
struct ondisk_cache_entry *ondisk;
const char *name;
unsigned int flags;
@@ -2055,9 +2082,7 @@ static unsigned long load_cache_entries(struct 
index_state *istate,
if (pthread_create(>pthread, NULL, 
load_cache_entries_thread, p))
die("unable to create 
load_cache_entries_thread");
 
-   /* exit the loop when we've created the last thread */
-   if (++thread == nr_threads)
-   break;
+

Git for Windows v2.19.0-rc1, was Re: [ANNOUNCE] Git v2.19.0-rc1

2018-08-29 Thread Johannes Schindelin

Team,

the corresponding Git for Windows v2.19.0-rc1 (most notably with the
Experimental Options page in the installer letting you opt into using the
built-in `stash` and `rebase`) was built by Jameson Miller (and me) last
night and can be found here:

https://github.com/git-for-windows/git/releases/v2.19.0-rc1.windows.1

Ciao,
Johannes

On Tue, 28 Aug 2018, Junio C Hamano wrote:

> A release candidate Git v2.19.0-rc1 is now available for testing
> at the usual places.  It is comprised of 735 non-merge commits
> since v2.18.0, contributed by 64 people, 15 of which are new faces.
> 
> The tarballs are found at:
> 
> https://www.kernel.org/pub/software/scm/git/testing/
> 
> The following public repositories all have a copy of the
> 'v2.19.0-rc1' tag and the 'master' branch that the tag points at:
> 
>   url = https://kernel.googlesource.com/pub/scm/git/git
>   url = git://repo.or.cz/alt-git.git
>   url = https://github.com/gitster/git
> 
> New contributors whose contributions weren't in v2.18.0 are as follows.
> Welcome to the Git development community!
> 
>   Aleksandr Makarov, Andrei Rybak, Chen Bin, Henning Schild,
>   Isabella Stephens, Josh Steadmon, Jules Maselbas, Kana Natsuno,
>   Marc Strapetz, Masaya Suzuki, Nicholas Guriev, Samuel Maftoul,
>   Sebastian Kisela, Vladimir Parfinenko, and William Chargin.
> 
> Returning contributors who helped this release are as follows.
> Thanks for your continued support.
> 
>   Aaron Schrab, Ævar Arnfjörð Bjarmason, Alban Gruin, Alejandro
>   R. Sedeño, Anthony Sottile, Antonio Ospite, Beat Bolli, Ben
>   Peart, Brandon Williams, brian m. carlson, Christian Couder,
>   Derrick Stolee, Elia Pinto, Elijah Newren, Eric Sunshine,
>   Han-Wen Nienhuys, Jameson Miller, Jean-Noël Avila, Jeff
>   Hostetler, Jeff King, Johannes Schindelin, Johannes Sixt,
>   Jonathan Nieder, Jonathan Tan, Junio C Hamano, Kim Gybels,
>   Kirill Smelkov, Kyle Meyer, Luis Marsano, Łukasz Stelmach,
>   Luke Diamand, Martin Ågren, Max Kirillov, Michael Barabanov,
>   Mike Hommey, Nguyễn Thái Ngọc Duy, Olga Telezhnaya,
>   Phillip Wood, Prathamesh Chavan, Ramsay Jones, René Scharfe,
>   Stefan Beller, SZEDER Gábor, Taylor Blau, Thomas Rast, Tobias
>   Klauser, Todd Zullinger, Ville Skyttä, and Xiaolong Ye.
> 
> 
> 
> Git 2.19 Release Notes (draft)
> ==
> 
> Updates since v2.18
> ---
> 
> UI, Workflows & Features
> 
>  * "git diff" compares the index and the working tree.  For paths
>added with intent-to-add bit, the command shows the full contents
>of them as added, but the paths themselves were not marked as new
>files.  They are now shown as new by default.
> 
>"git apply" learned the "--intent-to-add" option so that an
>otherwise working-tree-only application of a patch will add new
>paths to the index marked with the "intent-to-add" bit.
> 
>  * "git grep" learned the "--column" option that gives not just the
>line number but the column number of the hit.
> 
>  * The "-l" option in "git branch -l" is an unfortunate short-hand for
>"--create-reflog", but many users, both old and new, somehow expect
>it to be something else, perhaps "--list".  This step warns when "-l"
>is used as a short-hand for "--create-reflog" and warns about the
>future repurposing of the it when it is used.
> 
>  * The userdiff pattern for .php has been updated.
> 
>  * The content-transfer-encoding of the message "git send-email" sends
>out by default was 8bit, which can cause trouble when there is an
>overlong line to bust RFC 5322/2822 limit.  A new option 'auto' to
>automatically switch to quoted-printable when there is such a line
>in the payload has been introduced and is made the default.
> 
>  * "git checkout" and "git worktree add" learned to honor
>checkout.defaultRemote when auto-vivifying a local branch out of a
>remote tracking branch in a repository with multiple remotes that
>have tracking branches that share the same names.
>(merge 8d7b558bae ab/checkout-default-remote later to maint).
> 
>  * "git grep" learned the "--only-matching" option.
> 
>  * "git rebase --rebase-merges" mode now handles octopus merges as
>well.
> 
>  * Add a server-side knob to skip commits in exponential/fibbonacci
>stride in an attempt to cover wider swath of history with a smaller
>number of iterations, potentially accepting a larger packfile
>transfer, instead of going back one commit a time during common
>ancestor discovery during the "git fetch" transaction.
>(merge 42cc7485a2 jt/fetch-negotiator-skipping later to maint).
> 
>  * A new configuration variable core.usereplacerefs has been added,
>primarily to help server installations that want to ignore the
>replace mechanism altogether.
> 
>  * Teach "git tag -s" etc. a few configuration variables (gpg.format
>that can be set to "openpgp" or "x509",

Re: Contributor Summit planning

2018-08-29 Thread Johannes Schindelin

Hi Peff,

On Wed, 29 Aug 2018, Jeff King wrote:

> On Mon, Aug 27, 2018 at 03:34:16PM +0200, Johannes Schindelin wrote:
> 
> > Rather than have a "hack day", I would actually prefer to work with
> > other contributors in a way that we have not done before, but which I
> > had the pleasure of "test ballooning" with Pratik: using Visual Studio
> > Code Live Share. This allows multiple users to work on the same code
> > base, in the same worktree, seeing what each other is doing. It
> > requires a separate communication channel to talk; Pratik & I used
> > IRC, but I think Google Hangout (or Skype or WhatsApp or
> > ) would have worked a bit better. It's
> > kind of pair programming, but with some of the limitations removed.
> 
> OK, I said in my earlier email that I would give any scheme a try, but I
> really don't like pair programming. ;)

Tastes do differ, and that's okay.

I found it pretty invaluable a tool for intense 1:1 mentoring. It
definitely helps a lot with getting over the initial hurdles.

Ciao,
Dscho

Re: Contributor Summit planning

2018-08-29 Thread Johannes Schindelin

Hi Peff,

On Wed, 29 Aug 2018, Jeff King wrote:

> On Mon, Aug 27, 2018 at 03:22:39PM +0200, Johannes Schindelin wrote:
> 
> > Having said that, I believe that we core contributors can learn to have a
> > fruitful online meeting. With 30+ participants, too.
> > 
> > Learning from my past life in academia (it is hard for me to imagine a
> > less disciplined crowd than a bunch of white, male, old scientists), we
> > would need a moderator, and some forum that allows to "give somebody the
> > mic". That software/platform should exist somewhere.
> 
> Yes, I agree that software tools could help a lot with a crowd that
> size. I have used various "virtual classroom" tools before, and I think
> the core of the idea is there, but I was often unimpressed by the
> execution (and expense). So if you know of a good tool, it might be
> worth trying.

I don't, and I will keep looking.

> > I would love to have the best of both worlds. For example, it is an annual
> > annoyance that we are discussion all kinds of things regarding Git, trying
> > to steer the direction, to form collaborations on certain features, and
> > the person at the helm is not even there.
> > 
> > Maybe *two* meetings per year, one attached to GitMerge, and one purely
> > online, would help.
> 
> I'm somewhat skeptical of the utility of an online meeting. That said,
> I'm willing give it a try (or any other scheme people want to come up
> with, for that matter).

I am glad you say that.

> > Point in favor of the pure-online meeting: the informal standup on IRC
> > every second Friday. I really try to attend it (it is a bit awkward
> > because it is on a Friday evening in my timezone, right at the time when I
> > want to unwind from the work week), as it does have a similar effect to
> > in-person standups: surprising collaborations spring up, unexpected help,
> > and a general sense of belonging.
> 
> Yes, I've been meaning to make it to another one (I popped in for one a
> month or two ago, and it didn't seem like much of anything was
> happening).
> 
> What time is it, again?

It is supposed to happen every two weeks, on Fridays, 17:00-17:30 UTC:
https://public-inbox.org/git/20180713170018.ga139...@aiede.svl.corp.google.com/

The latest one is logged here:
http://colabti.org/irclogger/irclogger_log/git-devel?date=2018-08-24#l144

> > Such an online summit as I suggested above would really only work if
> > enough frequent contributors would attend. If enough people like you,
> > Junio, and the standup regulars would say: yep, we're willing to plan and
> > attend an online summit, where we try to have a timezone-friendly
> > "unconference"-style meeting on one day (on which we would of course try
> > to free ourselves from our regular work obligations).
> > 
> > I guess I am asking for a "raise your hands", with mine high up in the
> > air.
> 
> I'll come if you want to organize it.

That's a good idea. I am a week away from taking a vacation, and I will
try to come back to that idea afterwards and see what I can do there.

The more I think about it, the more excited I get about it.

Ciao,
Dscho

Re: Questions about the hash function transition

2018-08-29 Thread Derrick Stolee


On 8/29/2018 9:27 AM, Derrick Stolee wrote:

On 8/29/2018 9:09 AM, Johannes Schindelin wrote:


What I meant was to leverage the midx code, not the .midx files.

My comment was motivated by my realizing that both the SHA-1 <-> SHA-256
mapping and the MIDX code have to look up (in a *fast* way) information
with hash values as keys. *And* this information is immutable. *And* the
amount of information should grow with new objects being added to the
database.


I'm unsure what this means, as the multi-pack-index simply uses 
bsearch_hash() to find hashes in the list. The same method is used for 
IDX lookups.


I talked with Johannes privately, and we found differences in our 
understanding of the current multi-pack-index feature. Johannes thought 
the feature was farther along than it is, specifically related to how 
much we value the data in the multi-pack-index when adding objects to 
pack-files or repacking. Some of this misunderstanding is due to how the 
equivalent feature works in VSTS (where there is no IDX-file equivalent, 
every object in the repo is tracked by a multi-pack-index).


I'd like to point out a few things about how the multi-pack-index works 
now, and how we hope to extend it in the future.


Currently:

1. Objects are added to the multi-pack-index by adding a new set of 
.idx/.pack file pairs. We scan the .idx file for the objects and offsets 
to add.


2. We re-use the information in the multi-pack-index only to write the 
new one without re-reading the .pack files that are already covered.


3. If a 'git repack' command deletes a pack-file, then we delete the 
multi-pack-index. It must be regenerated by 'git multi-pack-index write' 
later.


In the current world, the multi-pack-index is completely secondary to 
the .idx files.


In the future, I hope these features exist in the multi-pack-index:

1. A stable object order. As objects are added to the multi-pack-index, 
we assign a distinct integer value to each. As we add objects, those 
integers values do not change. We can then pair the reachability bitmap 
to the multi-pack-index instead of a specific pack-file (allowing repack 
and bitmap computations to happen asynchronously). The data required to 
store this object order is very similar to storing the bijection between 
SHA-1 and SHA-256 hashes.


2. Incremental multi-pack-index: Currently, we have only one 
multi-pack-index file per object directory. We can use a mechanism 
similar to the split-index to keep a small number of multi-pack-index 
files (at most 3, probably) such that the 
'.git/objects/pack/multi-pack-index' file is small and easy to rewrite, 
while it refers to larger '.git/objects/pack/*.midx' files that change 
infrequently.


3. Multi-pack-index-aware repack: The repacker only knows about the 
multi-pack-index enough to delete it. We could instead directly 
manipulate the multi-pack-index during repack, and we could decide to do 
more incremental repacks based on data stored in the multi-pack-index.


In conclusion: please keep the multi-pack-index in mind as we implement 
the transition plan. I'll continue building the feature as planned (the 
next thing to do after the current series of cleanups is 'git 
multi-pack-index verify') but am happy to look into other applications 
as we need it.


Thanks,

-Stolee

Re: Contributor Summit planning

2018-08-29 Thread Johannes Schindelin

Hi Jonathan,

On Tue, 28 Aug 2018, Jonathan Nieder wrote:

> [... talking about the IRC channel ...]
> 
> My offer to +v anyone affected by the channel's current settings still
> stands (just /msg me).  Zero people have taken me up on this offer so
> far.

I did have problems seeing any private messages from anybody
non-authenticated, recently. Pratik & Paul switched to Gitter for our
private conversations for that reason.

Ciao,
Dscho

[PATCH v2 1/1] builtin rebase: prepare for builtin rebase -i

2018-08-29 Thread Johannes Schindelin via GitGitGadget

From: Johannes Schindelin 

The builtin rebase and the builtin interactive rebase have been
developed independently, on purpose: Google Summer of Code rules
specifically state that students have to work on independent projects,
they cannot collaborate on the same project.

One fallout is that the rebase-in-c and rebase-i-in-c patches cause no
merge conflicts but a royal number of tests in the test suite to fail.

It is easy to explain why: rebase-in-c was developed under the
assumption that all rebase backends are implemented in Unix shell script
and can be sourced via `. git-rebase--`, which is no longer
true with rebase-i-in-c, where git-rebase--interactive is a hard-linked
builtin.

This patch fixes that.

Signed-off-by: Johannes Schindelin 
---
 builtin/rebase.c | 81 
 1 file changed, 81 insertions(+)

diff --git a/builtin/rebase.c b/builtin/rebase.c
index 4e69458161..99fd5d4017 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -326,6 +326,13 @@ static void add_var(struct strbuf *buf, const char *name, 
const char *value)
}
 }
 
+static const char *resolvemsg =
+N_("Resolve all conflicts manually, mark them as resolved with\n"
+"\"git add/rm \", then run \"git rebase --continue\".\n"
+"You can instead skip this commit: run \"git rebase --skip\".\n"
+"To abort and get back to the state before \"git rebase\", run "
+"\"git rebase --abort\".");
+
 static int run_specific_rebase(struct rebase_options *opts)
 {
const char *argv[] = { NULL, NULL };
@@ -333,6 +340,79 @@ static int run_specific_rebase(struct rebase_options *opts)
int status;
const char *backend, *backend_func;
 
+   if (opts->type == REBASE_INTERACTIVE) {
+   /* Run builtin interactive rebase */
+   struct child_process child = CHILD_PROCESS_INIT;
+
+   argv_array_pushf(_array, "GIT_CHERRY_PICK_HELP=%s",
+resolvemsg);
+   if (!(opts->flags & REBASE_INTERACTIVE_EXPLICIT)) {
+   argv_array_push(_array, "GIT_EDITOR=:");
+   opts->autosquash = 0;
+   }
+
+   child.git_cmd = 1;
+   argv_array_push(, "rebase--interactive");
+
+   if (opts->action)
+   argv_array_pushf(, "--%s", opts->action);
+   if (opts->keep_empty)
+   argv_array_push(, "--keep-empty");
+   if (opts->rebase_merges)
+   argv_array_push(, "--rebase-merges");
+   if (opts->rebase_cousins)
+   argv_array_push(, "--rebase-cousins");
+   if (opts->autosquash)
+   argv_array_push(, "--autosquash");
+   if (opts->flags & REBASE_VERBOSE)
+   argv_array_push(, "--verbose");
+   if (opts->flags & REBASE_FORCE)
+   argv_array_push(, "--no-ff");
+   if (opts->restrict_revision)
+   argv_array_pushf(,
+"--restrict-revision=^%s",
+
oid_to_hex(>restrict_revision->object.oid));
+   if (opts->upstream)
+   argv_array_pushf(, "--upstream=%s",
+
oid_to_hex(>upstream->object.oid));
+   if (opts->onto)
+   argv_array_pushf(, "--onto=%s",
+oid_to_hex(>onto->object.oid));
+   if (opts->squash_onto)
+   argv_array_pushf(, "--squash-onto=%s",
+oid_to_hex(opts->squash_onto));
+   if (opts->onto_name)
+   argv_array_pushf(, "--onto-name=%s",
+opts->onto_name);
+   argv_array_pushf(, "--head-name=%s",
+opts->head_name ?
+opts->head_name : "detached HEAD");
+   if (opts->strategy)
+   argv_array_pushf(, "--strategy=%s",
+opts->strategy);
+   if (opts->strategy_opts)
+   argv_array_pushf(, "--strategy-opts=%s",
+opts->strategy_opts);
+   if (opts->switch_to)
+   argv_array_pushf(, "--switch-to=%s",
+opts->switch_to);
+   if (opts->cmd)
+   argv_array_pushf(, "--cmd=%s", opts->cmd);
+   if (opts->allow_empty_message)
+   argv_array_push(, "--allow-empty-message");
+   if (opts->allow_rerere_autoupdate > 0)
+   argv_array_push(, "--rerere-autoupdate");
+   else if (opts->allow_rerere_autoupdate == 0)
+   argv_array_push(,

[PATCH v2 0/1] Teach the builtin rebase about the builtin interactive rebase

2018-08-29 Thread Johannes Schindelin via GitGitGadget

The builtin rebase and the builtin interactive rebase have been developed
independently, on purpose: Google Summer of Code rules specifically state
that students have to work on independent projects, they cannot collaborate
on the same project.

The reason is probably the very fine tradition in academia to prohibit
teamwork, which makes grading easier (at the expense of not exactly
preparing the students for the real world, unless they want to stay in
academia).

One fallout is that the rebase-in-c and rebase-i-in-c patches cause no merge
conflicts but a royal number of tests in the test suite to fail.

It is easy to explain why: rebase-in-c was developed under the assumption
that all rebase backends are implemented in Unix shell script and can be
sourced via . git-rebase--, which is no longer true with 
rebase-i-in-c, where git-rebase--interactive is a hard-linked builtin.

This patch fixes that.

Note: while this patch targets pk/rebase-in-c-6-final, it will not work
correctly without ag/rebase-i-in-c. So my suggestion is to rewrite the 
pk/rebas-in-c-6-final branch by first merging ag/rebase-i-in-c, then
applying this here patch, and only then cherry-pick "rebase: default to
using the builtin rebase".

Changes since v1:

 * replaced the too-terse commit message by a copy-edited version of this
   cover letter (leaving out only the rant about disallowing teamwork).

Johannes Schindelin (1):
  builtin rebase: prepare for builtin rebase -i

 builtin/rebase.c | 81 
 1 file changed, 81 insertions(+)


base-commit: ae497a044508ebaac1794dcdd7ad04f8685686b2
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-23%2Fdscho%2Frebase-in-c-6-final-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-23/dscho/rebase-in-c-6-final-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/23

Range-diff vs v1:

 1:  29d49819fa ! 1:  5403014be7 builtin rebase: prepare for builtin rebase -i
 @@ -2,8 +2,21 @@
  
  builtin rebase: prepare for builtin rebase -i
  
 -It is no longer a shell script, so we need to call it in a different 
way
 -than the other backends.
 +The builtin rebase and the builtin interactive rebase have been
 +developed independently, on purpose: Google Summer of Code rules
 +specifically state that students have to work on independent projects,
 +they cannot collaborate on the same project.
 +
 +One fallout is that the rebase-in-c and rebase-i-in-c patches cause no
 +merge conflicts but a royal number of tests in the test suite to fail.
 +
 +It is easy to explain why: rebase-in-c was developed under the
 +assumption that all rebase backends are implemented in Unix shell 
script
 +and can be sourced via `. git-rebase--`, which is no longer
 +true with rebase-i-in-c, where git-rebase--interactive is a 
hard-linked
 +builtin.
 +
 +This patch fixes that.
  
  Signed-off-by: Johannes Schindelin 
  

-- 
gitgitgadget

Re: [PATCH 0/1] Teach the builtin rebase about the builtin interactive rebase

2018-08-29 Thread Johannes Schindelin

Hi Jonathan,

On Tue, 28 Aug 2018, Jonathan Nieder wrote:

> Johannes Schindelin wrote:
> > On Mon, 27 Aug 2018, Junio C Hamano wrote:
> >> Johannes Schindelin  writes:
> >>> Jonathan Nieder wrote:
> 
>  Please include this information in the commit message.  It's super
>  helpful to find this kind of information about why a patch does what
>  it does when encountering a patch later "in the wild" (in git log -S
>  output).
> [...]
> >> I think what Jonathan finds helpful is the other half of the story
> >
> > I will await Jonathan's clarification.
> 
> Junio's understanding is correct.
> 
> More generally, I greatly appreciate the kind of motivation and
> backstory that you write in cover letters, and I wish that more of
> that would find its way into the commit messages instead.  Really I
> wish (and don't take this the wrong way --- I am not asking you to
> write it unless it's your own itch) that GitGitGadget would put the
> cover letter in single-patch series after the "---" line in the
> individual patches, since that would make it easier for reviewers to
> point out what content from the cover letter would be useful to add to
> the commit message.
> 
> That said, this is minor and not a reason to reroll this patch.  It was
> more that I wanted to give the hint for later patches.
> 
> Thanks much and hope that helps,

It does.

I'll "rick-roll" a new iteration, as I just realized that (contrary to my
recollection; I guess I'll need that vacation) the commit message is
*seriously* lacking. I thought I had remembered that I copy-edited the
commit message into the PR description. Clearly that was not the case.

Thanks for the clarification that triggered my looking,
Dscho

[PATCH v2] doc: Don't echo sed command for manpage-base-url.xsl

2018-08-29 Thread Tim Schumacher

Previously, the sed command for generating manpage-base-url.xsl
was printed to the console when being run.

Make the console output for this rule similiar to all the
other rules by printing a short status message instead of
the whole command.

Signed-off-by: Tim Schumacher 
---

To Junio C Hamano:
The rule does now print "SED manpage-base-url.xsl"
to the console, which is similiar to other QUIET_$TOOL
rules.

To Eric Sunshine:
I reworded the commit message to focus more on _why_
the patch is relevant.

---

 Documentation/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/Makefile b/Documentation/Makefile
index a42dcfc74..cbf33c5a7 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -225,6 +225,7 @@ ifndef V
QUIET_XSLTPROC  = @echo '   ' XSLTPROC $@;
QUIET_GEN   = @echo '   ' GEN $@;
QUIET_LINT  = @echo '   ' LINT $@;
+   QUIET_SED   = @echo '   ' SED $@;
QUIET_STDERR= 2> /dev/null
QUIET_SUBDIR0   = +@subdir=
QUIET_SUBDIR1   = ;$(NO_SUBDIR) echo '   ' SUBDIR $$subdir; \
@@ -344,7 +345,7 @@ $(OBSOLETE_HTML): %.html : %.txto asciidoc.conf
mv $@+ $@
 
 manpage-base-url.xsl: manpage-base-url.xsl.in
-   sed "s|@@MAN_BASE_URL@@|$(MAN_BASE_URL)|" $< > $@
+   $(QUIET_SED)sed "s|@@MAN_BASE_URL@@|$(MAN_BASE_URL)|" $< > $@
 
 %.1 %.5 %.7 : %.xml manpage-base-url.xsl
$(QUIET_XMLTO)$(RM) $@ && \
-- 
2.19.0.rc1.1.g093671f86

Re: Questions about the hash function transition

2018-08-29 Thread Derrick Stolee


On 8/29/2018 9:09 AM, Johannes Schindelin wrote:

Hi Jonathan,

On Tue, 28 Aug 2018, Jonathan Nieder wrote:


Johannes Schindelin wrote:

On Thu, 23 Aug 2018, Jonathan Nieder wrote:

Ævar Arnfjörð Bjarmason wrote:

Are we going to need a midx version of these mapping files? How does
midx fit into this picture? Perhaps it's too obscure to worry
about...

That's a great question!  I think the simplest answer is to have a
midx only for the primary object format and fall back to using
ordinary idx files for the others.

The midx format already has a field for hash function (thanks,
Derrick!).

Related: I wondered whether we could simply leverage the midx code for
the bidirectional SHA-1 <-> SHA-256 mapping, as it strikes me as very
similar in concept and challenges.

Interesting: tell me more.

My first instinct is to prefer the idx-based design that is already
described in the design doc.  If we want to change that, we should
have a motivating reason.

Midx is designed to be optional and to not necessarily cover all
objects, so it doesn't seem like a good fit.


It is optional, but shouldn't this mode where a Git repo that needs to 
know about two different versions of all files be optional? Or at least 
temporary?


The multi-pack-index is intended to cover all packed objects, so covers 
the same number of objects as an IDX-based strategy. If we are 
rebuilding the repo from scratch by translating the hashes, then "being 
too big to repack" is probably not a problem, so we would expect a 
single IDX file anyway.


In my opinion, whatever we do for the IDX-based approach will need to be 
duplicated in the multi-pack-index. The multi-pack-index does have a 
natural mechanism (optional chunks) for inserting this data without 
incrementing the version number.



Right.

What I meant was to leverage the midx code, not the .midx files.

My comment was motivated by my realizing that both the SHA-1 <-> SHA-256
mapping and the MIDX code have to look up (in a *fast* way) information
with hash values as keys. *And* this information is immutable. *And* the
amount of information should grow with new objects being added to the
database.


I'm unsure what this means, as the multi-pack-index simply uses 
bsearch_hash() to find hashes in the list. The same method is used for 
IDX lookups.



I know that Stolee performed a bit of performance testing regarding
different data structures to use in MIDX. We could benefit from that
testing by using not only the results from those tests, but also the code.


I did test ways to use something other than bsearch_hash(), such as 
using a 65,536-entry fanout table for lookups using the first two bytes 
of a hash (tl;dr: it speeds things up a bit, but the super-small 
improvement is probably not worth the space and complexity). I've also 
toyed with the idea of using interpolation search inside bsearch_hash(), 
but I haven't had time to do that.



IIRC one of the insights was that packs are a natural structure that
can be used for the MIDX mapping, too (you could, for example, store the
SHA-1 <-> SHA-256 mapping *only* for objects inside packs, and re-generate
them on the fly for loose objects all the time).

Stolee can speak with much more competence and confidence about this,
though, whereas all of what I said above is me waving my hands quite
frantically.


I understand the hesitation to pair such an important feature (hash 
transition) to a feature that hasn't even shipped. We will need to see 
how things progress on both fronts to see how mature the 
multi-pack-index is when we need this transition table.


Thanks,

-Stolee

1 2 >

1 - 100 of 122 matches

Mail list logo