Re: [ANNOUNCE] git-series: track changes to a patch series over time

2016-07-29 Thread Richard Ipsum
On Fri, Jul 29, 2016 at 04:04:26AM -0700, Josh Triplett wrote:
[snip]
> 
> These definitely seem like a family of related problems.  I'd like to
> use git-series as a format for storing iterations on things like GitHub
> pull-requests or Gerrit patch versions (in the latter case, overcoming
> Gerrit's limitations on only handling one patch at a time).  Integrating
> reviews with that seems helpful.

Worth noting here that Gerrit's one patch per change format isn't
intrinsic to Notedb, since we just need to track the sha we want
to merge and optionally the branch we intend to merge into.

> 
> > The prototype library I have is partly the result of some discussion and 
> > work
> > with the Gerrit folks, since they were thinking about this problem
> > before I even started writing git-candidate, and solved it with Notedb.[5]
> > 
> > Let me know if you'd like to work together on this,
> 
> I'd love to.
> 
> I'll be presenting git-series at LinuxCon North America; will you be
> there by any chance?  If not, perhaps we could meet by IRC or some other
> medium and talk about this family of problems.

Cool :)

I didn't plan to be at LinuxCon North America,
but I can certainly send contact details out of band.

> 
> I hope to use git notes with git-series in the future, by putting
> another gitlink under the git-series for notes related to the series.
> I'd intended that for more persistent notes; putting them in the series
> solves some of the problems related to notes refs, pushing/pulling, and
> collaboration.  Using notes for review comments makes sense as well,
> whether in a series or in a separate ref.

Sounds interesting, can you explain how this works in more detail?
I ended up solving the push/pull issue with a custom merge driver
that effectively runs the Notedb parser on each side of the merge
and emits the union of the two sets of change notes.

> 
> > I've been considering taking the perl-notedb prototype and writing
> > a C library for it with bindings for other languages (i.e. Rust).
> 
> A C library based on libgit2 seems like a good idea; ideally the
> bindings could interoperate with git2-rs.  (Alternatively, Rust can
> *export* a C interface, so you could write directly with git2-rs. :) )

Certainly a fair alternative, though it may arguably be safer to write
the C and export to other languages, as cool as Rust looks it's not
established the way C is, so may be a slightly riskier foundation,
in my view.

And ofcourse in C we have native access to libgit2.

> 
> One of the items on my long-term TODO list is a completely federated
> GitHub; I've been looking at other aspects of that, but federated
> reviews/comments/etc seem critical to that as well.
> 

I agree.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-series: track changes to a patch series over time

2016-07-29 Thread Richard Ipsum
On Thu, Jul 28, 2016 at 11:40:55PM -0700, Josh Triplett wrote:
[snip]
> 
> I'd welcome any feedback, whether on the interface and workflow, the
> internals and collaboration, ideas on presenting diffs of patch series,
> or anything else.
> 

This looks awesome!

I've been working on some similar stuff for a while also.[1][2]

I'm particularly interested in trying to establish a standard for
storing review data in git. I've got a prototype for doing that[3],
and an example tool that uses it[4]. The tool is still incomplete/buggy though.

There seem to be a number of us trying to solve this in our different ways,
it would be great to coordinate our efforts.

The prototype library I have is partly the result of some discussion and work
with the Gerrit folks, since they were thinking about this problem
before I even started writing git-candidate, and solved it with Notedb.[5]

Let me know if you'd like to work together on this,
I've been considering taking the perl-notedb prototype and writing
a C library for it with bindings for other languages (i.e. Rust).

[1]: http://www.mail-archive.com/git%40vger.kernel.org/msg79461.html
[2]: http://www.mail-archive.com/git%40vger.kernel.org/msg80972.html

[3]: https://bitbucket.org/richardipsum/perl-notedb
[4]: https://bitbucket.org/richardipsum/git-candidate

[5]: https://storage.googleapis.com/gerrit-talks/summit/2015/NoteDB.pdf
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] convert: add filter..process option

2016-07-29 Thread Jakub Narębski
W dniu 2016-07-29 o 12:38, Lars Schneider pisze:
> On 27 Jul 2016, at 11:41, Eric Wong  wrote:
>> larsxschnei...@gmail.com wrote:

>>> +static off_t multi_packet_read(struct strbuf *sb, const int fd, const 
>>> size_t size)
>> 
>> I'm no expert in C, but this might be const-correctness taken
>> too far.  I think basing this on the read(2) prototype is less
>> surprising:
>> 
>>   static ssize_t multi_packet_read(int fd, struct strbuf *sb, size_t size)
>
> Hm... ok. I like `const` because I think it is usually easier to 
> read/understand
> functions that do not change their input variables. This way I can communicate
> my intention to future people modifying this function!

Well, scalar types like `size_t` are always passed by value, so here `const`
doesn't matter, and it makes line longer.  I think library functions do not
use `const` for `size_t` parameters.

You are reading from the file descriptor `fd`, so it state would change.
Using `const` feels a bit like lying.  Also, it is scalar type.
 
[...] 
> I agree with your reordering of the parameters, though!
> 
> Speaking of coding style... convert.c is already big and gets only bigger 
> with this patch (1720 lines). Would it make sense to add a new file 
> "convert-pipe-protocol.c"
> or something for my additions?

I wonder if it would be possible to enhance existing functions, instead
of redoing them (at least in part) for per-command filter driver protocol.

Best,
-- 
Jakub Narębski

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-series: track changes to a patch series over time

2016-07-29 Thread Josh Triplett
On Fri, Jul 29, 2016 at 11:10:11AM +0100, Richard Ipsum wrote:
> On Thu, Jul 28, 2016 at 11:40:55PM -0700, Josh Triplett wrote:
> [snip]
> > 
> > I'd welcome any feedback, whether on the interface and workflow, the
> > internals and collaboration, ideas on presenting diffs of patch series,
> > or anything else.
> > 
> 
> This looks awesome!
> 
> I've been working on some similar stuff for a while also.[1][2]
> 
> I'm particularly interested in trying to establish a standard for
> storing review data in git. I've got a prototype for doing that[3],
> and an example tool that uses it[4]. The tool is still incomplete/buggy 
> though.

Looks promising, though!

> There seem to be a number of us trying to solve this in our different ways,
> it would be great to coordinate our efforts.

These definitely seem like a family of related problems.  I'd like to
use git-series as a format for storing iterations on things like GitHub
pull-requests or Gerrit patch versions (in the latter case, overcoming
Gerrit's limitations on only handling one patch at a time).  Integrating
reviews with that seems helpful.

> The prototype library I have is partly the result of some discussion and work
> with the Gerrit folks, since they were thinking about this problem
> before I even started writing git-candidate, and solved it with Notedb.[5]
> 
> Let me know if you'd like to work together on this,

I'd love to.

I'll be presenting git-series at LinuxCon North America; will you be
there by any chance?  If not, perhaps we could meet by IRC or some other
medium and talk about this family of problems.

I hope to use git notes with git-series in the future, by putting
another gitlink under the git-series for notes related to the series.
I'd intended that for more persistent notes; putting them in the series
solves some of the problems related to notes refs, pushing/pulling, and
collaboration.  Using notes for review comments makes sense as well,
whether in a series or in a separate ref.

> I've been considering taking the perl-notedb prototype and writing
> a C library for it with bindings for other languages (i.e. Rust).

A C library based on libgit2 seems like a good idea; ideally the
bindings could interoperate with git2-rs.  (Alternatively, Rust can
*export* a C interface, so you could write directly with git2-rs. :) )

One of the items on my long-term TODO list is a completely federated
GitHub; I've been looking at other aspects of that, but federated
reviews/comments/etc seem critical to that as well.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] convert: add filter..process option

2016-07-29 Thread Lars Schneider

> On 29 Jul 2016, at 13:24, Jakub Narębski  wrote:
> 
> W dniu 2016-07-29 o 12:38, Lars Schneider pisze:
>> On 27 Jul 2016, at 11:41, Eric Wong  wrote:
>>> larsxschnei...@gmail.com wrote:
> 
 +static off_t multi_packet_read(struct strbuf *sb, const int fd, const 
 size_t size)
>>> 
>>> I'm no expert in C, but this might be const-correctness taken
>>> too far.  I think basing this on the read(2) prototype is less
>>> surprising:
>>> 
>>>  static ssize_t multi_packet_read(int fd, struct strbuf *sb, size_t size)
>> 
>> Hm... ok. I like `const` because I think it is usually easier to 
>> read/understand
>> functions that do not change their input variables. This way I can 
>> communicate
>> my intention to future people modifying this function!
> 
> Well, scalar types like `size_t` are always passed by value, so here `const`
> doesn't matter, and it makes line longer.  I think library functions do not
> use `const` for `size_t` parameters.
> 
> You are reading from the file descriptor `fd`, so it state would change.
> Using `const` feels a bit like lying.  Also, it is scalar type.

OK, since you are the second reviewer arguing against `const` I will remove it.


> 
> [...] 
>> I agree with your reordering of the parameters, though!
>> 
>> Speaking of coding style... convert.c is already big and gets only bigger 
>> with this patch (1720 lines). Would it make sense to add a new file 
>> "convert-pipe-protocol.c"
>> or something for my additions?
> 
> I wonder if it would be possible to enhance existing functions, instead
> of redoing them (at least in part) for per-command filter driver protocol.

I think I reused as much as possible.

Thanks,
Lars
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] convert: add filter..process option

2016-07-29 Thread Lars Schneider

> On 27 Jul 2016, at 11:41, Eric Wong  wrote:
> 
> larsxschnei...@gmail.com wrote:
>> +static off_t multi_packet_read(struct strbuf *sb, const int fd, const 
>> size_t size)
> 
> I'm no expert in C, but this might be const-correctness taken
> too far.  I think basing this on the read(2) prototype is less
> surprising:
> 
>   static ssize_t multi_packet_read(int fd, struct strbuf *sb, size_t size)

Hm... ok. I like `const` because I think it is usually easier to read/understand
functions that do not change their input variables. This way I can communicate
my intention to future people modifying this function!

If this is frowned upon in the Git community then I will add a comment to the
CodingGuidelines and remove the const :)

I agree with your reordering of the parameters, though!

Speaking of coding style... convert.c is already big and gets only bigger 
with this patch (1720 lines). Would it make sense to add a new file 
"convert-pipe-protocol.c"
or something for my additions?


> Also what Jeff said about off_t vs size_t, but my previous
> emails may have confused you w.r.t. off_t usage...
> 
>> +static int multi_packet_write(const char *src, size_t len, const int in, 
>> const int out)
> 
> Same comment about over const ints above.
> len can probably be off_t based on what is below; but you need
> to process the loop in ssize_t-friendly chunks.

I think I would prefer to keep it an size_t because that is the
type we get from Git initially. The code will be more clear in v3.


> 
>> +{
>> +int ret = 1;
>> +char header[4];
>> +char buffer[8192];
>> +off_t bytes_to_write;
> 
> What Jeff said, this should be ssize_t to match read(2) and xread

Agreed.


> 
>> +while (ret) {
>> +if (in >= 0) {
>> +bytes_to_write = xread(in, buffer, sizeof(buffer));
>> +if (bytes_to_write < 0)
>> +ret &= 0;
>> +src = buffer;
>> +} else {
>> +bytes_to_write = len > LARGE_PACKET_MAX - 4 ? 
>> LARGE_PACKET_MAX - 4 : len;
>> +len -= bytes_to_write;
>> +}
>> +if (!bytes_to_write)
>> +break;
> 
> The whole ret &= .. style error handling is hard-to-follow and
> here, a source of bugs.  I think the expected convention on
> hitting errors is:
> 
>   1) stop whatever you're doing
>   2) cleanup
>   3) propagate the error to callers
> 
> "goto" is an acceptable way of accomplishing this.
> 
> For example, byte_to_write may still be negative at this point
> (and interpreted as a really big number when cast to unsigned
> size_t) and src/buffer could be stack garbage.

I changed the implementation here so that the &= style
is not necessary anymore. However, I will look into "goto"
for the other areas!


>> +set_packet_header(header, bytes_to_write + 4);
>> +ret &= write_in_full(out, , sizeof(header)) == 
>> sizeof(header);
>> +ret &= write_in_full(out, src, bytes_to_write) == 
>> bytes_to_write;
>> +}
>> +ret &= write_in_full(out, "", 4) == 4;
>> +return ret;
>> +}
>> +
> 
>> +static int apply_protocol_filter(const char *path, const char *src, size_t 
>> len,
>> +int fd, struct strbuf *dst, 
>> const char *cmd,
>> +const char *filter_type)
>> +{
> 
> 
> 
>> +if (fd >= 0 && !src) {
>> +ret &= fstat(fd, _stat) != -1;
>> +len = file_stat.st_size;
> 
> Same truncation bug I noticed earlier; what I originally meant
> is the `len' arg probably ought to be off_t, here, not size_t.
> 32-bit x86 Linux systems have 32-bit size_t (unsigned), but
> large file support means off_t is 64-bits (signed).

OK. Would it be OK to keep size_t for this patch series?


> Also, is it worth continuing this function if fstat fails?

No :-)


>> +}
>> +
>> +sigchain_push(SIGPIPE, SIG_IGN);
>> +
>> +packet_write(process->in, "%s\n", filter_type);
>> +packet_write(process->in, "%s\n", path);
>> +packet_write(process->in, "%zu\n", len);
> 
> I'm not sure if "%zu" is portable since we don't do C99 (yet?)
> For 64-bit signed off_t, you can probably do:
> 
>   packet_write(process->in, "%"PRIuMAX"\n", (uintmax_t)len);
> 
> Since we don't have PRIiMAX or intmax_t, here, and a negative
> len would be a bug (probably from failed fstat) anyways.

OK. "%zu" is not used in the entire code base. I will go with
your suggestion!


>> +ret &= multi_packet_write(src, len, fd, process->in);
> 
> multi_packet_write will probably fail if fstat failed above...

Yes. The error handling is bogus... I thought bitwise "and" would
act the same way as logical "and" (a bit embarrassing to admit that...).


> 
>> +strbuf = packet_read_line(process->out, NULL);
> 
> And this may just block or timeout if multi_packet_write failed.

True, but unless there is 

[PATCH] Fix failing test t3700-add.sh

2016-07-29 Thread Ingo Brückl
At the time of the test xfoo1 already exists and is a link.
As a result, the check for file mode 100644 fails.

Create not yet existing file xfoo instead.

Signed-off-by: Ingo Brückl 
---
 t/t3700-add.sh | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 4865304..aee61b9 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -342,12 +342,12 @@ test_expect_success 'git add --chmod=+x stages a 
non-executable file with +x' '
 '

 test_expect_success 'git add --chmod=-x stages an executable file with -x' '
-   echo foo >xfoo1 &&
-   chmod 755 xfoo1 &&
-   git add --chmod=-x xfoo1 &&
-   case "$(git ls-files --stage xfoo1)" in
-   100644" "*xfoo1) echo pass;;
-   *) echo fail; git ls-files --stage xfoo1; (exit 1);;
+   echo foo >xfoo &&
+   chmod 755 xfoo &&
+   git add --chmod=-x xfoo &&
+   case "$(git ls-files --stage xfoo)" in
+   100644" "*xfoo) echo pass;;
+   *) echo fail; git ls-files --stage xfoo; (exit 1);;
esac
 '

--
2.9.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] git-series: track changes to a patch series over time

2016-07-29 Thread Josh Triplett
I'd like to announce a project I've been working on for a while:

git-series provides a tool for managing patch series with git, tracking
the "history of history". git series tracks changes to the patch series
over time, including rebases and other non-fast-forwarding changes. git
series also tracks a cover letter for the patch series, formats the
series for email, and prepares pull requests.

This makes it easier to collaborate on a patch series, distribution
package, backport, or any other development process that includes
rebasing or non-fast-forward development.

A patch series typically goes through multiple iterations before
submission; the path from idea to RFC to [PATCHv12 1/8] includes many
invocations of git rebase -i. However, while Git tracks and organizes
commits quite well, it doesn't actually track changes to a patch series
at all, outside of the ephemeral reflog. This makes it a challenge to
collaborate on a patch series, distribution package, backport, or any
other development process that includes rebasing or non-fast-forward
development.

Typically, tracking the evolution of a patch series over time involves
moving part of the version control outside of git. You can move the
patch series from git into quilt or a distribution package, and then
version the patch files with git, losing the power of git's tools. Or,
you can keep the patch series in git, and version it via multiple named
branches; however, names like feature-v2, feature-v3-typofix, and
feature-v8-rebased-4.6-alice-fix sound like filenames from corporate
email, not modern version control. And either way, git doesn't track
your cover letter at all.

git-series tracks both a patch series and its evolution within the same
git repository. git-series works entirely with existing git features,
allowing git to push and pull a series to any git repository along with
other branches and tags. Each time you change the patch series, whether
fast-forwarding or not, you can "git series commit" a new version of the
patch series, complete with commit message.

You can rebase a patch series with "git series rebase -i", format it for
submission with "git series format", or send a "please pull" request with
"git series req".  git-series knows the base of your series, so you
don't need to count patches or find a commit hash to run rebase or
format.

If you're interested in trying git-series, see
https://github.com/git-series/git-series for installation instructions
and a "getting started" guide.

I've also documented the internal storage format of git-series at
https://github.com/git-series/git-series/blob/master/INTERNALS.md ,
including the details for how git-series ensures git can always reach,
push, and pull a series.

I'd welcome any feedback, whether on the interface and workflow, the
internals and collaboration, ideas on presenting diffs of patch series,
or anything else.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/5] Git filter protocol

2016-07-29 Thread Lars Schneider

> On 29 Jul 2016, at 09:40, Jakub Narębski  wrote:
> 
> W dniu 2016-07-28 o 15:29, Jeff King pisze:
>> On Thu, Jul 28, 2016 at 09:16:18AM +0200, Lars Schneider wrote:
>> 
>>> But Peff ($gmane/299902), Duy, and Eric, seemed to prefer the pkt-line
>>> solution (gmane is down - otherwise I would have given you the links).
>> 
>> FWIW, I think there are arguments for transmitting size + content
>> (namely, that it is simpler); the downside is that it doesn't allow
>> streaming.
> 
> And that it requires for the filter to know the size of its output
> upfront (which, as I wrote, might be easy to do based on size of input
> and data stored elsewhere, or might need generating whole output to
> know).
> 
> I don't know how parallel Git is, but if it is parallel enough,
> and other limits do not apply (limited amount of CPU cores, I/O limits),
> without streaming new filter protocol might be slower, unless startup
> time dominates (MS Windows?):
> 
> Current parallel:
> 
>   |   startup   | processing 1 |
>|  startup| processing 2  |
>   | startup |  processing 3 |
> |  startup  |  processing 4  |
> 
> Protocol v2:
> 
>   |  startup  | processing 1 | processing 2 | processing 3 | processing 4 |

Based on the current filter design the "single-shot" invocations are
not executed in parallel.


>> So I think there are two viable alternatives:
>> 
>>  1. Total size of data in ASCII decimal, newline, then that many bytes
>> of content.
>> 
>>  2. No size header, then a series of pkt-lines followed by a flush
>> packet.
> 
>3. Optional size header[2][3], then a series of pkt-lines followed
>   by a flush packet[4].
> 
> [2] Git should always provide size, because it is easy to do, and
>I think quite cheap (stored with blob, stored in index, or stat()
>on file away).  Filter can provide size if it is easy to calculate,
>or approximation of size / size hint[5] - it helps to avoid
>reallocation.

Agreed!


> [3] It is also a place where filter can pass error conditions that
>are known before starting processing a file.

I am not sure I understand what you mean. Can you think of an example?


> [4] On one hand you need to catch cases where real size is larger than
>size sent upfront, or smaller than size sent upfront; on the
>other hand it might be a place where to send warnings and errors...
>unless we utilize stderr of a process (but then there is a problem
>of deadlocking, I think).
> [5] I suggest
> 
>
>"approx" SPC 
>"unknown"
>"fail"

My current implementation supports only two cases. Either the filter
knows the size and sends it back. Or the filter doesn't know the size
and Git reads until the flush packet (your "unknown" case). "Approx" is 
probably hard to do and fail shouldn't be part of the size, no?

That being said a "fail" response is a very good idea! This allows
the filter to communicate to git that a non required filter process
failed. I will add that to the protocol. Thanks :) 

- Lars


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


⊙¤Hi

2016-07-29 Thread hi
hi
this is an electronics shop
bike,brand guitar,camera,tv,samsung product free shipping
www .slooone .com
if you do not want to receive our email,please reply to us, we will never send 
email to you

Re: Small trivial annoyance with the nice new builtin "git am"

2016-07-29 Thread Christian Couder
On Fri, Jul 29, 2016 at 2:32 AM, Linus Torvalds
 wrote:
> On Thu, Jul 28, 2016 at 5:21 PM, Jeff King  wrote:
>>
>> I do wonder if you would be happier giving each commit a "fake"
>> monotonically increasing time, so they are correctly ordered by commit
>> date.
>
> No, that would be nasty, partly for the corner case you mention, but
> partly because I just think it's wrong to try to lie about reality.
>
> The reason I noticed this in the first place was actually that I just
> wanted to take a look whether things had gotten slower or faster over
> time, and see how many patches per second I get from the patch-bombs
> Andrew sends me.
>
> So getting real time was what I was looking for.

Great!

> Also, before somebody asks: the reason git has always cached the
> "default time" string is because there's a reverse annoying thing,
> which is looking up time twice, and getting a difference of a second
> between author times and committer times just because of bad luck.
> That makes no sense either, and is actually why we have that
> "ident_default_date()" cache thing going on.
>
> So we do want to cache things for a single commit, it's just that for
> things like "git am" (or, like Junio wondered, "git rebase" - I didn't
> check) we probabyl just just flush the cache in between commits.

When Junio wrote:

> I have a suspicion that we would see more and more breakages like
> this in the near future, as there are folks trying to redo multi-commit
> commands in a single process.

he was maybe talking about my patch series to libify `git apply` and
use that in `git am` (which is itself used by `git rebase` when not in
interactive mode):

https://public-inbox.org/git/20160627182429.31550-1-chriscool%40tuxfamily.org/

that will likely mean more patches with the same real time anyway
especially with split-index on (see "Performance numbers" in that
email).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-objects: Use reachability bitmap index when generating non-stdout pack too

2016-07-29 Thread Kirill Smelkov
On Thu, Jul 28, 2016 at 02:18:29PM -0700, Junio C Hamano wrote:
> Kirill Smelkov  writes:
> 
> > I'm waiting so long for main patch to be at least queued to pu, that I'm
> > now a bit frustrated and ready to do something not related to main goal :)
> 
> Perhaps the first step would be to stop putting multiple patches in
> a single e-mail buried after a few pages of discussion.  I will not
> even find that there _are_ multiple patches in the message if I am
> not involved directly in the discussion, and the discussion is still
> ongoing, because it is likely that I'd skim just a few paragraphs at
> the top before going on to other messages.
> 
> I won't touch the message I am responding to, as your -- 8< -- cut
> mark does not even seem to be a reliable marker between patches
> (i.e.  I see something like this that is clearly not a message
> boundary:
> 
> than `git pack-objects file.pack`. Extracting erp5.git pack from
> lab.nexedi.com backup repository:
> 
>  8< 
> $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack
> 
> real0m22.309s
> ...
> )

Ok, makes sense and my fault. I'm resending each patch as separate
message in reply to this mail.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/5] Git filter protocol

2016-07-29 Thread Jakub Narębski
W dniu 2016-07-28 o 15:29, Jeff King pisze:
> On Thu, Jul 28, 2016 at 09:16:18AM +0200, Lars Schneider wrote:
> 
>> But Peff ($gmane/299902), Duy, and Eric, seemed to prefer the pkt-line
>> solution (gmane is down - otherwise I would have given you the links).
> 
> FWIW, I think there are arguments for transmitting size + content
> (namely, that it is simpler); the downside is that it doesn't allow
> streaming.

And that it requires for the filter to know the size of its output
upfront (which, as I wrote, might be easy to do based on size of input
and data stored elsewhere, or might need generating whole output to
know).

I don't know how parallel Git is, but if it is parallel enough,
and other limits do not apply (limited amount of CPU cores, I/O limits),
without streaming new filter protocol might be slower, unless startup
time dominates (MS Windows?):

Current parallel:

   |   startup   | processing 1 |
|  startup| processing 2  |
   | startup |  processing 3 |
 |  startup  |  processing 4  |

Protocol v2:

   |  startup  | processing 1 | processing 2 | processing 3 | processing 4 |

> 
> So I think there are two viable alternatives:
> 
>   1. Total size of data in ASCII decimal, newline, then that many bytes
>  of content.
> 
>   2. No size header, then a series of pkt-lines followed by a flush
>  packet.

3. Optional size header[2][3], then a series of pkt-lines followed
   by a flush packet[4].

[2] Git should always provide size, because it is easy to do, and
I think quite cheap (stored with blob, stored in index, or stat()
on file away).  Filter can provide size if it is easy to calculate,
or approximation of size / size hint[5] - it helps to avoid
reallocation.
[3] It is also a place where filter can pass error conditions that
are known before starting processing a file.
[4] On one hand you need to catch cases where real size is larger than
size sent upfront, or smaller than size sent upfront; on the
other hand it might be a place where to send warnings and errors...
unless we utilize stderr of a process (but then there is a problem
of deadlocking, I think).
[5] I suggest


"approx" SPC 
"unknown"
"fail"

> And you should choose between the two based on whether it's more
> important to allow streaming, or more important to make the filter
> implementations simple[1].
> 
> Any solution that is in between those (like sending a size header and
> then using pktlines anyway) is sacrificing simplicity but not getting
> the streaming benefits.
> 
> -Peff
> 
> [1] I haven't thought hard enough about it to have a real opinion. My
> gut says to go with the streaming, just because we've had to
> retrofit streaming in other areas when dealing with blobs, so I
> think we'll end up there eventually. So choosing a simpler protocol
> like (1) would probably mean eventually implementing a next-version
> protocol that does (2), and having to support both.
> 
> PS Jakub asked for links, but gmane is down. Here are the relevant threads:
> 
>http://public-inbox.org/git/20160720134916.gb19...@sigill.intra.peff.net
> 
>
> http://public-inbox.org/git/20160722154900.19477-1-larsxschneider%40gmail.com/t/#u
> 

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] diff-highlight: add support for git log --graph output.

2016-07-29 Thread Jakub Narębski
W dniu 2016-07-28 o 18:27, Brian Henderson pisze:
> Hi, I've been working with Jeff King on this patch, but he encouraged me to
> email the list.
> 
> I recently learned about the diff-highlight tool and find it very helpful,
> however, I frequently use the --graph option to git log which breaks the tool.
> This patch looks to fix this.
> 
> I wanted to try and add some tests as well, but I was unsure what number to
> pick for the second digit. As there were limited tests in the contrib 
> directory
> (only t93xx and one t7900) I just chose the next number in the 9xxx range.
> Please let me know if I need to change it.
> 
> I'm also not super happy about my test case for the graph option. If anyone 
> has
> any better ideas, please let me know!
> 
> Brian Henderson (3):
>   diff-highlight: add some tests.
>   diff-highlight: add failing test for handling --graph output.
>   diff-highlight: add support for --graph output.

Could you please follow SubmittingPatches, and send patches in separate emails,
inline, with those emails being replies to cover letter (like `git format-patch`
and `git send-email` would do), and not as attachments to cover letter?

With email as it stands it is really difficult to review individual
patches.

> 
>  contrib/diff-highlight/Makefile  |   5 +
>  contrib/diff-highlight/diff-highlight|  13 +--
>  contrib/diff-highlight/t/Makefile|  19 
>  contrib/diff-highlight/t/t9400-diff-highlight.sh |  76 
>  contrib/diff-highlight/t/test-diff-highlight.sh  | 111 
> +++
>  5 files changed, 218 insertions(+), 6 deletions(-)
>  create mode 100644 contrib/diff-highlight/Makefile
>  create mode 100644 contrib/diff-highlight/t/Makefile
>  create mode 100644 contrib/diff-highlight/t/t9400-diff-highlight.sh
>  create mode 100644 contrib/diff-highlight/t/test-diff-highlight.sh
> 

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/2] pack-objects: Teach it to use reachability bitmap index when generating non-stdout pack too

2016-07-29 Thread Kirill Smelkov
Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects)
if a repository has bitmap index, pack-objects can nicely speedup
"Counting objects" graph traversal phase. That however was done only for
case when resultant pack is sent to stdout, not written into a file.

The reason here is for on-disk repack by default we want:

- to produce good pack (with bitmap index not-yet-packed objects are
  emitted to pack in suboptimal order).

- to use more robust pack-generation codepath (avoiding possible
  bugs in bitmap code and possible bitmap index corruption).

Jeff Kind further explains:

The reason for this split is that pack-objects tries to determine how
"careful" it should be based on whether we are packing to disk or to
stdout. Packing to disk implies "git repack", and that we will likely
delete the old packs after finishing. We want to be more careful (so
as not to carry forward a corruption, and to generate a more optimal
pack), and we presumably run less frequently and can afford extra CPU.
Whereas packing to stdout implies serving a remote via "git fetch" or
"git push". This happens more frequently (e.g., a server handling many
fetching clients), and we assume the receiving end takes more
responsibility for verifying the data.

But this isn't always the case. One might want to generate on-disk
packfiles for a specialized object transfer. Just using "--stdout" and
writing to a file is not optimal, as it will not generate the matching
pack index.

So it would be useful to have some way of overriding this heuristic:
to tell pack-objects that even though it should generate on-disk
files, it is still OK to use the reachability bitmaps to do the
traversal.

So we can teach pack-objects to use bitmap index for initial object
counting phase when generating resultant pack file too:

- if we care it is not activated under git-repack:

  See above about repack robustness and not forward-carrying corruption.

- if we know bitmap index generation is not enabled for resultant pack:

  Current code has singleton bitmap_git so cannot work simultaneously
  with two bitmap indices.

  We also want to avoid (at least with current implementation)
  generating bitmaps off of bitmaps. The reason here is: when generating
  a pack, not-yet-packed objects will be emitted into pack in
  suboptimal order and added to tail of the bitmap as "extended entries".
  When the resultant pack + some new objects in associated repository
  are in turn used to generate another pack with bitmap, the situation
  repeats: new objects are again not emitted optimally and just added to
  bitmap tail - not in recency order.

  So the pack badness can grow over time when at each step we have
  bitmapped pack + some other objects. That's why we want to avoid
  generating bitmaps off of bitmaps, not to let pack badness grow.

- if we keep pack reuse enabled still only for "send-to-stdout" case:

  Because on pack reuse raw entries are directly written out to destination
  pack by write_reused_pack() bypassing needed for pack index generation
  bookkeeping done by regular codepath in write_one() and friends.

This way all git tests pass, and for pack-objects -> file we get nice
speedup:

erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup
repository managed by git-backup[2] via

time echo 0186ac99 | git pack-objects --revs erp5pack

before:  37.2s
after:   26.2s

And for `git repack -adb` packed git.git

time echo 5c589a73 | git pack-objects --revs gitpack

before:   7.1s
after:3.6s

i.e. it can be 30% - 50% speedup for pack extraction.

git-backup extracts many packs on repositories restoration. That was my
initial motivation for the patch.

[1] https://lab.nexedi.com/nexedi/erp5
[2] https://lab.nexedi.com/kirr/git-backup

NOTE

Jeff also suggests that pack.useBitmaps was probably a mistake to
introduce originally. This way we are not adding another config point,
but instead just always default to-file pack-objects not to use bitmap
index: Tools which need to generate on-disk packs with using bitmap, can
pass --use-bitmap-index explicitly. And git-repack does never pass
--use-bitmap-index, so this way we can be sure regular on-disk repacking
remains robust.

NOTE2

`git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much 
slower
than `git pack-objects file.pack`. Extracting erp5.git pack from
lab.nexedi.com backup repository:

$ time echo 0186ac99 | git pack-objects --stdout --revs 
>erp5pack-stdout.pack

real0m22.309s
user0m21.148s
sys 0m0.932s

$ time git index-pack erp5pack-stdout.pack

real0m50.873s   <-- more than 2 times slower than time to generate pack 
itself!
user0m49.300s
sys 0m1.360s

So the time for

`pack-object --stdout >file.pack` + `index-pack file.pack`  is  72s,

while

`pack-objects file.pack` which does both pack and index is  

[PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental

2016-07-29 Thread Kirill Smelkov
Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there
are two codepaths in pack-objects: with & without using bitmap
reachability index.

However add_object_entry_from_bitmap(), despite its non-bitmapped
counterpart add_object_entry(), in no way does check for whether --local
or --honor-pack-keep or --incremental should be respected. In
non-bitmapped codepath this is handled in want_object_in_pack(), but
bitmapped codepath has simply no such checking at all.

The bitmapped codepath however was allowing to pass in all those options
and with bitmap indices still being used under such conditions -
potentially giving wrong output (e.g. including objects from non-local or
.keep'ed pack).

We can easily fix this by noting the following: when an object comes to
add_object_entry_from_bitmap() it can come for two reasons:

1. entries coming from main pack covered by bitmap index, and
2. object coming from, possibly alternate, loose or other packs.

For "2" we always have pack not yet found by bitmap traversal code, and
thus we can simply reuse non-bitmapped want_object_in_pack() to find in
which pack an object lives and also for taking omitting decision.

For "1" we always have pack already found by bitmap traversal code and we
only need to check that pack for same criteria used in
want_object_in_pack() for found_pack.

Suggested-by: Junio C Hamano 
Discussed-with: Jeff King 
---
 builtin/pack-objects.c  |  39 +++
 t/t5310-pack-bitmaps.sh | 100 
 2 files changed, 139 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a2f8cfd..34b3019 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -987,6 +987,42 @@ static int want_object_in_pack(const unsigned char *sha1,
return 1;
 }
 
+/* Like want_object_in_pack() but for objects coming from-under bitmapped 
traversal */
+static int want_object_in_pack_bitmap(const unsigned char *sha1,
+ struct packed_git **found_pack,
+ off_t *found_offset)
+{
+   struct packed_git *p = *found_pack;
+
+   /*
+* There are two types of requests coming here:
+* 1. entries coming from main pack covered by bitmap index, and
+* 2. object coming from, possibly alternate, loose or other packs.
+*
+* For "1" we always have *found_pack != NULL passed here from
+* traverse_bitmap_commit_list(). (*found_pack is bitmap_git.pack
+* actually).
+*
+* For "2" we always have *found_pack == NULL passed here from
+* traverse_bitmap_commit_list() - since this is the way bitmap
+* traversal passes here "extended" bitmap entries.
+*/
+
+   /* objects not covered by bitmap */
+   if (!p)
+   return want_object_in_pack(sha1, 0, found_pack, found_offset);
+
+   /* objects covered by bitmap - we only have to check p wrt local and 
.keep */
+   if (incremental)
+   return 0;
+   if (local && !p->pack_local)
+   return 0;
+   if (ignore_packed_keep && p->pack_local && p->pack_keep)
+   return 0;
+
+   return 1;
+}
+
 static void create_object_entry(const unsigned char *sha1,
enum object_type type,
uint32_t hash,
@@ -1055,6 +1091,9 @@ static int add_object_entry_from_bitmap(const unsigned 
char *sha1,
if (have_duplicate_entry(sha1, 0, _pos))
return 0;
 
+   if (!want_object_in_pack_bitmap(sha1, , ))
+   return 0;
+
create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, 
offset);
 
display_progress(progress_state, nr_result);
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 3893afd..a76f6ca 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -118,6 +118,88 @@ test_expect_success 'incremental repack can disable 
bitmaps' '
git repack -d --no-write-bitmap-index
 '
 
+test_expect_success 'pack-objects respects --local (non-local loose)' '
+   mkdir -p alt_objects/pack &&
+   echo $(pwd)/alt_objects > .git/objects/info/alternates &&
+   echo content1 > file1 &&
+   objsha1=$(GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w file1) &&
+   git add file1 &&
+   test_tick &&
+   git commit -m commit_file1 &&
+   echo HEAD | \
+   git pack-objects --local --stdout --revs >1.pack &&
+   git index-pack 1.pack &&
+   git verify-pack -v 1.pack >1.objects &&
+   if egrep "^$objsha1" 1.objects; then
+   echo "Non-local object present in pack generated with --local: 
$objsha1"
+   return 1
+   fi
+'
+
+test_expect_success 'pack-objects respects --honor-pack-keep (local 
non-bitmapped pack)' '
+   echo content2 > file2 &&
+   objsha2=$(git hash-object -w 

Re: [PATCH v2 5/5] convert: add filter..process option

2016-07-29 Thread Lars Schneider

> On 28 Jul 2016, at 01:31, Jakub Narębski  wrote:
> 
> W dniu 2016-07-27 o 02:06, larsxschnei...@gmail.com pisze:
>> From: Lars Schneider 
>> 
>> Git's clean/smudge mechanism invokes an external filter process for every
>> single blob that is affected by a filter. If Git filters a lot of blobs
>> then the startup time of the external filter processes can become a
>> significant part of the overall Git execution time.
> 
> It is not strictly necessary... but do we have any benchmarks for this,
> or is it just the feeling?  That is, in what situations Git may filter
> a large number of files (initial checkout? initial add?, switching
> to unrelated branch? getting large files from LFS solution?, and when
> startup time might become significant part of execution time (MS Windows?
> fast filters?)?

All the operations you mentioned are slow because they all cause filter
process invocations. I benchmarked the new filter protocol and it is 
almost 70x faster when switching branches on my local machine (i7, SSD, 
OS X) with a test repository containing 12,000 files that need to be 
filtered. 
Details here: https://github.com/github/git-lfs/pull/1382

Based on my previous experience with Git filter invocations I expect even
more dramatic results on Windows (I will benchmark this, too, as soon as
the list agrees on this approach).


>> This patch adds the filter..process string option which, if used,
> 
> String option... what are possible values?  What happens if you use
> value that is not recognized by Git (it is "if used", isn't it)?  That's
> not obvious from the commit message (though it might be from the docs).

Then the process invocation will fail in the same way current filter
process invocations fail. If "filter..required" is set then
Git will fail, otherwise not. 


> What is missing is the description that it is set to a command, and
> how it interacts with `clean` and `smudge` options.

Right, I'll add that! I implemented it in a way that the 
"filter..clean" 
and "filter..smudge" take presence over "filter..process".

This allows the use of a `single-shot` clean filter and a `long-running` 
smudge as suggested by Junio in the v1 discussion (no ref, gmane down).


>> keeps the external filter process running and processes all blobs with
>> the following packet format (pkt-line) based protocol over standard input
>> and standard output.
>> 
>> Git starts the filter on first usage and expects a welcome
>> message, protocol version number, and filter capabilities
>> seperated by spaces:
> 
> s/seperated/separated/

Thanks!


> Is there any handling of misconfigured one-shot filters, or would
> they still hang the execution of a Git command?

They would still hang. Would it be sufficient to mention that in the
docs?


>> 
>> packet:  git< git-filter-protocol
>> packet:  git< version 2
>> packet:  git< clean smudge
> 
> Wouldn't "capabilities clean smudge" be better?  Or is it the
> "clean smudge" proposal easier to handle?

Good suggestion! This is easy to handle and I think it mimics
the pack-protocol a bit more closely.


>> 
>> Supported filter capabilities are "clean" and "smudge".
>> 
>> Afterwards Git sends a command (e.g. "smudge" or "clean" - based
>> on the supported capabilities), the filename, the content size as
>> ASCII number in bytes, and the content in packet format with a
>> flush packet at the end:
>> 
>> packet:  git> smudge
>> packet:  git> testfile.dat
> 
> And here we don't have any problems with files containing embedded
> newlines etc.  Also Git should not be sending invalid file names.
> The question remains: is it absolute file path, or basename?

It sends a path absolute in context of the Git repo (e.g. subdir/testfile.dat).
I'll add that to the docs and I'll add a test case to demonstrate it.

> 
>> packet:  git> 7
>> packet:  git> CONTENT
> 
> Can Git send file contents using more than one packet?  I think
> it should be stated upfront.

OK


>> packet:  git> 
>> 
> 
> Why we need to send content size upfront?  Well, it is not a problem
> for Git, but (as I wrote in reply to the cover letter for this
> series) might be a problem for filter scripts.

I think sending it upfront is nice for buffer allocations of big files
and it doesn't cost us anything to do it.


>> The filter is expected to respond with the result content size as
>> ASCII number in bytes and the result content in packet format with
>> a flush packet at the end:
>> 
>> packet:  git< 57
> 
> This is not neccessary (and might be hard for scripts to do) if
> pkt-line protocol is used.
> 
> In short: I think pkt-line is not worth the complication on
> the Git side and on the filter size, unless it is used for
> streaming, or at least filter not having to calculate output
> 

git bisect for reachable commits only

2016-07-29 Thread Oleg Taranenko
Hi all,


I just found an interesting post how git bisect can trouble humans.
Full story is here
https://groups.google.com/forum/#!topic/git-users/v3__t42qbKE


My suggestion to fix it (ditto)


What I suggest change logic of bisecting to something like


  git config bisect.reachable true


This behaviour will reduce bisecting only to reachable commits between
bad/good points and will never goes away.

Maybe if commit will be found at merge commit git bisect can suggest
open another round trip to looking into not reachable way, but anyway
it should be explicit confirmed by user.


BTW, if bad commit is merge point with more than 2 parents, how git
bisect will find "first wrong commit" at the current state?


Thanks for the attention,


Oleg
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Issue with global config defaults "user.useConfigOnly = true" + "pull.rebase = preserve" - "user.email"

2016-07-29 Thread Dakota Hawkins
Hello,

I have a question which may be a bug (I'm a bit skeptical), but here goes:

In my global .gitconfig, I have "user.useConfigOnly = true" and
user.email isn't set there (I prefer to be forced to set it on a
per-repo basis, as I use different emails for work and personal
repos). I ALSO have "pull.rebase = preserve" set.

An example of the problem I have is with tools like golang (I filed an
issue there, they closed it and suggested the problem is with git or
my config: https://github.com/golang/go/issues/16516#issuecomment-235800085)
that use git to pull in package repos without any real user
interaction. When something like that runs a git pull for me (to
update a package repo) my global config makes it try to rebase, which
fails because git doesn't know who I am.

With golang, at least, there's no help setting-up all of its cloned
repos with my user information in local configs, it's a manual step
and I have to repeat it anytime anything decided to clone a new repo
to use as a package, so it's at least persistently frustrating. Hence
the golang bug, which I think is still a bug because they should
handle this better.

In those cases specifically, I never have local commits that differ
from the remote, so a "pull --ff-only" should leave me in the same
state as a "pull --rebase".

Is this a case of rebase trying to make sure it has enough information
for me to be a committer before knowing whether I even need to rewrite
any commits, and could/should that be avoided? Alternatively (or also)
could/should rebase detect that a fast-forward is possible and prefer
to do that instead?

I would not be surprised to learn that there's something I don't know
or haven't considered that explains why those things aren't really
do-able, but if that's the case I'd be interested to know what they
are (I'd love to go back to the golang people and tell them to re-open
my bug because there's nothing wrong with git).

Thanks for your time, and please let me know if you'd like any
additional information.

- Dakota Hawkins
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-series: track changes to a patch series over time

2016-07-29 Thread Josh Triplett
On Fri, Jul 29, 2016 at 01:44:44PM +0100, Richard Ipsum wrote:
> On Fri, Jul 29, 2016 at 04:04:26AM -0700, Josh Triplett wrote:
> > I hope to use git notes with git-series in the future, by putting
> > another gitlink under the git-series for notes related to the series.
> > I'd intended that for more persistent notes; putting them in the series
> > solves some of the problems related to notes refs, pushing/pulling, and
> > collaboration.  Using notes for review comments makes sense as well,
> > whether in a series or in a separate ref.
> 
> Sounds interesting, can you explain how this works in more detail?

The tree within a git-series commit includes a blob "cover" for the
cover letter, a gitlink "base" for the base commit, and a gitlink
"series" for the top of the series.  I could add a gitlink "notes",
which acts like a notes ref; then, each version of the series would have
its own notes ref.  As with the series, git-series would track the
"history of history"; since git-notes themselves use git history to
store a set of notes, git-series would store the history of the notes.
So if you add, remove, or change a note, git-series would track that as
a change to the notes ref.  If you merge/rebase/etc the notes ref to
merge notes, git-series would track that too.  A different series would
have a different set of notes, so you wouldn't be limited to
one notes ref per repository.

This doesn't solve the problem of merging notes, but it *does* mean you
have a full history of the changes to notes, not just the notes
themselves.

Something similar might work for the Gerrit notesdb.

> > > I've been considering taking the perl-notedb prototype and writing
> > > a C library for it with bindings for other languages (i.e. Rust).
> > 
> > A C library based on libgit2 seems like a good idea; ideally the
> > bindings could interoperate with git2-rs.  (Alternatively, Rust can
> > *export* a C interface, so you could write directly with git2-rs. :) )
> 
> Certainly a fair alternative, though it may arguably be safer to write
> the C and export to other languages, as cool as Rust looks it's not
> established the way C is, so may be a slightly riskier foundation,
> in my view.

I was mostly joking there.  Rust makes that potentially reasonable,
unlike most languages that can consume but not easily provide a C API,
but that doesn't make it the ideal solution quite yet. :)

> And ofcourse in C we have native access to libgit2.

Right.

> > One of the items on my long-term TODO list is a completely federated
> > GitHub; I've been looking at other aspects of that, but federated
> > reviews/comments/etc seem critical to that as well.
>
> I agree.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 32/41] environment: add set_index_file()

2016-07-29 Thread Christian Couder
On Wed, Jul 27, 2016 at 5:14 PM, Duy Nguyen  wrote:
> On Tue, Jul 26, 2016 at 9:28 PM, Junio C Hamano  wrote:
>> Christian Couder  writes:
>>
>>> Introduce set_index_file() to be able to temporarily change the index file.
>>>
>>> It should be used like this:
>>>
>>> /* Save current index file */
>>> old_index_file = get_index_file();
>>> set_index_file((char *)tmp_index_file);
>>>
>>> /* Do stuff that will use tmp_index_file as the index file */
>>> ...
>>>
>>> /* When finished reset the index file */
>>> set_index_file(old_index_file);
>>
>> WHY is glaringly missing.
>
> It's used in a4aaa23 (builtin/am: use apply api in run_apply() -
> 2016-06-27) as a straight conversion of "GIT_INDEX_FILE=xxx git apply"
> .

Yeah, I would add something like the following to clarify the WHY:

"It is intended to be used by builtins commands to temporarily change the
index file used by libified code.

This is useful when libified code uses the global index, but a builtin
command wants another index file to be used instead."

>> Doesn't calling this have a global effect that is flowned upon when
>> used by a library-ish function?  Who is the expected caller in this
>> series that wants to "libify" a part of the system?

The expected caller is a builtin, like "builtin/am.c".

> I agree we should avoid this. There's a bunch of cache_name_pos() (and
> even read_cache()) in the libified apply.c, those will need to be
> converted  to take struct index_state* directly (read_index_from or
> index_name_pos).

There is a lot of other "libified" code that is calling these functions:

$ git grep -l cache_name_pos -- '*.c' | grep -v builtin
apply.c
diff.c
dir.c
merge-recursive.c
pathspec.c
rerere.c
sha1_name.c
submodule.c
wt-status.c

$ git grep -l read_cache -- '*.c' | grep -v builtin | egrep -v '^t/'
apply.c
check-racy.c
diff.c
dir.c
merge-recursive.c
merge.c
read-cache.c
rerere.c
revision.c
sequencer.c
sha1_name.c
submodule.c

and some of that code is perhaps directly and indirectly called by the
libified apply code.

Also I am not sure that read_cache() and cache_name_pos() are the only
functions that should be changed.

So it looks like it is a very big and different project to make the
current libified code be explicit about which index it is using.
And by the way perhaps this other project should just remove the
"the_index" global altogether.

> But at least read-cache.c has supported multiple
> indexes for a long time.

This is great, but it is unfortunately not the only lib file involved.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [[PATCH v2] 2/4] patch-ids: replace the seen indicator with a commit pointer

2016-07-29 Thread Junio C Hamano
Kevin Willford  writes:

> From: Kevin Willford 
>
> The cherry_pick_list was looping through the original side checking the
> seen indicator and setting the cherry_flag on the commit.  If we save
> off the commit in the patch_id we can set the cherry_flag on the correct
> commit when running through the other side when a patch_id match is found.
>
> Signed-off-by: Kevin Willford 
> ---
>  patch-ids.c |  1 +
>  patch-ids.h |  2 +-
>  revision.c  | 18 +++---
>  3 files changed, 5 insertions(+), 16 deletions(-)

We lose the final loop to fix up the shorter side, because the
second loop now marks both sides.

And as a side effect, we do not use commit->util which is a scarce
shared resource (aka historical API wart).

Makes sense.  Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [[PATCH v2] 0/4] Use header data patch ids for rebase to avoid loading file content

2016-07-29 Thread Junio C Hamano
Kevin Willford  writes:

> This patch series is to remove the hand rolled hashmap in the patch_ids
> and use the hashmap.h implementation.  It also introduces the idea of having
> a header only patch id so that the contents of the files do not have to be
> loaded in order to determine if two commits are different.

> Subject: Re: [[PATCH v2] 0/4] Use header data patch ids for rebase to avoid 
> loading file content

Did you do "format-patch --subject-prefix='[PATCH v2]'" or something
like that?  When applied, that would result in a commit title like
this:

1/4] patch-ids: stop using a ...

because we stop at the first ']', and we do not bother to count up
to "matching bracket".

git format-patch --subject-prefix='PATCH v2'

or even better

git format-patch -v2

would have been more appropriate.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [[PATCH v2] 1/4] patch-ids: stop using a hand-rolled hashmap implementation

2016-07-29 Thread Junio C Hamano
Kevin Willford  writes:

> From: Kevin Willford 
>
> This change will use the hashmap from the hashmap.h to keep track of the
> patch_ids that have been encountered instead of using an internal
> implementation.  This simplifies the implementation of the patch ids.
>
> Signed-off-by: Kevin Willford 
> ---
>  patch-ids.c | 86 
> +
>  patch-ids.h |  7 +++--
>  2 files changed, 32 insertions(+), 61 deletions(-)

The patch text itself is almost unreadble because of a lot of
verbose code it had to carry before this change, and the removal of
that unreadable code of course is the point of this very welcome
clean-up ;-).  The resulting code is very readable.

>  struct patch_id *has_commit_patch_id(struct commit *commit,
>struct patch_ids *ids)
>  {
> - return add_commit(commit, ids, 1);
> + struct patch_id patch;
> +
> + memset(, 0, sizeof(patch));
> + if (init_patch_id_entry(, commit, ids))
> + return NULL;
> + return hashmap_get(>patches, , NULL);
>  }
>  
>  struct patch_id *add_commit_patch_id(struct commit *commit,
>struct patch_ids *ids)
>  {
> - return add_commit(commit, ids, 0);
> + struct patch_id *key = xcalloc(1, sizeof(*key));
> +
> + if (init_patch_id_entry(key, commit, ids)) {
> + free(key);
> + return NULL;
> + }

This is a tangent, but this made me wonder if it is safe to simply
free(3) the result of calling hashmap_entry_init() which is called
in init_patch_id_entry().  It would obviously become a resource
leak, if a hashmap_entry (which the api documentation says is "an
opaque structure") holds any allocated resource.

The fact that hashmap_entry_init() is there but there is no
corresponding hashmap_entry_clear() hints that there is nothing to
be worried about and I can see from the implementation of
hashmap_entry_init() that no extra resource is held inside, but an
API user should not have to guess.  We may want to do one of the two
things:

 * document that an embedded hashmap_entry does not hold any
   resource that need to be released and it is safe to free the user
   structure that embeds one; or

 * implement hashmap_entry_clear() that currently is a no-op.

If we anticipate that the hashmap implementation may gain more
fields in this "opaque" structure, the latter might be a more
future-proof approach, as all the callers of hashmap_entry_init()
would already be calling hashmap_entry_clear() to clean it up when
such a change to the hashmap implementation happens.  On the other
hand, a caller that does not call hashmap_entry_clear() would not be
noticed by anybody as leaking resources until such a change happens,
so the future-proofing may not have much practical value (iow, the
existing callers of _init() would need to be audited anyway to make
sure they also call _clear()).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 32/41] environment: add set_index_file()

2016-07-29 Thread Duy Nguyen
On Fri, Jul 29, 2016 at 4:21 PM, Christian Couder
 wrote:
>> I agree we should avoid this. There's a bunch of cache_name_pos() (and
>> even read_cache()) in the libified apply.c, those will need to be
>> converted  to take struct index_state* directly (read_index_from or
>> index_name_pos).
>
> There is a lot of other "libified" code that is calling these functions:
>
> $ git grep -l cache_name_pos -- '*.c' | grep -v builtin
> apply.c
> diff.c
> dir.c
> merge-recursive.c
> pathspec.c
> rerere.c
> sha1_name.c
> submodule.c
> wt-status.c

Irrelevant?

> $ git grep -l read_cache -- '*.c' | grep -v builtin | egrep -v '^t/'
> apply.c
> check-racy.c
> diff.c
> dir.c
> merge-recursive.c
> merge.c
> read-cache.c
> rerere.c
> revision.c
> sequencer.c
> sha1_name.c
> submodule.c
>
> and some of that code is perhaps directly and indirectly called by the
> libified apply code.

Yeah. If the libification movement is going strong, we can start
converting and at some point should be able to define
NO_THE_INDEX_COMPATIBILITY_MACROS globally (and avoid the_index along
the way)

Without that, there is a risk. I looked at 'nm apply.o | grep ' [Uu]
'' and I don't see any external functions that would potentially
access the index, except ll_merge. Again there's a good chance I may
have missed something.

> So it looks like it is a very big and different project to make the
> current libified code be explicit about which index it is using.
> And by the way perhaps this other project should just remove the
> "the_index" global altogether.

This is probably the way to go. But it's the boring sort of work that
nobody wants to do :(
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] diff-highlight: add support for git log --graph output.

2016-07-29 Thread Brian Henderson
(resending as thread instead of attachments)

Hi, I've been working with Jeff King on this patch, but he encouraged me to
email the list.

I recently learned about the diff-highlight tool, and find it very helpful,
however, I frequently use the --graph option to git log which breaks the tool.
This patch looks to fix this.

I wanted to try and add some tests as well, but I was unsure what number to
pick for the second digit. As there were limited tests in the contrib directory
(only t93xx and one t7900) I just chose the next number in the 9xxx range.
Please let me know if I need to change it.

I'm also not super happy about my test case for the graph option. If anyone has
any better ideas, please let me know!

Brian Henderson (3):
  diff-highlight: add some tests.
  diff-highlight: add failing test for handling --graph output.
  diff-highlight: add support for --graph output.

 contrib/diff-highlight/Makefile  |   5 +
 contrib/diff-highlight/diff-highlight|  13 +--
 contrib/diff-highlight/t/Makefile|  19 
 contrib/diff-highlight/t/t9400-diff-highlight.sh |  76 
 contrib/diff-highlight/t/test-diff-highlight.sh  | 111 +++
 5 files changed, 218 insertions(+), 6 deletions(-)
 create mode 100644 contrib/diff-highlight/Makefile
 create mode 100644 contrib/diff-highlight/t/Makefile
 create mode 100644 contrib/diff-highlight/t/t9400-diff-highlight.sh
 create mode 100644 contrib/diff-highlight/t/test-diff-highlight.sh

-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 0/7] Add possibility to clone specific subdirectories

2016-07-29 Thread Duy Nguyen
On Thu, Jul 28, 2016 at 10:33 PM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
>
>>> 4. Fsck complains about missing blobs. Should be fairly easy to fix.
>>
>> Not really. You'll have to associate path information with blobs
>> before you decide that a blob should exist or not.
>
> Also the same blob or the tree can exist both inside and outside the
> narrowed area, as people reorganize their trees all the time.  I am
> not quite convinced a path-based approach (either yours or Robin's)
> is workable in the longer term.

I think it should be ok. What I meant was when we travel the trees to
find connected blobs, if a tree points to paths outside the narrowed
area, we do not add those blobs to our fsck list. Trees inside the
narrow area work as usually so those shared blobs are added to fsck
list anyway. Object islands are going to be problem (because we can't
assign paths to them)...
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] gitweb: escape link body in format_ref_marker

2016-07-29 Thread Andreas Brauchli
Fix a case where an html link can be generated from unescaped input
resulting in invalid strict xhtml or potentially injected code.

An overview of a repo with a tag "1.0.0&0.0.1" would previously result
in an unescaped amperstand in the link body.

Signed-off-by: Andreas Brauchli 
---
 gitweb/gitweb.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 2fddf75..33d701d 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2090,7 +2090,7 @@ sub format_ref_marker {
-href => href(
action=>$dest_action,
hash=>$dest
-   )}, $name);
+   )}, esc_html($name));

$markers .= " " .
$link . "";
-- 
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/7] pack-objects: use mru list when iterating over packs

2016-07-29 Thread Junio C Hamano
Jeff King  writes:

> On Fri, Jul 29, 2016 at 12:15:24AM -0400, Jeff King wrote:
>
> But because this series switches the order of pack-lookup between
> objects, it is possible for us to find a `B` which is a delta against
> `A` in one pack, and then another copy of `A` which is a delta against
> another copy of `B` from another pack. We add both of the deltas to our
> packing list, but at write time when we try to write out all of the
> bases for `A`, we realize that whoops, we are recursing infinitely.
>
> As it turns out, Git actually handles this pretty well! Upon noticing
> the recursion, it breaks the delta chain and writes out one of the
> objects as a full base. This is due to Junio's f63c79d (pack-object:
> tolerate broken packs that have duplicated objects, 2011-11-16), though
> I think we later decided that duplicated objects were simply insane.
>
> So one option is to simply silence the warning, because the resulting
> pack is perfectly fine.

Thanks for a great analysis.

My gut feeling is that keeping the warning is preferred if possible,
because f63c79db (pack-object: tolerate broken packs that have
duplicated objects, 2011-11-16) was made as the last ditch effort to
warn about the presence of the problem in the delta-base selection
code without harming the users.

> So it's possible that the resulting pack
> is not as small as it could be (i.e., we break the chain with a base
> object, but it's possible if we looked that we could have broken the
> chain by making a delta against an existing base object). So I wonder if
> it's possible to detect this case earlier, during the "can we reuse this
> delta" bits of check_object().

I'd let the issue simmer in my mind a bit for now, as I do not
think of a neat trick offhand.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/5] Git filter protocol

2016-07-29 Thread Jeff King
On Fri, Jul 29, 2016 at 10:14:17AM +0200, Lars Schneider wrote:

> My current implementation supports only two cases. Either the filter
> knows the size and sends it back. Or the filter doesn't know the size
> and Git reads until the flush packet (your "unknown" case). "Approx" is 
> probably hard to do and fail shouldn't be part of the size, no?

Ah, OK, I missed that you could handle both cases. I think that is a
reasonable approach. It means the filter has to bother with pkt-lines,
but beyond that, it can choose the simple or streaming approach as
appropriate.

> That being said a "fail" response is a very good idea! This allows
> the filter to communicate to git that a non required filter process
> failed. I will add that to the protocol. Thanks :) 

Maybe just send "ok ", "ok -1" (for streaming), or "fail "
followed by the content? That is similar to other Git protocols, though
I am not sure they are good models for sanity or extensibility. :)

I don't know if you would want to leave room for other "headers" in the
response, but you could also do something more HTTP-like, with a status
code, and arbitrary headers. And presumably git would just ignore
headers it doesn't know about. I think that's what Jakub's example was
leaning towards. I'm just not sure what other headers are really useful,
but it does leave room for extensibility.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Small trivial annoyance with the nice new builtin "git am"

2016-07-29 Thread Jeff King
On Thu, Jul 28, 2016 at 05:37:08PM -0700, Linus Torvalds wrote:

> > and then to sprinkle calls liberally through builtin-ified programs when
> > they move from one unit of work to the next.
> 
> Maybe we can just add it to the end of commit_tree_extended(), and
> just say "the cache is reset between commits".
> 
> That way there is no sprinking in random places.

Hmm, yeah, that might work. As you mentioned, there are cases where we
really do want the timestamps to match (especially between author and
committer). So we would not want this reset to kick in where callers
would not want it.

So I'm trying to play devil's advocate and think of a case where
somebody would not want the time reset after creating a commit.

One obvious impact would be reflog entries, since we would reset the
time between the object creation and the ref write (so your reflog entry
would sometimes be a second or more after the commit time it writes).
I'm not sure how much anybody _cares_ about that; they're much less
intimate than author/committer times.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] Simplify test t3700-add.sh

2016-07-29 Thread Ingo Brückl
Utilize the test_mode_in_index helper function.

Signed-off-by: Ingo Brückl 
---
 t/t3700-add.sh | 30 ++
 1 file changed, 6 insertions(+), 24 deletions(-)

diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index c08ec9e..0e21a52 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -25,18 +25,12 @@ test_expect_success \
 echo foo >xfoo1 &&
 chmod 755 xfoo1 &&
 git add xfoo1 &&
-case "$(git ls-files --stage xfoo1)" in
-100644" "*xfoo1) echo pass;;
-*) echo fail; git ls-files --stage xfoo1; (exit 1);;
-esac'
+test_mode_in_index 100644 xfoo1'

 test_expect_success 'git add: filemode=0 should not get confused by symlink' '
rm -f xfoo1 &&
test_ln_s_add foo xfoo1 &&
-   case "$(git ls-files --stage xfoo1)" in
-   12" "*xfoo1) echo pass;;
-   *) echo fail; git ls-files --stage xfoo1; (exit 1);;
-   esac
+   test_mode_in_index 12 xfoo1
 '

 test_expect_success \
@@ -45,28 +39,19 @@ test_expect_success \
 echo foo >xfoo2 &&
 chmod 755 xfoo2 &&
 git update-index --add xfoo2 &&
-case "$(git ls-files --stage xfoo2)" in
-100644" "*xfoo2) echo pass;;
-*) echo fail; git ls-files --stage xfoo2; (exit 1);;
-esac'
+test_mode_in_index 100644 xfoo2'

 test_expect_success 'git add: filemode=0 should not get confused by symlink' '
rm -f xfoo2 &&
test_ln_s_add foo xfoo2 &&
-   case "$(git ls-files --stage xfoo2)" in
-   12" "*xfoo2) echo pass;;
-   *) echo fail; git ls-files --stage xfoo2; (exit 1);;
-   esac
+   test_mode_in_index 12 xfoo2
 '

 test_expect_success \
'git update-index --add: Test that executable bit is not used...' \
'git config core.filemode 0 &&
 test_ln_s_add xfoo2 xfoo3 &&   # runs git update-index --add
-case "$(git ls-files --stage xfoo3)" in
-12" "*xfoo3) echo pass;;
-*) echo fail; git ls-files --stage xfoo3; (exit 1);;
-esac'
+test_mode_in_index 12 xfoo3'

 test_expect_success '.gitignore test setup' '
echo "*.ig" >.gitignore &&
@@ -347,10 +332,7 @@ test_expect_success POSIXPERM,SYMLINKS 'git add --chmod=+x 
with symlinks' '
rm -f foo2 &&
echo foo >foo2 &&
git add --chmod=+x foo2 &&
-   case "$(git ls-files --stage foo2)" in
-   100755" "*foo2) echo pass;;
-   *) echo fail; git ls-files --stage foo2; (exit 1);;
-   esac
+   test_mode_in_index 100755 foo2
 '

 test_done
--
2.9.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [[PATCH v2] 1/4] patch-ids: stop using a hand-rolled hashmap implementation

2016-07-29 Thread Junio C Hamano
Kevin Willford  writes:

> +static int patch_id_cmp(struct patch_id *a,
> + struct patch_id *b,
> + void *keydata)
>  {
> + return hashcmp(a->patch_id, b->patch_id);
>  }
>  
>  int init_patch_ids(struct patch_ids *ids)
>  {
>   memset(ids, 0, sizeof(*ids));
>   diff_setup(>diffopts);
>   DIFF_OPT_SET(>diffopts, RECURSIVE);
>   diff_setup_done(>diffopts);
> + hashmap_init(>patches, (hashmap_cmp_fn)patch_id_cmp, 256);
>   return 0;
>  }

This is a tangent, and I do not suggest to change patch 1/4 to flip
the style, but I am not sure if this is a good style, or casting it
the other way around is better from the type-checking point of view,
i.e.

static int cmp_fn(const void *a_, const void *b_, const void *keydata)
{
struct patch_id *a = a_;
struct patch_id *b = b_;
return hashcmp(a->patch_id, b->patch_id);
}

...
hashmap_init(..., cmp_fn, ...);
...

I see many existing calls to hashmap_init() follow this pattern, so
as I said, patch 1/4 is fine as-is.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] Enhance test t3700-add.sh

2016-07-29 Thread Junio C Hamano
Ingo Brückl  writes:

> Subject: Re: [PATCH v2 1/3] Enhance test t3700-add.sh
>
> The files to be tested may already exist and be links which will make
> the test fail. So ensure that we are working in a clean environment.

"Enhance", like "Update" or "Change", is a more-or-less useless
noiseword in a commit title; it does not say much about in what way
it was enhanced.

If we clarify the body of the explanation, a better word emerges, I
would think.  How about this instead?

t3700: remove unwanted leftover files before running new tests

When an earlier test that has prerequisite is skipped, files
used by later tests may be left in the working tree in an
unexpected state.  For example, a test runs this sequence:

echo foo >xfoo1 && chmod 755 xfoo1

to create an executable file xfoo1, expecting that xfoo1
does not exist before it runs in the test sequence.
However, the absense of this file depends on "git reset
--hard" done in an earlier test, that is skipped when SANITY
prerequisite is not met, and worse yet, xfoo1 originally is
created as a symbolic link, which means the chmod does not
affect the modes of xfoo1 as this test expects.

Fix this by starting the test with "rm -f xfoo1" to make
sure the file is created from scratch, and do the same to
other similar tests.

>
> Signed-off-by: Ingo Brückl 
> ---
>  t/t3700-add.sh | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/t/t3700-add.sh b/t/t3700-add.sh
> index 4865304..494f5b8 100755
> --- a/t/t3700-add.sh
> +++ b/t/t3700-add.sh
> @@ -333,6 +333,7 @@ test_expect_success 'git add --dry-run --ignore-missing 
> of non-existing file out
>  '
>
>  test_expect_success 'git add --chmod=+x stages a non-executable file with 
> +x' '
> + rm -f foo1 &&
>   echo foo >foo1 &&
>   git add --chmod=+x foo1 &&
>   case "$(git ls-files --stage foo1)" in
> @@ -342,6 +343,7 @@ test_expect_success 'git add --chmod=+x stages a 
> non-executable file with +x' '
>  '
>
>  test_expect_success 'git add --chmod=-x stages an executable file with -x' '
> + rm -f xfoo1 &&
>   echo foo >xfoo1 &&
>   chmod 755 xfoo1 &&
>   git add --chmod=-x xfoo1 &&
> @@ -354,6 +356,7 @@ test_expect_success 'git add --chmod=-x stages an 
> executable file with -x' '
>  test_expect_success POSIXPERM,SYMLINKS 'git add --chmod=+x with symlinks' '
>   git config core.filemode 1 &&
>   git config core.symlinks 1 &&
> + rm -f foo2 &&
>   echo foo >foo2 &&
>   git add --chmod=+x foo2 &&
>   case "$(git ls-files --stage foo2)" in
> --
> 2.9.2
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with global config defaults "user.useConfigOnly = true" + "pull.rebase = preserve" - "user.email"

2016-07-29 Thread Jeff King
On Fri, Jul 29, 2016 at 11:37:59AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > ... So I do think there may be a bug to be fixed,
> > but it is simply commands being over-eager to make sure we have an
> > ident when they might not need it.
> 
> 36267854 (pull: fast-forward "pull --rebase=true", 2016-06-29) may
> be a part of a good solution for that, perhaps?

Hmm. So the solution I came up with is below, and I think is the "most
correct" in the sense that it leaves the check to the low-level code
which actually creates commits, and so we know that we will absolutely
only ever complain when we actually need an ident.

But I could also see an argument along the lines of: if you do not have
an ident set up, you don't really have any business running "git rebase
-i" in the first place, and it is OK for us to complain even there's a
chance your rebase might be a noop. So we should prefer consistency over
absolute flexibility in rebase, and just fix the common case of invoking
"rebase" at all when we know it is a fast-forward.

Which is what your patch is doing. I can buy that argument in general
for rebase, though it is a little funny that "git pull --rebase" would
then behave differently than "git fetch && git rebase".

There's a middle ground, too, which goes something like: it is OK to
invoke "rebase" (interactive or not) without a valid ident if you have
no commits to rebase. But as soon as we know there are commits to pick,
we should do the ident check and bail, canceling the rebase entirely.

That seems to be what happens in the non-interactive case (we call "git
am --rebasing" and it bails immediately without leaving any state).
Implementing that would mean shifting rebase--interactive's ident check
to the right spot (after seeing the insn sheet has entries) and bailing
appropriately.

TBH, I'm not sure anybody cares that much between the three. Even before
user.useConfigOnly, this could be an issue on machines where the ident
could not be auto-configured, and it seems like nobody ran across it.
It's only the funny interaction with pull.rebase that makes it likely to
come up, so as long as that code path is fixed (one way or another), I
doubt anybody would bring it up again.

Anyway, here's my patch.

-- >8 --
Subject: rebase-interactive: drop early check for valid ident

Since the very inception of interactive-rebase in 1b1dce4
(Teach rebase an interactive mode, 2007-06-25), there has
been a preemptive check, before looking at any commits, to
see whether the user has a valid name/email combination.

This is convenient, because it means that we abort the
operation before even beginning (rather than just
complaining that we are unable to pick a particular commit).

However, it does the wrong thing when the rebase does not
actually need to generate any new commits (e.g., a
fast-forward with no commits to pick, or one where the base
stays the same, and we just pick the same commits without
rewriting anything). In this case it may complain about the
lack of ident, even though one would not be needed to
complete the operation.

This may seem like mere nit-picking, but because interactive
rebase underlies the "preserve-merges" rebase, somebody who
has set "pull.rebase" to "preserve" cannot make even a
fast-forward pull without a valid ident, as we bail before
even realizing the fast-forward nature.

This commit drops the extra ident check entirely. This means
we rely on individual commands that generate commit objects
to complain. So we will continue to notice and prevent cases
that actually do create commits, but with one important
difference: we fail while actually executing the "pick"
operations, and leave the rebase in a conflicted, half-done
state.

In some ways this is less convenient, but in some ways it is
more so; the user can then manually commit or even "git
rebase --continue" after setting up their ident (or
providing it as a one-off on the command line).

Reported-by: Dakota Hawkins 
Signed-off-by: Jeff King 
---
 git-rebase--interactive.sh |  3 ---
 t/t7517-per-repo-email.sh  | 47 ++
 2 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index ded4595..f0f4777 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -1180,9 +1180,6 @@ To continue rebase after editing, run:
;;
 esac
 
-git var GIT_COMMITTER_IDENT >/dev/null ||
-   die "$(gettext "You need to set your committer info first")"
-
 comment_for_reflog start
 
 if test ! -z "$switch_to"
diff --git a/t/t7517-per-repo-email.sh b/t/t7517-per-repo-email.sh
index 337e6e3..2a22fa7 100755
--- a/t/t7517-per-repo-email.sh
+++ b/t/t7517-per-repo-email.sh
@@ -36,4 +36,51 @@ test_expect_success 'succeeds cloning if global email is not 
set' '
git clone . clone
 '
 
+test_expect_success 'set up rebase scenarios' '
+   # temporarily 

Re: Issue with global config defaults "user.useConfigOnly = true" + "pull.rebase = preserve" - "user.email"

2016-07-29 Thread Junio C Hamano
Jeff King  writes:

> TBH, I'm not sure anybody cares that much between the three. Even before
> user.useConfigOnly, this could be an issue on machines where the ident
> could not be auto-configured, and it seems like nobody ran across it.
> It's only the funny interaction with pull.rebase that makes it likely to
> come up, so as long as that code path is fixed (one way or another), I
> doubt anybody would bring it up again.

Yup, I do not think the choice among the three would make all that
much difference in practice.  If I really have to pick one of them,
I think the one in your message I am responding to would make the
most sense.

The one I sent, which I wrote as a response to some end-user request
on the list back then, has been sitting on 'pu' for quite a while
because I didn't see a real use or positive support for it, and the
only reason why I sent it is because this might be that one
real use it wanted to see.


> In some ways this is less convenient, but in some ways it is
> more so; the user can then manually commit or even "git
> rebase --continue" after setting up their ident (or
> providing it as a one-off on the command line).

Yup, that is the controvercial bit, and I suspect Dscho's original
was siding for the "set up ident first, as you will need it anyway
eventually", so I'll let others with viewpoints different from us to
chime in first before picking it up.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Does anything document --date=tea/yesterday/noon/midnight/never/now etc.?

2016-07-29 Thread Ævar Arnfjörð Bjarmason
These were originally added by Linus back in v0.99.9-230-ga8aca41, a
bit was added to them since then. See
https://github.com/git/git/blob/v2.9.2/date.c#L927

The v0.99.9-214-g3c07b1d commit also has some info on the warty edge
cases these have to deal with.

I was checking what documented "git commit --amend --date=now", but
Documentation/date-formats.txt only mentions ISO/RFC formats.

I thought I'd just submit a trivial patch to document this but a)
maybe it's in some mysterious part of the docs I've missed b) after
reading v0.99.9-214-g3c07b1d and some of the date.c code it looks like
it won't be that trivial to describe this.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] convert: add filter..process option

2016-07-29 Thread Jakub Narębski
W dniu 2016-07-29 o 19:35, Junio C Hamano pisze:
> Lars Schneider  writes:
> 
>> I think sending it upfront is nice for buffer allocations of big files
>> and it doesn't cost us anything to do it.
> 
> While I do NOT think "total size upfront" MUST BE avoided at all costs,
> I do not think the above statement to justify it makes ANY sense.
> 
> Big files are by definition something you cannot afford to hold its
> entirety in core, so you do not want to be told that you'd be fed 40GB
> and ask xmalloc to allocate that much.

I don't know much how filter driver work internally, but in some cases
Git reads or writes from file (file descriptor), in other cases it reads
or writes from str+len pair (it probably predates strbuf) - I think in
those cases file needs to fit in memory (in size_t).  So in some cases
Git reads file into memory.  Whether it uses xmalloc or mmap, I don't
know.

> 
> It allows the reader to be lazy for buffer allocations as long as
> you know the file fits in-core, at the cost of forcing the writer to
> somehow come up with the total number of bytes even before sending a
> single byte (in other words, if the writer cannot produce and hold
> the data in-core, it may even have to spool the data in a temporary
> file only to count, and then play it back after showing the total
> size).

For some types of filters you can know the size upfront:
 - for filters such as rot13, with 1-to-1 transformation, you know
   that the output size is the same as the input size
 - for block encodings, and for constant-width to constant-width
   encoding conversion, filter can calculate output size from the
   input size (e.g.  = 2*)
 - filter may have get size from somewhere, for example LFS filter
   stub is constant size, and files are stored in artifactory with
   their length 

> 
> It is good that you allow both mode of operations and the size of
> the data can either be given upfront (which allows a single fixed
> allocation upfront without realloc, as long as the data fits in
> core), or be left "(atend)".

I think the protocol should be either:  + , or
 +  + , that is do not use flush
packet if size is known upfront -- it would be a second point
of truth (SPOT principle).
 
> I just don't want to see it oversold as a "feature" that the size
> has to come before data.  That is a limitation, not a feature.
> 
> Thanks.
> 


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] convert: add filter..process option

2016-07-29 Thread Lars Schneider

> On 30 Jul 2016, at 01:11, Jakub Narębski  wrote:
> 
> W dniu 2016-07-29 o 19:35, Junio C Hamano pisze:
>> Lars Schneider  writes:
>> 
>>> I think sending it upfront is nice for buffer allocations of big files
>>> and it doesn't cost us anything to do it.
>> 
>> While I do NOT think "total size upfront" MUST BE avoided at all costs,
>> I do not think the above statement to justify it makes ANY sense.
>> 
>> Big files are by definition something you cannot afford to hold its
>> entirety in core, so you do not want to be told that you'd be fed 40GB
>> and ask xmalloc to allocate that much.
> 
> I don't know much how filter driver work internally, but in some cases
> Git reads or writes from file (file descriptor), in other cases it reads
> or writes from str+len pair (it probably predates strbuf) - I think in
> those cases file needs to fit in memory (in size_t).  So in some cases
> Git reads file into memory.  Whether it uses xmalloc or mmap, I don't
> know.
> 
>> 
>> It allows the reader to be lazy for buffer allocations as long as
>> you know the file fits in-core, at the cost of forcing the writer to
>> somehow come up with the total number of bytes even before sending a
>> single byte (in other words, if the writer cannot produce and hold
>> the data in-core, it may even have to spool the data in a temporary
>> file only to count, and then play it back after showing the total
>> size).
> 
> For some types of filters you can know the size upfront:
> - for filters such as rot13, with 1-to-1 transformation, you know
>   that the output size is the same as the input size
> - for block encodings, and for constant-width to constant-width
>   encoding conversion, filter can calculate output size from the
>   input size (e.g.  = 2*)
> - filter may have get size from somewhere, for example LFS filter
>   stub is constant size, and files are stored in artifactory with
>   their length 
> 
>> 
>> It is good that you allow both mode of operations and the size of
>> the data can either be given upfront (which allows a single fixed
>> allocation upfront without realloc, as long as the data fits in
>> core), or be left "(atend)".
> 
> I think the protocol should be either:  + , or
>  +  + , that is do not use flush
> packet if size is known upfront -- it would be a second point
> of truth (SPOT principle).

As I mentioned elsewhere a  packet is always send right now.
I have no strong opinion if this is good or bad. The implementation
was a little bit simpler and that's why I did it. I will implement 
whatever option the majority prefers :-)

Cheers,
Lars

> 
>> I just don't want to see it oversold as a "feature" that the size
>> has to come before data.  That is a limitation, not a feature.
>> 
>> Thanks.
>> 
> 

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


What's cooking in git.git (Jul 2016, #09; Fri, 29)

2016-07-29 Thread Junio C Hamano
What's cooking in git.git (Jul 2016, #09; Fri, 29)
--

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

On the 'master' front, the individual commit count now exceeds 400
since the last major release, which is a good pace.  We may want to
start slowing down once the current crop of topics in 'next' hits
the 'master' and switch our attention to regression hunting.  The
'maint' branch has been accumulating enough material to make it the
next maintenance release v2.9.3.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* ew/daemon-socket-keepalive (2016-07-22) 2 commits
  (merged to 'next' on 2016-07-22 at d39c827)
 + Windows: add missing definition of ENOTSOCK
  (merged to 'next' on 2016-07-19 at 0140849)
 + daemon: ignore ENOTSOCK from setsockopt

 Recent update to "git daemon" tries to enable the socket-level
 KEEPALIVE, but when it is spawned via inetd, the standard input
 file descriptor may not necessarily be connected to a socket.
 Suppress an ENOTSOCK error from setsockopt().


* ew/find-perl-on-freebsd-in-local (2016-07-26) 1 commit
  (merged to 'next' on 2016-07-26 at f76a962)
 + config.mak.uname: correct perl path on FreeBSD

 Recent FreeBSD stopped making perl available at /usr/bin/perl;
 switch the default the built-in path to /usr/local/bin/perl on not
 too ancient FreeBSD releases.


* js/rebase-i-tests (2016-07-07) 3 commits
  (merged to 'next' on 2016-07-13 at b06b28f)
 + rebase -i: we allow extra spaces after fixup!/squash!
 + rebase -i: demonstrate a bug with --autosquash
 + t3404: add a test for the --gpg-sign option

 A few tests that specifically target "git rebase -i" have been
 added.


* nd/pack-ofs-4gb-limit (2016-07-13) 7 commits
  (merged to 'next' on 2016-07-13 at 91e217d)
 + fsck: use streaming interface for large blobs in pack
 + pack-objects: do not truncate result in-pack object size on 32-bit systems
 + index-pack: correct "offset" type in unpack_entry_data()
 + index-pack: report correct bad object offsets even if they are large
 + index-pack: correct "len" type in unpack_data()
 + sha1_file.c: use type off_t* for object_info->disk_sizep
 + pack-objects: pass length to check_pack_crc() without truncation

 "git pack-objects" and "git index-pack" mostly operate with off_t
 when talking about the offset of objects in a packfile, but there
 were a handful of places that used "unsigned long" to hold that
 value, leading to an unintended truncation.


* nd/worktree-lock (2016-07-08) 6 commits
  (merged to 'next' on 2016-07-13 at c768a85)
 + worktree.c: find_worktree() search by path suffix
 + worktree: add "unlock" command
 + worktree: add "lock" command
 + worktree.c: add is_worktree_locked()
 + worktree.c: add is_main_worktree()
 + worktree.c: add find_worktree()

 "git worktree prune" protected worktrees that are marked as
 "locked" by creating a file in a known location.  "git worktree"
 command learned a dedicated command pair to create and remove such
 a file, so that the users do not have to do this with editor.


* rs/notes-merge-no-toctou (2016-07-07) 1 commit
  (merged to 'next' on 2016-07-13 at f08b530)
 + notes-merge: use O_EXCL to avoid overwriting existing files

 "git notes merge" had a code to see if a path exists (and fails if
 it does) and then open the path for writing (when it doesn't).
 Replace it with open with O_EXCL.


* sb/submodule-deinit-all (2016-07-26) 1 commit
  (merged to 'next' on 2016-07-26 at ca0b067)
 + submodule deinit: remove outdated comment

 A comment update for a topic that was merged to Git v2.8.

--
[New Topics]

* da/subtree-modernize (2016-07-27) 2 commits
 - subtree: adjust function definitions to match CodingGuidelines
 - subtree: adjust style to match CodingGuidelines
 (this branch uses da/subtree-2.9-regression.)

 Style fixes for "git subtree" (in contrib/).

 Will merge to 'next'.


* js/rebase-i-progress-tidy (2016-07-28) 1 commit
 - rebase-interactive: trim leading whitespace from progress count

 Regression fix for an i18n topic already in 'master'.

 Will merge to 'next'.


* jk/t4205-cleanup (2016-07-27) 2 commits
 - t4205: indent here documents
 - t4205: drop top-level &&-chaining

 Test modernization.

 Will merge to 'next'.


* jk/pack-objects-optim (2016-07-29) 7 commits
 - pack-objects: use mru list when iterating over packs
 - pack-objects: compute local/ignore_pack_keep early
 - pack-objects: break out of want_object loop early
 - find_pack_entry: replace last_found_pack with MRU cache
 - add generic most-recently-used 

Re: [PATCH] reset cached ident date before creating objects

2016-07-29 Thread Paul Tan
Hi Jeff,

On Sat, Jul 30, 2016 at 2:05 AM, Jeff King  wrote:
> When we compute the date to put in author/committer lines of
> commits, or tagger lines of tags, we get the current date
> once and then cache it for the rest of the program.  This is
> a good thing in some cases, like "git commit", because it
> means we do not racily assign different times to the
> author/committer fields of a single commit object.

So commits created with "git commit" should have the same author and
committer timestamps...

> diff --git a/commit.c b/commit.c
> index 71a360d..7ddbffe 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1548,6 +1548,7 @@ int commit_tree_extended(const char *msg, size_t 
> msg_len,
> }
>
> /* Person/date information */
> +   reset_ident_date();
> if (!author)
> author = git_author_info(IDENT_STRICT);
> strbuf_addf(, "author %s\n", author);

But since builtin/commit.c constructs its author ident string before
calling the editor and then commit_tree_extended(), this would cause
the resulting commits to have committer timestamps which differ from
their author timestamps.

So maybe we would have to put reset_ident_date() at the end of the
function instead, at least after git_committer_info() is called.

Regards,
Paul
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] Enhance test t3700-add.sh

2016-07-29 Thread Ingo Brückl
The files to be tested may already exist and be links which will make
the test fail. So ensure that we are working in a clean environment.

Signed-off-by: Ingo Brückl 
---
 t/t3700-add.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 4865304..494f5b8 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -333,6 +333,7 @@ test_expect_success 'git add --dry-run --ignore-missing of 
non-existing file out
 '

 test_expect_success 'git add --chmod=+x stages a non-executable file with +x' '
+   rm -f foo1 &&
echo foo >foo1 &&
git add --chmod=+x foo1 &&
case "$(git ls-files --stage foo1)" in
@@ -342,6 +343,7 @@ test_expect_success 'git add --chmod=+x stages a 
non-executable file with +x' '
 '

 test_expect_success 'git add --chmod=-x stages an executable file with -x' '
+   rm -f xfoo1 &&
echo foo >xfoo1 &&
chmod 755 xfoo1 &&
git add --chmod=-x xfoo1 &&
@@ -354,6 +356,7 @@ test_expect_success 'git add --chmod=-x stages an 
executable file with -x' '
 test_expect_success POSIXPERM,SYMLINKS 'git add --chmod=+x with symlinks' '
git config core.filemode 1 &&
git config core.symlinks 1 &&
+   rm -f foo2 &&
echo foo >foo2 &&
git add --chmod=+x foo2 &&
case "$(git ls-files --stage foo2)" in
--
2.9.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] Make test t3700-add.sh more robust

2016-07-29 Thread Ingo Brückl
Don't rely on chmod to work on the underlying platform (although it
wouldn't harm the result of the '--chmod=-x' test). Directly check the
result of the --chmod option.

Add a test_mode_in_index helper function in order to check for success.

Signed-off-by: Ingo Brückl 
---
 t/t3700-add.sh  | 20 
 t/test-lib-functions.sh | 14 ++
 2 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 494f5b8..c08ec9e 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -332,25 +332,13 @@ test_expect_success 'git add --dry-run --ignore-missing 
of non-existing file out
test_i18ncmp expect.err actual.err
 '

-test_expect_success 'git add --chmod=+x stages a non-executable file with +x' '
+test_expect_success 'git add --chmod=[+-]x stages correctly' '
rm -f foo1 &&
echo foo >foo1 &&
git add --chmod=+x foo1 &&
-   case "$(git ls-files --stage foo1)" in
-   100755" "*foo1) echo pass;;
-   *) echo fail; git ls-files --stage foo1; (exit 1);;
-   esac
-'
-
-test_expect_success 'git add --chmod=-x stages an executable file with -x' '
-   rm -f xfoo1 &&
-   echo foo >xfoo1 &&
-   chmod 755 xfoo1 &&
-   git add --chmod=-x xfoo1 &&
-   case "$(git ls-files --stage xfoo1)" in
-   100644" "*xfoo1) echo pass;;
-   *) echo fail; git ls-files --stage xfoo1; (exit 1);;
-   esac
+   test_mode_in_index 100755 foo1 &&
+   git add --chmod=-x foo1 &&
+   test_mode_in_index 100644 foo1
 '

 test_expect_success POSIXPERM,SYMLINKS 'git add --chmod=+x with symlinks' '
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 4f7eadb..0e6652b 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -990,3 +990,17 @@ test_copy_bytes () {
}
' - "$1"
 }
+
+# Test the file mode "$1" of the file "$2" in the index.
+test_mode_in_index () {
+   case "$(git ls-files --stage "$2")" in
+   $1\ *"$2")
+   echo pass
+   ;;
+   *)
+   echo fail
+   git ls-files --stage "$2"
+   return 1
+   ;;
+   esac
+}
--
2.9.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 03/10] pkt-line: add packet_flush_gentle()

2016-07-29 Thread larsxschneider
From: Lars Schneider 

packet_flush() would die in case of a write error even though for some callers
an error would be acceptable. Add packet_flush_gentle() which writes a pkt-line
flush packet and returns `0` for success and `1` for failure.

Signed-off-by: Lars Schneider 
---
 pkt-line.c | 6 ++
 pkt-line.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 6fae508..1728690 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -91,6 +91,12 @@ void packet_flush(int fd)
write_or_die(fd, "", 4);
 }
 
+int packet_flush_gentle(int fd)
+{
+   packet_trace("", 4, 1);
+   return !write_or_whine_pipe(fd, "", 4, "flush packet");
+}
+
 void packet_buf_flush(struct strbuf *buf)
 {
packet_trace("", 4, 1);
diff --git a/pkt-line.h b/pkt-line.h
index 02dcced..3953c98 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -23,6 +23,7 @@ void packet_flush(int fd);
 void packet_write(int fd, const char *fmt, ...) __attribute__((format (printf, 
2, 3)));
 void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
+int packet_flush_gentle(int fd);
 int direct_packet_write(int fd, char *buf, size_t size, int gentle);
 int direct_packet_write_data(int fd, const char *buf, size_t size, int gentle);
 
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 08/10] convert: modernize tests

2016-07-29 Thread larsxschneider
From: Lars Schneider 

Use `test_config` to set the config, check that files are empty with
`test_must_be_empty`, compare files with `test_cmp`, and remove spaces
after ">" and "<".

Signed-off-by: Lars Schneider 
---
 t/t0021-conversion.sh | 62 +--
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 7bac2bc..7b45136 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -13,8 +13,8 @@ EOF
 chmod +x rot13.sh
 
 test_expect_success setup '
-   git config filter.rot13.smudge ./rot13.sh &&
-   git config filter.rot13.clean ./rot13.sh &&
+   test_config filter.rot13.smudge ./rot13.sh &&
+   test_config filter.rot13.clean ./rot13.sh &&
 
{
echo "*.t filter=rot13"
@@ -38,8 +38,8 @@ script='s/^\$Id: \([0-9a-f]*\) \$/\1/p'
 
 test_expect_success check '
 
-   cmp test.o test &&
-   cmp test.o test.t &&
+   test_cmp test.o test &&
+   test_cmp test.o test.t &&
 
# ident should be stripped in the repository
git diff --raw --exit-code :test :test.i &&
@@ -47,10 +47,10 @@ test_expect_success check '
embedded=$(sed -ne "$script" test.i) &&
test "z$id" = "z$embedded" &&
 
-   git cat-file blob :test.t > test.r &&
+   git cat-file blob :test.t >test.r &&
 
-   ./rot13.sh < test.o > test.t &&
-   cmp test.r test.t
+   ./rot13.sh test.t &&
+   test_cmp test.r test.t
 '
 
 # If an expanded ident ever gets into the repository, we want to make sure that
@@ -130,7 +130,7 @@ test_expect_success 'filter shell-escaped filenames' '
 
# delete the files and check them out again, using a smudge filter
# that will count the args and echo the command-line back to us
-   git config filter.argc.smudge "sh ./argc.sh %f" &&
+   test_config filter.argc.smudge "sh ./argc.sh %f" &&
rm "$normal" "$special" &&
git checkout -- "$normal" "$special" &&
 
@@ -141,7 +141,7 @@ test_expect_success 'filter shell-escaped filenames' '
test_cmp expect "$special" &&
 
# do the same thing, but with more args in the filter expression
-   git config filter.argc.smudge "sh ./argc.sh %f --my-extra-arg" &&
+   test_config filter.argc.smudge "sh ./argc.sh %f --my-extra-arg" &&
rm "$normal" "$special" &&
git checkout -- "$normal" "$special" &&
 
@@ -154,9 +154,9 @@ test_expect_success 'filter shell-escaped filenames' '
 '
 
 test_expect_success 'required filter should filter data' '
-   git config filter.required.smudge ./rot13.sh &&
-   git config filter.required.clean ./rot13.sh &&
-   git config filter.required.required true &&
+   test_config filter.required.smudge ./rot13.sh &&
+   test_config filter.required.clean ./rot13.sh &&
+   test_config filter.required.required true &&
 
echo "*.r filter=required" >.gitattributes &&
 
@@ -165,17 +165,17 @@ test_expect_success 'required filter should filter data' '
 
rm -f test.r &&
git checkout -- test.r &&
-   cmp test.o test.r &&
+   test_cmp test.o test.r &&
 
./rot13.sh expected &&
git cat-file blob :test.r >actual &&
-   cmp expected actual
+   test_cmp expected actual
 '
 
 test_expect_success 'required filter smudge failure' '
-   git config filter.failsmudge.smudge false &&
-   git config filter.failsmudge.clean cat &&
-   git config filter.failsmudge.required true &&
+   test_config filter.failsmudge.smudge false &&
+   test_config filter.failsmudge.clean cat &&
+   test_config filter.failsmudge.required true &&
 
echo "*.fs filter=failsmudge" >.gitattributes &&
 
@@ -186,9 +186,9 @@ test_expect_success 'required filter smudge failure' '
 '
 
 test_expect_success 'required filter clean failure' '
-   git config filter.failclean.smudge cat &&
-   git config filter.failclean.clean false &&
-   git config filter.failclean.required true &&
+   test_config filter.failclean.smudge cat &&
+   test_config filter.failclean.clean false &&
+   test_config filter.failclean.required true &&
 
echo "*.fc filter=failclean" >.gitattributes &&
 
@@ -197,8 +197,8 @@ test_expect_success 'required filter clean failure' '
 '
 
 test_expect_success 'filtering large input to small output should use little 
memory' '
-   git config filter.devnull.clean "cat >/dev/null" &&
-   git config filter.devnull.required true &&
+   test_config filter.devnull.clean "cat >/dev/null" &&
+   test_config filter.devnull.required true &&
for i in $(test_seq 1 30); do printf "%1048576d" 1; done >30MB &&
echo "30MB filter=devnull" >.gitattributes &&
GIT_MMAP_LIMIT=1m GIT_ALLOC_LIMIT=1m git add 30MB
@@ -207,7 +207,7 @@ test_expect_success 'filtering large input to small output 
should use little mem
 

[PATCH v3 02/10] pkt-line: add direct_packet_write() and direct_packet_write_data()

2016-07-29 Thread larsxschneider
From: Lars Schneider 

Sometimes pkt-line data is already available in a buffer and it would
be a waste of resources to write the packet using packet_write() which
would copy the existing buffer into a strbuf before writing it.

If the caller has control over the buffer creation then the
PKTLINE_DATA_START macro can be used to skip the header and write
directly into the data section of a pkt-line (PKTLINE_DATA_LEN bytes
would be the maximum). direct_packet_write() would take this buffer,
adjust the pkt-line header and write it.

If the caller has no control over the buffer creation then
direct_packet_write_data() can be used. This function creates a pkt-line
header. Afterwards the header and the data buffer are written using two
consecutive write calls.

Both functions have a gentle parameter that indicates if Git should die
in case of a write error (gentle set to 0) or return with a error (gentle
set to 1).

Signed-off-by: Lars Schneider 
---
 pkt-line.c | 30 ++
 pkt-line.h |  5 +
 2 files changed, 35 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 445b8e1..6fae508 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -135,6 +135,36 @@ void packet_write(int fd, const char *fmt, ...)
write_or_die(fd, buf.buf, buf.len);
 }
 
+int direct_packet_write(int fd, char *buf, size_t size, int gentle)
+{
+   int ret = 0;
+   packet_trace(buf + 4, size - 4, 1);
+   set_packet_header(buf, size);
+   if (gentle)
+   ret = !write_or_whine_pipe(fd, buf, size, "pkt-line");
+   else
+   write_or_die(fd, buf, size);
+   return ret;
+}
+
+int direct_packet_write_data(int fd, const char *buf, size_t size, int gentle)
+{
+   int ret = 0;
+   char hdr[4];
+   set_packet_header(hdr, sizeof(hdr) + size);
+   packet_trace(buf, size, 1);
+   if (gentle) {
+   ret = (
+   !write_or_whine_pipe(fd, hdr, sizeof(hdr), "pkt-line 
header") ||
+   !write_or_whine_pipe(fd, buf, size, "pkt-line data")
+   );
+   } else {
+   write_or_die(fd, hdr, sizeof(hdr));
+   write_or_die(fd, buf, size);
+   }
+   return ret;
+}
+
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
 {
va_list args;
diff --git a/pkt-line.h b/pkt-line.h
index 3cb9d91..02dcced 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -23,6 +23,8 @@ void packet_flush(int fd);
 void packet_write(int fd, const char *fmt, ...) __attribute__((format (printf, 
2, 3)));
 void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
+int direct_packet_write(int fd, char *buf, size_t size, int gentle);
+int direct_packet_write_data(int fd, const char *buf, size_t size, int gentle);
 
 /*
  * Read a packetized line into the buffer, which must be at least size bytes
@@ -77,6 +79,9 @@ char *packet_read_line_buf(char **src_buf, size_t *src_len, 
int *size);
 
 #define DEFAULT_PACKET_MAX 1000
 #define LARGE_PACKET_MAX 65520
+#define PKTLINE_HEADER_LEN 4
+#define PKTLINE_DATA_START(pkt) ((pkt) + PKTLINE_HEADER_LEN)
+#define PKTLINE_DATA_LEN (LARGE_PACKET_MAX - PKTLINE_HEADER_LEN)
 extern char packet_buffer[LARGE_PACKET_MAX];
 
 #endif
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 04/10] pkt-line: call packet_trace() only if a packet is actually send

2016-07-29 Thread larsxschneider
From: Lars Schneider 

The packet_trace() call is not ideal in format_packet() as we would print
a trace when a packet is formatted and (potentially) when the packet is
actually send. This was no problem up until now because format_packet()
was only used by one function. Fix it by moving the trace call into the
function that actally sends the packet.

Signed-off-by: Lars Schneider 
---
 pkt-line.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pkt-line.c b/pkt-line.c
index 1728690..32c0a34 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -126,7 +126,6 @@ static void format_packet(struct strbuf *out, const char 
*fmt, va_list args)
die("protocol error: impossibly long line");
 
set_packet_header(>buf[orig_len], n);
-   packet_trace(out->buf + orig_len + 4, n - 4, 1);
 }
 
 void packet_write(int fd, const char *fmt, ...)
@@ -138,6 +137,7 @@ void packet_write(int fd, const char *fmt, ...)
va_start(args, fmt);
format_packet(, fmt, args);
va_end(args);
+   packet_trace(buf.buf + 4, buf.len - 4, 1);
write_or_die(fd, buf.buf, buf.len);
 }
 
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 09/10] convert: generate large test files only once

2016-07-29 Thread larsxschneider
From: Lars Schneider 

Generate more interesting large test files with pseudo random characters
in between and reuse these test files in multiple tests. Run tests formerly
marked as EXPENSIVE every time but with a smaller data set.

Signed-off-by: Lars Schneider 
---
 t/t0021-conversion.sh | 48 ++--
 1 file changed, 38 insertions(+), 10 deletions(-)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 7b45136..34c8eb9 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -4,6 +4,15 @@ test_description='blob conversion via gitattributes'
 
 . ./test-lib.sh
 
+if test_have_prereq EXPENSIVE
+then
+   T0021_LARGE_FILE_SIZE=2048
+   T0021_LARGISH_FILE_SIZE=100
+else
+   T0021_LARGE_FILE_SIZE=30
+   T0021_LARGISH_FILE_SIZE=2
+fi
+
 cat test.i &&
git add test test.t test.i &&
rm -f test test.t test.i &&
-   git checkout -- test test.t test.i
+   git checkout -- test test.t test.i &&
+
+   mkdir generated-test-data &&
+   for i in $(test_seq 1 $T0021_LARGE_FILE_SIZE)
+   do
+   RANDOM_STRING="$(test-genrandom end $i | tr -dc "A-Za-z0-9" )"
+   ROT_RANDOM_STRING="$(echo $RANDOM_STRING | ./rot13.sh )"
+   # Generate 1MB of empty data and 100 bytes of random characters
+   # printf "$(test-genrandom start $i)"
+   printf "%1048576d" 1 >>generated-test-data/large.file &&
+   printf "$RANDOM_STRING" >>generated-test-data/large.file &&
+   printf "%1048576d" 1 >>generated-test-data/large.file.rot13 &&
+   printf "$ROT_RANDOM_STRING" 
>>generated-test-data/large.file.rot13 &&
+
+   if test $i = $T0021_LARGISH_FILE_SIZE
+   then
+   cat generated-test-data/large.file 
>generated-test-data/largish.file &&
+   cat generated-test-data/large.file.rot13 
>generated-test-data/largish.file.rot13
+   fi
+   done
 '
 
 script='s/^\$Id: \([0-9a-f]*\) \$/\1/p'
@@ -199,9 +227,9 @@ test_expect_success 'required filter clean failure' '
 test_expect_success 'filtering large input to small output should use little 
memory' '
test_config filter.devnull.clean "cat >/dev/null" &&
test_config filter.devnull.required true &&
-   for i in $(test_seq 1 30); do printf "%1048576d" 1; done >30MB &&
-   echo "30MB filter=devnull" >.gitattributes &&
-   GIT_MMAP_LIMIT=1m GIT_ALLOC_LIMIT=1m git add 30MB
+   cp generated-test-data/large.file large.file &&
+   echo "large.file filter=devnull" >.gitattributes &&
+   GIT_MMAP_LIMIT=1m GIT_ALLOC_LIMIT=1m git add large.file
 '
 
 test_expect_success 'filter that does not read is fine' '
@@ -214,15 +242,15 @@ test_expect_success 'filter that does not read is fine' '
test_cmp expect actual
 '
 
-test_expect_success EXPENSIVE 'filter large file' '
+test_expect_success 'filter large file' '
test_config filter.largefile.smudge cat &&
test_config filter.largefile.clean cat &&
-   for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
-   echo "2GB filter=largefile" >.gitattributes &&
-   git add 2GB 2>err &&
+   echo "large.file filter=largefile" >.gitattributes &&
+   cp generated-test-data/large.file large.file &&
+   git add large.file 2>err &&
test_must_be_empty err &&
-   rm -f 2GB &&
-   git checkout -- 2GB 2>err &&
+   rm -f large.file &&
+   git checkout -- large.file 2>err &&
test_must_be_empty err
 '
 
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 10/10] convert: add filter..process option

2016-07-29 Thread larsxschneider
From: Lars Schneider 

Git's clean/smudge mechanism invokes an external filter process for every
single blob that is affected by a filter. If Git filters a lot of blobs
then the startup time of the external filter processes can become a
significant part of the overall Git execution time.

This patch adds the filter..process string option which, if used,
keeps the external filter process running and processes all blobs with
the following packet format (pkt-line) based protocol over standard input
and standard output.

Git starts the filter on first usage and expects a welcome
message, protocol version number, and filter capabilities
separated by spaces:

packet:  git< git-filter-protocol\n
packet:  git< version 2\n
packet:  git< capabilities clean smudge\n

Supported filter capabilities are "clean", "smudge", "stream",
and "shutdown".

Afterwards Git sends a command (based on the supported
capabilities), the filename including its path
relative to the repository root, the content size as ASCII number
in bytes, the content split in zero or many pkt-line packets,
and a flush packet at the end:

packet:  git> smudge\n
packet:  git> filename=path/testfile.dat\n
packet:  git> size=7\n
packet:  git> CONTENT
packet:  git> 


The filter is expected to respond with the result content size as
ASCII number in bytes. If the capability "stream" is defined then
the filter must not send the content size. Afterwards the result
content in send in zero or many pkt-line packets and a flush packet
at the end. Finally a "success" packet is send to indicate that
everything went well.

packet:  git< size=57\n   (omitted with capability "stream")
packet:  git< SMUDGED_CONTENT
packet:  git< 
packet:  git< success\n


In case the filter cannot process the content, it is expected
to respond with the result content size 0 (only if "stream" is
not defined) and a "reject" packet.

packet:  git< size=0\n(omitted with capability "stream")
packet:  git< reject\n


After the filter has processed a blob it is expected to wait for
the next command. A demo implementation can be found in
`t/t0021/rot13-filter.pl` located in the Git core repository.

If the filter supports the "shutdown" capability then Git will
send the "shutdown" command and wait until the filter answers
with "done". This gives the filter the opportunity to perform
cleanup tasks. Afterwards the filter is expected to exit.

packet:  git> shutdown\n
packet:  git< done\n


If a filter..clean or filter..smudge command
is configured then these commands always take precedence over
a configured filter..process command.

Please note that you cannot use an existing filter..clean
or filter..smudge command as filter..process
command. As soon as Git would detect a file that needs to be
processed by this filter, it would stop responding.

Signed-off-by: Lars Schneider 
Helped-by: Martin-Louis Bright 
---
 Documentation/gitattributes.txt |  84 -
 convert.c   | 400 +--
 t/t0021-conversion.sh   | 405 
 t/t0021/rot13-filter.pl | 177 ++
 4 files changed, 1053 insertions(+), 13 deletions(-)
 create mode 100755 t/t0021/rot13-filter.pl

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 8882a3e..e3fbcc2 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -300,7 +300,11 @@ checkout, when the `smudge` command is specified, the 
command is
 fed the blob object from its standard input, and its standard
 output is used to update the worktree file.  Similarly, the
 `clean` command is used to convert the contents of worktree file
-upon checkin.
+upon checkin. By default these commands process only a single
+blob and terminate. If a long running filter process (see section
+below) is used then Git can process all blobs with a single filter
+invocation for the entire life of a single Git command (e.g.
+`git add .`).
 
 One use of the content filtering is to massage the content into a shape
 that is more convenient for the platform, filesystem, and the user to use.
@@ -375,6 +379,84 @@ substitution.  For example:
 
 
 
+Long Running Filter Process
+^^^
+
+If the filter command (string value) is defined via
+filter..process then Git can process all blobs with a
+single filter invocation for the entire life of a single Git
+command. This is achieved by using the following packet
+format (pkt-line, see 

[PATCH v3 07/10] convert: quote filter names in error messages

2016-07-29 Thread larsxschneider
From: Lars Schneider 

Git filter driver commands with spaces (e.g. `filter.sh foo`) are hard to
read in error messages. Quote them to improve the readability.

Signed-off-by: Lars Schneider 
---
 convert.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/convert.c b/convert.c
index b1614bf..522e2c5 100644
--- a/convert.c
+++ b/convert.c
@@ -397,7 +397,7 @@ static int filter_buffer_or_fd(int in, int out, void *data)
child_process.out = out;
 
if (start_command(_process))
-   return error("cannot fork to run external filter %s", 
params->cmd);
+   return error("cannot fork to run external filter '%s'", 
params->cmd);
 
sigchain_push(SIGPIPE, SIG_IGN);
 
@@ -415,13 +415,13 @@ static int filter_buffer_or_fd(int in, int out, void 
*data)
if (close(child_process.in))
write_err = 1;
if (write_err)
-   error("cannot feed the input to external filter %s", 
params->cmd);
+   error("cannot feed the input to external filter '%s'", 
params->cmd);
 
sigchain_pop(SIGPIPE);
 
status = finish_command(_process);
if (status)
-   error("external filter %s failed %d", params->cmd, status);
+   error("external filter '%s' failed %d", params->cmd, status);
 
strbuf_release();
return (write_err || status);
@@ -462,15 +462,15 @@ static int apply_filter(const char *path, const char 
*src, size_t len, int fd,
return 0;   /* error was already reported */
 
if (strbuf_read(, async.out, len) < 0) {
-   error("read from external filter %s failed", cmd);
+   error("read from external filter '%s' failed", cmd);
ret = 0;
}
if (close(async.out)) {
-   error("read from external filter %s failed", cmd);
+   error("read from external filter '%s' failed", cmd);
ret = 0;
}
if (finish_async()) {
-   error("external filter %s failed", cmd);
+   error("external filter '%s' failed", cmd);
ret = 0;
}
 
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 01/10] pkt-line: extract set_packet_header()

2016-07-29 Thread larsxschneider
From: Lars Schneider 

set_packet_header() converts an integer to a 4 byte hex string. Make
this function locally available so that other pkt-line functions can
use it.

Signed-off-by: Lars Schneider 
---
 pkt-line.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index 62fdb37..445b8e1 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -98,9 +98,17 @@ void packet_buf_flush(struct strbuf *buf)
 }
 
 #define hex(a) (hexchar[(a) & 15])
-static void format_packet(struct strbuf *out, const char *fmt, va_list args)
+static void set_packet_header(char *buf, const int size)
 {
static char hexchar[] = "0123456789abcdef";
+   buf[0] = hex(size >> 12);
+   buf[1] = hex(size >> 8);
+   buf[2] = hex(size >> 4);
+   buf[3] = hex(size);
+}
+
+static void format_packet(struct strbuf *out, const char *fmt, va_list args)
+{
size_t orig_len, n;
 
orig_len = out->len;
@@ -111,10 +119,7 @@ static void format_packet(struct strbuf *out, const char 
*fmt, va_list args)
if (n > LARGE_PACKET_MAX)
die("protocol error: impossibly long line");
 
-   out->buf[orig_len + 0] = hex(n >> 12);
-   out->buf[orig_len + 1] = hex(n >> 8);
-   out->buf[orig_len + 2] = hex(n >> 4);
-   out->buf[orig_len + 3] = hex(n);
+   set_packet_header(>buf[orig_len], n);
packet_trace(out->buf + orig_len + 4, n - 4, 1);
 }
 
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 06/10] run-command: add clean_on_exit_handler

2016-07-29 Thread larsxschneider
From: Lars Schneider 

Some commands might need to perform cleanup tasks on exit. Let's give
them an interface for doing this.

Signed-off-by: Lars Schneider 
---
 run-command.c | 12 
 run-command.h |  1 +
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/run-command.c b/run-command.c
index 33bc63a..197b534 100644
--- a/run-command.c
+++ b/run-command.c
@@ -21,6 +21,7 @@ void child_process_clear(struct child_process *child)
 
 struct child_to_clean {
pid_t pid;
+   void (*clean_on_exit_handler)(pid_t);
struct child_to_clean *next;
 };
 static struct child_to_clean *children_to_clean;
@@ -30,6 +31,8 @@ static void cleanup_children(int sig, int in_signal)
 {
while (children_to_clean) {
struct child_to_clean *p = children_to_clean;
+   if (p->clean_on_exit_handler)
+   p->clean_on_exit_handler(p->pid);
children_to_clean = p->next;
kill(p->pid, sig);
if (!in_signal)
@@ -49,10 +52,11 @@ static void cleanup_children_on_exit(void)
cleanup_children(SIGTERM, 0);
 }
 
-static void mark_child_for_cleanup(pid_t pid)
+static void mark_child_for_cleanup(pid_t pid, void 
(*clean_on_exit_handler)(pid_t))
 {
struct child_to_clean *p = xmalloc(sizeof(*p));
p->pid = pid;
+   p->clean_on_exit_handler = clean_on_exit_handler;
p->next = children_to_clean;
children_to_clean = p;
 
@@ -422,7 +426,7 @@ int start_command(struct child_process *cmd)
if (cmd->pid < 0)
error_errno("cannot fork() for %s", cmd->argv[0]);
else if (cmd->clean_on_exit)
-   mark_child_for_cleanup(cmd->pid);
+   mark_child_for_cleanup(cmd->pid, cmd->clean_on_exit_handler);
 
/*
 * Wait for child's execvp. If the execvp succeeds (or if fork()
@@ -483,7 +487,7 @@ int start_command(struct child_process *cmd)
if (cmd->pid < 0 && (!cmd->silent_exec_failure || errno != ENOENT))
error_errno("cannot spawn %s", cmd->argv[0]);
if (cmd->clean_on_exit && cmd->pid >= 0)
-   mark_child_for_cleanup(cmd->pid);
+   mark_child_for_cleanup(cmd->pid, cmd->clean_on_exit_handler);
 
argv_array_clear();
cmd->argv = sargv;
@@ -752,7 +756,7 @@ int start_async(struct async *async)
exit(!!async->proc(proc_in, proc_out, async->data));
}
 
-   mark_child_for_cleanup(async->pid);
+   mark_child_for_cleanup(async->pid, NULL);
 
if (need_in)
close(fdin[0]);
diff --git a/run-command.h b/run-command.h
index 5066649..59d21ea 100644
--- a/run-command.h
+++ b/run-command.h
@@ -43,6 +43,7 @@ struct child_process {
unsigned stdout_to_stderr:1;
unsigned use_shell:1;
unsigned clean_on_exit:1;
+   void (*clean_on_exit_handler)(pid_t);
 };
 
 #define CHILD_PROCESS_INIT { NULL, ARGV_ARRAY_INIT, ARGV_ARRAY_INIT }
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 05/10] pack-protocol: fix maximum pkt-line size

2016-07-29 Thread larsxschneider
From: Lars Schneider 

According to LARGE_PACKET_MAX in pkt-line.h the maximal lenght of a
pkt-line packet is 65520 bytes. The pkt-line header takes 4 bytes and
therefore the pkt-line data component must not exceed 65516 bytes.

Signed-off-by: Lars Schneider 
---
 Documentation/technical/protocol-common.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/technical/protocol-common.txt 
b/Documentation/technical/protocol-common.txt
index bf30167..ecedb34 100644
--- a/Documentation/technical/protocol-common.txt
+++ b/Documentation/technical/protocol-common.txt
@@ -67,9 +67,9 @@ with non-binary data the same whether or not they contain the 
trailing
 LF (stripping the LF if present, and not complaining when it is
 missing).
 
-The maximum length of a pkt-line's data component is 65520 bytes.
-Implementations MUST NOT send pkt-line whose length exceeds 65524
-(65520 bytes of payload + 4 bytes of length data).
+The maximum length of a pkt-line's data component is 65516 bytes.
+Implementations MUST NOT send pkt-line whose length exceeds 65520
+(65516 bytes of payload + 4 bytes of length data).
 
 Implementations SHOULD NOT send an empty pkt-line ("0004").
 
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [[PATCH v2] 4/4] rebase: avoid computing unnecessary patch IDs

2016-07-29 Thread Junio C Hamano
Kevin Willford  writes:

>  static int patch_id_cmp(struct patch_id *a,
>   struct patch_id *b,
> - void *keydata)
> + struct diff_options *opt)
>  {
> + if (is_null_sha1(a->patch_id) &&
> + commit_patch_id(a->commit, opt, a->patch_id, 0))
> + return error("Could not get patch ID for %s",
> + oid_to_hex(>commit->object.oid));
> + if (is_null_sha1(b->patch_id) &&
> + commit_patch_id(b->commit, opt, b->patch_id, 0))
> + return error("Could not get patch ID for %s",
> + oid_to_hex(>commit->object.oid));
>   return hashcmp(a->patch_id, b->patch_id);
>  }

These error returns initially looks slightly iffy in that in general
the caller of any_cmp_fn() wants to know how a/b compares, but by
returning error(), it always says "a is smaller than b".  This
however may be OK because the callers in hashmap_get* implementation
only wants to know "are they equal?", and we are saying "no they
cannot possibly be equal" here.  The original that ran a full
commit_patch_id() in now-removed add_commit() helper function didn't
even diagnose this error and silently omitted the commit from the
candidate list, so this may be even seen as an improvement.

The idea of using the two level hash, computing the more expensive
one only when the hashmap hashes of the result of the cheaper hash
function collide, is excellent.  Thanks.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] Make test t3700-add.sh more robust

2016-07-29 Thread Junio C Hamano
Ingo Brückl  writes:

> Subject: Re: [PATCH v2 2/3] Make test t3700-add.sh more robust

Please check output from "git shortlog --no-merges -100" to see how
your titles play well with others.  We typically prefix the title
with a specific area, a colon, and a sentence that does not begin in
a capital letter and does not end with a full-stop.

> Don't rely on chmod to work on the underlying platform (although it
> wouldn't harm the result of the '--chmod=-x' test). Directly check the
> result of the --chmod option.
>
> Add a test_mode_in_index helper function in order to check for success.

Hmph, I do not immediately see the point of having the helper in
test-lib-functions.sh, though.  This helper looks more or less
specific to this test script.

In any case, I think addition of the "test_mode_in_index" helper and
conversion from case/esac to the helper should be a separate patch
from what this patch wants to do, which is to merge two more-or-less
redundant tests into one.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git-format-patch: default to --from to avoid spoofed mails?

2016-07-29 Thread Jeff King
On Thu, Jul 28, 2016 at 07:08:02PM -0700, Josh Triplett wrote:

> > I think on the whole that defaulting to "--from" would help more people
> > than hurt them, but if we do believe there are scripts that would be
> > regressed, it probably needs a deprecation period.
> 
> I don't think it's likely that there are scripts that would be regressed
> (and I think it's likely that there are scripts that would be
> progressed), but I'd also have no objection to a deprecation period.

I'm on the fence, so I'd leave the final decision on whether to deal
with deprecation to you (who will write the patch) and Junio (as the
maintainer).

> I just confirmed that with the default changed, --no-from works to
> return to the current behavior, so we don't need a new option.  And
> --no-from has worked for a long time, so scripts won't need to care if
> they're working with an old version of git.
> 
> I can provide a patch implementing a new config option to set the
> format-patch --from default ("false" for --no-from, "true" for --from,
> or a string value for --from=value).

I'd be surprised if the config option is actually useful to anybody in
practice (the distinction is not "this my preference" as much as "in
what context am I calling the program?"). It is a convenient way to do
deprecations (introduce an option, then flip the default, leaving the
option as an escape hatch for anybody who has been broken), though.

> Do you think this needs the kind of very noisy deprecation period that
> push.default had, where anyone without the git-config option set gets a
> warning to stderr?  Or do you think it would suffice to provide a
> warning in the release notes for a while and then change the default?

IMHO the noisy deprecations haven't really been all that beneficial.
Even after the length push.default one, I think we still had people who
had skipped enough versions to miss it, and quite a few people became
confused or annoyed by the spew of text.

OTOH, I'm not sure most people read the release notes carefully, either.
It would be nice if we could make the change and count on people to
notice it in 'next' or even 'master' before the release, but empirically
that does not happen.

So I dunno. If we consider the risk minor, perhaps a mention in the
release notes for version X, and then the change in X+1 would be fine.
That gives some opportunity for release-note readers to pipe up. It's
not foolproof, but it would give us more confidence.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 00/10] Git filter protocol

2016-07-29 Thread larsxschneider
From: Lars Schneider 

Hi,

thanks a lot to Jakub, Peff, Torsten, Eric, and Junio for comments
and reviews.

Here is what has changed since v2:

* replace `/dev/urandom` with `test-genrandom` (Torsten, Peff)
* improve commit message "Git filter driver command with spaces" (Jakub)
* use proper types for memory and disk (Peff)
* create packet read buffer with overflow check (Peff)
* change capabilities format: "capabilities clean smudge" (Jakub)
* replace "%zu" (Eric)
* remove &= error handling (Eric, Peff)
* initialize *argv[] with  { cmd, NULL } (Jakub)
* reorder multi_packet_read() parameters to match read(2) (Eric)
* do not continue if fstat fails (Eric)
* filter: add reject response
* add functions to pkt-line.h/c that: (Jakub, Peff)
- can write a packet without creating a new buffer
- do not die in case of a failure
* add function to pkt-line.h/c that writes a pkt-line flush and does not die on 
error
* add filter stream capability
* add filter shutdown capability
* docs: fix LARGE_PACKET_MAX documentaion
see 
http://public-inbox.org/git/20160726134257.GB19277%40sigill.intra.peff.net/
* docs: fix s/seperated/separated/ (Jakub)
* docs: "mis-configured one-shot filters would hang" (Jakub)
* docs: filter protocol filename absolute (Jakub)
* docs: state that Git can use more than one packet (Jabub)
* docs: add "\n" to lines (Jakub)
* docs: filter precedence (Jakub)

Cheers,
Lars

PS: If you prefer checkout the code from a Git repo instead then you can find
it here: https://github.com/larsxschneider/git/tree/protocol-filter/v3


Lars Schneider (10):
  pkt-line: extract set_packet_header()
  pkt-line: add direct_packet_write() and direct_packet_write_data()
  pkt-line: add packet_flush_gentle()
  pkt-line: call packet_trace() only if a packet is actually send
  pack-protocol: fix maximum pkt-line size
  run-command: add clean_on_exit_handler
  convert: quote filter names in error messages
  convert: modernize tests
  convert: generate large test files only once
  convert: add filter..process option

 Documentation/gitattributes.txt |  84 -
 Documentation/technical/protocol-common.txt |   6 +-
 convert.c   | 412 +-
 pkt-line.c  |  53 ++-
 pkt-line.h  |   6 +
 run-command.c   |  12 +-
 run-command.h   |   1 +
 t/t0021-conversion.sh   | 515 +---
 t/t0021/rot13-filter.pl | 177 ++
 9 files changed, 1193 insertions(+), 73 deletions(-)
 create mode 100755 t/t0021/rot13-filter.pl

--
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with global config defaults "user.useConfigOnly = true" + "pull.rebase = preserve" - "user.email"

2016-07-29 Thread Jeff King
On Fri, Jul 29, 2016 at 03:45:35PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > TBH, I'm not sure anybody cares that much between the three. Even before
> > user.useConfigOnly, this could be an issue on machines where the ident
> > could not be auto-configured, and it seems like nobody ran across it.
> > It's only the funny interaction with pull.rebase that makes it likely to
> > come up, so as long as that code path is fixed (one way or another), I
> > doubt anybody would bring it up again.
> 
> Yup, I do not think the choice among the three would make all that
> much difference in practice.  If I really have to pick one of them,
> I think the one in your message I am responding to would make the
> most sense.
> 
> The one I sent, which I wrote as a response to some end-user request
> on the list back then, has been sitting on 'pu' for quite a while
> because I didn't see a real use or positive support for it, and the
> only reason why I sent it is because this might be that one
> real use it wanted to see.

BTW, I didn't actually test yours, but if we do go that route I suspect
you can reuse the tests I posted by just replacing "git rebase" with
"git pull --rebase= . master".

> > In some ways this is less convenient, but in some ways it is
> > more so; the user can then manually commit or even "git
> > rebase --continue" after setting up their ident (or
> > providing it as a one-off on the command line).
> 
> Yup, that is the controvercial bit, and I suspect Dscho's original
> was siding for the "set up ident first, as you will need it anyway
> eventually", so I'll let others with viewpoints different from us to
> chime in first before picking it up.

Very sensible. Thanks.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git-format-patch: default to --from to avoid spoofed mails?

2016-07-29 Thread Josh Triplett
On Sat, Jul 30, 2016 at 01:47:42AM -0400, Jeff King wrote:
> On Fri, Jul 29, 2016 at 09:50:55PM -0700, Josh Triplett wrote:
> 
> > I would propose the following then:
> > 
> > - I'll write a patch adding a config option format.from, along with
> >   documentation, without changing the default.
> > - The release notes for the version of git introducing that config
> >   option should mention, prominently, the plan to change the default in
> >   a future version of git.
> > - A subsequent release can change the default.  No major rush to do
> >   this.
> > 
> > Does that sound reasonable?
> 
> That sounds fine to me.
> 
> I do have to admit that after reading through the "format.*" section of
> git-config(1), there is quite a bit that is configurable in it. So
> perhaps we do not need to be as careful about behavior changes as I
> thought.
> 
> So if you wanted to squish steps 2 and 3 together, that would also be OK
> by me.

I'll send two separate patches, and I'll leave it up to Junio whether to
apply the second one right away or wait a release.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git-format-patch: default to --from to avoid spoofed mails?

2016-07-29 Thread Jeff King
On Fri, Jul 29, 2016 at 09:50:55PM -0700, Josh Triplett wrote:

> I would propose the following then:
> 
> - I'll write a patch adding a config option format.from, along with
>   documentation, without changing the default.
> - The release notes for the version of git introducing that config
>   option should mention, prominently, the plan to change the default in
>   a future version of git.
> - A subsequent release can change the default.  No major rush to do
>   this.
> 
> Does that sound reasonable?

That sounds fine to me.

I do have to admit that after reading through the "format.*" section of
git-config(1), there is quite a bit that is configurable in it. So
perhaps we do not need to be as careful about behavior changes as I
thought.

So if you wanted to squish steps 2 and 3 together, that would also be OK
by me.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reset cached ident date before creating objects

2016-07-29 Thread Jeff King
On Sat, Jul 30, 2016 at 10:11:56AM +0800, Paul Tan wrote:

> > diff --git a/commit.c b/commit.c
> > index 71a360d..7ddbffe 100644
> > --- a/commit.c
> > +++ b/commit.c
> > @@ -1548,6 +1548,7 @@ int commit_tree_extended(const char *msg, size_t 
> > msg_len,
> > }
> >
> > /* Person/date information */
> > +   reset_ident_date();
> > if (!author)
> > author = git_author_info(IDENT_STRICT);
> > strbuf_addf(, "author %s\n", author);
> 
> But since builtin/commit.c constructs its author ident string before
> calling the editor and then commit_tree_extended(), this would cause
> the resulting commits to have committer timestamps which differ from
> their author timestamps.

Hrm, yeah. I assumed it would only pass in the author string for things
like "-c" or "--amend". But it looks like it unconditionally passes in
the author.  And it would be slightly difficult to have it pass NULL,
because it may actually have _part_ of an author (e.g., "--author" will
come up with name and email but not the date), so it has to sometimes
combine those bits with things like ident_default_date() itself.

I guess one option would be to commit_tree_extended() to take the
broken-down author bits and call fmt_ident() itself. That's what all of
the callers are doing (that, or just passing NULL). It would make the
interface a bit clunkier, but I think the end result would be more
flexible.

I suppose that would be tricky for git-commit, because in addition to
passing the result of fmt_ident() to commit_tree_extended(), it wants to
take the pieces and put them in the environment for hooks to see. And if
the data is available only inside commit_tree_extended(), we don't have
it for the hooks.

> So maybe we would have to put reset_ident_date() at the end of the
> function instead, at least after git_committer_info() is called.

Yes, although "reset and end" still feels a bit weird to me.

I'd almost prefer to just have long-running programs insert resets at
strategic points.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git-format-patch: default to --from to avoid spoofed mails?

2016-07-29 Thread Josh Triplett
On Fri, Jul 29, 2016 at 06:58:00PM -0400, Jeff King wrote:
> On Thu, Jul 28, 2016 at 07:08:02PM -0700, Josh Triplett wrote:
> 
> > > I think on the whole that defaulting to "--from" would help more people
> > > than hurt them, but if we do believe there are scripts that would be
> > > regressed, it probably needs a deprecation period.
> > 
> > I don't think it's likely that there are scripts that would be regressed
> > (and I think it's likely that there are scripts that would be
> > progressed), but I'd also have no objection to a deprecation period.
> 
> I'm on the fence, so I'd leave the final decision on whether to deal
> with deprecation to you (who will write the patch) and Junio (as the
> maintainer).

OK, see the plan proposed below then.

> > I just confirmed that with the default changed, --no-from works to
> > return to the current behavior, so we don't need a new option.  And
> > --no-from has worked for a long time, so scripts won't need to care if
> > they're working with an old version of git.
> > 
> > I can provide a patch implementing a new config option to set the
> > format-patch --from default ("false" for --no-from, "true" for --from,
> > or a string value for --from=value).
> 
> I'd be surprised if the config option is actually useful to anybody in
> practice (the distinction is not "this my preference" as much as "in
> what context am I calling the program?"). It is a convenient way to do
> deprecations (introduce an option, then flip the default, leaving the
> option as an escape hatch for anybody who has been broken), though.

It also allows people to change their local default in advance of git
changing the default.

> > Do you think this needs the kind of very noisy deprecation period that
> > push.default had, where anyone without the git-config option set gets a
> > warning to stderr?  Or do you think it would suffice to provide a
> > warning in the release notes for a while and then change the default?
> 
> IMHO the noisy deprecations haven't really been all that beneficial.
> Even after the length push.default one, I think we still had people who
> had skipped enough versions to miss it, and quite a few people became
> confused or annoyed by the spew of text.
> 
> OTOH, I'm not sure most people read the release notes carefully, either.
> It would be nice if we could make the change and count on people to
> notice it in 'next' or even 'master' before the release, but empirically
> that does not happen.
> 
> So I dunno. If we consider the risk minor, perhaps a mention in the
> release notes for version X, and then the change in X+1 would be fine.
> That gives some opportunity for release-note readers to pipe up. It's
> not foolproof, but it would give us more confidence.

I would propose the following then:

- I'll write a patch adding a config option format.from, along with
  documentation, without changing the default.
- The release notes for the version of git introducing that config
  option should mention, prominently, the plan to change the default in
  a future version of git.
- A subsequent release can change the default.  No major rush to do
  this.

Does that sound reasonable?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[[PATCH v2] 0/4] Use header data patch ids for rebase to avoid loading file content

2016-07-29 Thread Kevin Willford
This patch series is to remove the hand rolled hashmap in the patch_ids
and use the hashmap.h implementation.  It also introduces the idea of having
a header only patch id so that the contents of the files do not have to be
loaded in order to determine if two commits are different.


Kevin Willford (4):
  patch-ids: stop using a hand-rolled hashmap implementation
  patch-ids: replace the seen indicator with a commit pointer
  patch-ids: add flag to create the diff patch id using header only data
  rebase: avoid computing unnecessary patch IDs

 builtin/log.c|  2 +-
 diff.c   | 16 +++---
 diff.h   |  2 +-
 patch-ids.c  | 99 +++-
 patch-ids.h  | 11 ++--
 revision.c   | 18 ++-
 t/perf/p3400-rebase.sh   | 36 +
 t/t6007-rev-list-cherry-pick-file.sh | 17 +++
 8 files changed, 114 insertions(+), 87 deletions(-)
 create mode 100644 t/perf/p3400-rebase.sh

-- 
2.9.2.windows.1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[[PATCH v2] 3/4] patch-ids: add flag to create the diff patch id using header only data

2016-07-29 Thread Kevin Willford
From: Kevin Willford 

This will allow a diff patch id to be created using only the header data
so that the contents of the file will not have to be loaded.

Signed-off-by: Kevin Willford 
---
 diff.c  | 16 ++--
 diff.h  |  2 +-
 patch-ids.c |  2 +-
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 7d03419..28a4190 100644
--- a/diff.c
+++ b/diff.c
@@ -4455,7 +4455,7 @@ static void patch_id_consume(void *priv, char *line, 
unsigned long len)
 }
 
 /* returns 0 upon success, and writes result into sha1 */
-static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1)
+static int diff_get_patch_id(struct diff_options *options, unsigned char 
*sha1, int diff_header_only)
 {
struct diff_queue_struct *q = _queued_diff;
int i;
@@ -4490,9 +4490,6 @@ static int diff_get_patch_id(struct diff_options 
*options, unsigned char *sha1)
 
diff_fill_sha1_info(p->one);
diff_fill_sha1_info(p->two);
-   if (fill_mmfile(, p->one) < 0 ||
-   fill_mmfile(, p->two) < 0)
-   return error("unable to read files to diff");
 
len1 = remove_space(p->one->path, strlen(p->one->path));
len2 = remove_space(p->two->path, strlen(p->two->path));
@@ -4527,6 +4524,13 @@ static int diff_get_patch_id(struct diff_options 
*options, unsigned char *sha1)
len2, p->two->path);
git_SHA1_Update(, buffer, len1);
 
+   if (diff_header_only)
+   continue;
+
+   if (fill_mmfile(, p->one) < 0 ||
+   fill_mmfile(, p->two) < 0)
+   return error("unable to read files to diff");
+
if (diff_filespec_is_binary(p->one) ||
diff_filespec_is_binary(p->two)) {
git_SHA1_Update(, oid_to_hex(>one->oid),
@@ -4549,11 +4553,11 @@ static int diff_get_patch_id(struct diff_options 
*options, unsigned char *sha1)
return 0;
 }
 
-int diff_flush_patch_id(struct diff_options *options, unsigned char *sha1)
+int diff_flush_patch_id(struct diff_options *options, unsigned char *sha1, int 
diff_header_only)
 {
struct diff_queue_struct *q = _queued_diff;
int i;
-   int result = diff_get_patch_id(options, sha1);
+   int result = diff_get_patch_id(options, sha1, diff_header_only);
 
for (i = 0; i < q->nr; i++)
diff_free_filepair(q->queue[i]);
diff --git a/diff.h b/diff.h
index 125447b..7883729 100644
--- a/diff.h
+++ b/diff.h
@@ -342,7 +342,7 @@ extern int run_diff_files(struct rev_info *revs, unsigned 
int option);
 extern int run_diff_index(struct rev_info *revs, int cached);
 
 extern int do_diff_cache(const unsigned char *, struct diff_options *);
-extern int diff_flush_patch_id(struct diff_options *, unsigned char *);
+extern int diff_flush_patch_id(struct diff_options *, unsigned char *, int);
 
 extern int diff_result_code(struct diff_options *, int);
 
diff --git a/patch-ids.c b/patch-ids.c
index bafaae2..69a14a3 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -13,7 +13,7 @@ int commit_patch_id(struct commit *commit, struct 
diff_options *options,
else
diff_root_tree_sha1(commit->object.oid.hash, "", options);
diffcore_std(options);
-   return diff_flush_patch_id(options, sha1);
+   return diff_flush_patch_id(options, sha1, 0);
 }
 
 static int patch_id_cmp(struct patch_id *a,
-- 
2.9.2.gvfs.2.42.gb7633a3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[[PATCH v2] 2/4] patch-ids: replace the seen indicator with a commit pointer

2016-07-29 Thread Kevin Willford
From: Kevin Willford 

The cherry_pick_list was looping through the original side checking the
seen indicator and setting the cherry_flag on the commit.  If we save
off the commit in the patch_id we can set the cherry_flag on the correct
commit when running through the other side when a patch_id match is found.

Signed-off-by: Kevin Willford 
---
 patch-ids.c |  1 +
 patch-ids.h |  2 +-
 revision.c  | 18 +++---
 3 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/patch-ids.c b/patch-ids.c
index db31fa6..bafaae2 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -43,6 +43,7 @@ static int init_patch_id_entry(struct patch_id *patch,
   struct commit *commit,
   struct patch_ids *ids)
 {
+   patch->commit = commit;
if (commit_patch_id(commit, >diffopts, patch->patch_id))
return -1;
 
diff --git a/patch-ids.h b/patch-ids.h
index 9569ee0..dea1ecd 100644
--- a/patch-ids.h
+++ b/patch-ids.h
@@ -4,7 +4,7 @@
 struct patch_id {
struct hashmap_entry ent;
unsigned char patch_id[GIT_SHA1_RAWSZ];
-   char seen;
+   struct commit *commit;
 };
 
 struct patch_ids {
diff --git a/revision.c b/revision.c
index edba5b7..19cc067 100644
--- a/revision.c
+++ b/revision.c
@@ -846,7 +846,7 @@ static void cherry_pick_list(struct commit_list *list, 
struct rev_info *revs)
 */
if (left_first != !!(flags & SYMMETRIC_LEFT))
continue;
-   commit->util = add_commit_patch_id(commit, );
+   add_commit_patch_id(commit, );
}
 
/* either cherry_mark or cherry_pick are true */
@@ -873,21 +873,9 @@ static void cherry_pick_list(struct commit_list *list, 
struct rev_info *revs)
id = has_commit_patch_id(commit, );
if (!id)
continue;
-   id->seen = 1;
-   commit->object.flags |= cherry_flag;
-   }
 
-   /* Now check the original side for seen ones */
-   for (p = list; p; p = p->next) {
-   struct commit *commit = p->item;
-   struct patch_id *ent;
-
-   ent = commit->util;
-   if (!ent)
-   continue;
-   if (ent->seen)
-   commit->object.flags |= cherry_flag;
-   commit->util = NULL;
+   commit->object.flags |= cherry_flag;
+   id->commit->object.flags |= cherry_flag;
}
 
free_patch_ids();
-- 
2.9.2.gvfs.2.42.gb7633a3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix failing test t3700-add.sh

2016-07-29 Thread Johannes Sixt

Am 29.07.2016 um 14:31 schrieb Ingo Brückl:

At the time of the test xfoo1 already exists and is a link.
As a result, the check for file mode 100644 fails.

Create not yet existing file xfoo instead.

Signed-off-by: Ingo Brückl 
---
 t/t3700-add.sh | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 4865304..aee61b9 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -342,12 +342,12 @@ test_expect_success 'git add --chmod=+x stages a 
non-executable file with +x' '
 '

 test_expect_success 'git add --chmod=-x stages an executable file with -x' '
-   echo foo >xfoo1 &&
-   chmod 755 xfoo1 &&
-   git add --chmod=-x xfoo1 &&
-   case "$(git ls-files --stage xfoo1)" in
-   100644" "*xfoo1) echo pass;;
-   *) echo fail; git ls-files --stage xfoo1; (exit 1);;
+   echo foo >xfoo &&
+   chmod 755 xfoo &&
+   git add --chmod=-x xfoo &&
+   case "$(git ls-files --stage xfoo)" in
+   100644" "*xfoo) echo pass;;
+   *) echo fail; git ls-files --stage xfoo; (exit 1);;
esac
 '


The commit that added this test is already 2 months old. How could that 
have been missed?


In fact, I cannot verify that there is xfoo1 in the directory or in the 
index before this test case runs. The general statement that the commit 
message makes is clearly not correct. What am I missing?


-- Hannes

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git-format-patch: default to --from to avoid spoofed mails?

2016-07-29 Thread Junio C Hamano
Josh Triplett  writes:

> So, it seems exceedingly unlikely to me that this would result in
> unnecessary in-body "From:" headers.

Yup, Peff elsewhere on the thread gave the same conclusion, and
after reading the codepaths involved again, I agree.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix failing test t3700-add.sh

2016-07-29 Thread Jeff King
[+cc Ed, who wrote 4e55ed3 (add: add --chmod=+x / --chmod=-x options,
2016-05-31)]

On Fri, Jul 29, 2016 at 02:31:28PM +0200, Ingo Brückl wrote:

> At the time of the test xfoo1 already exists and is a link.
> As a result, the check for file mode 100644 fails.
> 
> Create not yet existing file xfoo instead.

Hrm. So in the original code:

>  test_expect_success 'git add --chmod=-x stages an executable file with -x' '
> - echo foo >xfoo1 &&
> - chmod 755 xfoo1 &&
> - git add --chmod=-x xfoo1 &&
> - case "$(git ls-files --stage xfoo1)" in
> - 100644" "*xfoo1) echo pass;;
> - *) echo fail; git ls-files --stage xfoo1; (exit 1);;

I would have expected "git add --chmod" to drop the "-x" bit in addition
to actually overwriting the file contents (and switching a symlink to a
file). And it does. The culprit is actually the "echo foo >xfoo1" line.
If "xfoo1" is a symlink, then it silently writes to the symlink
destination, and xfoo1 remains a symlink (and thus tweaking its execute
bit is a noop).

I was also puzzled why the test fails for you; it does not for me.
Running the test script as root does make it fail. There are some
earlier tests which are skipped in this case, which run "git reset
--hard" with xfoo1 in the index, which cleans it up.

> + echo foo >xfoo &&
> + chmod 755 xfoo &&
> + git add --chmod=-x xfoo &&
> + case "$(git ls-files --stage xfoo)" in
> + 100644" "*xfoo) echo pass;;
> + *) echo fail; git ls-files --stage xfoo; (exit 1);;

Here you just pick another name, "xfoo", which does happen to work. But
it seems like that has the same potential for flakiness if earlier tests
get adjusted or skipped, since they also use that name.

How about just:

  rm -f xfoo1

at the top of the test, which explicitly documents the state we are
looking for?

I also wondered if this test, which calls "chmod 755 xfoo1", should be
marked with the POSIXPERM prerequisite. But I guess since its goal is to
strip the executable bit, it "works" even on systems where that chmod is
a noop (the "git add --chmod" doesn't do anything, but one way or the
other we end up at the end state we expect).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Small trivial annoyance with the nice new builtin "git am"

2016-07-29 Thread Junio C Hamano
Jeff King  writes:

> On Thu, Jul 28, 2016 at 04:47:17PM -0700, Junio C Hamano wrote:
>
>> Also makes me wonder if "git cherry-pick A..B" shares the same
>> breakage.
>
> Probably.

It seems that "cherry-pick A..B" leads to sequencer.c::run_git_commit()
that uses run_command_v_opt() to drive "git commit", so we are safe.

> I guess we want something like:
>
> +void reset_ident_date(void)
> +{
> + strbuf_reset(_default_date);
> +}
> +
>
> and then to sprinkle calls liberally through builtin-ified programs when
> they move from one unit of work to the next.

ident_default_date() is currently the only one that sets this to be
cached, and that is to be used only when there is no user-specified
date.

When I saw the suggestion first time, I was worried if this had
interaction with things like GIT_COMMITTER_DATE environment (because
I didn't have easy access to the source) but it is not the case, so
the change looks very sensible.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[[PATCH v2] 1/4] patch-ids: stop using a hand-rolled hashmap implementation

2016-07-29 Thread Kevin Willford
From: Kevin Willford 

This change will use the hashmap from the hashmap.h to keep track of the
patch_ids that have been encountered instead of using an internal
implementation.  This simplifies the implementation of the patch ids.

Signed-off-by: Kevin Willford 
---
 patch-ids.c | 86 +
 patch-ids.h |  7 +++--
 2 files changed, 32 insertions(+), 61 deletions(-)

diff --git a/patch-ids.c b/patch-ids.c
index a4d0016..db31fa6 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -16,90 +16,62 @@ int commit_patch_id(struct commit *commit, struct 
diff_options *options,
return diff_flush_patch_id(options, sha1);
 }
 
-static const unsigned char *patch_id_access(size_t index, void *table)
+static int patch_id_cmp(struct patch_id *a,
+   struct patch_id *b,
+   void *keydata)
 {
-   struct patch_id **id_table = table;
-   return id_table[index]->patch_id;
+   return hashcmp(a->patch_id, b->patch_id);
 }
 
-static int patch_pos(struct patch_id **table, int nr, const unsigned char *id)
-{
-   return sha1_pos(id, table, nr, patch_id_access);
-}
-
-#define BUCKET_SIZE 190 /* 190 * 21 = 3990, with slop close enough to 4K */
-struct patch_id_bucket {
-   struct patch_id_bucket *next;
-   int nr;
-   struct patch_id bucket[BUCKET_SIZE];
-};
-
 int init_patch_ids(struct patch_ids *ids)
 {
memset(ids, 0, sizeof(*ids));
diff_setup(>diffopts);
DIFF_OPT_SET(>diffopts, RECURSIVE);
diff_setup_done(>diffopts);
+   hashmap_init(>patches, (hashmap_cmp_fn)patch_id_cmp, 256);
return 0;
 }
 
 int free_patch_ids(struct patch_ids *ids)
 {
-   struct patch_id_bucket *next, *patches;
-
-   free(ids->table);
-   for (patches = ids->patches; patches; patches = next) {
-   next = patches->next;
-   free(patches);
-   }
+   hashmap_free(>patches, 1);
return 0;
 }
 
-static struct patch_id *add_commit(struct commit *commit,
-  struct patch_ids *ids,
-  int no_add)
+static int init_patch_id_entry(struct patch_id *patch,
+  struct commit *commit,
+  struct patch_ids *ids)
 {
-   struct patch_id_bucket *bucket;
-   struct patch_id *ent;
-   unsigned char sha1[20];
-   int pos;
-
-   if (commit_patch_id(commit, >diffopts, sha1))
-   return NULL;
-   pos = patch_pos(ids->table, ids->nr, sha1);
-   if (0 <= pos)
-   return ids->table[pos];
-   if (no_add)
-   return NULL;
-
-   pos = -1 - pos;
+   if (commit_patch_id(commit, >diffopts, patch->patch_id))
+   return -1;
 
-   bucket = ids->patches;
-   if (!bucket || (BUCKET_SIZE <= bucket->nr)) {
-   bucket = xcalloc(1, sizeof(*bucket));
-   bucket->next = ids->patches;
-   ids->patches = bucket;
-   }
-   ent = >bucket[bucket->nr++];
-   hashcpy(ent->patch_id, sha1);
-
-   ALLOC_GROW(ids->table, ids->nr + 1, ids->alloc);
-   if (pos < ids->nr)
-   memmove(ids->table + pos + 1, ids->table + pos,
-   sizeof(ent) * (ids->nr - pos));
-   ids->nr++;
-   ids->table[pos] = ent;
-   return ids->table[pos];
+   hashmap_entry_init(patch, sha1hash(patch->patch_id));
+   return 0;
 }
 
 struct patch_id *has_commit_patch_id(struct commit *commit,
 struct patch_ids *ids)
 {
-   return add_commit(commit, ids, 1);
+   struct patch_id patch;
+
+   memset(, 0, sizeof(patch));
+   if (init_patch_id_entry(, commit, ids))
+   return NULL;
+
+   return hashmap_get(>patches, , NULL);
 }
 
 struct patch_id *add_commit_patch_id(struct commit *commit,
 struct patch_ids *ids)
 {
-   return add_commit(commit, ids, 0);
+   struct patch_id *key = xcalloc(1, sizeof(*key));
+
+   if (init_patch_id_entry(key, commit, ids)) {
+   free(key);
+   return NULL;
+   }
+
+   hashmap_add(>patches, key);
+   return key;
 }
diff --git a/patch-ids.h b/patch-ids.h
index eeb56b3..9569ee0 100644
--- a/patch-ids.h
+++ b/patch-ids.h
@@ -2,15 +2,14 @@
 #define PATCH_IDS_H
 
 struct patch_id {
-   unsigned char patch_id[20];
+   struct hashmap_entry ent;
+   unsigned char patch_id[GIT_SHA1_RAWSZ];
char seen;
 };
 
 struct patch_ids {
+   struct hashmap patches;
struct diff_options diffopts;
-   int nr, alloc;
-   struct patch_id **table;
-   struct patch_id_bucket *patches;
 };
 
 int commit_patch_id(struct commit *commit, struct diff_options *options,
-- 
2.9.2.gvfs.2.42.gb7633a3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message 

Re: [PATCH v2 0/5] Git filter protocol

2016-07-29 Thread Lars Schneider

> On 29 Jul 2016, at 17:57, Jeff King  wrote:
> 
> On Fri, Jul 29, 2016 at 10:14:17AM +0200, Lars Schneider wrote:
> 
>> My current implementation supports only two cases. Either the filter
>> knows the size and sends it back. Or the filter doesn't know the size
>> and Git reads until the flush packet (your "unknown" case). "Approx" is 
>> probably hard to do and fail shouldn't be part of the size, no?
> 
> Ah, OK, I missed that you could handle both cases. I think that is a
> reasonable approach. It means the filter has to bother with pkt-lines,
> but beyond that, it can choose the simple or streaming approach as
> appropriate.

Right.


>> That being said a "fail" response is a very good idea! This allows
>> the filter to communicate to git that a non required filter process
>> failed. I will add that to the protocol. Thanks :) 
> 
> Maybe just send "ok ", "ok -1" (for streaming), or "fail "
> followed by the content? That is similar to other Git protocols, though
> I am not sure they are good models for sanity or extensibility. :)
> 
> I don't know if you would want to leave room for other "headers" in the
> response, but you could also do something more HTTP-like, with a status
> code, and arbitrary headers. And presumably git would just ignore
> headers it doesn't know about. I think that's what Jakub's example was
> leaning towards. I'm just not sure what other headers are really useful,
> but it does leave room for extensibility.

Well, "ok " wouldn't make much sense as we already transmitted
the size upfront I think. Right now I have implemented the following options:

"success\n" --> everything was alright
"reject\n" --> the filter rejected the operation but this is no error 
   if "filter..required = false"
 --> failure that stops/restarts the filter process

I don't think sending any failure reason makes sense because if a failure
happens then we are likely in a bad state already (that's why I restart the
filter process. I think the filter can report trouble on its own via stdout,
no? I think this is what Git-LFS already does.

I am working on the docs right now and afterwards I will send a v3 :-)

- Lars
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[[PATCH v2] 4/4] rebase: avoid computing unnecessary patch IDs

2016-07-29 Thread Kevin Willford
From: Kevin Willford 

The `rebase` family of Git commands avoid applying patches that were
already integrated upstream. They do that by using the revision walking
option that computes the patch IDs of the two sides of the rebase
(local-only patches vs upstream-only ones) and skipping those local
patches whose patch ID matches one of the upstream ones.

In many cases, this causes unnecessary churn, as already the set of
paths touched by a given commit would suffice to determine that an
upstream patch has no local equivalent.

This hurts performance in particular when there are a lot of upstream
patches, and/or large ones.

Therefore, let's introduce the concept of a "diff-header-only" patch ID,
compare those first, and only evaluate the "full" patch ID lazily.

Please note that in contrast to the "full" patch IDs, those
"diff-header-only" patch IDs are prone to collide with one another, as
adjacent commits frequently touch the very same files. Hence we now
have to be careful to allow multiple hash entries with the same hash.
We accomplish that by using the hashmap_add() function that does not even
test for hash collisions.  This also allows us to evaluate the full patch ID
lazily, i.e. only when we found commits with matching diff-header-only
patch IDs.

We add a performance test that demonstrates ~1-6% improvement.  In
practice this will depend on various factors such as how many upstream
changes and how big those changes are along with whether file system
caches are cold or warm.  As Git's test suite has no way of catching
performance regressions, we also add a regression test that verifies
that the full patch ID computation is skipped when the diff-header-only
computation suffices.

Signed-off-by: Kevin Willford 
---
 builtin/log.c|  2 +-
 patch-ids.c  | 22 --
 patch-ids.h  |  2 +-
 t/perf/p3400-rebase.sh   | 36 
 t/t6007-rev-list-cherry-pick-file.sh | 17 +
 5 files changed, 71 insertions(+), 8 deletions(-)
 create mode 100644 t/perf/p3400-rebase.sh

diff --git a/builtin/log.c b/builtin/log.c
index fd1652f..b076993 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1331,7 +1331,7 @@ static void prepare_bases(struct base_tree_info *bases,
struct object_id *patch_id;
if (commit->util)
continue;
-   if (commit_patch_id(commit, , sha1))
+   if (commit_patch_id(commit, , sha1, 0))
die(_("cannot get patch id"));
ALLOC_GROW(bases->patch_id, bases->nr_patch_id + 1, 
bases->alloc_patch_id);
patch_id = bases->patch_id + bases->nr_patch_id;
diff --git a/patch-ids.c b/patch-ids.c
index 69a14a3..0a4828a 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -5,7 +5,7 @@
 #include "patch-ids.h"
 
 int commit_patch_id(struct commit *commit, struct diff_options *options,
-   unsigned char *sha1)
+   unsigned char *sha1, int diff_header_only)
 {
if (commit->parents)
diff_tree_sha1(commit->parents->item->object.oid.hash,
@@ -13,13 +13,21 @@ int commit_patch_id(struct commit *commit, struct 
diff_options *options,
else
diff_root_tree_sha1(commit->object.oid.hash, "", options);
diffcore_std(options);
-   return diff_flush_patch_id(options, sha1, 0);
+   return diff_flush_patch_id(options, sha1, diff_header_only);
 }
 
 static int patch_id_cmp(struct patch_id *a,
struct patch_id *b,
-   void *keydata)
+   struct diff_options *opt)
 {
+   if (is_null_sha1(a->patch_id) &&
+   commit_patch_id(a->commit, opt, a->patch_id, 0))
+   return error("Could not get patch ID for %s",
+   oid_to_hex(>commit->object.oid));
+   if (is_null_sha1(b->patch_id) &&
+   commit_patch_id(b->commit, opt, b->patch_id, 0))
+   return error("Could not get patch ID for %s",
+   oid_to_hex(>commit->object.oid));
return hashcmp(a->patch_id, b->patch_id);
 }
 
@@ -43,11 +51,13 @@ static int init_patch_id_entry(struct patch_id *patch,
   struct commit *commit,
   struct patch_ids *ids)
 {
+   unsigned char header_only_patch_id[GIT_SHA1_RAWSZ];
+
patch->commit = commit;
-   if (commit_patch_id(commit, >diffopts, patch->patch_id))
+   if (commit_patch_id(commit, >diffopts, header_only_patch_id, 1))
return -1;
 
-   hashmap_entry_init(patch, sha1hash(patch->patch_id));
+   hashmap_entry_init(patch, sha1hash(header_only_patch_id));
return 0;
 }
 
@@ -60,7 +70,7 @@ struct patch_id *has_commit_patch_id(struct commit *commit,
if (init_patch_id_entry(, commit, ids))
   

Re: [PATCH v2 0/5] Git filter protocol

2016-07-29 Thread Jeff King
On Fri, Jul 29, 2016 at 06:20:51PM +0200, Lars Schneider wrote:

> >> That being said a "fail" response is a very good idea! This allows
> >> the filter to communicate to git that a non required filter process
> >> failed. I will add that to the protocol. Thanks :) 
> > 
> > Maybe just send "ok ", "ok -1" (for streaming), or "fail "
> > followed by the content? That is similar to other Git protocols, though
> > I am not sure they are good models for sanity or extensibility. :)
> > 
> > I don't know if you would want to leave room for other "headers" in the
> > response, but you could also do something more HTTP-like, with a status
> > code, and arbitrary headers. And presumably git would just ignore
> > headers it doesn't know about. I think that's what Jakub's example was
> > leaning towards. I'm just not sure what other headers are really useful,
> > but it does leave room for extensibility.
> 
> Well, "ok " wouldn't make much sense as we already transmitted
> the size upfront I think. Right now I have implemented the following options:

Maybe I'm confused about where in the protocol we are. I was imagining:

  git> smudge
  git> 
  git> 
  git> ...pkt-lines...
  git> pktline-flush

  git< ok 
  git< ...pkt-lines...
  git< flush

That is, we should say "I have something for you" or "I do not" before
sending a size, because in the "I do not" case we have no size to send.

A more extensible protocol might look like:

  git> smudge
  git> filename=
  git> size=
  git> pktline-flush
  git> ...pkt-lines of data...
  git> pktline-flush

  git< ok (or success, or whatever status code you like)
  git< size=
  git< pkt-line-flush
  git< ...pkt-lines of data...
  git< pktline-flush

That leaves room for new "keys" to be added before the first pkt-flush,
without having to change the parsing at all.

> "success\n" --> everything was alright
> "reject\n" --> the filter rejected the operation but this is no error 
>if "filter..required = false"
>  --> failure that stops/restarts the filter process
> 
> I don't think sending any failure reason makes sense because if a failure
> happens then we are likely in a bad state already (that's why I restart the
> filter process. I think the filter can report trouble on its own via stdout,
> no? I think this is what Git-LFS already does.

Git-LFS sends to stderr because there's no other option. I wonder if it
would be nicer to make it Git's responsibility to talk to the user,
because then it could respect things like "--quiet". I guess error
messages are generally printed regardless of verbosity, though, so
printing them unconditionally is OK.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git bisect for reachable commits only

2016-07-29 Thread Junio C Hamano
Oleg Taranenko  writes:

> What I suggest change logic of bisecting to something like
>
>   git config bisect.reachable true

Such a configuration should not be needed.

When a history with this shape is given to "git bisect":

o---o---X---Y---B
 \ /
  o---G

and you gave G as good, and B as bad, it is a BUG that needs to be
fixed if bisect strayed outside G, X, Y and B.  Setting your new
configuration to false would mean "please run a buggy version of
bisect", which does not make much sense, I would think.





--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] reset cached ident date before creating objects

2016-07-29 Thread Jeff King
On Fri, Jul 29, 2016 at 10:15:26AM -0700, Junio C Hamano wrote:

> > One obvious impact would be reflog entries, since we would reset the
> > time between the object creation and the ref write (so your reflog entry
> > would sometimes be a second or more after the commit time it writes).
> > I'm not sure how much anybody _cares_ about that; they're much less
> > intimate than author/committer times.
> 
> As long as it is understood that a commit object is created and then
> a ref is updated to point at it in this order, I do not think there
> is any confusion on the party who reads the reflog, I would think.

Actually, I think we can trivially keep this the same.

Linus suggested resetting the timestamp after making the commit, to
clear the way for the next commit. But if we reset any cached value
_before_ making the commit, this has a few advantages:

  - the cached timestamp remains the same afterwards for anything which
wants to look at it (like reflog updates, but also potentially
anything that wants to report the ident it just used).

  - this gives a more accurate timestamp if the distance between caching
and the actual commit creation is more than a second (and this does
happen; we call git_committer_info() in many places besides creating
an actual commit).

So here's a patch. I gave tags the same treatment, though I don't know
if there are any cases that create a series of tags. I grepped through
all the calls to git_committer_info(), and I didn't see any others that
would want to reset the date.

It does feel a little backwards to cache by default, and then try to
catch all the places that want to reset. Another way of thinking about
it would be to almost _never_ cache, but let a few callsites like (the
commit object creation) explicitly ask for a stable timestamp between
the author and committer. That would be a lot more invasive, though. And
it just gives us the opposite problem: finding all sites which care
about stability and annotating them.

(In fact, even this patch may regress some cases that want stability,
though I could not think of any. The test suite does not complain, but
that's not surprising; it has to avoid looking at this kind of thing
entirely, or it would be racy).

-- >8 --
Subject: reset cached ident date before creating objects

When we compute the date to put in author/committer lines of
commits, or tagger lines of tags, we get the current date
once and then cache it for the rest of the program.  This is
a good thing in some cases, like "git commit", because it
means we do not racily assign different times to the
author/committer fields of a single commit object.

But as more programs start to make many commits in a single
process (e.g., the recently builtin "git am"), it means that
you'll get long strings of commits with identical committer
timestamps (whereas before, we invoked "git commit" many
times and got true timestamps).

This patch addresses it by letting callers reset the cached
time, which means they'll get a fresh time on their next
call to git_committer_info() or git_author_info().  We do so
automatically before filling in the ident fields of commit
and tag objects. That retains the property that committers
and authors in a single object will match, but means that
separate objects we create should always get their own
fresh timestamps.

There's no automated test, because it would be inherently
racy (it depends on whether the program takes multiple
seconds to run). But you can see the effect with something
like:

  # make a fake 100-patch series; use --first-parent
  # so that we pretend merges are just more patches
  top=$(git rev-parse HEAD)
  bottom=$(git rev-list --first-parent -100 HEAD | tail -n 1)
  git log --format=email --reverse --first-parent \
  --binary -m -p $bottom..$top >patch

  # now apply it; this presumably takes multiple seconds
  git checkout --detach $bottom
  git am 
Signed-off-by: Jeff King 
---
 builtin/tag.c | 1 +
 cache.h   | 1 +
 commit.c  | 1 +
 ident.c   | 5 +
 4 files changed, 8 insertions(+)

diff --git a/builtin/tag.c b/builtin/tag.c
index 50e4ae5..3025e7f 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -225,6 +225,7 @@ static void create_tag(const unsigned char *object, const 
char *tag,
if (type <= OBJ_NONE)
die(_("bad object type."));
 
+   reset_ident_date();
header_len = snprintf(header_buf, sizeof(header_buf),
  "object %s\n"
  "type %s\n"
diff --git a/cache.h b/cache.h
index b5f76a4..31e65f9 100644
--- a/cache.h
+++ b/cache.h
@@ -1269,6 +1269,7 @@ extern const char *ident_default_email(void);
 extern const char *git_editor(void);
 extern const char *git_pager(int stdout_is_tty);
 extern int git_ident_config(const char *, const char *, void *);
+extern void reset_ident_date(void);
 
 struct ident_split {
const char *name_begin;
diff --git a/commit.c b/commit.c
index 

Re: Small trivial annoyance with the nice new builtin "git am"

2016-07-29 Thread Junio C Hamano
On Thu, Jul 28, 2016 at 5:37 PM, Linus Torvalds
 wrote:
> On Thu, Jul 28, 2016 at 5:29 PM, Jeff King  wrote:
>>
>> I guess we want something like:
>>
>> +void reset_ident_date(void)
>> +{
>> +   strbuf_reset(_default_date);
>> +}
>
> Looks sane.
>
>> and then to sprinkle calls liberally through builtin-ified programs when
>> they move from one unit of work to the next.
>
> Maybe we can just add it to the end of commit_tree_extended(), and
> just say "the cache is reset between commits".
>
> That way there is no sprinking in random places.

Hmph, wouldn't that equally work well if we cleared at the beginning of the
function, instead of at the end?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with global config defaults "user.useConfigOnly = true" + "pull.rebase = preserve" - "user.email"

2016-07-29 Thread Jeff King
On Fri, Jul 29, 2016 at 10:47:27AM -0700, Junio C Hamano wrote:

> > In those cases specifically, I never have local commits that differ
> > from the remote, so a "pull --ff-only" should leave me in the same
> > state as a "pull --rebase".
> >
> > Is this a case of rebase trying to make sure it has enough information
> > for me to be a committer before knowing whether I even need to rewrite
> > any commits, and could/should that be avoided?  Alternatively (or also)
> > could/should rebase detect that a fast-forward is possible and prefer
> > to do that instead?
> 
> I think that is a reasonable argument, but to solve this for a more
> general case, shouldn't we be discussing a solution that would also
> work when rebase _does_ need to create a new commit?  And when the
> latter is solved, I would imagine that "this rebase happens to be
> fast-forward, and not having an ident shouldn't be an issue for this
> special case" would become moot.

Wouldn't it be wrong to create a commit with non-config ident when
user.useConfigOnly is set, though? That is the exact point when it
should kick in, to tell the user "you thought it would not matter here,
but in this case we _do_ need your real ident; what should we do?"

If the user is doing a one-off thing where they do not care if their
crappy, fake ident makes it into a commit object, then the right thing
is:

  git -c user.useConfigOnly=false pull --rebase

or even:

  git -c user.email=fake-but...@example.com pull --rebase

And they can do that preemptively for commands like the golang example
here. They shouldn't _have_ to do that, though, if the command wouldn't
actually create a commit. So I do think there may be a bug to be fixed,
but it is simply commands being over-eager to make sure we have an
ident when they might not need it.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reset cached ident date before creating objects

2016-07-29 Thread Linus Torvalds
On Fri, Jul 29, 2016 at 11:05 AM, Jeff King  wrote:
>
> Linus suggested resetting the timestamp after making the commit, to
> clear the way for the next commit. But if we reset any cached value
> _before_ making the commit, this has a few advantages:

Looks fine to me. It should trivially fix the git-am issue, and I
can't see what it could break. Famous last words.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reset cached ident date before creating objects

2016-07-29 Thread Junio C Hamano
Jeff King  writes:

> Linus suggested resetting the timestamp after making the commit, to
> clear the way for the next commit. But if we reset any cached value
> _before_ making the commit, this has a few advantages:

I guess our mails crossed ;-).

> It does feel a little backwards to cache by default, and then try to
> catch all the places that want to reset. Another way of thinking about
> it would be to almost _never_ cache, but let a few callsites like (the
> commit object creation) explicitly ask for a stable timestamp between
> the author and committer.

... and the reflog?

I would say that the approach taken by this version is perfectly
sensible, if we don't look at it as a "cache" and instead look at it
as a "snapshot" of the clock for the duration of the operation.
"reset" is like "now we are starting another operation, so grab a
snapshot please".

The changes to both tagging and committing look sensible.  Thanks.

> -- >8 --
> Subject: reset cached ident date before creating objects
>
> When we compute the date to put in author/committer lines of
> commits, or tagger lines of tags, we get the current date
> once and then cache it for the rest of the program.  This is
> a good thing in some cases, like "git commit", because it
> means we do not racily assign different times to the
> author/committer fields of a single commit object.
>
> But as more programs start to make many commits in a single
> process (e.g., the recently builtin "git am"), it means that
> you'll get long strings of commits with identical committer
> timestamps (whereas before, we invoked "git commit" many
> times and got true timestamps).
>
> This patch addresses it by letting callers reset the cached
> time, which means they'll get a fresh time on their next
> call to git_committer_info() or git_author_info().  We do so
> automatically before filling in the ident fields of commit
> and tag objects. That retains the property that committers
> and authors in a single object will match, but means that
> separate objects we create should always get their own
> fresh timestamps.
>
> There's no automated test, because it would be inherently
> racy (it depends on whether the program takes multiple
> seconds to run). But you can see the effect with something
> like:
>
>   # make a fake 100-patch series; use --first-parent
>   # so that we pretend merges are just more patches
>   top=$(git rev-parse HEAD)
>   bottom=$(git rev-list --first-parent -100 HEAD | tail -n 1)
>   git log --format=email --reverse --first-parent \
>   --binary -m -p $bottom..$top >patch
>
>   # now apply it; this presumably takes multiple seconds
>   git checkout --detach $bottom
>   git am 
>   # now count the number of distinct committer times;
>   # prior to this patch, there would only be one, but
>   # now we'd typically see several.
>   git log --format=%ct $bottom.. | sort -u
>
> Suggested-by: Linus Torvalds 
> Signed-off-by: Jeff King 
> ---
>  builtin/tag.c | 1 +
>  cache.h   | 1 +
>  commit.c  | 1 +
>  ident.c   | 5 +
>  4 files changed, 8 insertions(+)
>
> diff --git a/builtin/tag.c b/builtin/tag.c
> index 50e4ae5..3025e7f 100644
> --- a/builtin/tag.c
> +++ b/builtin/tag.c
> @@ -225,6 +225,7 @@ static void create_tag(const unsigned char *object, const 
> char *tag,
>   if (type <= OBJ_NONE)
>   die(_("bad object type."));
>  
> + reset_ident_date();
>   header_len = snprintf(header_buf, sizeof(header_buf),
> "object %s\n"
> "type %s\n"
> diff --git a/cache.h b/cache.h
> index b5f76a4..31e65f9 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1269,6 +1269,7 @@ extern const char *ident_default_email(void);
>  extern const char *git_editor(void);
>  extern const char *git_pager(int stdout_is_tty);
>  extern int git_ident_config(const char *, const char *, void *);
> +extern void reset_ident_date(void);
>  
>  struct ident_split {
>   const char *name_begin;
> diff --git a/commit.c b/commit.c
> index 71a360d..7ddbffe 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1548,6 +1548,7 @@ int commit_tree_extended(const char *msg, size_t 
> msg_len,
>   }
>  
>   /* Person/date information */
> + reset_ident_date();
>   if (!author)
>   author = git_author_info(IDENT_STRICT);
>   strbuf_addf(, "author %s\n", author);
> diff --git a/ident.c b/ident.c
> index 139c528..e20a772 100644
> --- a/ident.c
> +++ b/ident.c
> @@ -184,6 +184,11 @@ static const char *ident_default_date(void)
>   return git_default_date.buf;
>  }
>  
> +void reset_ident_date(void)
> +{
> + strbuf_reset(_default_date);
> +}
> +
>  static int crud(unsigned char c)
>  {
>   return  c <= 32  ||
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with global config defaults "user.useConfigOnly = true" + "pull.rebase = preserve" - "user.email"

2016-07-29 Thread Junio C Hamano
Junio C Hamano  writes:

> Dakota Hawkins  writes:
>
>> In those cases specifically, I never have local commits that differ
>> from the remote, so a "pull --ff-only" should leave me in the same
>> state as a "pull --rebase".
>>
>> Is this a case of rebase trying to make sure it has enough information
>> for me to be a committer before knowing whether I even need to rewrite
>> any commits, and could/should that be avoided?  Alternatively (or also)
>> could/should rebase detect that a fast-forward is possible and prefer
>> to do that instead?
>
> I think that is a reasonable argument,...

There is one that still wants to know who you are, I think.  The
reflog entries record who moved the tip of the ref and when, and
obviously a fast-forward is also recorded.

I _think_ our intention was to allow a bogus ident in reflog entries
(even though we want to avoid a bogus ident in commits and tags), so
perhaps additional code/logic for user.useConfigOnly may need to know
about that (I didn't dig)?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 32/41] environment: add set_index_file()

2016-07-29 Thread Christian Couder
On Fri, Jul 29, 2016 at 5:34 PM, Duy Nguyen  wrote:
>
> Yeah. If the libification movement is going strong, we can start
> converting and at some point should be able to define
> NO_THE_INDEX_COMPATIBILITY_MACROS globally (and avoid the_index along
> the way)
>
> Without that, there is a risk. I looked at 'nm apply.o | grep ' [Uu]
> '' and I don't see any external functions that would potentially
> access the index, except ll_merge. Again there's a good chance I may
> have missed something.
>
>> So it looks like it is a very big and different project to make the
>> current libified code be explicit about which index it is using.
>> And by the way perhaps this other project should just remove the
>> "the_index" global altogether.
>
> This is probably the way to go. But it's the boring sort of work that
> nobody wants to do :(

Do you mean that it might be a source of micro-projects for the next
GSoC or Outreachy? ;-)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reset cached ident date before creating objects

2016-07-29 Thread Jeff King
On Fri, Jul 29, 2016 at 11:15:39AM -0700, Junio C Hamano wrote:

> > It does feel a little backwards to cache by default, and then try to
> > catch all the places that want to reset. Another way of thinking about
> > it would be to almost _never_ cache, but let a few callsites like (the
> > commit object creation) explicitly ask for a stable timestamp between
> > the author and committer.
> 
> ... and the reflog?

I left that part out. I can't decide if it is a bug or a feature that
the reflog retains the same timestamp.

I.e., I'm not sure it would be wrong to do:

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 12290d2..465cfc5 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2886,6 +2886,8 @@ static int log_ref_write_1(const char *refname, const 
unsigned char *old_sha1,
logfd = open(logfile->buf, oflags);
if (logfd < 0)
return 0;
+
+   reset_ident_date();
result = log_ref_write_fd(logfd, old_sha1, new_sha1,
  git_committer_info(0), msg);
if (result) {

on top. If somebody writes a lot of refs without updating any commit
objects, should those all have a stable timestamp or not?

On the one hand, moving the timestamp reflects reality.

On the other, I have done dirty things in the past like "undoing" the
results of somebody's mistaken:

  git clone ...
  git push --mirror ;# oops, this deletes everything except master!

by grouping ref updates according to their reflog mtimes.

So I kind of punted by not changing it but also not making any claims
either way. :)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/5] Git filter protocol

2016-07-29 Thread Jeff King
On Fri, Jul 29, 2016 at 07:43:49PM +0200, Lars Schneider wrote:

> Here is the reject case (non-streaming):
> 
> git> smudge
> git> 
> git> 
> git> ...pkt-lines...
> git> pktline-flush
> 
> git< 0
> git< reject
> 
> 
> Do you see a problem with this approach?

Only that it seemed a little weird to me to have to write a meaningless
"0" when "reject" covers the situation entirely. I don't think it's
wrong, though (and even in some ways right, because it decouples the
meaning of "reject" from the syntax of parsing, but I think it's OK for
the protocol parser to understand the difference between success and
failure codes).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Small trivial annoyance with the nice new builtin "git am"

2016-07-29 Thread Junio C Hamano
Jeff King  writes:

> On Thu, Jul 28, 2016 at 05:37:08PM -0700, Linus Torvalds wrote:
>
>> > and then to sprinkle calls liberally through builtin-ified programs when
>> > they move from one unit of work to the next.
>> 
>> Maybe we can just add it to the end of commit_tree_extended(), and
>> just say "the cache is reset between commits".
>> 
>> That way there is no sprinking in random places.
>
> Hmm, yeah, that might work. As you mentioned, there are cases where we
> really do want the timestamps to match (especially between author and
> committer). So we would not want this reset to kick in where callers
> would not want it.
>
> So I'm trying to play devil's advocate and think of a case where
> somebody would not want the time reset after creating a commit.
>
> One obvious impact would be reflog entries, since we would reset the
> time between the object creation and the ref write (so your reflog entry
> would sometimes be a second or more after the commit time it writes).
> I'm not sure how much anybody _cares_ about that; they're much less
> intimate than author/committer times.

As long as it is understood that a commit object is created and then
a ref is updated to point at it in this order, I do not think there
is any confusion on the party who reads the reflog, I would think.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Small trivial annoyance with the nice new builtin "git am"

2016-07-29 Thread Junio C Hamano
Linus Torvalds  writes:

> So we do want to cache things for a single commit, it's just that for
> things like "git am" (or, like Junio wondered, "git rebase" - I didn't
> check) we probabyl just just flush the cache in between commits.

What I cautioned was indeed "git rebase", and the codepath that uses
"git am" internally would be fixed when "git am" is fixed, so it is
OK.

What I wondered was actually "git cherry-pick A..B", but I think it
runs "git commit" via the run_command() API as external process, so
it should also be safe.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix failing test t3700-add.sh

2016-07-29 Thread Junio C Hamano
Jeff King  writes:

> I was also puzzled why the test fails for you; it does not for me.
> Running the test script as root does make it fail. There are some
> earlier tests which are skipped in this case, which run "git reset
> --hard" with xfoo1 in the index, which cleans it up.
>
>> +echo foo >xfoo &&
>> +chmod 755 xfoo &&
>> +git add --chmod=-x xfoo &&
>> +case "$(git ls-files --stage xfoo)" in
>> +100644" "*xfoo) echo pass;;
>> +*) echo fail; git ls-files --stage xfoo; (exit 1);;
>
> Here you just pick another name, "xfoo", which does happen to work. But
> it seems like that has the same potential for flakiness if earlier tests
> get adjusted or skipped, since they also use that name.
>
> How about just:
>
>   rm -f xfoo1
>
> at the top of the test, which explicitly documents the state we are
> looking for?

That's much more sensible.

> I also wondered if this test, which calls "chmod 755 xfoo1", should be
> marked with the POSIXPERM prerequisite. But I guess since its goal is to
> strip the executable bit, it "works" even on systems where that chmod is
> a noop (the "git add --chmod" doesn't do anything, but one way or the
> other we end up at the end state we expect).

We could make sure --chmod=[-+]x works both ways, which would be
more robust on either type of underlying platform.  Something along
the lines of

echo foo >xfoo1 &&
git add --chmod=+x xfoo1 &&
test_mode_in_index 100755 xfoo1 &&
git add --chmod=-x xfoo1 &&
test_mode_in_index 100644 xfoo1

with an obvious addition of a test_mode_in_index helper function as
the same "case $(ls-files -s) in ... esac" pattern appears number of
times.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/5] Git filter protocol

2016-07-29 Thread Lars Schneider

> On 29 Jul 2016, at 18:50, Jeff King  wrote:
> 
> On Fri, Jul 29, 2016 at 06:20:51PM +0200, Lars Schneider wrote:
> 
 That being said a "fail" response is a very good idea! This allows
 the filter to communicate to git that a non required filter process
 failed. I will add that to the protocol. Thanks :) 
>>> 
>>> Maybe just send "ok ", "ok -1" (for streaming), or "fail "
>>> followed by the content? That is similar to other Git protocols, though
>>> I am not sure they are good models for sanity or extensibility. :)
>>> 
>>> I don't know if you would want to leave room for other "headers" in the
>>> response, but you could also do something more HTTP-like, with a status
>>> code, and arbitrary headers. And presumably git would just ignore
>>> headers it doesn't know about. I think that's what Jakub's example was
>>> leaning towards. I'm just not sure what other headers are really useful,
>>> but it does leave room for extensibility.
>> 
>> Well, "ok " wouldn't make much sense as we already transmitted
>> the size upfront I think. Right now I have implemented the following options:
> 
> Maybe I'm confused about where in the protocol we are. I was imagining:
> 
>  git> smudge
>  git> 
>  git> 
>  git> ...pkt-lines...
>  git> pktline-flush
> 
>  git< ok 
>  git< ...pkt-lines...
>  git< flush
> 
> That is, we should say "I have something for you" or "I do not" before
> sending a size, because in the "I do not" case we have no size to send.


Right now the protocol is like that in the happy case (non-streaming):

git> smudge
git> 
git> 
git> ...pkt-lines...
git> pktline-flush

git< 
git< ...pkt-lines...
git< flush
git< success

(diff to your version: no "ok" in front of size answer ... plus the
size answer is not present in the streaming case)


Here is the reject case (non-streaming):

git> smudge
git> 
git> 
git> ...pkt-lines...
git> pktline-flush

git< 0
git< reject


Do you see a problem with this approach?


> A more extensible protocol might look like:
> 
>  git> smudge
>  git> filename=
>  git> size=
>  git> pktline-flush
>  git> ...pkt-lines of data...
>  git> pktline-flush
> 
>  git< ok (or success, or whatever status code you like)
>  git< size=
>  git< pkt-line-flush
>  git< ...pkt-lines of data...
>  git< pktline-flush
> 
> That leaves room for new "keys" to be added before the first pkt-flush,
> without having to change the parsing at all.

Alright. Will be in v3!


>> "success\n" --> everything was alright
>> "reject\n" --> the filter rejected the operation but this is no error 
>>   if "filter..required = false"
>>  --> failure that stops/restarts the filter process
>> 
>> I don't think sending any failure reason makes sense because if a failure
>> happens then we are likely in a bad state already (that's why I restart the
>> filter process. I think the filter can report trouble on its own via stdout,
>> no? I think this is what Git-LFS already does.
> 
> Git-LFS sends to stderr because there's no other option. I wonder if it
> would be nicer to make it Git's responsibility to talk to the user,
> because then it could respect things like "--quiet". I guess error
> messages are generally printed regardless of verbosity, though, so
> printing them unconditionally is OK.

OK!

Thanks,
Lars
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with global config defaults "user.useConfigOnly = true" + "pull.rebase = preserve" - "user.email"

2016-07-29 Thread Junio C Hamano
Dakota Hawkins  writes:

> I have a question which may be a bug (I'm a bit skeptical), but here goes:
>
> In my global .gitconfig, I have "user.useConfigOnly = true" and
> user.email isn't set there (I prefer to be forced to set it on a
> per-repo basis, as I use different emails for work and personal
> repos). I ALSO have "pull.rebase = preserve" set.
>
> An example of the problem I have is with tools like golang (I filed an
> issue there, they closed it and suggested the problem is with git or
> my config: https://github.com/golang/go/issues/16516#issuecomment-235800085)
> that use git to pull in package repos without any real user
> interaction. When something like that runs a git pull for me (to
> update a package repo) my global config makes it try to rebase, which
> fails because git doesn't know who I am.

It's an interesting chicken-and-egg problem that user.useConfigOnly
introduces.  It seems that the design of that configuration variable
is not perfect and has room for improvement.

> In those cases specifically, I never have local commits that differ
> from the remote, so a "pull --ff-only" should leave me in the same
> state as a "pull --rebase".
>
> Is this a case of rebase trying to make sure it has enough information
> for me to be a committer before knowing whether I even need to rewrite
> any commits, and could/should that be avoided?  Alternatively (or also)
> could/should rebase detect that a fast-forward is possible and prefer
> to do that instead?

I think that is a reasonable argument, but to solve this for a more
general case, shouldn't we be discussing a solution that would also
work when rebase _does_ need to create a new commit?  And when the
latter is solved, I would imagine that "this rebase happens to be
fast-forward, and not having an ident shouldn't be an issue for this
special case" would become moot.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] diff-highlight: add support for git log --graph output.

2016-07-29 Thread Jeff King
On Fri, Jul 29, 2016 at 07:51:27AM -0700, Brian Henderson wrote:

> (resending as thread instead of attachments)

Hmm, the actual patches don't seem to have made it to the list. :(

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-series: track changes to a patch series over time

2016-07-29 Thread Stefan Beller
On Fri, Jul 29, 2016 at 5:44 AM, Richard Ipsum
 wrote:
>>
>> These definitely seem like a family of related problems.  I'd like to
>> use git-series as a format for storing iterations on things like GitHub
>> pull-requests or Gerrit patch versions (in the latter case, overcoming
>> Gerrit's limitations on only handling one patch at a time).  Integrating
>> reviews with that seems helpful.
>
> Worth noting here that Gerrit's one patch per change format isn't
> intrinsic to Notedb, since we just need to track the sha we want
> to merge and optionally the branch we intend to merge into.

Note that Gerrit started to lose the "one patch at a time" notion.
It is possible to at least submit multiple changes coupled together
(even across project boundaries) via the topic. Some sort of cover
letter is missing though, that could be used e.g. for the merge commit.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] convert: add filter..process option

2016-07-29 Thread Junio C Hamano
Lars Schneider  writes:

> I think sending it upfront is nice for buffer allocations of big files
> and it doesn't cost us anything to do it.

While I do NOT think "total size upfront" MUST BE avoided at all costs,
I do not think the above statement to justify it makes ANY sense.

Big files are by definition something you cannot afford to hold its
entirety in core, so you do not want to be told that you'd be fed 40GB
and ask xmalloc to allocate that much.

It allows the reader to be lazy for buffer allocations as long as
you know the file fits in-core, at the cost of forcing the writer to
somehow come up with the total number of bytes even before sending a
single byte (in other words, if the writer cannot produce and hold
the data in-core, it may even have to spool the data in a temporary
file only to count, and then play it back after showing the total
size).

It is good that you allow both mode of operations and the size of
the data can either be given upfront (which allows a single fixed
allocation upfront without realloc, as long as the data fits in
core), or be left "(atend)".

I just don't want to see it oversold as a "feature" that the size
has to come before data.  That is a limitation, not a feature.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with global config defaults "user.useConfigOnly = true" + "pull.rebase = preserve" - "user.email"

2016-07-29 Thread Jeff King
On Fri, Jul 29, 2016 at 11:20:49AM -0700, Junio C Hamano wrote:

> There is one that still wants to know who you are, I think.  The
> reflog entries record who moved the tip of the ref and when, and
> obviously a fast-forward is also recorded.
> 
> I _think_ our intention was to allow a bogus ident in reflog entries
> (even though we want to avoid a bogus ident in commits and tags), so
> perhaps additional code/logic for user.useConfigOnly may need to know
> about that (I didn't dig)?

I think we handle this case OK, or you would not even be able to "git
fetch" into a repository.

It works because the check is predicated on the "strict" flag, and the
reflog writer passes IDENT_NO_STRICT.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with global config defaults "user.useConfigOnly = true" + "pull.rebase = preserve" - "user.email"

2016-07-29 Thread Junio C Hamano
Jeff King  writes:

>> > Is this a case of rebase trying to make sure it has enough information
>> > for me to be a committer before knowing whether I even need to rewrite
>> > any commits, and could/should that be avoided?  Alternatively (or also)
>> > could/should rebase detect that a fast-forward is possible and prefer
>> > to do that instead?
>> 
>> I think that is a reasonable argument, but to solve this for a more
>> general case, shouldn't we be discussing a solution that would also
>> work when rebase _does_ need to create a new commit?  And when the
>> latter is solved, I would imagine that "this rebase happens to be
>> fast-forward, and not having an ident shouldn't be an issue for this
>> special case" would become moot.
>
> Wouldn't it be wrong to create a commit with non-config ident when
> user.useConfigOnly is set, though?

That is exactly what I was getting at.

> If the user is doing a one-off thing where they do not care if their
> crappy, fake ident makes it into a commit object, then the right thing
> is:
>
>   git -c user.useConfigOnly=false pull --rebase
>
> or even:
>
>   git -c user.email=fake-but...@example.com pull --rebase

Hmm, I somehow had an impression that these git commands are not
what the end-user runs from the command line, but wrapper tools like
"go get" has a hardcoded invocation of "git pull".

If a user sets useconfigonly globally, each repository must have
ident the user wants to use in it configured, so I would think that
a solution should be something that makes it easy to do so.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 32/41] environment: add set_index_file()

2016-07-29 Thread Duy Nguyen
On Fri, Jul 29, 2016 at 8:23 PM, Christian Couder
 wrote:
> On Fri, Jul 29, 2016 at 5:34 PM, Duy Nguyen  wrote:
>>
>> Yeah. If the libification movement is going strong, we can start
>> converting and at some point should be able to define
>> NO_THE_INDEX_COMPATIBILITY_MACROS globally (and avoid the_index along
>> the way)
>>
>> Without that, there is a risk. I looked at 'nm apply.o | grep ' [Uu]
>> '' and I don't see any external functions that would potentially
>> access the index, except ll_merge. Again there's a good chance I may
>> have missed something.
>>
>>> So it looks like it is a very big and different project to make the
>>> current libified code be explicit about which index it is using.
>>> And by the way perhaps this other project should just remove the
>>> "the_index" global altogether.
>>
>> This is probably the way to go. But it's the boring sort of work that
>> nobody wants to do :(
>
> Do you mean that it might be a source of micro-projects for the next
> GSoC or Outreachy? ;-)

No that's what I meant by boring. This is not something to interest
GSoC candidates, and while it looks simple, sometimes it needs a good
understanding of git overall, and it's definitely not small enough for
a micro project. It's very similar to i18n-izing the code base.
Luckily Vasco Almeida came out of nowhere and did (still do) that.
There may be another Vasco somewhere ;-)
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with global config defaults "user.useConfigOnly = true" + "pull.rebase = preserve" - "user.email"

2016-07-29 Thread Junio C Hamano
Jeff King  writes:

> ... So I do think there may be a bug to be fixed,
> but it is simply commands being over-eager to make sure we have an
> ident when they might not need it.

36267854 (pull: fast-forward "pull --rebase=true", 2016-06-29) may
be a part of a good solution for that, perhaps?

-- >8 --
From: Junio C Hamano 
Date: Wed, 29 Jun 2016 10:22:31 -0700
Subject: [PATCH] pull: fast-forward "pull --rebase=true"

Signed-off-by: Junio C Hamano 
---
 builtin/pull.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/builtin/pull.c b/builtin/pull.c
index bf3fd3f..2a41d41 100644
--- a/builtin/pull.c
+++ b/builtin/pull.c
@@ -878,10 +878,24 @@ int cmd_pull(int argc, const char **argv, const char 
*prefix)
if (merge_heads.nr > 1)
die(_("Cannot merge multiple branches into empty 
head."));
return pull_into_void(*merge_heads.sha1, curr_head);
-   } else if (opt_rebase) {
-   if (merge_heads.nr > 1)
-   die(_("Cannot rebase onto multiple branches."));
+   }
+   if (opt_rebase && merge_heads.nr > 1)
+   die(_("Cannot rebase onto multiple branches."));
+
+   if (opt_rebase) {
+   struct commit_list *list = NULL;
+   struct commit *merge_head, *head;
+
+   head = lookup_commit_reference(orig_head);
+   commit_list_insert(head, );
+   merge_head = lookup_commit_reference(merge_heads.sha1[0]);
+   if (is_descendant_of(merge_head, list)) {
+   /* we can fast-forward this without invoking rebase */
+   opt_ff = "--ff-only";
+   return run_merge();
+   }
return run_rebase(curr_head, *merge_heads.sha1, 
rebase_fork_point);
-   } else
+   } else {
return run_merge();
+   }
 }
-- 
2.9.2-685-g483c9ea

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with global config defaults "user.useConfigOnly = true" + "pull.rebase = preserve" - "user.email"

2016-07-29 Thread Jeff King
On Fri, Jul 29, 2016 at 11:32:40AM -0700, Junio C Hamano wrote:

> > Wouldn't it be wrong to create a commit with non-config ident when
> > user.useConfigOnly is set, though?
> 
> That is exactly what I was getting at.

Ah, OK, I thought you were trying to explore the opposite direction.

> > If the user is doing a one-off thing where they do not care if their
> > crappy, fake ident makes it into a commit object, then the right thing
> > is:
> >
> >   git -c user.useConfigOnly=false pull --rebase
> >
> > or even:
> >
> >   git -c user.email=fake-but...@example.com pull --rebase
> 
> Hmm, I somehow had an impression that these git commands are not
> what the end-user runs from the command line, but wrapper tools like
> "go get" has a hardcoded invocation of "git pull".

Yeah, the right person or entity to set those options is the one who
knows "the operation I am doing is OK even with bogus ident". I had
assumed if "go get" fell under that category, that it should be the one
to tell it to git (via the config above).

But I am not really sure that is the case. In general "go get" shouldn't
make commits if you aren't doing active work on the repo (AFAIK), and it
should just work.

>From my limited testing, "git pull --rebase" is perfectly fine. The
culprit is "--rebase=preverse", which complains even if it would be a
fast-forward.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with global config defaults "user.useConfigOnly = true" + "pull.rebase = preserve" - "user.email"

2016-07-29 Thread Jeff King
On Fri, Jul 29, 2016 at 02:39:11PM -0400, Jeff King wrote:

> From my limited testing, "git pull --rebase" is perfectly fine. The
> culprit is "--rebase=preverse", which complains even if it would be a
> fast-forward.

That should be preserve, of course. :)

And I think I see what is happening. "preserve" implies
interactive-rebase, which makes an early check that we have valid
committer info, even though we might not actually write any new commits.

So doing this:

diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index ded4595..f0f4777 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -1180,9 +1180,6 @@ To continue rebase after editing, run:
;;
 esac
 
-git var GIT_COMMITTER_IDENT >/dev/null ||
-   die "$(gettext "You need to set your committer info first")"
-
 comment_for_reflog start
 
 if test ! -z "$switch_to"

fixes it for me. I can't figure out if that would have any bad side
effects, though. That check comes from Dscho's original 1b1dce4 (Teach
rebase an interactive mode, 2007-06-25), so there's not much comment on
why it was added specifically.

We would notice the bogus ident later when we actually do try to create
a commit object, but I can guess that this up-front check might give us
a better error message. You get warned up-front, rather than something
like:

  Rebasing (1/1)
  *** Please tell me who you are.
  [...]
  fatal: no name was given and auto-detection is disabled
  Could not pick 8ebea123853128ca2411b2b449f76a1a4b0d026c

and dumped in the middle of an interactive rebase that you cannot
complete. OTOH, that is how a regular non-interactive merge works. And
if your next step is to set up your ident, then it's natural to do:

  git config user.email whatever
  git rebase --continue

So I'd lean towards dropping it, but maybe there are other hidden
gotchas.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 32/41] environment: add set_index_file()

2016-07-29 Thread Junio C Hamano
Duy Nguyen  writes:

> On Fri, Jul 29, 2016 at 8:23 PM, Christian Couder
> 
>> Do you mean that it might be a source of micro-projects for the next
>> GSoC or Outreachy? ;-)
>
> No that's what I meant by boring. This is not something to interest
> GSoC candidates, and while it looks simple, sometimes it needs a good
> understanding of git overall, and it's definitely not small enough for
> a micro project.

I think "that's not what I meant" is what you meant ;-) but anyway,
I actually view that as part of the same "libify" project, and not
solving it and building an "am that makes an internal call to apply
that is not libified" is adding technical debt to the codebase.

It may be a good trade-off between having "am that internally calls
apply" earlier and the additional technical debt, but is not a good
thing to do to the overall health of the project in the longer term
to pretend as if this new set_index_file() is part of a good API.

Instead it probably should be marked with "the libification of
'apply' took a short-circuit by adding this technical debt; please
do not call this function in new codepaths", or something like that.



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html