Re: [RFC 0/1] Tolerate broken headers in `packed-refs` files

2018-03-26 Thread Edward Thomson
On Mon, Mar 26, 2018 at 2:08 PM, Derrick Stolee  wrote:
> Since most heavily-used tools that didn't spawn Git processes use
> LibGit2 to interact with Git repos, I added Ed Thomson to CC to see
> if libgit2 could ever write these bad header comments.

We added the `sorted` capability to our `packed-refs` header relatively
recently (approximately two months ago, v0.27.0 will be the first release
to include it as of today).  So, at the moment, libgit2 writes:

  "# pack-refs with: peeled fully-peeled sorted "

Prior to this change, libgit2's header was stable for the last five years
as:

  "# pack-refs with: peeled fully-peeled "

And prior to that, we advertised only `peeled`:

  "# pack-refs with: peeled "

Thanks for thinking of us!

-ed


Re: [RFC 0/1] Tolerate broken headers in `packed-refs` files

2018-03-26 Thread Jeff King
On Mon, Mar 26, 2018 at 09:08:04AM -0400, Derrick Stolee wrote:

> Since most heavily-used tools that didn't spawn Git processes use LibGit2 to
> interact with Git repos, I added Ed Thomson to CC to see if libgit2 could
> ever write these bad header comments.

Ed can probably answer more definitively, but I dug in the libgit2
history a little, and I think it has only ever generated correct
"pack-refs with:" lines.

Ditto for JGit, though there it blames down to 1a6964c82 (Initial JGit
contribution to eclipse.org, 2009-09-29). I didn't dig further into the
historical JGit repository, but I think that's probably far enough to
feel good about it.

-Peff


Re: [RFC 0/1] Tolerate broken headers in `packed-refs` files

2018-03-26 Thread Derrick Stolee

On 3/26/2018 8:42 AM, Michael Haggerty wrote:

[...]

But there might be some tools out in the wild that have been writing
broken headers. In that case, users who upgrade Git might suddenly
find that they can't read repositories that they could read before. In
fact, a tool that we wrote and use internally at GitHub was doing
exactly that, which is how we discovered this "problem".

This patch shows what it would look like to relax the parsing again,
albeit *only* for the first line of the file, and *only* for lines
that start with '#'.

The problem with this patch is that it would make it harder for people
who implement broken tools in the future to discover their mistakes.
The only result of the error would be that it is slower to work with
the `packed-refs` files that they wrote. Such an error could go
undiscovered for a long time.


My opinion is that we shouldn't maintain back-compat with formats that 
may have been written by another tool because Git wasn't strict about 
it. As long as Git never wrote files with these formats, then they 
shouldn't be supported.


You are absolutely right that staying strict will help discover the 
tools that are writing an incorrect format.


Since most heavily-used tools that didn't spawn Git processes use 
LibGit2 to interact with Git repos, I added Ed Thomson to CC to see if 
libgit2 could ever write these bad header comments.


Thanks for writing this RFC so we can have the discussion and more 
quickly identify this issue if/when users are broken.


Thanks,
-Stolee


[RFC 0/1] Tolerate broken headers in `packed-refs` files

2018-03-26 Thread Michael Haggerty
Prior to

9308b7f3ca read_packed_refs(): die if `packed-refs` contains bogus data, 
2017-07-01

we silently ignored pretty much any bogus data in a `packed-refs`
file. I think that was pretty clearly a bad policy. The above commit
made parsing quite a bit stricter, calling `die()` if it found any
lines that it didn't understand.

But there's one situation that is maybe not quite so clear-cut. The
first line of a `packed-refs` file can be a header that enumerates
some traits of the file containing it; for example,

# pack-refs with: peeled fully-peeled 

The old code would have tolerated lots of breakage in that line. For
example, any of the following headers would have just been ignored:

# arbitrary data that looks like a comment
# pack-refs with peeled fully-peeled  ← note: missing colon
# pack-refs

Now, any of the above lines are considered errors and cause git to
die.

In my opinion, that is a good thing and we *shouldn't* tolerate broken
header lines; i.e., the status quo is good and we *shouldn't* apply
this patch.

But there might be some tools out in the wild that have been writing
broken headers. In that case, users who upgrade Git might suddenly
find that they can't read repositories that they could read before. In
fact, a tool that we wrote and use internally at GitHub was doing
exactly that, which is how we discovered this "problem".

This patch shows what it would look like to relax the parsing again,
albeit *only* for the first line of the file, and *only* for lines
that start with '#'.

The problem with this patch is that it would make it harder for people
who implement broken tools in the future to discover their mistakes.
The only result of the error would be that it is slower to work with
the `packed-refs` files that they wrote. Such an error could go
undiscovered for a long time.

The tighter check was released quite a while ago, and AFAIK we haven't
had any bug reports from people tripped up by this consistency check.
So I'm inclined to believe that the existing tools are OK and this
patch would be counterproductive. But I wanted to share it with the
list anyway.

Michael

Michael Haggerty (1):
  packed-backend: ignore broken headers

 refs/packed-backend.c | 21 +
 t/t3210-pack-refs.sh  | 33 -
 2 files changed, 41 insertions(+), 13 deletions(-)

-- 
2.14.2