Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Theodore Y. Ts'o
On Thu, Jul 19, 2018 at 09:08:08PM -0400, Jeff King wrote:
> Ditto for sprintf, where you should _always_ be using at least xsnprintf
> (or some better tool, depending on the situation).  And for strncpy,
> strlcpy (or again, some better tool) is strictly an improvement.

Nitpick: this may be true for git, but it's not strictly true in all
cases.  I actually have a (non-git) case where strncpy *is* the right
tool.  And this is one where I'm copying a NUL-terminated string into
a fixed-length charater array (in the ext4 superblock) which is *not*
NUL-terminated.  In that case, strncpy works(); but strlcpy() does not
do what I want.

So I used strncpy() advisedly, and I ignore people running Coccinelle
scripts and blindly sending patches to "fix" ext4.

But perhaps that's also a solution for git?  You don't have to
necessarily put them on a banned list; you could instead have some
handy, pre-set scripts that scan the entire code base looking for
"bad" functions with cleanups automatically suggested.  This could be
run at patch review time, and/or periodically (especially before a

- Ted

Re: de-alphabetizing the documentation

2018-07-06 Thread Theodore Y. Ts'o
On Fri, Jul 06, 2018 at 04:21:47PM -0700, wrote:
> I don't think that it's really important to find a "best" ordering for
> commands or glossary terms; it's more a matter of finding someone who
> is willing to take responsibility for choosing a reasonable ordering.
> Presumably the head maintainer of this project could delegate the task
> to a qualified volunteer, not a newbie like myself but not necessarily
> someone with expert knowledge either. It's too bad that a policy of
> not listing things alphabetically wasn't adopted from the beginning of
> this project, but I guess that's life.

That wasn't that portion of the man page, for better or for worse.  We
can debate whethher using a non-alphabetical order would be better or
worse for everyone; personally, I think the much better pointer is at
the very beginning of the git man page, which points people at "man
gittutorial" and "man giteveryday".

It seems to me that for your stated goal, "git everyday" has a good
list of commands that people should learn, complete with a proposed

That's probably the biggest stumbling block of finding an ideal
ordering.  What's reasonable really depends on your workflow, and
there are many different workflows depending on what a particular
developer is trying to do.  Consider carpentry; for some use cases, a
screwdriver is an absolutely critical tool.  For others, they might
never use it, and instead almost exclusively join two pieces of woods
using mortise and tenon joint.  Others might use a butt joint, plus
glue and nails.  All of these different techniques can be used to make
a wooden box, and they all involve a very different set of tools.


- Ted

Re: GDPR compliance best practices?

2018-06-13 Thread Theodore Y. Ts'o
On Tue, Jun 12, 2018 at 09:12:19PM +0200, Peter Backes wrote:
> This incorrect claim is completely inverting the logic of Art. 17.
> The logic is clarly that if ANY of lit (a) to (f) is satisfied, the 
> data must be deleted.
> It is not necessary for ALL of them to be satisfied.
> In particular, if the data is no longer necessary for the purpose for 
> which it was collected, then THAT ALONE is grounds for erasure ((1) 
> lit. a). It does not matter at all whether processing was consent-based 
> or whether such consent was withdrawn.

Sure, but given that you are the one trying to claim that people need
to do all sorts of extra development work (I don't see any patches
from you) and suffer performance degredation, the burden of proof is
on _you_ to show that this is a problem that github, et. al., are
likely run into.

In particular, keep in mind that distribution of open source code can
only be done under the terms of an open source license --- and a
license is a contract.  So in particular, your claim that the data is
no longer necessary (point a) is at the very least going to be subject
to dispute and is a legal question.  I can think of any number of ways
that this could considered necessary in order to assure open source
license compliance, the public interest in terms of allowing forking,

The bottom line is I'm sure the lawyers at github and Microsoft have
very carefully done their due diligence, and if they are concerned,
I'm sure we'll see patches from them, since after all, they would not
be interested in seeing the imperial European bureaucrats trying to
assess 4% of Microsoft's world-wide revenues --- that's $3.6 billion
dollars, by the way.  I'm sure if they think it's a concern, their
programmers will be right on it.

- Ted

Re: GDPR compliance best practices?

2018-06-09 Thread Theodore Y. Ts'o
On Sat, Jun 09, 2018 at 11:50:32PM +0100, Philip Oakley wrote:
> I just want to remind folks that Gmane disappeared as a regular list because
> of a legal challenge, the SCO v IBM Unix court case keeps rumbling on, so
> clarifying the legal case for:
> a) holding the 'personal git meta data', and
> b) disclosing (publishing) 'personal git meta data'
> under various copyright and other legal issue scenarios relative to GDPR is
> worth clarifying.

And I suspect the best way of clarifying things is for laywers at the
major corporations (e.g., Red Hat, Microsoft now that it owns github,
Google since it publishes Android sources at,
Canonical, etc.) to figure it out.

Those situations may very well differ depend on whether they have a
CLA or Copyright Assignment Agreement which they require of
contributors.  But fortunately, those organizations are also best set
up to send patches.   :-)

If those organizations are not choosing to send patches, I suspect
that might be a strong hint as to what those lawyers have concluded.

- Ted

Re: GDPR compliance best practices?

2018-06-08 Thread Theodore Y. Ts'o
On Fri, Jun 08, 2018 at 08:26:57AM +0200, Peter Backes wrote:
> If you run a website where the world can access a repository, you are 
> responsible for obeying the GDPR with respect to that repository. If 
> you receive a request to be forgotten, you have to make sure you stop 
> publishing that author's identity as part of the repository.

*Anyone* can run a repository.  It's not just github and gitlab.  The
hobbiest in New Zealand, who might never visit Europe (so she can't
be arrested when she visits the fair shores of Europe) and who has no
business interests in Europe, can host such a web site.

So the person trying to engage in censorship would need to contact
*everyone*.  And someone who has a git note in their private repo who
then pushes to github/gitlab would end up pushing that note back up to
the web server.

> You do NOT need to
> - delete it from a private copy you have
> - care about others who publish that data
> - or even make sure the data is deleted from private copies others may 
> have, even if the number of copies is in the thousands.

Great, so you can get github and gitlab to get rid of the information.
But it's *pointless*.  And given that real developers really do care
about who authored a patch, and regularly will do operations that
reference the authorship information, the fact that it is stored
somewhere else (e.g., in a git note, per your proposal), *will* slow
down those operations.

> In practical terms, if someone wishes to exercise his right to be 
> forgotten, he will usually send the request to the maintainer and stop 
> him from distributing the information, and perhaps to a third party he 
> might use as a platform for publication, such as github.

Your problem is in the word: "a"

- Ted

Re: GDPR compliance best practices?

2018-06-07 Thread Theodore Y. Ts'o
On Fri, Jun 08, 2018 at 01:21:29AM +0200, Peter Backes wrote:
> On Thu, Jun 07, 2018 at 03:38:49PM -0700, David Lang wrote:
> > > Again: The GDPR certainly allows you to keep a proof of copyright
> > > privately if you have it. However, it does not allow you to keep
> > > publishing it if someone exercises his right to be forgotten.
> > someone is granting the world the right to use the code and you are claiming
> > that the evidence that they have granted this right is illegal to have?
> Hell no! Please read what I wrote:
> - "allows you to keep a proof ... privately"
> - "However, it does not allow you to keep publishing it"

The problem is you've left undefined who is "you"?  With an open
source project, anyone who has contributed to open source project has
a copyright interest.  That hobbyist in German who submitted a patch?
They have a copyright interest.  That US Company based in Redmond,
Washington?  They own a copyright interest.  Huawei in China?  They
have a copyright interest.

So there is no "privately".  And "you" numbers in the thousands and
thousands of copyright holders of portions of the open source code.

And of course, that's the other thing you seem to fundamentally not
understand about how git works.  Every developer in the world working
on that open source project has their own copy.  There is
fundamentally no way that you can expunge that information from every
single git repository in the world.  You can remote a git note from a
single repository.  But that doesn't affect my copy of the repository
on my laptop.  And if I push that repository to my server, it git note
will be out there for the whole world to see.

So someone could *try* sending a public request to the entire world,
saying, "I am a European and I demand that you disassociate commit
DEADBEF12345 from my name".  They could try serving legal papers on
everyone.  But at this point, it's going to trigger something called
the "Streisand Effect".  If you haven't heard of it, I suggest you
look it up:


- Ted

Re: GDPR compliance best practices?

2018-06-04 Thread Theodore Y. Ts'o
On Mon, Jun 04, 2018 at 12:16:16AM +0200, Peter Backes wrote:
> Verifying the commit ID by itself wouldn't be any less efficient than 
> before. Admitteldly, it wouldn't verify the author and authordate 
> integrity anymore without additional work. That would be some overhead, 
> sure, and could be done on demand, and would mostly affect clones.

For people who are doing real work on git repos, other commands that
we very much care about include "git log --author=", "git
tag --contains", "git blame", etc.

At least for any repo that *I* control, slow those down, and I
wouldn't downgrade my git binary/repo just to make some imperialistic
European bureaucrats happy.


- Ted

Re: GDPR compliance best practices?

2018-06-03 Thread Theodore Y. Ts'o
On Sun, Jun 03, 2018 at 10:52:33PM +02h00, hPeter Backes wrote:
> But I will take your message as saying you at least don't see any 
> obvious criticism leading to complete rejection of the approach.

If you don't think a potential 2x -- 10x performance hit isn't a
blocking factor --- sure, go ahead and try implementing it.  And good
luck to you.  And this is not a guarantee that it won't get rejected.
I certainly don't have the power to make that guarantee.

If you don't have time to implement, why do you think it's fair to
inflict on everyone else the request for time to do a design review
for something for which the need hasn't even been established?


 - Ted

Re: GDPR compliance best practices?

2018-06-03 Thread Theodore Y. Ts'o
On Sun, Jun 03, 2018 at 09:24:17PM +0200, Peter Backes wrote:
> He said: It would be a tyranny of lawyers.
> Let's not have a tyranny of lawyers. Let us, the engineers and hackers, 
> exercise the necessary control over those pesky lawyers by defining and 
> redefining the state of the art in technology, and prevent them from 
> defining it by themselves. For a hammer, everything looks like a nail. 
> What is the better options: To suggest people to pay for legal advice 
> by lawyers, who only offer lengthy disclaimers and such for bypassing 
> the right to be forgotten, or simply discuss technical changes for git 
> which enable its easy implementation, without legal excuses for not 
> doing supporting it?

Why don't you try to implement your proposal then, and then benchmark
it.  After you find out how much of a performance disaster it's going
to be, especially for large git repos, we can discuss who is being
It may very well be that different people and companies will get
different legal advice, and one of the interesting things about many
git repos for open source project is that it is not owned by any one
company.  A change in the git repo format is one that has to be
adopted by the entire open source project, and if a portion of the
community isn't interesting in paying the overhead cost, and sticks
with the existing git repo format, I wonder what the "imperialistic"
(your word, not mine) EU will do --- try to lock up or sue everyone
from outside the EU that refuses to pay the 2x-10x performance
overhead and sticks with the original repo format, such that anyone
who wants to interoperate has to send git pushes in the orignial

But in any case, way don't you send a patch and we can discuss?  As
the old saying goes, "code talks, bullshit walks".   :-)


 - Ted

Re: GDPR compliance best practices?

2018-06-03 Thread Theodore Y. Ts'o
On Sun, Jun 03, 2018 at 07:46:17PM +0200, Peter Backes wrote:
> Let's be honest: We do not know what legitimization exactly in each 
> specific case the git metadata is being distributed under.

It seems like you are engaging in something even more dangerous than a
hardware engineering pretending they know how program, or a software
engineer knowing how to use as oldering iron --- and that's a
programmer pretending they know enough that they can speculate on the

I would gently suggest that if you really want to engage in something
practical than speculating how the GPDR compliance will work out in
actual practice, that you contact a lawyer and get official legal

After getting that advice, if you or your company wants to implemnt,
you can then send patches, and those can get debated using the usual
patch submission process.


- Ted