Re: GDPR compliance best practices?
On Wed, Jun 13, 2018 at 10:12:18AM -0400, Theodore Y. Ts'o wrote: > Sure, but given that you are the one trying to claim that people need > to do all sorts of extra development work (I don't see any patches No. I am not. I said it is desirable to have a convenient solution for the problem. I did not demand development work or patches from anyone, just kindly asked for a comment on a possible solution. > from you) and suffer performance degredation, the burden of proof is > on _you_ to show that this is a problem that github, et. al., are > likely run into. *You* claimed there was performance degradation, not me. That github et. al. will sooner or later receive such erasure requests is a practical certainty. Google receives them every day in large quantities. Just think about someone who committed smelly code on github and now wants to get a new job and wants to get rid of all associations with those smells. > In particular, keep in mind that distribution of open source code can > only be done under the terms of an open source license --- and a > license is a contract. Not that it would be relevant here, but, depending on jurisdication, it is highly controversial whether open source licenses really constitute contracts (or, for example, promissory estoppel). For the right to erasure, it does not matter whether a contract exists or not. The GDPR explicitly prohibits any use of contracts in a way that undermines the GDPR. Making it an irrevocable contractual obligation to publish the data is not going to be an excuse thus. And Free Software licenses have nothing whatsoever to do with repository metadata. Such software has existed long before version control became so popular. > So in particular, your claim that the data is > no longer necessary (point a) is at the very least going to be subject No, it is github's claim that it must no longer be necessary for being erased, not mine! I clearly stated that if ANY point (not: ALL points) is given, the data must be deleted. Thus, point b, c, d or any other are just as good as point a. > to dispute and is a legal question. I can think of any number of ways > that this could considered necessary in order to assure open source > license compliance, the public interest in terms of allowing forking, > etc. To claim that the data is necessary (which is, as I said, irrelevant) and then say it's not because you can as well use a dummy user string, is self-contradicting. > The bottom line is I'm sure the lawyers at github and Microsoft have > very carefully done their due diligence, and if they are concerned, > I'm sure we'll see patches from them, since after all, they would not Why should they be concerned? They can rewrite history if necessary. They have a solution, though an inconvenient one. As far as the lawyers are concerned, that solution is pefectly fine. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Tue, Jun 12, 2018 at 09:12:19PM +0200, Peter Backes wrote: > This incorrect claim is completely inverting the logic of Art. 17. > > The logic is clarly that if ANY of lit (a) to (f) is satisfied, the > data must be deleted. > > It is not necessary for ALL of them to be satisfied. > > In particular, if the data is no longer necessary for the purpose for > which it was collected, then THAT ALONE is grounds for erasure ((1) > lit. a). It does not matter at all whether processing was consent-based > or whether such consent was withdrawn. Sure, but given that you are the one trying to claim that people need to do all sorts of extra development work (I don't see any patches from you) and suffer performance degredation, the burden of proof is on _you_ to show that this is a problem that github, et. al., are likely run into. In particular, keep in mind that distribution of open source code can only be done under the terms of an open source license --- and a license is a contract. So in particular, your claim that the data is no longer necessary (point a) is at the very least going to be subject to dispute and is a legal question. I can think of any number of ways that this could considered necessary in order to assure open source license compliance, the public interest in terms of allowing forking, etc. The bottom line is I'm sure the lawyers at github and Microsoft have very carefully done their due diligence, and if they are concerned, I'm sure we'll see patches from them, since after all, they would not be interested in seeing the imperial European bureaucrats trying to assess 4% of Microsoft's world-wide revenues --- that's $3.6 billion dollars, by the way. I'm sure if they think it's a concern, their programmers will be right on it. - Ted
Re: GDPR compliance best practices?
On Tuesday, June 12, 2018 09:12:19 PM Peter Backes wrote: > So? If a thousand lawyers claim 1+1=3, it becomes a > mathematical truth? No, but probably a legal "truth". :) -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: GDPR compliance best practices?
On Tue, Jun 12, 2018 at 11:56:13AM -0700, David Lang wrote: > [quoting github] > > It's important to remember that the Right to Erasure only applies to > personal data, not all data. It only applies to data a controller (GitHub, > for example) is processing _solely_ on the basis of consent. This is very obviously wrong. See Art. 17 GDPR. Consent is only one of the explicitly mentioned grounds for deletion (it is (1) lit b, but there's also a and c to f). > And it only > applies when there's not another legal reason to keep the data -- for > instance, if the data is no longer necessary for the purpose for which it > was collected. This incorrect claim is completely inverting the logic of Art. 17. The logic is clarly that if ANY of lit (a) to (f) is satisfied, the data must be deleted. It is not necessary for ALL of them to be satisfied. In particular, if the data is no longer necessary for the purpose for which it was collected, then THAT ALONE is grounds for erasure ((1) lit. a). It does not matter at all whether processing was consent-based or whether such consent was withdrawn. > We do not process Git commit history on the basis of consent. We have a > legitimate business purpose for collecting Git commit history: to maintain > the integrity of the Git commit record. It remains necessary for its purpose > for as long as a commit needs to be attributable to its committer. Right, but this merely justifies storing the data, not publishing it, or keeping it published, as I already explained at length. > At GitHub, as part of our Privacy By Design work, we offer ways for users to > set their own Git commit email data, so if an individual wants to remain > anonymous or pseudonymous, he or she can do so. Not only is this contradicting fundamentally what they just said in the previous sentence, it is not a justification for ignoring the right to erasure either. It is exactly the purpose of the right to erasure to get the data erased *after* the fact. > We also explain, in our > [Privacy > Statement](https://help.github.com/articles/github-privacy-statement), that > we are not able to delete personal data from the Git commit history once it > has been recorded. Privacy Statements are not a justification under GDPR for processing data or ignoring the right to erasure. And oh yes they are able. Rewriting history is a possibility, though an inconvenient one. I have pointed towards more convenient solutions. > I'll point out that not only did the Github lawyers need to sign off on this > stance, but the Microsoft lawyers would have looked at it as well as part of > their purchase of Github. So? If a thousand lawyers claim 1+1=3, it becomes a mathematical truth? Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
Adding one more datapoint here, I reached out to Github to find out their stance. Here is what I got back Quote: Thanks for reaching out to us about this. It's important to remember that the Right to Erasure only applies to personal data, not all data. It only applies to data a controller (GitHub, for example) is processing _solely_ on the basis of consent. And it only applies when there's not another legal reason to keep the data — for instance, if the data is no longer necessary for the purpose for which it was collected. We do not process Git commit history on the basis of consent. We have a legitimate business purpose for collecting Git commit history: to maintain the integrity of the Git commit record. It remains necessary for its purpose for as long as a commit needs to be attributable to its committer. At GitHub, as part of our Privacy By Design work, we offer ways for users to set their own Git commit email data, so if an individual wants to remain anonymous or pseudonymous, he or she can do so. We also explain, in our [Privacy Statement](https://help.github.com/articles/github-privacy-statement), that we are not able to delete personal data from the Git commit history once it has been recorded. End Quote I'll point out that not only did the Github lawyers need to sign off on this stance, but the Microsoft lawyers would have looked at it as well as part of their purchase of Github. David Lang
Re: GDPR compliance best practices?
On Sat, Jun 09, 2018 at 11:50:32PM +0100, Philip Oakley wrote: > I just want to remind folks that Gmane disappeared as a regular list because > of a legal challenge, the SCO v IBM Unix court case keeps rumbling on, so > clarifying the legal case for: > a) holding the 'personal git meta data', and > b) disclosing (publishing) 'personal git meta data' > under various copyright and other legal issue scenarios relative to GDPR is > worth clarifying. And I suspect the best way of clarifying things is for laywers at the major corporations (e.g., Red Hat, Microsoft now that it owns github, Google since it publishes Android sources at sources.android.com, Canonical, etc.) to figure it out. Those situations may very well differ depend on whether they have a CLA or Copyright Assignment Agreement which they require of contributors. But fortunately, those organizations are also best set up to send patches. :-) If those organizations are not choosing to send patches, I suspect that might be a strong hint as to what those lawyers have concluded. - Ted
Re: GDPR compliance best practices?
From: "Theodore Y. Ts'o" Sent: Friday, June 08, 2018 3:53 AM On Fri, Jun 08, 2018 at 01:21:29AM +0200, Peter Backes wrote: On Thu, Jun 07, 2018 at 03:38:49PM -0700, David Lang wrote: > > Again: The GDPR certainly allows you to keep a proof of copyright > > privately if you have it. However, it does not allow you to keep > > publishing it if someone exercises his right to be forgotten. > someone is granting the world the right to use the code and you are > claiming > that the evidence that they have granted this right is illegal to have? Hell no! Please read what I wrote: - "allows you to keep a proof ... privately" - "However, it does not allow you to keep publishing it" The problem is you've left undefined who is "you"? With an open source project, anyone who has contributed to open source project has a copyright interest. That hobbyist in German who submitted a patch? They have a copyright interest. That US Company based in Redmond, Washington? They own a copyright interest. Huawei in China? They have a copyright interest. So there is no "privately". And "you" numbers in the thousands and thousands of copyright holders of portions of the open source code. And of course, that's the other thing you seem to fundamentally not understand about how git works. Every developer in the world working on that open source project has their own copy. There is fundamentally no way that you can expunge that information from every single git repository in the world. You can remote a git note from a single repository. But that doesn't affect my copy of the repository on my laptop. And if I push that repository to my server, it git note will be out there for the whole world to see. So someone could *try* sending a public request to the entire world, saying, "I am a European and I demand that you disassociate commit DEADBEF12345 from my name". They could try serving legal papers on everyone. But at this point, it's going to trigger something called the "Streisand Effect". If you haven't heard of it, I suggest you look it up: http://mentalfloss.com/article/67299/how-barbra-streisand-inspired-streisand-effect Regards, - Ted Hi Ted, I just want to remind folks that Gmane disappeared as a regular list because of a legal challenge, the SCO v IBM Unix court case keeps rumbling on, so clarifying the legal case for: a) holding the 'personal git meta data', and b) disclosing (publishing) 'personal git meta data' under various copyright and other legal issue scenarios relative to GDPR is worth clarifying. I'm of the opinion that the GPL should be able to allow both holding and disclosing that data, though it may need a few more clarifications as to verifying that the author is 'correct' (e.g. not a child) and if a DCO is needed, etc. We are already looking at a change to the hash, so the technical challenge could be addressed, but may create too many logical conflicts if 'right to be forgotten' is allowed (one hash change is enough;-) Philip
Re: GDPR compliance best practices?
On Fri, Jun 08 2018, Jonathan Nieder wrote: > Separate from that legal context, though, I think it's an interesting > feature request. I don't think it goes far enough: I would like a way > to erase arbitrary information from the history in a repository. For > example, if I accidentally check in an encryption key in my repository > as content or a commit message, I would like a way to remove it, > assuming that others who fetch from the same repo are willing to > cooperate with me, of course (i.e. in place of the object, the server > would store a placeholder and an _advisory_ token allowing clients to > know (1) that this object was deleted, (2) what object to use instead, > and (3) an explanatory note about why the deletion occured; clients > could make whatever use of this information they choose). > > I've seen some discussion on this subject at > https://www.mercurial-scm.org/pipermail/mercurial/2008-March/017802.html > long ago and have some ideas of my own, but nothing concrete yet. > Anyway, I thought it might be useful to get people's minds working on > it. You may find it interesting to look at how git-annex-forget does this: https://git-annex.branchable.com/git-annex-forget/ & http://git-annex.branchable.com/devblog/day_-4__forgetting/
Re: GDPR compliance best practices?
On Fri, Jun 08 2018, Peter Backes wrote: > On Fri, Jun 08, 2018 at 10:13:20AM +0200, Ævar Arnfjörð Bjarmason wrote: >> Can you walk us through how anyone would be expected to fork (as create >> a new project, not the github-ism) existing projects under such a >> regiment? > > I don't see your point. Copy the repository to fork. Nothing changes > about that. Nothing prevents anyone from forking a repository which had > some of its author names removed from the commits. This basically the same as saying the whole notion of Signed-off-by should be abandoned entirely, since in this case the fork will only have a partial set of these. The point is that we're recording information so each line in the repository can be traced back to a SOB. These sorts of take-downs would destroy that information, and the proposed solution of having some party retain these creates a special class of free software users who are capable of following that line of attributions.
Re: GDPR compliance best practices?
Hi, Peter Backes wrote: > I'd like to ask whether anyone has best practices for achieving GDPR > compliance for git repos? The GDPR will come into effect in the EU next > month. This is a reasonable question to ask other Git users on this list to share ideas, so thanks for asking it. > In particular, how do you cope with the "Right to erasure" concerning > entries in the history of your git repos? Later in the thread you discussed some changes you would like to make to Git or in front of Git to ensure that people can erase their authorship information from a repository after the fact in a non-disruptive way. I have no opinion about how that relates to GDPR requirements. I tend to expect any legal advice a person gets to be situation-specific; it's much harder to get legal advice that is useful to share. Separate from that legal context, though, I think it's an interesting feature request. I don't think it goes far enough: I would like a way to erase arbitrary information from the history in a repository. For example, if I accidentally check in an encryption key in my repository as content or a commit message, I would like a way to remove it, assuming that others who fetch from the same repo are willing to cooperate with me, of course (i.e. in place of the object, the server would store a placeholder and an _advisory_ token allowing clients to know (1) that this object was deleted, (2) what object to use instead, and (3) an explanatory note about why the deletion occured; clients could make whatever use of this information they choose). I've seen some discussion on this subject at https://www.mercurial-scm.org/pipermail/mercurial/2008-March/017802.html long ago and have some ideas of my own, but nothing concrete yet. Anyway, I thought it might be useful to get people's minds working on it. Thanks, Jonathan
Re: GDPR compliance best practices?
Am 08.06.2018 um 04:53 schrieb Theodore Y. Ts'o: And of course, that's the other thing you seem to fundamentally not understand about how git works. Every developer in the world working on that open source project has their own copy. Everyone here understands how Git works, of course. "*shrug* but that's how Git works" does *NOT* override the GDPR. -- Hannes
Re: GDPR compliance best practices?
On Fri, 8 Jun 2018, Peter Backes wrote: On Fri, Jun 08, 2018 at 12:42:54AM -0700, David Lang wrote: Wrong, if you have to delete info, you are not allowed to keep a private copy. Yes you are allowed. See Art. 17 (3) lit e GDPR. There is _nothing_ in the GDPR about publishing information, everything in it is about what you are allowed to store privately, how you are required to protect it (or more precisely, what you are required to do if private data gets hacked), and how you are required to keep it available. Nope, the GDPR is not at all restricted to private copies. If the GDPR doesn't restrict private copies, then Google and Facebook are free to keep all data about everyone. That is explicitly what the GDPR is trying to prevent. The GDPR has special jargon for publishing; the GDPR calls it "disclosure (Art. 4 (2) GDPR) to an unspecified number of unspecified recipients (Art. 4 (9) GDPR), including ones in third countries (Chapter 5) in a repetitive (Art 49 (1) GDPR) fashion". disclosure is what the person who submits the patch is doing, torturing the language of the GDPR to say that hanging on to data that people want you to delete is legal, and echoing public data that people have asked to be public is not legal is not going to be a successful line of argument, it's the exact opposite of the stated goals of the GDPR. David Lang
Re: GDPR compliance best practices?
On Fri, Jun 08, 2018 at 10:45:51AM -0400, Theodore Y. Ts'o wrote: > *Anyone* can run a repository. It's not just github and gitlab. The > hobbiest in New Zealand, who might never visit Europe (so she can't > be arrested when she visits the fair shores of Europe) and who has no > business interests in Europe, can host such a web site. Just because letters of request are hardly enforced doesn't make it legal to break the GDPR. For sure, a hobbyist would not have much to fear, even if he is violating the GDPR and coming to Europe. The GDPR is mostly about taming the megacorporations, not about arresting tourists. > So the person trying to engage in censorship Censorship? The GDPR is not about censorship. If you want to write an opionion about someone by name, the GDPR gives you all legitimization to do so, against that person's will. This is about removing the data under ordinary circumstances. > would need to contact *everyone*. This is the subject's problem, not the repository provider's. > And someone who has a git note in their private repo who > then pushes to github/gitlab would end up pushing that note back up to > the web server. If that note has been deleted based on the right to be forgotten, you as the repository provider have to make sure you don't publish it again. Since you are allowed to keep a private copy, ensuring that shouldn't be a problem for you. > Great, so you can get github and gitlab to get rid of the information. > But it's *pointless*. It's up to the subject to consider it pointless or not to exercise his rights... > Your problem is in the word: "a" ...and against whom, whether one repository provider, the major ones, all of them he can find. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Fri, Jun 08, 2018 at 08:26:57AM +0200, Peter Backes wrote: > > If you run a website where the world can access a repository, you are > responsible for obeying the GDPR with respect to that repository. If > you receive a request to be forgotten, you have to make sure you stop > publishing that author's identity as part of the repository. > *Anyone* can run a repository. It's not just github and gitlab. The hobbiest in New Zealand, who might never visit Europe (so she can't be arrested when she visits the fair shores of Europe) and who has no business interests in Europe, can host such a web site. So the person trying to engage in censorship would need to contact *everyone*. And someone who has a git note in their private repo who then pushes to github/gitlab would end up pushing that note back up to the web server. > You do NOT need to > > - delete it from a private copy you have > - care about others who publish that data > - or even make sure the data is deleted from private copies others may > have, even if the number of copies is in the thousands. Great, so you can get github and gitlab to get rid of the information. But it's *pointless*. And given that real developers really do care about who authored a patch, and regularly will do operations that reference the authorship information, the fact that it is stored somewhere else (e.g., in a git note, per your proposal), *will* slow down those operations. > In practical terms, if someone wishes to exercise his right to be > forgotten, he will usually send the request to the maintainer and stop > him from distributing the information, and perhaps to a third party he > might use as a platform for publication, such as github. Your problem is in the word: "a" - Ted
Re: GDPR compliance best practices?
On Fri, Jun 08, 2018 at 10:13:20AM +0200, Ævar Arnfjörð Bjarmason wrote: > Can you walk us through how anyone would be expected to fork (as create > a new project, not the github-ism) existing projects under such a > regiment? I don't see your point. Copy the repository to fork. Nothing changes about that. Nothing prevents anyone from forking a repository which had some of its author names removed from the commits. > As David Lang notes upthread, "the license is granted to the world, so > the world has an interest in it". I wouldn't be so sure that this line > of argument wouldn't work. As I already stressed, having an interest is not enough. You need to have overriding legitimate grounds. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Fri, Jun 08, 2018 at 12:42:54AM -0700, David Lang wrote: > Wrong, if you have to delete info, you are not allowed to keep a private > copy. Yes you are allowed. See Art. 17 (3) lit e GDPR. > There is _nothing_ in the GDPR about publishing information, > everything in it is about what you are allowed to store privately, how you > are required to protect it (or more precisely, what you are required to do > if private data gets hacked), and how you are required to keep it available. Nope, the GDPR is not at all restricted to private copies. The GDPR has special jargon for publishing; the GDPR calls it "disclosure (Art. 4 (2) GDPR) to an unspecified number of unspecified recipients (Art. 4 (9) GDPR), including ones in third countries (Chapter 5) in a repetitive (Art 49 (1) GDPR) fashion". Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Fri, Jun 08 2018, Peter Backes wrote: > On Thu, Jun 07, 2018 at 10:53:13PM -0400, Theodore Y. Ts'o wrote: >> The problem is you've left undefined who is "you"? With an open >> source project, anyone who has contributed to open source project has >> a copyright interest. That hobbyist in German who submitted a patch? >> They have a copyright interest. That US Company based in Redmond, >> Washington? They own a copyright interest. Huawei in China? They >> have a copyright interest. >> >> So there is no "privately". And "you" numbers in the thousands and >> thousands of copyright holders of portions of the open source code. > > Of course there is "privately". Every single one of those who have the > author information can keep it, privately, for themselves. But those > that have received a request to be forgotten must not keep publishing > it on the Internet for download or distribute it to others. Can you walk us through how anyone would be expected to fork (as create a new project, not the github-ism) existing projects under such a regiment? E.g. in git.git we have SOB lines for the whole history, in lieu of GNU-style copyright assignment (which is how things mainly worked back in the CVS days) someone can just clone the repository and create a hostile fork, which is one of the central ideas of free software. In the world you're describing the history would have been expunged publicly, and no hosting site would be willing to host it. It might be gone in practical terms to anyone who just doesn't like how (in this example) the Git project is run, and thinks they can do it better. Maybe (again, in this example) the Software Freedom Conservancy's scope would have to expand to retain this private history (right now they have nothing to do with copyright). But then how am I going to fork the Git project if the SFC decides they don't want to cooperate with me? As David Lang notes upthread, "the license is granted to the world, so the world has an interest in it". I wouldn't be so sure that this line of argument wouldn't work.
Re: GDPR compliance best practices?
On Fri, 8 Jun 2018, Peter Backes wrote: you are the one arguing that the GDPR prohibits Git from storing and revealing this license granting data, not me. It prohibits publishing, and only after a request to be forgotten. It does not prohibit storing your private copy. Wrong, if you have to delete info, you are not allowed to keep a private copy. There is _nothing_ in the GDPR about publishing information, everything in it is about what you are allowed to store privately, how you are required to protect it (or more precisely, what you are required to do if private data gets hacked), and how you are required to keep it available.
Re: GDPR compliance best practices?
On Thu, Jun 07, 2018 at 10:53:13PM -0400, Theodore Y. Ts'o wrote: > The problem is you've left undefined who is "you"? With an open > source project, anyone who has contributed to open source project has > a copyright interest. That hobbyist in German who submitted a patch? > They have a copyright interest. That US Company based in Redmond, > Washington? They own a copyright interest. Huawei in China? They > have a copyright interest. > > So there is no "privately". And "you" numbers in the thousands and > thousands of copyright holders of portions of the open source code. Of course there is "privately". Every single one of those who have the author information can keep it, privately, for themselves. But those that have received a request to be forgotten must not keep publishing it on the Internet for download or distribute it to others. > And of course, that's the other thing you seem to fundamentally not > understand about how git works. Every developer in the world working > on that open source project has their own copy. There is > fundamentally no way that you can expunge that information from every > single git repository in the world. The misunderstanding is on your side. If you run a website where the world can access a repository, you are responsible for obeying the GDPR with respect to that repository. If you receive a request to be forgotten, you have to make sure you stop publishing that author's identity as part of the repository. You do NOT need to - delete it from a private copy you have - care about others who publish that data - or even make sure the data is deleted from private copies others may have, even if the number of copies is in the thousands. In practical terms, if someone wishes to exercise his right to be forgotten, he will usually send the request to the maintainer and stop him from distributing the information, and perhaps to a third party he might use as a platform for publication, such as github. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Thu, Jun 07, 2018 at 04:53:16PM -0700, David Lang wrote: > the license is granted to the world, so the world has an interest in it. Certainly, but you need to have overriding legitimate grounds. An interest is not enough for justification. You have to weight your interests against those of the subject. > Unless you are going to argue that the GDPR outlawed open source > development. No it certainly did not and I don't see how it could. All the GDPR arguably demands is that the author's identity is deleted from a public repository if he wishes so. Just assume it was a CVS repo. Then removal would not be any issue at all. It is a technical speciality of git that makes the removal so intricate to implement, which is not at all an intrinsic property of open source development. > you are the one arguing that the GDPR prohibits Git from storing and > revealing this license granting data, not me. It prohibits publishing, and only after a request to be forgotten. It does not prohibit storing your private copy. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Fri, Jun 08, 2018 at 01:21:29AM +0200, Peter Backes wrote: > On Thu, Jun 07, 2018 at 03:38:49PM -0700, David Lang wrote: > > > Again: The GDPR certainly allows you to keep a proof of copyright > > > privately if you have it. However, it does not allow you to keep > > > publishing it if someone exercises his right to be forgotten. > > someone is granting the world the right to use the code and you are claiming > > that the evidence that they have granted this right is illegal to have? > > Hell no! Please read what I wrote: > > - "allows you to keep a proof ... privately" > - "However, it does not allow you to keep publishing it" The problem is you've left undefined who is "you"? With an open source project, anyone who has contributed to open source project has a copyright interest. That hobbyist in German who submitted a patch? They have a copyright interest. That US Company based in Redmond, Washington? They own a copyright interest. Huawei in China? They have a copyright interest. So there is no "privately". And "you" numbers in the thousands and thousands of copyright holders of portions of the open source code. And of course, that's the other thing you seem to fundamentally not understand about how git works. Every developer in the world working on that open source project has their own copy. There is fundamentally no way that you can expunge that information from every single git repository in the world. You can remote a git note from a single repository. But that doesn't affect my copy of the repository on my laptop. And if I push that repository to my server, it git note will be out there for the whole world to see. So someone could *try* sending a public request to the entire world, saying, "I am a European and I demand that you disassociate commit DEADBEF12345 from my name". They could try serving legal papers on everyone. But at this point, it's going to trigger something called the "Streisand Effect". If you haven't heard of it, I suggest you look it up: http://mentalfloss.com/article/67299/how-barbra-streisand-inspired-streisand-effect Regards, - Ted
Re: GDPR compliance best practices?
On Fri, 8 Jun 2018, Peter Backes wrote: On Thu, Jun 07, 2018 at 03:38:49PM -0700, David Lang wrote: Again: The GDPR certainly allows you to keep a proof of copyright privately if you have it. However, it does not allow you to keep publishing it if someone exercises his right to be forgotten. someone is granting the world the right to use the code and you are claiming that the evidence that they have granted this right is illegal to have? Hell no! Please read what I wrote: - "allows you to keep a proof ... privately" - "However, it does not allow you to keep publishing it" And you are incorrect to say that the GDPR lets you keep records privately and only applies to publishing them. The GDPR is specifically targeted at companies like Facebook and Google that want to keep lots of data privately. It does no good to ask Facebook to not publish your info, they don't want to publish it in the first place, they want to keep it internally and use it. How can you misunderstand so badly what I wrote. Sure the GDPR does not let you keep records privately at will. You ultimately need to have overriding legitimate grounds for doing so. However, overriding legitimate grounds for keeping private records are rarely overriding legitimate grounds for publishing them. the license is granted to the world, so the world has an interest in it. Unless you are going to argue that the GDPR outlawed open source development. In case of git history metadata, for publishing, you may have consent or even legitimate interests, but not overriding legitimate grounds. For keeping a private copy of the metadata, your probably have overriding legitimate grounds, however. The GDPR is not an "all or nothing" thing. Facebook and Google certainly do not have overriding legitimate grounds for most of the data they keep privately. Is it that so hard to understand? you are the one arguing that the GDPR prohibits Git from storing and revealing this license granting data, not me. David Lang
Re: GDPR compliance best practices?
On Thu, Jun 07, 2018 at 03:38:49PM -0700, David Lang wrote: > > Again: The GDPR certainly allows you to keep a proof of copyright > > privately if you have it. However, it does not allow you to keep > > publishing it if someone exercises his right to be forgotten. > someone is granting the world the right to use the code and you are claiming > that the evidence that they have granted this right is illegal to have? Hell no! Please read what I wrote: - "allows you to keep a proof ... privately" - "However, it does not allow you to keep publishing it" > And you are incorrect to say that the GDPR lets you keep records privately > and only applies to publishing them. The GDPR is specifically targeted at > companies like Facebook and Google that want to keep lots of data privately. > It does no good to ask Facebook to not publish your info, they don't want to > publish it in the first place, they want to keep it internally and use it. How can you misunderstand so badly what I wrote. Sure the GDPR does not let you keep records privately at will. You ultimately need to have overriding legitimate grounds for doing so. However, overriding legitimate grounds for keeping private records are rarely overriding legitimate grounds for publishing them. In case of git history metadata, for publishing, you may have consent or even legitimate interests, but not overriding legitimate grounds. For keeping a private copy of the metadata, your probably have overriding legitimate grounds, however. The GDPR is not an "all or nothing" thing. Facebook and Google certainly do not have overriding legitimate grounds for most of the data they keep privately. Is it that so hard to understand? Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Fri, 8 Jun 2018, Peter Backes wrote: On Thu, Jun 07, 2018 at 10:28:47PM +0100, Philip Oakley wrote: Some of Peter's fine distinctions may be technically valid, but that does not stop there being legal grounds. The proof of copyright is a legal grounds. Again: The GDPR certainly allows you to keep a proof of copyright privately if you have it. However, it does not allow you to keep publishing it if someone exercises his right to be forgotten. someone is granting the world the right to use the code and you are claiming that the evidence that they have granted this right is illegal to have? the GDPR recognizes that there are legal reasons why records need to be kept and does not insist that they be deleted. you can't sign a deal to buy something, then insist that the GDPR allows your name to be removed from the contract. And you are incorrect to say that the GDPR lets you keep records privately and only applies to publishing them. The GDPR is specifically targeted at companies like Facebook and Google that want to keep lots of data privately. It does no good to ask Facebook to not publish your info, they don't want to publish it in the first place, they want to keep it internally and use it. David Lang There is simply no justification for publishing against the explicit will of the subject, except for the rare circumstances where there are overriding legitimate grounds for doing so. I hardly see those for the average author entry in your everyday git repo. Such a justification is extremely fragile. Unfortunately once one gets into legal nitpicking the wording becomes tortuous and helps no-one. That's not nitpicking. If what you say were true, the GDPR would be without any practical validity at all. If one starts from an absolute "right to be forgotten" perspective one can demand all evidence of wrong doing , or authority to do something, be forgotten. The GDPR has the right to retain such evidence. Yes, but not to keep it published. I'll try and comment where I see the distinctions to be. You're essentially repeating what you already said there. Publishing (the meta data) is *distinct* from having it. Absolutely right. That is my point. You either start off public and stay public, or you start off private and stay there. Nope. The GDPR says you have to go from public to private if the subject wishes so and there are no overriding legitimate grounds. That is the entire purpose of the GDPR's right to be forgotten. Best wishes Peter
Re: GDPR compliance best practices?
On Thu, Jun 07, 2018 at 10:28:47PM +0100, Philip Oakley wrote: > Some of Peter's fine distinctions may be technically valid, but that does > not stop there being legal grounds. The proof of copyright is a legal > grounds. Again: The GDPR certainly allows you to keep a proof of copyright privately if you have it. However, it does not allow you to keep publishing it if someone exercises his right to be forgotten. There is simply no justification for publishing against the explicit will of the subject, except for the rare circumstances where there are overriding legitimate grounds for doing so. I hardly see those for the average author entry in your everyday git repo. Such a justification is extremely fragile. > Unfortunately once one gets into legal nitpicking the wording becomes > tortuous and helps no-one. That's not nitpicking. If what you say were true, the GDPR would be without any practical validity at all. > If one starts from an absolute "right to be forgotten" perspective one can > demand all evidence of wrong doing , or authority to do something, be > forgotten. The GDPR has the right to retain such evidence. Yes, but not to keep it published. > I'll try and comment where I see the distinctions to be. You're essentially repeating what you already said there. > Publishing (the meta data) is *distinct* from having it. Absolutely right. That is my point. > You either start off public and stay public, or you start off private and > stay there. Nope. The GDPR says you have to go from public to private if the subject wishes so and there are no overriding legitimate grounds. That is the entire purpose of the GDPR's right to be forgotten. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
Hi Peter, David, I thought that the legal notice (aka 'disclaimer') was pretty reaonable. Some of Peter's fine distinctions may be technically valid, but that does not stop there being legal grounds. The proof of copyright is a legal grounds. Unfortunately once one gets into legal nitpicking the wording becomes tortuous and helps no-one. If one starts from an absolute "right to be forgotten" perspective one can demand all evidence of wrong doing , or authority to do something, be forgotten. The GDPR has the right to retain such evidence. I'll try and comment where I see the distinctions to be. From: "Peter Backes" Hi David, thanks for your input on the issue. LEGAL GDPR NOTICE: According to the European data protection laws (GDPR), we would like to make you aware that contributing to rsyslog via git will permanently store the name and email address you provide as well as the actual commit and the time and date you made it inside git's version history. This is simply an information statement This is inevitable, because it is a main feature git. The "inevitable" word creates a point of argument within the GDPR. Removing the word (and 'because/main') brings the sentance back to be an informative statement without a GDPR claim. As we can, see, rsyslog tries to solve the issue by the already discussed legal "technology" of disclaimers (which is certainly not accepted as state of the art technology by the GDPR). In essence, they are giving excuses for why they are not honoring the right to be forgotten. Disclaimers do not work. They have no legal effect, they are placebos. The GDPR does not accept such excuses. If it would, companies could arbitrarily design their data storage such as to make it "the main feature" to not honor the right to be forgotten and/or other GDPR rights. It is obvious that this cannot work, as it would completely undermine those rights. The GDPR honors technology as a means to protect the individual's rights, not as a means to subvert them. If you are concerned about your privacy, we strongly recommend to use --author "anonymous " together with your commit. The [key] missing information here is whether rsyslog has a DCO (Developer Certificate of Origin) and what that contains. The git.git DCO is here https://github.com/git/git/blob/master/Documentation/SubmittingPatches#L304-L349 This will also help discriminate between the "name" part and the identifier, as both could be separately anonymised (given the right DCO). Thus it may be that the name is recored as "anonymous", but with a that bridges the legal evidence/right to be forgotten bridge. This can only be a solution if the project rejects any commits which are not anonymous. However, we have valid reasons why we cannot remove that information later on. The reasons are: * this would break git history and make future merges unworkable This is not a valid excuse (see above). Within the GDPR, that is correct. It (breaking history validation), of itself, should not be the reason. The technology has to be designed or applied in such a way that the individuals rights are honored, not the other way around. In absence of other means, the project has to rewrite history if it gets a valid request by someone exercising his right to be forgotten, even if that causes a lot of hazzle for everyone. * the rsyslog projects has legitimate interest to keep a permanent record of the contributor identity, once given, for - copyright verification - being able to provide proof should a malicious commit be made True, but that doesn't justify publishing that information and keeping it published even when someone exercises his right to be forgotten. Publishing (the meta data) is *distinct* from having it. However publishing the content and it's legal copyright is also associated with identifying the copyright holder (who has released it). This can be the uid if they hide behind a legal entity. This creates the catch 22 scenario. You either start off public and stay public, or you start off private and stay there. Whether the rsyslog folk want to accept copyrighted work without appropriate legal release (who guards the guards, what's their badge number?) is part of the same information requirement. Malicious intent makes the submission (commit) part of a legal evidence one needs to retain, so is supported by GDPR. In that case, "legitimate interest" is not enough. There need to be "overriding legitimate grounds". I don't see them here. Please also note that your commit is public and as such will potentially be processed by many third-parties. Git's distributed nature makes it impossible to track where exactly your commit, and thus your personal data, will be stored and be processed. If you would not like to accept this risk, please do either commit anonymously or refrain from contributing to the rsyslog project. The onward publishing and release s
Re: GDPR compliance best practices?
Hi David, thanks for your input on the issue. > LEGAL GDPR NOTICE: > According to the European data protection laws (GDPR), we would like to make > you > aware that contributing to rsyslog via git will permanently store the > name and email address you provide as well as the actual commit and the > time and date you made it inside git's version history. This is inevitable, > because it is a main feature git. As we can, see, rsyslog tries to solve the issue by the already discussed legal "technology" of disclaimers (which is certainly not accepted as state of the art technology by the GDPR). In essence, they are giving excuses for why they are not honoring the right to be forgotten. Disclaimers do not work. They have no legal effect, they are placebos. The GDPR does not accept such excuses. If it would, companies could arbitrarily design their data storage such as to make it "the main feature" to not honor the right to be forgotten and/or other GDPR rights. It is obvious that this cannot work, as it would completely undermine those rights. The GDPR honors technology as a means to protect the individual's rights, not as a means to subvert them. > If you are concerned about your > privacy, we strongly recommend to use > > --author "anonymous " > > together with your commit. This can only be a solution if the project rejects any commits which are not anonymous. > However, we have valid reasons why we cannot remove that information > later on. The reasons are: > > * this would break git history and make future merges unworkable This is not a valid excuse (see above). The technology has to be designed or applied in such a way that the individuals rights are honored, not the other way around. In absence of other means, the project has to rewrite history if it gets a valid request by someone exercising his right to be forgotten, even if that causes a lot of hazzle for everyone. > * the rsyslog projects has legitimate interest to keep a permanent record of > the > contributor identity, once given, for > - copyright verification > - being able to provide proof should a malicious commit be made True, but that doesn't justify publishing that information and keeping it published even when someone exercises his right to be forgotten. In that case, "legitimate interest" is not enough. There need to be "overriding legitimate grounds". I don't see them here. > Please also note that your commit is public and as such will potentially be > processed by many third-parties. Git's distributed nature makes it impossible > to track where exactly your commit, and thus your personal data, will be > stored > and be processed. If you would not like to accept this risk, please do either > commit anonymously or refrain from contributing to the rsyslog project. This is one of those statements that ultimately say "we do not honor the GDPR; either accept that or don't submit". That's the old, arguably ignorant mentality, and won't stand. The project has to have a legal basis for publishing the personal metadata contained in the repository. In doubt, it needs to be consent based, as that is practically the only basis that allows putting the data on the internet for everyone to download. And consent can be withdrawn at any time. The GDPR's transitional period started over two years ago. There was enough time to get everything GDPR compliant. It might be possible to implement my solution without changing git, btw. Simply use the anonymous hash as author name, and store the random number and the author as a git-notes. git-notes can be rewritten or deleted at any time without changing the commit ID. I am currently looking into this solution. One just needs to add something that can verify and resolve those anonymous hashes. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
I'm going to take the risk of inserting actual real-world data into the mix rather than just speculation :-) Here is an example of that the Rsyslog project is doing (main developers based in Germany). I'll say as someone who's day job has been very involved with GDPR stuff recently, this looks like a very reasonable statement to me. But I am not a lawyer. I will also say that I think it would be very reasonable for projects to not accept code from someone who doesn't give them any way to contact them later in case there is a question about authorship or licensing. David Lang https://github.com/rsyslog/rsyslog/pull/2746/files LEGAL GDPR NOTICE: According to the European data protection laws (GDPR), we would like to make you aware that contributing to rsyslog via git will permanently store the name and email address you provide as well as the actual commit and the time and date you made it inside git's version history. This is inevitable, because it is a main feature git. If you are concerned about your privacy, we strongly recommend to use --author "anonymous " together with your commit. Also please do NOT sign your commit in this case, as that potentially could lead back to you. Please note that if you use your real identity, the GDPR grants you the right to have this information removed later. However, we have valid reasons why we cannot remove that information later on. The reasons are: * this would break git history and make future merges unworkable * the rsyslog projects has legitimate interest to keep a permanent record of the contributor identity, once given, for - copyright verification - being able to provide proof should a malicious commit be made Please also note that your commit is public and as such will potentially be processed by many third-parties. Git's distributed nature makes it impossible to track where exactly your commit, and thus your personal data, will be stored and be processed. If you would not like to accept this risk, please do either commit anonymously or refrain from contributing to the rsyslog project.
Re: GDPR compliance best practices?
On Mon, Jun 04, 2018 at 09:47:18AM -0400, Theodore Y. Ts'o wrote: > For people who are doing real work on git repos, other commands that > we very much care about include "git log --author=", "git > tag --contains", "git blame", etc. I do not see how those, or anything but git clone (and even that only if author verification is requested) could possibly be affected in any significant way. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Mon, Jun 04, 2018 at 12:16:16AM +0200, Peter Backes wrote: > > Verifying the commit ID by itself wouldn't be any less efficient than > before. Admitteldly, it wouldn't verify the author and authordate > integrity anymore without additional work. That would be some overhead, > sure, and could be done on demand, and would mostly affect clones. For people who are doing real work on git repos, other commands that we very much care about include "git log --author=", "git tag --contains", "git blame", etc. At least for any repo that *I* control, slow those down, and I wouldn't downgrade my git binary/repo just to make some imperialistic European bureaucrats happy. Cheers, - Ted
Re: GDPR compliance best practices?
Hi Peter, (lost the cc's) From: "Peter Backes" On Sun, Jun 03, 2018 at 11:28:43PM +0100, Philip Oakley wrote: It is here that Article 6 kicks in as to whether the 'organisation' can retain the data and continue to use it. Article 6 is not about continuing to use data. Article 6 is about having and even obtaining it in the first place. Correct, and that is the part I was refering to. Recipients of the particular meta data require it for the licencing purpose. Thus they can continue to have (and 'need') that data. It is that 'other side of the fence' view I mentioned. Article 17 and article 21 are about continuing to use data. For an open source project with an open source licence then an implict DCO applies for the meta data. It is the legal basis for the the release. Neither article 6 nor 17 or 21 have anything remotely like an "implicit DCO" as a legitimization for publishing employee data. I was refering to 'implict' in a reverse direction, that is, the DCO supports the legal basis to have and hold the data. The express licence terms in the various open source licences give the permission, and becomes one of these legally conflicting aspects The GDPR is very explicit about implicit stuff never being a basis for consent, if you want to imply that is your basis. And consent can be withdrawn at any time anyway. An open source license has nothing whatsoever to do with the question of version control metadata. A public version control system is not necessary to publish open source software. > - copyright is about distributing the program, not about distributing > version control metadata. It is specificaly about giving that right to copy by Jane Doe (but git gives no other information other than that supposedly globally unique 'author email'. I don't get what you are saying. As I said, a public version control system is not necessary to publish open source software. The two things may be intimately related in practice, but not in theory. Such is the law. It's the practice that is legal/illegal, decided in court (if it gets there) > - Being named is a right, not an obligation of the author. Hence, if > the author doesn't want his name published, the company doesn't have > legitimate grounds based in copyright for doing it anyway, against his > or her will. Git for Open Source is about open licencing by name. I'd agree that a closed corporate licence stays closed, but not forgotten. Again I don't get what you are saying. The author has a right to be named as the author, not an obligation. This has nothing whatsoever to do with the question of Open Source vs. closed corporate licenses. The question is which clause is being used to justify an action. Those corporate organisations want a legal basis for holding data, not a voluntary permisson (because folk may try and rescind that permission... ). Those in open source want to ensure that their licence is a legal basis for other folk to have copies, and that folk can show they have that permission. Those with a personal data view, will focus on the hope that they can remove permission, especially for companies that are doing things they find unacceptable, and maybe 'illegal' or unethical. The GDPR attempts to balance the different set of expectaions, and the overlaps will need to be negotiated. Different nations (and individuals) have different perceptions as to what is normal and reasonable thus focus on different aspects, not appreciating the Competeing Values that are present in the different Frameworks of their weltanshauung. If a closed source corporate does publish their closed data, they have real internal problems anyway regarding that contradiction! > Let's be honest: We do not know what legitimization exactly in each > specific case the git metadata is being distributed under. We should know, already. A specific licence [or limit] should be in place. We don't really want to have to let a court decide ;-) It is insufficient to have a license for distributing the program. The license is not a GDPR legitimization for git metadata. Distributing the program can be done without distributing the author's identity as part of the metadata of his commits. The law is never decided by technical means, unfortunately. It is. The GDPR refers to the state of the art of technology without defining it. Thus, technical means are very important in the GDPR. This may be something new for lawyers. If technology changes tomorrow, even without anything else changing, you may be breaking the GDPR by this simple fact tomorrow, while not breaking it today. They will still argue about what is the state of the art, and that if the art is hidden in some lab, then it's not available to meet the criteia. Again: Technology is very important in the GDPR. We know quantum computing can crack the codes, but when does it become the state of the art. SHA1 has been 'cracked' once in one special case, but that doesn't make it state of
Re: GDPR compliance best practices?
On Sun, Jun 03, 2018 at 11:28:43PM +0100, Philip Oakley wrote: > It is here that Article 6 kicks in as to whether the 'organisation' can > retain the data and continue to use it. Article 6 is not about continuing to use data. Article 6 is about having and even obtaining it in the first place. Article 17 and article 21 are about continuing to use data. > For an open source project with an open source licence then an implict DCO > applies for the meta data. It is the legal basis for the the release. Neither article 6 nor 17 or 21 have anything remotely like an "implicit DCO" as a legitimization for publishing employee data. The GDPR is very explicit about implicit stuff never being a basis for consent, if you want to imply that is your basis. And consent can be withdrawn at any time anyway. An open source license has nothing whatsoever to do with the question of version control metadata. A public version control system is not necessary to publish open source software. > > - copyright is about distributing the program, not about distributing > > version control metadata. > It is specificaly about giving that right to copy by Jane Doe (but git gives > no other information other than that supposedly globally unique 'author > email'. I don't get what you are saying. As I said, a public version control system is not necessary to publish open source software. The two things may be intimately related in practice, but not in theory. > > - Being named is a right, not an obligation of the author. Hence, if > > the author doesn't want his name published, the company doesn't have > > legitimate grounds based in copyright for doing it anyway, against his > > or her will. > Git for Open Source is about open licencing by name. I'd agree that a closed > corporate licence stays closed, but not forgotten. Again I don't get what you are saying. The author has a right to be named as the author, not an obligation. This has nothing whatsoever to do with the question of Open Source vs. closed corporate licenses. > > Let's be honest: We do not know what legitimization exactly in each > > specific case the git metadata is being distributed under. > > We should know, already. A specific licence [or limit] should be in place. > We don't really want to have to let a court decide ;-) It is insufficient to have a license for distributing the program. The license is not a GDPR legitimization for git metadata. Distributing the program can be done without distributing the author's identity as part of the metadata of his commits. > The law is never decided by technical means, unfortunately. It is. The GDPR refers to the state of the art of technology without defining it. Thus, technical means are very important in the GDPR. This may be something new for lawyers. If technology changes tomorrow, even without anything else changing, you may be breaking the GDPR by this simple fact tomorrow, while not breaking it today. Again: Technology is very important in the GDPR. > Regular git users should have no issues - they just need to point > their finger at the responsible authority. If git users are putting commits online for global download, they are the responsible authority. > The DCO/GPL2 are the legitimate data record that recipients should have for > their copy. There is no right to be forgotten at that point. What do you mean by "should have for their copy"? Why shouldn't there be a right to be forgotten? Open Source Software has been distributed a lot without detailed version control history information. Having this information as a record is certainly in the interest of the recipient, but it is very very questionable that it is an overriding legitimate grounds as per Art. 17 for keeping that data. > I see the solution to be elsewhere, and that it is in some ways a strawman > discussion: "if someone has the right to be forgotten, how do we delete the > meta data", when that right (to delete the meta data in a properly licence > repo) does not exist. See, this kind of shady legal argument is what lawyers are selling you. Why not put the energy into designing a technical solution. They tell you: "Ignore the GDPR. I will give you backup by giving you lots of disclaimers and excuses for doing so. Just give me a lot of money." Having the ability to validate yet erase data form repositorys is desirable from a technical point of view. It has a lot of uses, not necessarily only legal ones. The objection of efficiency raised by Ted is a valid one. The strawman argument is not. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
From: "Peter Backes" On Sun, Jun 03, 2018 at 04:28:31PM +0100, Philip Oakley wrote: In most Git cases that legal/legitimate purpose is the copyright licence, and/or corporate employment. That is, Jane wrote it, hence X has a legal rights of use, and we need to have a record of that (Jane wrote it) as evidence of that (I'm X, I can use it) right. That would mean that Jane cannot just ask to have that record removed and expect it to be removed. Re corporate employment: For sure nobody would dare to quesion that a company has a right to keep an internal record that Jane wrote it. The issue is publishing that information. This is an entirely different story. It is here that Article 6 kicks in as to whether the 'organisation' can retain the data and continue to use it. https://gdpr-info.eu/art-6-gdpr/ https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/lawful-basis-for-processing/ https://www.lawscot.org.uk/news-and-events/news/gdpr-legal-basis-and-why-it-matters/ For an open source project with an open source licence then an implict DCO applies for the meta data. It is the legal basis for the the release. If a corporate project has a closed source project, then yes, open publishing of that personal data within a repo's meta data would be incorrect, even though the internal repo would be kept. I already stressed that from the very beginning. Re copyright license: No, a copyright license does not provide a legitimization. - copyright is about distributing the program, not about distributing version control metadata. It is specificaly about giving that right to copy by Jane Doe (but git gives no other information other than that supposedly globally unique 'author email'. - Being named is a right, not an obligation of the author. Hence, if the author doesn't want his name published, the company doesn't have legitimate grounds based in copyright for doing it anyway, against his or her will. Git for Open Source is about open licencing by name. I'd agree that a closed corporate licence stays closed, but not forgotten. From a personal view, many folk want it to be that corporates (and open source organisations) should hold no personal information with having explicit permission that can then be withdrawn, with deletion to follow. However that 'legal' clause does [generally] win. Let's be honest: We do not know what legitimization exactly in each specific case the git metadata is being distributed under. We should know, already. A specific licence [or limit] should be in place. We don't really want to have to let a court decide ;-) It may be copyright, it may be employment, but it may also be revocable consent. This is, we cannot safely assume that no git user will ever have to deal with a legitimate request based on the right to be forgotten. The law is never decided by technical means, unfortunately. Regular git users should have no issues - they just need to point their finger at the responsible authority. (beware though, of the oneway trap door that the users mistakes can become the problem for the responsible authority!) In the git.git case (and linux.git) there is the DCO (to back up the GLP2) as an explicit requirement/certification that puts the information into the legal evidence category. IIUC almost all copyright ends up with a similar evidentail trail for the meta data. This makes things more complicated, not less. You have yet more meta data to cope with, yet more opportunities to be bitten by the right to be forgotten. Since I proposed a list of metadata where each entry can be anonymized independently of each other, it would be able to deal with this perfectly. The DCO/GPL2 are the legitimate data record that recipients should have for their copy. There is no right to be forgotten at that point. The more likely problem is if the content of the repo, rather than the meta data, is subject to GDPR, and that could easily ruin any storage method. Being able to mark an object as would help here(*). My proposal supports any part of the commit, including the contents of individual files, as eraseable, yet verifiable data. Also remember that most EU legislation is 'intent' based, rather than 'letter of', for the style of legal arguments (which is where some of the UK Brexit misunderstandings come from), so it is more than possible to get into the situation where an action is both mandated and illegal at the same time, so plent of snake oil salesman continue to sell magic fixes according to the customers local biases. This may be true. I am not trying to sell snake oil, however. To have erasure and verifiability at the same time is a highly generic feature that may be desirable to have for a multitude of reasons, including but not limited to legal ones like GDPR and copyright violations. I do not believe Git has anything to worry about that wasn't already an issue. Yes, but it definitely
Re: GDPR compliance best practices?
On Sun, Jun 03, 2018 at 05:03:44PM -0400, Theodore Y. Ts'o wrote: > If you don't think a potential 2x -- 10x performance hit isn't a > blocking factor --- sure, go ahead and try implementing it. And good > luck to you. And this is not a guarantee that it won't get rejected. > I certainly don't have the power to make that guarantee. I do not want or expect a guarantee, or even a probability, of course. Just trying to avoid "STRONG REJECT. We could have said you before you even started implementing. Why didn't you discuss this beforehand?" One would simply change something like author A U Thor 1465982009 + into something like author 21bbba8e9ce9734022d2c23df247a2704c0320ad7d43c02e8bdecdfae27e23b4 A U Thor author-hash 469bb107e38f8e59dddb3bbd6f8646e052bf73d48427865563c7358a64467f2c authordate c444f739ca317e09dbd3dae1207065585ae2c2e18cd0fc434b5bde08df1e0569 1465982009 + authordate-hash 199875e5aedb6cb164a2b40c16209dc5bb37f34c059a56c6d96766440fb0fe68 and then compute the commit id without the "author" and the "authordate" lines. The *-hash values were obtained as follows: echo -n '21bbba8e9ce9734022d2c23df247a2704c0320ad7d43c02e8bdecdfae27e23b4 A U Thor ' | sha3sum -a 256 echo -n 'c444f739ca317e09dbd3dae1207065585ae2c2e18cd0fc434b5bde08df1e0569 1465982009 +' | sha3sum -a 256 The hex values here are simply the $huge_random_numbers Verifying the commit ID by itself wouldn't be any less efficient than before. Admitteldly, it wouldn't verify the author and authordate integrity anymore without additional work. That would be some overhead, sure, and could be done on demand, and would mostly affect clones. I don't think it would be that much of a problem. It can be parallelized easily. The hashes for each field are independent of each other. They can all be verified in parallel in different threads running on different cores. On djb's typical 2015 skylake machine the supercop benchmark tells us that sha3-256 (~=keccakc512) has a speed of about 20 cycles/byte for blocks of 64 bytes of data, see https://bench.cr.yp.to/results-sha3.html#amd64-skylake Let's say we have 128 bytes of data on average for the author field, so conservatively speaking it takes about 3000 cycles (> 128*20) to hash and compare the hash. At 3000 MHz, we can thus do roughly about 1000 verifications per second per core. Let's assume we have 10 anonymizable fields of this kind per commit. Then the overhead would be one second per 100 x ncores commits. How many commits are we talking about in a huge repository? And how long does a clone of such a huge repository take at the moment? Do you have any numbers? > If you don't have time to implement, why do you think it's fair to > inflict on everyone else the request for time to do a design review > for something for which the need hasn't even been established? I do not request from anyone to even reply to my messages. I just see a lot of time being wasted by discussing things about my proposal that are technically irrelevant. If that time were put into reviewing the design, it would be spent better. Please don't devalue a proposal. It is not true that the only value is in actual code and proposals are "bullshit". I was not the first to raise the issue, as I clearly showed in my initial email. The demand is in fact high; very high. At present, that demand is satisfied by lawyers. Who are writing snake oil disclaimers and such for enormous sums of money. In a lot of companies. To "solve" a technical issue by pseudo-legal means by finding excuses for why the "right to be forgotten" doesn't have to be implemented in specific cases such as git. What if all that lawyer money were put into actually solving the technical issues as technical issues? Engineers are apparently bad at marketing, the lawyers seem more successful in that respect. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Sun, Jun 03, 2018 at 10:52:33PM +02h00, hPeter Backes wrote: > But I will take your message as saying you at least don't see any > obvious criticism leading to complete rejection of the approach. If you don't think a potential 2x -- 10x performance hit isn't a blocking factor --- sure, go ahead and try implementing it. And good luck to you. And this is not a guarantee that it won't get rejected. I certainly don't have the power to make that guarantee. If you don't have time to implement, why do you think it's fair to inflict on everyone else the request for time to do a design review for something for which the need hasn't even been established? Regards, - Ted
Re: GDPR compliance best practices?
On Sun, Jun 03, 2018 at 04:07:39PM -0400, Theodore Y. Ts'o wrote: > Why don't you try to implement your proposal then, and then benchmark > it. After you find out how much of a performance disaster it's going > to be, especially for large git repos, we can discuss who is being > tyrannical. See, Ted, but I have this other hobby project with git stash preserving timestamps, which is 90% done but not yet finished. I am a very busy person. I might implement it but it's not the topmost priority. Thus, first I want to discuss to not waste too much time implementing something that's then rejected by valid criticism while that criticms could have been raised beforehand. Perhaps I can convince my employer to work on it on their account. But there's so much to do at the moment. I have a PhD, about very complex things like static program analysis by abstract interpretation. I love hacking very much but I can mostly only do it as a hobby because humanity is better served doing the complex things that not every hacker can do. I know I am being whiny but that's how it is. But I will take your message as saying you at least don't see any obvious criticism leading to complete rejection of the approach. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Sun, Jun 03, 2018 at 09:48:16PM +0200, Ævar Arnfjörð Bjarmason wrote: > Sure, but what I'm pointing out is a) you can't focus on git as the > technology because it tells you nothing about what's being done with it > (e.g. the log file case I mentioned b) nobody who came up with the GDPR > was concerned with some free software projects or the SCM used by > companies, so this is very unlikely to be enforced. As I already said, the GDPR refers to the state of the art in technology, without defining it. The GDPR provides a generic framework. It covers everyone. From a single person running a small blog to a S&P500 enterprise. It also covers non-profits and state authorities. Everyone is covered. Including SCM used. The GDPR will be enforced against SCMs. The question is just who will be the first to be affected. I suspect it will be a mega-corporation who fired one of their developers who wants to fight back and exercise his right to be forgotten against the company's public git repos. > So nobody can be GDPR compliant in the face of archive.org and the like? The GDPR has special exceptions for archives and the like. > It does if you've got the ref. Maybe I just don't get your proposal, > quote: > > Do not hash anything directly to obtain the commit ID. Instead, hash a > list of hashes of [$random_number, $information] pairs. $information > could be an author id, a commit date, a comment, or anything else. Then > store the commit id, the list of hashes, and the list of pairs to form > the commit. > > You're just proposing (if I've read this correctly) that the commit > object should have some list of headers pointing to other SHA1s, and > that fsck and the like be OK with these going away. Right? Certainly not SHA1. SHA1 is completely broken. I know Linus has a bit of a different opinion. But there's really no defense for SHA1. It's an utterly broken algorithm and should not be used at all anymore. > How is this intrinsically different from referring to something in the > ref namespace that may be deleted in the future? I guess I am partly repeating myself, but: 1. Having fsck be OK with erasure is not enough. It tells you nothing about anonymization. If the hash is the same in 5000 instances that's pseudonymization, not anonymization. You need to ensure a different hash in each instance, and you need to ensure there's no easy way to reconstruct the data from its hash. Hence $random_number (or let's call it $huge_random_number, it should have x bits if the hash has x bits). If you have the SHA1 64ca93f83bb29b51d8cbd6f3e6a8daff2e08d3ec it's too easy to figure out the plaintext (it's "Peter" BTW). 2. If you use a random UUID you cannot reconstruct the data from its hash, but you have the same issue about UUID reuse. Plus, you lose the ability to verify the author's name as part of the commit. > Okey, so you're not reading the GDPR in some literal sense, but you're > coming to a conclusion that's supported by ... what? To echo Theodore > Y. Ts'o E-Mail have you consulted with someone who's an actual lawyer on > this subject? I'm replying in private conversation about this one. It's not relevant for this discussion. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Sun, Jun 03, 2018 at 09:24:17PM +0200, Peter Backes wrote: > > He said: It would be a tyranny of lawyers. > > Let's not have a tyranny of lawyers. Let us, the engineers and hackers, > exercise the necessary control over those pesky lawyers by defining and > redefining the state of the art in technology, and prevent them from > defining it by themselves. For a hammer, everything looks like a nail. > What is the better options: To suggest people to pay for legal advice > by lawyers, who only offer lengthy disclaimers and such for bypassing > the right to be forgotten, or simply discuss technical changes for git > which enable its easy implementation, without legal excuses for not > doing supporting it? Why don't you try to implement your proposal then, and then benchmark it. After you find out how much of a performance disaster it's going to be, especially for large git repos, we can discuss who is being tyrannical. It may very well be that different people and companies will get different legal advice, and one of the interesting things about many git repos for open source project is that it is not owned by any one company. A change in the git repo format is one that has to be adopted by the entire open source project, and if a portion of the community isn't interesting in paying the overhead cost, and sticks with the existing git repo format, I wonder what the "imperialistic" (your word, not mine) EU will do --- try to lock up or sue everyone from outside the EU that refuses to pay the 2x-10x performance overhead and sticks with the original repo format, such that anyone who wants to interoperate has to send git pushes in the orignial format? But in any case, way don't you send a patch and we can discuss? As the old saying goes, "code talks, bullshit walks". :-) Regards, - Ted
Re: GDPR compliance best practices?
On Sun, Jun 03 2018, Peter Backes wrote: > On Sun, Jun 03, 2018 at 02:59:26PM +0200, Ævar Arnfjörð Bjarmason wrote: >> I'm not trying to be selfish, I'm just trying to counter your literal >> reading of the law with a comment of "it'll depend". >> >> Just like there's a law against public urination in many places, but >> this is applied very differently to someone taking a piss in front of >> parliament v.s. someone taking a piss in the forest on a hike, even >> though the law itself usually makes no distinction about the two. > > We have huge companies using git now. This is not the tool used by a > few kernel hackers anymore. Sure, but what I'm pointing out is a) you can't focus on git as the technology because it tells you nothing about what's being done with it (e.g. the log file case I mentioned b) nobody who came up with the GDPR was concerned with some free software projects or the SCM used by companies, so this is very unlikely to be enforced. >> In this example once you'd delete the UUID ref you don't have the UUID >> -> author mapping anymore (and b.t.w. that could be a many to one >> mapping). > > It is not relevant whether you have that mapping or not, it is enough > that with additional information you could obtain it. For example, say, > you have 5000 commits with the same UUID. Now your delete the mapping. > But your friend still has it on his local copy. Now your friendly > merely needs to tell you who is behind that UUID and instantly you can > associate all 5000 commits with that person again. So nobody can be GDPR compliant in the face of archive.org and the like? If the law says that you need to delete information you published in the past, and you do so, how is it your problem that someone mirrored & re-published it? That's their compliance problem at that point. > The GDPR is very explict about this, see recital 26. It says that > pseudonymization is not enough, you need anonymization if you want to > be free from regulation. > > In addition, and in contrast to my proposal, your solution doesn't > allow verification of the author field. It does if you've got the ref. Maybe I just don't get your proposal, quote: Do not hash anything directly to obtain the commit ID. Instead, hash a list of hashes of [$random_number, $information] pairs. $information could be an author id, a commit date, a comment, or anything else. Then store the commit id, the list of hashes, and the list of pairs to form the commit. You're just proposing (if I've read this correctly) that the commit object should have some list of headers pointing to other SHA1s, and that fsck and the like be OK with these going away. Right? How is this intrinsically different from referring to something in the ref namespace that may be deleted in the future? In both cases you're just trying to solve the problem of trying to somehow encode data into a git repository today, that may go away tomorrow. Similar to how a reference to some LFS object today going away doesn't fail "git fsck". >> I think again that this is taking too much of a literalist view. The >> intent of that policy is to ensure that companies like Google can't just >> close down their EU offices weasel out of compliance be saying "we're >> just doing business from the US, it doesn't apply to us". >> >> It will not be used against anyone who's taking every reasonable >> precaution from doing business with EU customers. > > I think you are underestimating the political intention behind the > GDPR. It has kind of an imperialist goal, to set international > standards, to enforce them against foreign companies and to pressure > other nations to establish the same standards. > > If I would read the GPDR in a literal sense, I would in fact come to > the same conclusion as you: It's about companies doing substantial > business in the EU. But the GDPR is carefully constructed in such a way > that it is hard not to be affected by the GDPR in one way or another, > and the obvious way to cope with that risk is to more or less obey the > GDPR rules even if one does not have substantial business interests in > the EU. Okey, so you're not reading the GDPR in some literal sense, but you're coming to a conclusion that's supported by ... what? To echo Theodore Y. Ts'o E-Mail have you consulted with someone who's an actual lawyer on this subject? I haven't but, I'm not suggesting that the git data format needs to change because of some new EU law. You are, what's your basis for that opinion? It seems to me that the git project doesn't need to do anything about this. There's plenty of things that are illegal to publish, and some of which may be made illegal after the fact (e.g. national security related information). If those things are incidentally saved in git repositories the parties involved may need to run git-filter-branch. Of course if they need to do that on a weekly basis because of some overzealous law we may need to have some "native" suppo
Re: GDPR compliance best practices?
Addendum: I one discussed with a philosopher the question: What is your argument against libertarianism? He said: It would be a tyranny of lawyers. Let's not have a tyranny of lawyers. Let us, the engineers and hackers, exercise the necessary control over those pesky lawyers by defining and redefining the state of the art in technology, and prevent them from defining it by themselves. For a hammer, everything looks like a nail. What is the better options: To suggest people to pay for legal advice by lawyers, who only offer lengthy disclaimers and such for bypassing the right to be forgotten, or simply discuss technical changes for git which enable its easy implementation, without legal excuses for not doing supporting it? Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Sun, Jun 03, 2018 at 02:18:07PM -0400, Theodore Y. Ts'o wrote: > I would gently suggest that if you really want to engage in something > practical than speculating how the GPDR compliance will work out in > actual practice, that you contact a lawyer and get official legal > advice? I completely disagree. Erasure is a technical issue to be solved by engineers, not by lawyers. And that's completely in line with the GDPR. The GDPR is ultimately not a legal thing to be solved by lawyers writing lengthy legal argumentations and disclaimers and such. They are not even the ones to take lead in GDPR implementation. All that would be simply snake oil. Some legal documentation may be necessary, and having a competent lawyer in a GDPR compliance task force is certainly a must. But that gets you done only 20% of the job, 80% is engineering. Every lawyer who claims to give you shady legal tricks to get the job 100% done in no time is a liar. The GDPRs ultimate goal is to incline the world to improve how data protection is implemented on a technical level. The GDPR contains several blanket clauses that refer to the "state of the art" of technology, which the GDPR itself of course does not define and which is of course nothing a lawyer has any competence in. My proposal is a technical, not a legal one: Provide a generic possibility of having eraseability and verifiability at the same time in git. Improve the state of the art in version control such that it is more in line with the GDPRs idea that people have a right to be forgotten, but to also be useful for a multitude of other applications. The lawyers can then build on this. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Sun, Jun 03, 2018 at 07:46:17PM +0200, Peter Backes wrote: > > Let's be honest: We do not know what legitimization exactly in each > specific case the git metadata is being distributed under. It seems like you are engaging in something even more dangerous than a hardware engineering pretending they know how program, or a software engineer knowing how to use as oldering iron --- and that's a programmer pretending they know enough that they can speculate on the law. I would gently suggest that if you really want to engage in something practical than speculating how the GPDR compliance will work out in actual practice, that you contact a lawyer and get official legal advice? After getting that advice, if you or your company wants to implemnt, you can then send patches, and those can get debated using the usual patch submission process. Cheers, - Ted
Re: GDPR compliance best practices?
correcting a negative /with/without/ and inserting a comma. - Original Message - From: "Philip Oakley" [snip] From a personal view, many folk want it to be that corporates (and open source organisations) should hold no personal information with having s/with/without/ explicit permission that can then be withdrawn, with deletion to follow. s/permission/permission,/ However that 'legal' clause does [generally] win.
Re: GDPR compliance best practices?
On Sun, Jun 03, 2018 at 04:28:31PM +0100, Philip Oakley wrote: > In most Git cases that legal/legitimate purpose is the copyright licence, > and/or corporate employment. That is, Jane wrote it, hence X has a legal > rights of use, and we need to have a record of that (Jane wrote it) as > evidence of that (I'm X, I can use it) right. That would mean that Jane > cannot just ask to have that record removed and expect it to be removed. Re corporate employment: For sure nobody would dare to quesion that a company has a right to keep an internal record that Jane wrote it. The issue is publishing that information. This is an entirely different story. I already stressed that from the very beginning. Re copyright license: No, a copyright license does not provide a legitimization. - copyright is about distributing the program, not about distributing version control metadata. - Being named is a right, not an obligation of the author. Hence, if the author doesn't want his name published, the company doesn't have legitimate grounds based in copyright for doing it anyway, against his or her will. > From a personal view, many folk want it to be that corporates (and open > source organisations) should hold no personal information with having > explicit permission that can then be withdrawn, with deletion to follow. > However that 'legal' clause does [generally] win. Let's be honest: We do not know what legitimization exactly in each specific case the git metadata is being distributed under. It may be copyright, it may be employment, but it may also be revocable consent. This is, we cannot safely assume that no git user will ever have to deal with a legitimate request based on the right to be forgotten. > In the git.git case (and linux.git) there is the DCO (to back up the GLP2) > as an explicit requirement/certification that puts the information into the > legal evidence category. IIUC almost all copyright ends up with a similar > evidentail trail for the meta data. This makes things more complicated, not less. You have yet more meta data to cope with, yet more opportunities to be bitten by the right to be forgotten. Since I proposed a list of metadata where each entry can be anonymized independently of each other, it would be able to deal with this perfectly. > The more likely problem is if the content of the repo, rather than the meta > data, is subject to GDPR, and that could easily ruin any storage method. > Being able to mark an object as would help here(*). My proposal supports any part of the commit, including the contents of individual files, as eraseable, yet verifiable data. > Also remember that most EU legislation is 'intent' based, rather than > 'letter of', for the style of legal arguments (which is where some of the UK > Brexit misunderstandings come from), so it is more than possible to get into > the situation where an action is both mandated and illegal at the same time, > so plent of snake oil salesman continue to sell magic fixes according to the > customers local biases. This may be true. I am not trying to sell snake oil, however. To have erasure and verifiability at the same time is a highly generic feature that may be desirable to have for a multitude of reasons, including but not limited to legal ones like GDPR and copyright violations. > I do not believe Git has anything to worry about that wasn't already an > issue. Yes, but it definitely had and still does have something to worry about. git should provide technical means to deal with this. I provided a proposal based on anonymization that does not in any way have any drawback compared to the status quo, except a slight increase in metadata size and various degrees of backwards incompatibility, depending on how it is implemented. What do you think about my proposal as a solution for the problem? You provide a lot of arguments about why it is not a necessity to have this, but let's assume it is; is there any actual problem you see with the proposal, except that someone would have to implement it? Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
From: "Peter Backes" On Sun, Jun 03, 2018 at 02:59:26PM +0200, Ævar Arnfjörð Bjarmason wrote: I'm not trying to be selfish, I'm just trying to counter your literal reading of the law with a comment of "it'll depend". Just like there's a law against public urination in many places, but this is applied very differently to someone taking a piss in front of parliament v.s. someone taking a piss in the forest on a hike, even though the law itself usually makes no distinction about the two. We have huge companies using git now. This is not the tool used by a few kernel hackers anymore. In this example once you'd delete the UUID ref you don't have the UUID -> author mapping anymore (and b.t.w. that could be a many to one mapping). It is not relevant whether you have that mapping or not, it is enough that with additional information you could obtain it. For example, say, you have 5000 commits with the same UUID. Now your delete the mapping. But your friend still has it on his local copy. Now your friendly merely needs to tell you who is behind that UUID and instantly you can associate all 5000 commits with that person again. The GDPR is very explict about this, see recital 26. It says that pseudonymization is not enough, you need anonymization if you want to be free from regulation. In addition, and in contrast to my proposal, your solution doesn't allow verification of the author field. I think again that this is taking too much of a literalist view. The intent of that policy is to ensure that companies like Google can't just close down their EU offices weasel out of compliance be saying "we're just doing business from the US, it doesn't apply to us". It will not be used against anyone who's taking every reasonable precaution from doing business with EU customers. I think you are underestimating the political intention behind the GDPR. It has kind of an imperialist goal, to set international standards, to enforce them against foreign companies and to pressure other nations to establish the same standards. If I would read the GPDR in a literal sense, I would in fact come to the same conclusion as you: It's about companies doing substantial business in the EU. But the GDPR is carefully constructed in such a way that it is hard not to be affected by the GDPR in one way or another, and the obvious way to cope with that risk is to more or less obey the GDPR rules even if one does not have substantial business interests in the EU. What do you imagine that this is going to be like? That some EU citizen is going to walk into a small business in South America one day, which somehow is violating the GPDR, and when that business owner goes on holiday to the EU they're going to get detained? Not even the US policy against Cuba is anywhere remotely close to that. Well not if he's locally interacting with that business, a situation which I am sure is not regulated by the GDPR. However, if a large US website accepts users from the EU and uses the data gathered in conflict with the GDPR, perhaps selling it for use in political campaigns, and it gets several fines for this by EU authorities but ignores them and doesn't pay them, and the CEO one day takes a flight to Frankfurt to continue by train to Switzerland to get some cash from his bank account, then he will most likely not reach Swiss territory. -- Having been through corporate training and read up a number of the conflicting views in the press, one of the issues is that there are two viewpoints, one from each side of the fence. From a corporate/organisation viewpoint, it is best if every case of holding user information is for a legitimate purpose, which then means the company has 'protection' from requests for removal because the data *is* held legally/legitimately (which includes acting as evidence). In most Git cases that legal/legitimate purpose is the copyright licence, and/or corporate employment. That is, Jane wrote it, hence X has a legal rights of use, and we need to have a record of that (Jane wrote it) as evidence of that (I'm X, I can use it) right. That would mean that Jane cannot just ask to have that record removed and expect it to be removed. From a personal view, many folk want it to be that corporates (and open source organisations) should hold no personal information with having explicit permission that can then be withdrawn, with deletion to follow. However that 'legal' clause does [generally] win. In the git.git case (and linux.git) there is the DCO (to back up the GLP2) as an explicit requirement/certification that puts the information into the legal evidence category. IIUC almost all copyright ends up with a similar evidentail trail for the meta data. The more likely problem is if the content of the repo, rather than the meta data, is subject to GDPR, and that could easily ruin any storage method. Being able to mark an object as would help here(*). Also remember that most EU legislation is 'intent' based, rather
Re: GDPR compliance best practices?
On Sun, Jun 03, 2018 at 02:59:26PM +0200, Ævar Arnfjörð Bjarmason wrote: > I'm not trying to be selfish, I'm just trying to counter your literal > reading of the law with a comment of "it'll depend". > > Just like there's a law against public urination in many places, but > this is applied very differently to someone taking a piss in front of > parliament v.s. someone taking a piss in the forest on a hike, even > though the law itself usually makes no distinction about the two. We have huge companies using git now. This is not the tool used by a few kernel hackers anymore. > In this example once you'd delete the UUID ref you don't have the UUID > -> author mapping anymore (and b.t.w. that could be a many to one > mapping). It is not relevant whether you have that mapping or not, it is enough that with additional information you could obtain it. For example, say, you have 5000 commits with the same UUID. Now your delete the mapping. But your friend still has it on his local copy. Now your friendly merely needs to tell you who is behind that UUID and instantly you can associate all 5000 commits with that person again. The GDPR is very explict about this, see recital 26. It says that pseudonymization is not enough, you need anonymization if you want to be free from regulation. In addition, and in contrast to my proposal, your solution doesn't allow verification of the author field. > I think again that this is taking too much of a literalist view. The > intent of that policy is to ensure that companies like Google can't just > close down their EU offices weasel out of compliance be saying "we're > just doing business from the US, it doesn't apply to us". > > It will not be used against anyone who's taking every reasonable > precaution from doing business with EU customers. I think you are underestimating the political intention behind the GDPR. It has kind of an imperialist goal, to set international standards, to enforce them against foreign companies and to pressure other nations to establish the same standards. If I would read the GPDR in a literal sense, I would in fact come to the same conclusion as you: It's about companies doing substantial business in the EU. But the GDPR is carefully constructed in such a way that it is hard not to be affected by the GDPR in one way or another, and the obvious way to cope with that risk is to more or less obey the GDPR rules even if one does not have substantial business interests in the EU. > What do you imagine that this is going to be like? That some EU citizen > is going to walk into a small business in South America one day, which > somehow is violating the GPDR, and when that business owner goes on > holiday to the EU they're going to get detained? Not even the US policy > against Cuba is anywhere remotely close to that. Well not if he's locally interacting with that business, a situation which I am sure is not regulated by the GDPR. However, if a large US website accepts users from the EU and uses the data gathered in conflict with the GDPR, perhaps selling it for use in political campaigns, and it gets several fines for this by EU authorities but ignores them and doesn't pay them, and the CEO one day takes a flight to Frankfurt to continue by train to Switzerland to get some cash from his bank account, then he will most likely not reach Swiss territory. Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Sun, Jun 03 2018, Peter Backes wrote: > On Sun, Jun 03, 2018 at 12:45:25PM +0200, Ævar Arnfjörð Bjarmason wrote: >> protection". I.e. regulators / prosecutors are much likely to go after >> some advertising company than some project using a Git repo. > > Well, it is indeed rather unlikely that one particular git repo project > will be targeted, but I guess it is basically certain that at least > some of them will be. > > It is the same as a lottery, it's very unlikely you win the jackpot, > yet someone wins it every few months. We should care about the entire > community, not be too selfish. I'm not trying to be selfish, I'm just trying to counter your literal reading of the law with a comment of "it'll depend". Just like there's a law against public urination in many places, but this is applied very differently to someone taking a piss in front of parliament v.s. someone taking a piss in the forest on a hike, even though the law itself usually makes no distinction about the two. >> Since the Author is free-form this sort of thing doesn't need to be part >> of the git data format. You can just generate a UUID like >> "5c679eda-b4e5-4f35-b691-8e13862d4f79" and then set user.name to >> "refval:5c679eda-b4e5-4f35-b691-8e13862d4f79" and user.email to >> "refval:5c679eda-b4e5-4f35-b691-8e13862d4f79". > > Well, this is merely pseudonymization, not anonymization. Note that the > UUID, innocent as it may look, is not in any way less "personal data" > than the author string itself. Your proposal would thus not actually > solve the problem, only slightly transform it. Only when you truly > anonymize (see my proposal about one way to to it), you can completely > evade the GDPR. In this example once you'd delete the UUID ref you don't have the UUID -> author mapping anymore (and b.t.w. that could be a many to one mapping). This seems perfectly acceptable to be since the spirit of the GDPR is to prevent easy Googling of who did what in the past, not to prevent someone with tremendous resources from say doing a textual analysis of all git.git commits to find out who authored what. >> Sites that are paranoid about the GDPR could have a pre-receive hook >> rejecting any pushes from EU customers unless their commits were in this >> format. > > This won't work either. The GDPR makes each data processor directly > responsible in relation to the data subject. So it does not matter at > all who is pushing, it matters who is in the author field of the > commits that were pushed. And since you don't have any information > about whether those authors are residing within the EU or not, you have > to assume they are and you have to obey the GDPR. Even if you are > outside the EU and do not have any subsidiaries within the EU, the GDPR > sill applies as long as you are processing personal data of EU citizen. > Perhaps the authorities in your country will refuse to obey letters of > request if the EU authorities try to enforce the GDPR on an > international scope, but if you have a record of GDPR violation and you > ever set foot on EU territory, you are fair game. I think again that this is taking too much of a literalist view. The intent of that policy is to ensure that companies like Google can't just close down their EU offices weasel out of compliance be saying "we're just doing business from the US, it doesn't apply to us". It will not be used against anyone who's taking every reasonable precaution from doing business with EU customers. What do you imagine that this is going to be like? That some EU citizen is going to walk into a small business in South America one day, which somehow is violating the GPDR, and when that business owner goes on holiday to the EU they're going to get detained? Not even the US policy against Cuba is anywhere remotely close to that. >> Instead I'll have a daily UUID issued from a government API > > Heaven forbid. ;) There is an old German proverb, warning that even > humorous trolling might be dangerous: "Man soll den Teufel nicht an die > Wand malen!" ;)
Re: GDPR compliance best practices?
On Sun, Jun 03, 2018 at 12:45:25PM +0200, Ævar Arnfjörð Bjarmason wrote: > protection". I.e. regulators / prosecutors are much likely to go after > some advertising company than some project using a Git repo. Well, it is indeed rather unlikely that one particular git repo project will be targeted, but I guess it is basically certain that at least some of them will be. It is the same as a lottery, it's very unlikely you win the jackpot, yet someone wins it every few months. We should care about the entire community, not be too selfish. > Since the Author is free-form this sort of thing doesn't need to be part > of the git data format. You can just generate a UUID like > "5c679eda-b4e5-4f35-b691-8e13862d4f79" and then set user.name to > "refval:5c679eda-b4e5-4f35-b691-8e13862d4f79" and user.email to > "refval:5c679eda-b4e5-4f35-b691-8e13862d4f79". Well, this is merely pseudonymization, not anonymization. Note that the UUID, innocent as it may look, is not in any way less "personal data" than the author string itself. Your proposal would thus not actually solve the problem, only slightly transform it. Only when you truly anonymize (see my proposal about one way to to it), you can completely evade the GDPR. > Sites that are paranoid about the GDPR could have a pre-receive hook > rejecting any pushes from EU customers unless their commits were in this > format. This won't work either. The GDPR makes each data processor directly responsible in relation to the data subject. So it does not matter at all who is pushing, it matters who is in the author field of the commits that were pushed. And since you don't have any information about whether those authors are residing within the EU or not, you have to assume they are and you have to obey the GDPR. Even if you are outside the EU and do not have any subsidiaries within the EU, the GDPR sill applies as long as you are processing personal data of EU citizen. Perhaps the authorities in your country will refuse to obey letters of request if the EU authorities try to enforce the GDPR on an international scope, but if you have a record of GDPR violation and you ever set foot on EU territory, you are fair game. > Instead I'll have a daily UUID issued from a government API Heaven forbid. ;) There is an old German proverb, warning that even humorous trolling might be dangerous: "Man soll den Teufel nicht an die Wand malen!" ;) Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Sun, Jun 03 2018, Peter Backes wrote: > Unfortunatly this important topic of GDPR compliance has not seen much > interest. I don't think you can infer that there's not much interest, but maybe people just don't have anything to say about it. There's a lot of discussions about this that I've seen, but what they all have in common is that nobody really knows. Just like nobody really knew what the "cookie law" would be like. So I think all of us are just waiting to see. I took the bite and tried to paraphrase some stuff I've read about it, but as you pointed out in 20180417232504.ga4...@helen.plasma.xg8.de I incorrectly surmised some stuff, although I very much suspect that *in practice* the GDPR is going to be more about "consumer protection". I.e. regulators / prosecutors are much likely to go after some advertising company than some project using a Git repo. Just like nobody's going after some local computer club's internal-only website because it sets cookies without asking, but they might go after Facebook for doing the same. > [...] > In course of this, anonymization could also be added. My idea would be > as follows: > > Do not hash anything directly to obtain the commit ID. Instead, hash a > list of hashes of [$random_number, $information] pairs. $information > could be an author id, a commit date, a comment, or anything else. Then > store the commit id, the list of hashes, and the list of pairs to form > the commit. > > If someone requests erasure, simply empty the corresponding pair in the > list. All that would be left would be the hash of the pair, which is > completely anonymous (not more useful than a random number) and thus > not covered by the GDPR. The history could still be completely > verified, and when displaying the log, the erased entry could be > displayed as "<>". > > What do you think about this? Since the Author is free-form this sort of thing doesn't need to be part of the git data format. You can just generate a UUID like "5c679eda-b4e5-4f35-b691-8e13862d4f79" and then set user.name to "refval:5c679eda-b4e5-4f35-b691-8e13862d4f79" and user.email to "refval:5c679eda-b4e5-4f35-b691-8e13862d4f79". Then you'd create a ref on the server like refs/refval/5c679eda-b4e5-4f35-b691-8e13862d4f79 containing the real "$user <$email>". If you then wanted to erase that field you'd just delete the ref, and it would be much easier to teach stuff that renders the likes of git-log to lookup these refs than changing the data format. Sites that are paranoid about the GDPR could have a pre-receive hook rejecting any pushes from EU customers unless their commits were in this format. Perhaps some variation of this is where the GDPR v2 will go. It'll be an "obligation to be forgotten", and I won't be allowed to use my own name anymore. Instead I'll have a daily UUID issued from a government API to use on various forms, and the only way for anyone to resolve that will be going through a webservice that'll reject UUID lookups older than N months, caching those requests will be met with the death penalty. We'll all be free at last. Okey, that last paragraph is just trolling, but I think that refval: -> ref convention is something worth considering if things *really* go in this direction.
Re: GDPR compliance best practices?
Hi, Unfortunatly this important topic of GDPR compliance has not seen much interest. After asking github about how they would cope with the issue of erasing the author field, they changed their privacy policy, which now clarifies that this won't be done. My guess is that this would ultimately rely on "overriding legitimate grounds for the processing" (Art. 17 (1) point (a) GDPR) which is one of the most fragile legitimizations avaiblable in the GDPR. The GDPR emphasizes the importance of using state of the art technology, including anonymization, in as much as possible to ensure privacy. At https://public-inbox.org/git/CA+dhYEViN4-boZLN+5QJyE7RtX+q6a92p0C2O6TA53==bzf...@mail.gmail.com/T/ there is already some discussion about transitioning to a different hashing algorithm to get more in line with state of the art in hashing. (My clear favourite would be SHA-3.) In course of this, anonymization could also be added. My idea would be as follows: Do not hash anything directly to obtain the commit ID. Instead, hash a list of hashes of [$random_number, $information] pairs. $information could be an author id, a commit date, a comment, or anything else. Then store the commit id, the list of hashes, and the list of pairs to form the commit. If someone requests erasure, simply empty the corresponding pair in the list. All that would be left would be the hash of the pair, which is completely anonymous (not more useful than a random number) and thus not covered by the GDPR. The history could still be completely verified, and when displaying the log, the erased entry could be displayed as "<>". What do you think about this? Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Tue, Apr 17, 2018 at 11:38:26PM +0200, Ævar Arnfjörð Bjarmason wrote: > I've been loosely following a similar discussion around blockchains and > my understanding of the situation is that for a project such as say > Linux the GDPR gives you this potential out for that[1]: > > "the personal data are no longer necessary in relation to the > purposes for which they were collected or otherwise processed" > > I.e. you understand that when you submit a patch to linux.git how it's > going to get used, and that it's in a storage system that isn't going to > be pruned just because you ask for it. > [...] > You can make a compelling case that for say submitting your data to the > Bitcoin blockhcain the above quote from article 17 overrides it Well, you're quoting from lit. a but there's also lit. b to f! It says "one of the following grounds applies", not "all of ...". > This is very different from you say joining a company, committing to its > internal git repo, and your name being there in perpetuity, or choosing > to submit a patch to linux.git or git.git. > > I'd think that would be handled the same way as a structural engineering > firm being able to record in perpetuity who it was that drew up the > design for some bridge. Internal repo is entirely unproblematic, since you don't need consent for doing that. It is covered by Art. 6 (1) lit. f. The problem is public repos. Publishing employee information is generally considered not to be covered by Art. 6 (1) lit. f. After all, you can easily publish the software but not the repo. > I don't think it's plausible that the GDPR, > which is probably mainly going to be about consumer protection, is going > to concern itself with that in practice. Oh, no, GDPR is about privacy in general. It's not only about consumer protection. It applies in the same way to employees in relation to their employer and to citizens in relation to the authorities, and to open source contributors in relation to the projects, or to any other data processing outside family and friends (Art. 2 (2) lit. c). I am inclined to assume that Art. 6 (1) lit. b might be the solution, since the licenses typically demand a history of changes to be distributed with the program (for example, GPLv3 section 5 a). After all, the author generally wants to be given credit for his changes and it can be assumed that this one of the conditions for licensing the work in the first place. On the other hand, of course, the author could waive the condition at any time, which means Art. 6 (1) lit. b wouldn't apply anymore and you'd have the same issue as with consent-based processing of the information (lit. a). Best wishes Peter -- Peter Backes, r...@helen.plasma.xg8.de
Re: GDPR compliance best practices?
On Tue, Apr 17 2018, Peter Backes wrote: > I'd like to ask whether anyone has best practices for achieving GDPR > compliance for git repos? The GDPR will come into effect in the EU next > month. > > In particular, how do you cope with the "Right to erasure" concerning > entries in the history of your git repos? > > Erasing author names from the history changes the commit hashes. It is > well known that this leads to a lot of problems. So I don't consider > this a workable solution. > > And how do you justify publishing your employee's name/email as part of > a git commit under GDPR rules in the first place? > > github has the following page mentioning the "Right to erasure" but > AFAICS nothing about how it will be implemented > https://about.gitlab.com/gdpr/ > > Here are discussions I found but they do not really provide a solution: > https://law.stackexchange.com/questions/24623/gdpr-git-history > https://news.ycombinator.com/item?id=16509755 [Not a lawyer and all that] I've been loosely following a similar discussion around blockchains and my understanding of the situation is that for a project such as say Linux the GDPR gives you this potential out for that[1]: "the personal data are no longer necessary in relation to the purposes for which they were collected or otherwise processed" I.e. you understand that when you submit a patch to linux.git how it's going to get used, and that it's in a storage system that isn't going to be pruned just because you ask for it. In combination with the "Conditions for consent"[2] this becomes a bit more tricky. I.e. "The data subject shall have the right to withdraw his or her consent at any time". You can make a compelling case that for say submitting your data to the Bitcoin blockhcain the above quote from article 17 overrides it, but can you for other hash-based-on-hash systems like linux.git? Maybe, maybe not. I think nobody really knows at this point. What I do think is for sure is that there's not going to be any one size fits all solution based on the underlying technology. If I start storing my webserver access logs with IP information in a git repo, I don't get to say "sorry git stores stuff this way, I don't want to rebase it". No court's going to buy that, I've just gone out of my way to use technology that circumvents the GDPR for no particularly good reason. This is very different from you say joining a company, committing to its internal git repo, and your name being there in perpetuity, or choosing to submit a patch to linux.git or git.git. I'd think that would be handled the same way as a structural engineering firm being able to record in perpetuity who it was that drew up the design for some bridge. I don't think it's plausible that the GDPR, which is probably mainly going to be about consumer protection, is going to concern itself with that in practice. There's a lot of middle ground in between those two though. E.g. children are specially protected under the GDPR. Is Linus going to say he doesn't want to rebase linux.git after some 14 year old who regrets submitting code doesn't want his name there anymore? Who knows. Depending on such common cases maybe git itself should eventually support some ways to work around the issues. E.g. we could have some mode to always supply a fake name/e-mail, or make the notice implicit_ident_advice() spews out somewhat scarier. 1. https://gdpr-info.eu/art-17-gdpr/ 2. https://gdpr-info.eu/art-7-gdpr/