Re: problem with not being able to enforce git content filters
On 2018-10-16 05:54 PM, brian m. carlson wrote: [...] >> It doesn't help with direct commits to master, since CI would be >> detecting it after it was committed. And when that happens we all know >> that already because 'git pull' fails. > > Typically projects that have CI set up don't allow direct pushes to > master, since that tends to allow to bypass CI, or they allow it only in > exceptional circumstances (e.g., reverts). Also, typically most > projects want to have some sort of review before code (resp. documents, > other contributions) are merged. Most Git hosting platforms, including > GitHub, offer protected branches, so it's something to consider. > > There is a possible alternative, and that's that if your project has > some sort of build file (e.g., a Makefile), you can set it up to > automatically insert hooks or git configuration into developers' > systems, optionally only if it's in a Git repository. For example, you > could add a pre-commit hook that fails if the filters are disabled. > > These are the approaches that tend to work well for most projects. I > tend to prefer the CI approach with restricted branches because often I > want to customize my hooks, but your project will decide what works best > for it. Yes, Brian, what you're sharing makes total sense when things are setup this way, but this is not the case with the project I'm contributing to. This one is setup, commit first, review later, due to having too few hands. And I have already setup a CI to detect misconfigured systems. It'll catch RPs in time, and everything else post-commit. Let's hope the developers will watch the status of their commits and will react quickly to fix their setup and mend the broken commit, when they see their commit broke things. That's as good as it can get in this particular situation I understand. Thank you, Brian and Ævar for your support and very helpful suggestions. -- Stas Bekman <'>< <'>< https://stasosphere.com https://chestofbooks.com https://experientialsexlab.com https://stason.org https://stasosphere.com/experience-life/my-books
Re: problem with not being able to enforce git content filters
On Tue, Oct 16, 2018 at 04:48:13PM -0700, Stas Bekman wrote: > > >> And the devs honestly try to do their best to remember to configure the > >> filters, but for some reason they disappear for them, don't ask me why, > >> I don't know. This is an open source project team, not a work place. > > > > This sounds like it could be easily solved by continuous integration. > > You could set up a job on any of a variety of services that checks that > > a pull request or other commit is clean when when the filter runs. If > > it doesn't pass, the code doesn't merge. > > This is an excellent idea wrt to PRs. Thank you, Brian! I will implement > that. > > It doesn't help with direct commits to master, since CI would be > detecting it after it was committed. And when that happens we all know > that already because 'git pull' fails. Typically projects that have CI set up don't allow direct pushes to master, since that tends to allow to bypass CI, or they allow it only in exceptional circumstances (e.g., reverts). Also, typically most projects want to have some sort of review before code (resp. documents, other contributions) are merged. Most Git hosting platforms, including GitHub, offer protected branches, so it's something to consider. There is a possible alternative, and that's that if your project has some sort of build file (e.g., a Makefile), you can set it up to automatically insert hooks or git configuration into developers' systems, optionally only if it's in a Git repository. For example, you could add a pre-commit hook that fails if the filters are disabled. These are the approaches that tend to work well for most projects. I tend to prefer the CI approach with restricted branches because often I want to customize my hooks, but your project will decide what works best for it. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: problem with not being able to enforce git content filters
>> And the devs honestly try to do their best to remember to configure the >> filters, but for some reason they disappear for them, don't ask me why, >> I don't know. This is an open source project team, not a work place. > > This sounds like it could be easily solved by continuous integration. > You could set up a job on any of a variety of services that checks that > a pull request or other commit is clean when when the filter runs. If > it doesn't pass, the code doesn't merge. This is an excellent idea wrt to PRs. Thank you, Brian! I will implement that. It doesn't help with direct commits to master, since CI would be detecting it after it was committed. And when that happens we all know that already because 'git pull' fails. -- Stas Bekman <'>< <'>< https://stasosphere.com https://chestofbooks.com https://experientialsexlab.com https://stason.org https://stasosphere.com/experience-life/my-books
Re: problem with not being able to enforce git content filters
On Tue, Oct 16, 2018 at 01:36:37PM -0700, Stas Bekman wrote: > The problem is that it can't be enforced. > > When it's not enforced, we end up with some devs using it and others > don't, or more often is the same dev sometimes doesn't have it configured. > > When a person has a stripped out notebook checked out, when another > person commits un-stripped out notebook, it leads to: invalid `git > status` reports, `git pull` breaks, `git stash` doesn't work, since it > tries to stash using the filters, and `git pull' can never succeed > because it thinks that it'll overwrite the local changes, but `git diff` > returns no changes. > > So the only solution when this happens is to disable the filters, clean > up the mess, re-enable the filters. Many people just make a new clone - > ouch! > > And the biggest problem is that it affects all users who may have the > filters enabled, e.g. because they worked on a PR, and not just devs - > i.e. the repercussions are much bigger than just a few devs affected. > > We can't use server-side hooks to enforce this because the project is on > github. > > And the devs honestly try to do their best to remember to configure the > filters, but for some reason they disappear for them, don't ask me why, > I don't know. This is an open source project team, not a work place. This sounds like it could be easily solved by continuous integration. You could set up a job on any of a variety of services that checks that a pull request or other commit is clean when when the filter runs. If it doesn't pass, the code doesn't merge. This is what other projects do for style-related and tidiness issues. Similar approaches can be used in other situations to enforce that all line endings are LF, or whatever your project desires. I don't think it's a good idea to provide Git configuration to end users, even with prompting, since there are many novice users who don't know what the security implications of various config options are. I also personally never would want to be prompted for such a thing, so even if that were a feature, people would turn if off, and you'd be no better off than you were before. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: problem with not being able to enforce git content filters
On 2018-10-16 02:17 PM, Ævar Arnfjörð Bjarmason wrote: [...] >> We can't use server-side hooks to enforce this because the project is on >> github. > > Ultimately git's a distributed system and we won't ever be able to > enforce that users in their local copies use filters, and they might > edit stuff e.g. in the GitHub UI directly, or with some other git > implementation. In this particular case github won't be a problem, since for the problem to appear it has to be executed on the user side. Editing directly in github UI is not a problem. Just to give a little big more context to the issue in first place. A jupyter notebook is a json file that contains the source code, the outputs from executing that code and the unique user environment bits. We want to "git" the source code but not the rest. Otherwise merging is a hell. Hence the stripping. There are other approaches to solve this problem, besides stripping, but they all require some kind of pre-processing before committing the file. > So if you have such special needs maybe consider hosting your own setup > where you can have pre-receive hooks, or within GitHub e.g. enforce that > all things must go through merge requests, and have some robot that > checks that the content to be merged has gone through filters before > being approved. Yes, that would be ideal. But I doubt this is going to happen, I'm just a contributing developer, not the creator/owner of the project. And as I said this affects anybody who collaborates on jupyter notebooks, not just in our project. I think there are several millions of them on github. > *Picks up flag*. For the purposes of us trying to understand this report > it would be really useful to boil what's happening down to some > step-by-step reproducible recipe. > > I.e. with some dummy filter configured & not configured how does "git > pull" end up breaking, and in this case you're alluding to git simply > forgetting config, how would that happen? The problem of 'git pull' and 'git status' and 'git stash' is presented in details here: https://stackoverflow.com/questions/51883227/git-pull-stash-conflicts-with-a-git-filter and more here: https://github.com/kynan/nbstripout/issues/65 https://github.com/jupyter/nbdime/issues/410#issuecomment-412999758 The problem of configuration disappearing I sadly have no input. It doesn't happen to me, and all I get from others is that it first works, and then it doesn't. Perhaps it has something to do with some of them using windows. I don't know, sorry. >> The bottom line it sucks and I hope that you can help with offering a >> programmatic solution, rather than recommending creating a police >> department. > > I think it would be great to have .gitconfig in-repo support, but a lot > of security & UI problems need to be surmounted before that would > happen, here's some previous ramblings of mine on that issue: > https://public-inbox.org/git/?q=87zi6eakkt.fsf%40evledraar.gmail.com > > It now occurs to me that a rather minimal proof-of-concept version of > that would be: > > 1. On repository clone, detect if HEAD:.gitconfig exists > > 2. If it does, and we trust $(git config -f -l) > enough, emit some advice() output saying the project suggests > setting config so-and-so, and that you could run the following one > liner(s) to set it if you agree. > > 3. Once we have that we could eventually nudge our way towards > something like what I suggested in the linked threads above, > i.e. consuming some subset of that config directly from the repo's > HEAD:.gitconfig I like it, Ævar! I feel doing the check and prompting the user on the first push/commit after clone would be a better. It'd be too easy for the user to skip that step on git clone. In our particular case we want it where the problem is introduced, which is on commit, and not on clone. I hope it makes sense. -- Stas Bekman <'>< <'>< https://stasosphere.com https://chestofbooks.com https://experientialsexlab.com https://stason.org https://stasosphere.com/experience-life/my-books
Re: problem with not being able to enforce git content filters
On Tue, Oct 16 2018, Stas Bekman wrote: > When a person has a stripped out notebook checked out, when another > person commits un-stripped out notebook, it leads to: invalid `git > status` reports, `git pull` breaks, `git stash` doesn't work, since it > tries to stash using the filters, and `git pull' can never succeed > because it thinks that it'll overwrite the local changes, but `git diff` > returns no changes. Planting a flag here. Let's get to this later. > So the only solution when this happens is to disable the filters, clean > up the mess, re-enable the filters. Many people just make a new clone - > ouch! > > And the biggest problem is that it affects all users who may have the > filters enabled, e.g. because they worked on a PR, and not just devs - > i.e. the repercussions are much bigger than just a few devs affected. > > We can't use server-side hooks to enforce this because the project is on > github. Ultimately git's a distributed system and we won't ever be able to enforce that users in their local copies use filters, and they might edit stuff e.g. in the GitHub UI directly, or with some other git implementation. So if you have such special needs maybe consider hosting your own setup where you can have pre-receive hooks, or within GitHub e.g. enforce that all things must go through merge requests, and have some robot that checks that the content to be merged has gone through filters before being approved. > And the devs honestly try to do their best to remember to configure the > filters, but for some reason they disappear for them, don't ask me why, > I don't know. This is an open source project team, not a work place. *Picks up flag*. For the purposes of us trying to understand this report it would be really useful to boil what's happening down to some step-by-step reproducible recipe. I.e. with some dummy filter configured & not configured how does "git pull" end up breaking, and in this case you're alluding to git simply forgetting config, how would that happen? > There needs to be a way for a project to define content filters w/o > going via user's .gitconfig configuration, since due to git's security > it can't be distributed to all users and must be enabled manually by > each user, which is just not the right solution in this case. > > Of course, I'm open to other suggestions that may help this issue. > > "Tell your developers they must configure the filters" is not it - I > tried it for a long time and in vain. If you look at our install > instructions: https://github.com/fastai/fastai#developer-install > > git clone https://github.com/fastai/fastai > cd fastai > tools/run-after-git-clone > > It already includes an instruction to run a script which enables the > filters, but this doesn't seem to help (and no, it's not a problem with > the script). The devs report that the configuration is there for a > while, and then suddenly it is not there, I don't know why. Perhaps they > make a new clone and forget to re-enable the filters, perhaps they > disable them to clean up and forget to reenable them, I can't tell. FWIW I tried following these on my Debian install and both on python2 and python3 I get either syntax errors or a combo of that and a missing pathlib from that script. I don't know Python well enough to debug it. Maybe that's part of the issue? > The bottom line it sucks and I hope that you can help with offering a > programmatic solution, rather than recommending creating a police > department. I think it would be great to have .gitconfig in-repo support, but a lot of security & UI problems need to be surmounted before that would happen, here's some previous ramblings of mine on that issue: https://public-inbox.org/git/?q=87zi6eakkt.fsf%40evledraar.gmail.com It now occurs to me that a rather minimal proof-of-concept version of that would be: 1. On repository clone, detect if HEAD:.gitconfig exists 2. If it does, and we trust $(git config -f -l) enough, emit some advice() output saying the project suggests setting config so-and-so, and that you could run the following one liner(s) to set it if you agree. 3. Once we have that we could eventually nudge our way towards something like what I suggested in the linked threads above, i.e. consuming some subset of that config directly from the repo's HEAD:.gitconfig
problem with not being able to enforce git content filters
Hi, TL;DR Our open source project dev team has a continuous problem with git content filters, because developers don't always have them configured. We need a way for git to support content filters w/o using user's .gitconfig. Otherwise it leads to an inconsistent behavior and messed up git checkouts. = Full story: We use a version of the nbstripout content filter (https://github.com/kynan/nbstripout), which removes user-specific information from the jupyter notebooks during commit. But the problem would be the same with any one way clean filter. First the setup: https://github.com/kynan/nbstripout#manual-filter-installation = Set up a git filter using nbstripout as follows: git config filter.nbstripout.clean '/path/to/nbstripout' git config filter.nbstripout.smudge cat git config filter.nbstripout.required true Create a file .gitattributes or .git/info/attributes with: *.ipynb filter=nbstripout Apply the filter for git diff of *.ipynb files: git config diff.ipynb.textconv '/path/to/nbstripout -t' In file .gitattributes or .git/info/attributes add: *.ipynb diff=ipynb = The problem is that it can't be enforced. When it's not enforced, we end up with some devs using it and others don't, or more often is the same dev sometimes doesn't have it configured. When a person has a stripped out notebook checked out, when another person commits un-stripped out notebook, it leads to: invalid `git status` reports, `git pull` breaks, `git stash` doesn't work, since it tries to stash using the filters, and `git pull' can never succeed because it thinks that it'll overwrite the local changes, but `git diff` returns no changes. So the only solution when this happens is to disable the filters, clean up the mess, re-enable the filters. Many people just make a new clone - ouch! And the biggest problem is that it affects all users who may have the filters enabled, e.g. because they worked on a PR, and not just devs - i.e. the repercussions are much bigger than just a few devs affected. We can't use server-side hooks to enforce this because the project is on github. And the devs honestly try to do their best to remember to configure the filters, but for some reason they disappear for them, don't ask me why, I don't know. This is an open source project team, not a work place. You can see some related complaints here: https://github.com/kynan/nbstripout/issues/65#issuecomment-430346894 https://stackoverflow.com/questions/51883227/git-pull-stash-conflicts-with-a-git-filter and I can find you a whole bunch more if you need more evidence. == Proposed solution: There needs to be a way for a project to define content filters w/o going via user's .gitconfig configuration, since due to git's security it can't be distributed to all users and must be enabled manually by each user, which is just not the right solution in this case. Of course, I'm open to other suggestions that may help this issue. "Tell your developers they must configure the filters" is not it - I tried it for a long time and in vain. If you look at our install instructions: https://github.com/fastai/fastai#developer-install git clone https://github.com/fastai/fastai cd fastai tools/run-after-git-clone It already includes an instruction to run a script which enables the filters, but this doesn't seem to help (and no, it's not a problem with the script). The devs report that the configuration is there for a while, and then suddenly it is not there, I don't know why. Perhaps they make a new clone and forget to re-enable the filters, perhaps they disable them to clean up and forget to reenable them, I can't tell. The bottom line it sucks and I hope that you can help with offering a programmatic solution, rather than recommending creating a police department. Thank you for listening. -- Stas Bekman <'>< <'>< https://stasosphere.com https://chestofbooks.com https://experientialsexlab.com https://stason.org https://stasosphere.com/experience-life/my-books