Re: [PATCH v2 0/9] Teach 'run' perf script to read config files
On 26 September 2017 at 16:40, Christian Couderwrote: > On Sun, Sep 24, 2017 at 9:59 AM, Junio C Hamano wrote: >> Christian Couder writes: >> >>> (It looks like smtp.gmail.com isn't working anymore for me, so I am >>> trying to send this using Gmail for the cover letter and Submitgit for >>> the patches.) >> >> SubmitGit may want to learn the "change the timestamps of the >> individual patches by 1 second" trick from "git send-email" to help >> threading (you can view inbox/comp.version-control.git/ group over >> nntp and tell your newsreader to sort-by-date). > > Roberto is now in CC. I will let him answer about that. I had a quick look at git-send-email.perl, I see the trick is the `time++` one introduced with https://github.com/git/git/commit/a5370b16 - seems reasonable! SubmitGit makes all emails in-reply-to the initial email, which I think is correct behaviour, but I can see that offsetting the times would probably give a more reliable sorting in a newsreader. Unfortunately the documentation for AWS Simple Email Service (SES) says: "Note: Amazon SES overrides any Date header you provide with the time that Amazon SES accepts the message." http://docs.aws.amazon.com/ses/latest/DeveloperGuide/header-fields.html ...so the only way SubmitGit can offset the times is to literally delay the sending of the emails, which is a bit unfortunate for patchbombs more than a few dozen commits long! I'll take a further look at this when I get a bit more free time. Roberto
[PATCH 1/2] Partition SubmittingPatches doc into two files
No editorial changes in this commit, the text that is transferred into the second file is unchanged apart from minor chunk re-ordering. The split is based on: * Information needed for all users, whether using `git send-email` or submitGit (ie good commit practice, mailing list etiquette) * Information needed just for `git send-email`/MUA users (generating the right kind of diff, avoid MIME & PGP, send-email & MUA specific hints) --- Documentation/SubmittingPatches | 137 - Documentation/SubmittingPatchesByMUA | 142 +++ 2 files changed, 142 insertions(+), 137 deletions(-) create mode 100644 Documentation/SubmittingPatchesByMUA diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index 98fc4cc..6dca41d 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -119,11 +119,6 @@ archive, summarize the relevant points of the discussion. (3) Generate your patch using Git tools out of your commits. -Git based diff tools generate unidiff which is the preferred format. - -You do not have to be afraid to use -M option to "git diff" or -"git format-patch", if your patch involves file renames. The -receiving end can handle them just fine. Please make sure your patch does not add commented out debugging code, or include any extra files which do not relate to what your patch @@ -136,11 +131,6 @@ that is fine, but please mark it as such. (4) Sending your patches. -Learn to use format-patch and send-email if possible. These commands -are optimized for the workflow of sending patches, avoiding many ways -your existing e-mail client that is optimized for "multipart/*" mime -type e-mails to corrupt and render your patches unusable. - People on the Git mailing list need to be able to read and comment on the changes you are submitting. It is important for a developer to be able to "quote" your changes, using standard @@ -148,18 +138,8 @@ e-mail tools, so that they may comment on specific portions of your code. For this reason, each patch should be submitted "inline" in a separate message. -Multiple related patches should be grouped into their own e-mail -thread to help readers find all parts of the series. To that end, -send them as replies to either an additional "cover letter" message -(see below), the first patch, or the respective preceding patch. -If your log message (including your name on the -Signed-off-by line) is not writable in ASCII, make sure that -you send off a message in the correct encoding. -WARNING: Be wary of your MUAs word-wrap -corrupting your patch. Do not cut-n-paste your patch; you can -lose tabs that way if you are not careful. It is a common convention to prefix your subject line with [PATCH]. This lets people easily distinguish patches from other @@ -187,31 +167,6 @@ an explanation of changes between each iteration can be kept in Git-notes and inserted automatically following the three-dash line via `git format-patch --notes`. -Do not attach the patch as a MIME attachment, compressed or not. -Do not let your e-mail client send quoted-printable. Do not let -your e-mail client send format=flowed which would destroy -whitespaces in your patches. Many -popular e-mail applications will not always transmit a MIME -attachment as plain text, making it impossible to comment on -your code. A MIME attachment also takes a bit more time to -process. This does not decrease the likelihood of your -MIME-attached change being accepted, but it makes it more likely -that it will be postponed. - -Exception: If your mailer is mangling patches then someone may ask -you to re-send them using MIME, that is OK. - -Do not PGP sign your patch, at least for now. Most likely, your -maintainer or other people on the list would not have your PGP -key and would not bother obtaining it anyway. Your patch is not -judged by who you are; a good patch from an unknown origin has a -far better chance of being accepted than a patch from a known, -respected origin that is done poorly or does incorrect things. - -If you really really really really want to do a PGP signed -patch, format it as "multipart/signed", not a text/plain message -that starts with '-BEGIN PGP SIGNED MESSAGE-'. That is -not a text/plain, it's something else. Send your patch with "To:" set to the mailing list, with "cc:" listing people who are involved in the area you are touching (the output from @@ -370,95 +325,3 @@ Know the status of your patch after submission entitled "What's cooking in git.git" and "What's in git.git" giving the status of various proposed changes. - -MUA specific hints - -Some of patches I receive or pick up from the list share common -patterns of breakage. Please make sure your MUA is set up -properly not to corrupt whitespaces. - -See the DISCUSSION section of git-format-patch(1) for hints on
[PATCH 2/2] Add submitGit patch-submission information
Most of the guidance on how to use submitGit will stay with the tool itself, so the edits here are mostly to make the choice clear to users. Because generation of patches is quite different for MUA-users and submitGit users, I've merged section 3 and 4 together: section 3 - 'Generate your patch using Git tools out of your commits.' + section 4 - 'Sending your patches.' = new section 3 - 'Generate and send your patch to the Git mailing list' I've edited the text of old section 3 to make it more concise (using 'make sure' for emphasis just once before presenting the requirements list). --- Documentation/SubmittingPatches | 44 +++-- 1 file changed, 29 insertions(+), 15 deletions(-) diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index 6dca41d..9735236 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -117,29 +117,43 @@ without external resources. Instead of giving a URL to a mailing list archive, summarize the relevant points of the discussion. -(3) Generate your patch using Git tools out of your commits. - - -Please make sure your patch does not add commented out debugging code, -or include any extra files which do not relate to what your patch -is trying to achieve. Make sure to review -your patch after generating it, to ensure accuracy. Before -sending out, please make sure it cleanly applies to the "master" -branch head. If you are preparing a work based on "next" branch, -that is fine, but please mark it as such. - - -(4) Sending your patches. +(3) Generate and send your patch to the Git mailing list People on the Git mailing list need to be able to read and comment on the changes you are submitting. It is important for a developer to be able to "quote" your changes, using standard e-mail tools, so that they may comment on specific portions of your code. For this reason, each patch should be submitted -"inline" in a separate message. +"inline" (not as an attachment) in a separate message. + +There can be unexpected problems in sending patches: + + . Webmail clients like Gmail generally corrupt whitespace in patches. + . messages using HTML-formatting (used by default in many webmail +clients) is automatically rejected by the Git mailing list server. + +Because of these factors, it's recommended that you use one of these +specific methods to generate and send your patchs: + + - Generate mail-ready patch files using "git format-patch" and +send them using "git send-email" to the Git mailing list. +See SubmittingPatchesByMUA for further details. + - Create a pull request on https://github.com/git/git and +use https://submitgit.herokuapp.com/ to send it as a patch series +to the mailing list. Note that the PR is just the place where your +patch is born - discussion of the patch should still take place on +the Git mailing list. +Please make sure to review your patch before sending it, to ensure that +it: + . accurately reflects the change you want to make + . does not add commented-out debugging code, or include any extra +files which do not relate to what your patch is trying to achieve. + . cleanly applies to the "master" branch head. If you are preparing +a work based on "next" branch, that is fine, but please mark it as +such. It is a common convention to prefix your subject line with [PATCH]. This lets people easily distinguish patches from other @@ -186,7 +200,7 @@ patch. *2* The mailing list: git@vger.kernel.org -(5) Sign your work +(4) Sign your work To improve tracking of who did what, we've borrowed the "sign-off" procedure from the Linux kernel project on patches -- https://github.com/git/git/pull/223 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] commit: add a commit.verbose config variable
On 11 March 2016 at 05:44, Eric Sunshinewrote: > On Fri, Mar 11, 2016 at 05:45:27AM +0530, Pranit Bauva wrote: >> Actually I am sending the patches with submitGit herokuapp because my >> institute proxy does not allow IMAP/POP3 connections. Really glad to hear this is helping you Pranit - I hadn't even thought of the blocked IMAP/POP3 connections problem, I'm not sure what other method you could have easily used to get round this. > That's unfortunate. Your separate "cover letter" often arrives hours > later than the patch itself. Perhaps Roberto can comment on submitGit > and per-patch commentary. This sounds like an improvement I need to make to submitGit, I've created an issue here: https://github.com/rtyley/submitgit/issues/30 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Update diff-highlight
On 22 February 2016 at 04:49, Eric Sunshinewrote: > On Sun, Feb 21, 2016 at 11:14 PM, Peter Dave Hello > wrote: >> From: Peter Dave Hello > > This "From:" line looks suspiciously incorrect. If anything, you'd > probably want to drop the line altogether or use: > > From: Peter Dave Hello Peter's commit (https://github.com/git/git/commit/15415c6e) had an author of 'peterdavehe...@users.noreply.github.com' (perhaps because the commit was generated through GitHub's interface?), and submitGit added it as an in-body 'From: ' line because it differed from the address used to send the email (h...@peterdavehello.org - submitGit always uses the user's primary-email-address-in-GitHub to send the email). A 'noreply' address is obviously not wanted in this context though, so I've updated submitGit to disregard them when deciding whether or not to generate an in-body 'From: ' header: https://github.com/rtyley/submitgit/pull/29 Roberto -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1] travis-ci: override CFLAGS properly, add -Wdeclaration-after-statement
On 9 February 2016 at 18:42, Junio C Hamanowrote: > Lars Schneider writes: >> Jeff Merkey made me aware of http://kernelnewbies.org/FirstKernelPatch [2] >> where I found checkpatch.pl [3]. Would it make sense to check all commits >> that are not in next/master/maint with this script on Travis-CI? > > That does not help very much. These changes are already shown to > people and dirtied their eyes, and most likely I've already have > wasted time tweaking the glitches out locally. The damage has > already been done. > > It would make a lot of sense if the checkpatch is called inside > Roberto Tyley's "pull-request-to-patch-submission" thing, though. I've not personally run checkpatch.pl (as Peff mentioned, it's not actually a documented part of the Git project's recommend contribution workflow) - I'm still trying to understand whether it will restrict it's errors to just the things that are introduced in a patch, or if it will indiscriminately mention existing problems too (of which I guess there are many already present in the live Git codebase?). If it mentions _existing_ problems, I wouldn't personally want it in any automated flow until it can be tuned to find the trees of master/maint totally clean. At that point it could be added to the Travis build, and GitHub would automatically reflect the Travis status in any git/git PR. I like the idea of giving helpful guidance to users on how to make their patches cleaner - I'm not that enthusiastic about submitGit invoking the checkpatch.pl script directly at this point, given that it lives in a separate project (the linux kernel) and the version Junio uses is patched off _that_ - I'm lazy enough to not want to try to get that all to work reliably on a little transient Heroku box. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] stash: use "stash--helper"
On 28 January 2016 at 21:41, Stefan Bellerwrote: > On Thu, Jan 28, 2016 at 1:25 PM, Matthias Aßhauer wrote: https://github.com/git/git/pull/191 >>> >>> Oh I see you're using the pull-request to email translator, cool! Yay! >> Yes, I did. It definitly makes things easier if you are not used to mailing >> lists, but it was also a bit of a kerfuffle. I tried to start working on >> coverletter support, but I couldn't get it to accept the amazon SES >> credentials I provided. I ended up manually submiting the coverletter. It >> also didn't like my name. Apologies for that - https://github.com/rtyley/submitgit/pull/26 has just been deployed, which should resolve the encoding for non-US ASCII characters - if you feel like submitting another patch, and want to put the eszett back into your GitHub account display name, I'd be interested to know how that goes. > Not sure if Roberto, the creator of that tool, follows the mailing > list. I cc'd him. I don't closely follow the mailing list, so thanks for the cc! Roberto -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH v1] Add Travis CI support
On 28 September 2015 at 19:47, Junio C Hamanowrote: > I won't enable it on github.com:gitster/git anyway, so I do not > think that is a concern. I thought what people are talking about > was to add it on github.com:git/git, but have I been misreading the > thread? I do not even own the latter repository (I only can push > into it). I was momentarily surprised to hear that Junio doesn't own github.com/git/git but I had a quick look at the github.com/git organisation, and it turns out that Peff and Scott Chacon are the current owners - so at the moment I think they're the only ones who could switch on the GitHub webhook to hit Travis. For what it's worth, I'd love to see Travis CI - or any form of CI - running for the core Git project. It doesn't require giving write access to Travis, and beyond the good reasons given by Lars, I'm also personally interested because it opens up the possibility of some useful enhancements to the submitGit flow - so that you can't send email to the list without knowing you've broken tests first. Regarding Luke's concerns about excess emails coming from CI, default Travis behaviour is for emails to be sent to the committer and author, but only if they have write access to the repository the commit was pushed to: http://docs.travis-ci.com/user/notifications/#How-is-the-build-email-receiver-determined%3F If Travis emails do become problematic, you can disable them completely by adding 2 lines of config to the .travis.yml: http://docs.travis-ci.com/user/notifications/#Email-notifications Given this, enabling Travis CI for git/git seems pretty low risk, are there any strong objections to it happening? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] rebase -i: demonstrate incorrect behavior of post-rewrite
On 22 May 2015 at 16:59, Junio C Hamano gits...@pobox.com wrote: Roberto, isn't your threading of multi-patch series busted? Why is 1/2 a follow-up to 2/2? Do you have a time-machine ;-)? Oh, embarrassing, I better destroy the time-machine: https://github.com/rtyley/submitgit/pull/5 This was due to me not realising that the GitHub API returns commit lists for PRs in reverse-chronological order... thanks for pointing that out! -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Announce] submitGit for patch submission (was Diffing submodule does not yield complete logs)
On Tuesday, 19 May 2015, Stefan Beller sbel...@google.com wrote: On Tue, May 19, 2015 at 12:29 PM, Robert Dailey rcdailey.li...@gmail.com wrote: How do you send your patches inline? [snip] This workflow discussion was a topic at the GitMerge2015 conference, and there are essentially 2 groups, those who know how to send email and those who complain about it. A solution was agreed on by nearly all of the contributors. It would be awesome to have a git-to-email proxy, such that you could do a git push proxy master:refs/for/mailinglist and this proxy would convert the push into sending patch series to the mailing list. It could even convert the following discussion back into comments (on Github?) but as a first step we'd want to try out a one way proxy. Unfortunately nobody stepped up to actually do the work, yet :( Hello, I'm stepping up to do that work :) Or at least, I'm implementing a one-way GitHub PR - Mailing list tool, called submitGit: https://submitgit.herokuapp.com/ Here's what a user does: * create a PR on https://github.com/git/git * logs into https://submitgit.herokuapp.com/ with GitHub auth * selects their PR on https://submitgit.herokuapp.com/git/git/pulls * gets submitGit to email the PR as patches to themselves, in order to check it looks ok * when they're ready, get submitGit to send it to the mailing list on their behalf All discussion of the patch *stays* on the mailing list - I'm not attempting to change anything about the Git community process, other than make it easier for a wider group people to submit patches to the list. For hard-core contributors to Git, I'd imagine that git format-patch send-email remain the fastest way to do their work. But those tools are _unfamiliar to the majority of Git users_ - so submitGit aims to cater to those users, because they definitely have valuable contributions to make, which would be tragic to throw away. I've been working on submitGit in my spare time for the past few weeks, and there are still features I plan to add (like guiding the user to more 'correct' word wrapping, sign-off, etc), but given this discussion, I wanted to chime in and let people know what's here so far. It would be great if people could take the time to explore the tool (you don't have to raise a git/git PR in order to try sending one *to yourself*, for instance) and give feedback on list, or in GitHub issues: https://github.com/rtyley/submitgit/issues I've been lucky enough to discuss the ideas around submitGit with a few people at the Git-Merge conf, so thanks to Peff, Thomas Ferris Nicolaisen, and Emma Jane Hogbin Westby for listening to me (not to imply their endorsement of what I've done, just thanks for talking about it!). Roberto -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Diffing submodule does not yield complete logs for merge commits
On Tuesday, 19 May 2015, Stefan Beller sbel...@google.com wrote: On Tue, May 19, 2015 at 12:29 PM, Robert Dailey rcdailey.li...@gmail.com wrote: How do you send your patches inline? This workflow discussion was a topic at the GitMerge2015 conference, and there are essentially 2 groups, those who know how to send email and those who complain about it. A solution was agreed on by nearly all of the contributors. It would be awesome to have a git-to-email proxy, such that you could do a git push proxy master:refs/for/mailinglist and this proxy would convert the push into sending patch series to the mailing list. It could even convert the following discussion back into comments (on Github?) but as a first step we'd want to try out a one way proxy. Unfortunately nobody stepped up to actually do the work, yet :( I've replied to this on a separate announcement thread on the Git mailing list here: http://thread.gmane.org/gmane.comp.version-control.git/269699 ...I've created a new tool called submitGit, which aims to help. I am willing to review the typical workflow for contributing via git on mailing lists but I haven't seen any informative reading material on this. I just find using command line to email patches and dealing with other issues not worth the trouble. Lack of syntax highlighting, lack of monospace font, the fact that I'm basically forced to install mail client software just to contribute a single git patch. I'd be interested to know what you think! Roberto -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filter-branch performance
On 9 December 2014 at 18:59, Jeff King p...@peff.net wrote: On Tue, Dec 09, 2014 at 07:52:33PM +0100, Henning Moll wrote: I assume that there is a lot of process forking going on. Could that be the cause? Yes. filter-branch is a shell scripts, and it is probably running multiple git commands per commit it is filtering. Any ideas how to further improve? Depending on how much time you can sink into improving the performance (versus just allowing the process to run to completion), you could also look into a non-forking solution, as well as not bothering to load the commit trees. To me non-forking means putting everything into the JVM by using JGit, like the BFG does, though libgit2 might also be an option. Changing the BFG's code to do the transformation in your script is absolutely trivial - define a commit-node cleaner like this: object SetCommitterToAuthor extends CommitNodeCleaner { override def fixer(kit: CommitNodeCleaner.Kit) = c = c.copy(committer = c.author) // PersonIdent class holds name, email time } ...trivial if you don't mind compiling Scala with SBT that is, and I'm sure some people do! A DSL for non-Scala people to define their own BFG scripts would be good, I must get on that some day. The BFG is generally faster than filter-branch for 3 reasons: 1. No forking - everything stays in the JVM process 2. Embarrassingly parallel algorithm makes good use of multi-core machines 3. Memoization means no Git object (file or folder) is cleaned more than once In the case of your problem, only the first factor will be noticeably helpful. Unfortunately commits do need to be cleaned sequentially, as their hashes depend on the hashes of their parents, and filter-branch doesn't clean /commits/ more than once, the way it does with files or folders - so the last 2 reasons in the list won't be significant. For your specific use case tho', the fact that BFG doesn't load the file tree at all unless it needs to clean it will also help. I decided to knock up an egregious hack in the BFG to see what performance would be like. I ran it against a fairly large repo (https://github.com/bfg-repo-cleaner-demos/intellij-community-original), 100k commits, stored in /dev/shm, and used the SetCommitterToAuthor code above. The BFG run completed in 31.7 seconds, you can see the resulting repo here: https://github.com/rtyley/intellij-community-set-committer-to-author I started running the same test some time ago using filter-branch, unfortunately that test has not completed yet - the BFG appears to be substantially faster. Before: $ git cat-file -p b02bf46c4e93c2e8570910cdd68eb6f4ce21ff81 tree 7a412e49ecdbd966d7efe5fe746ff3ea3b6067d1 parent 8794219e3e84aed3cc8af926ffd74beafa51fb6b author peter pe...@jetbrains.com 1370854045 +0200 committer peter pe...@jetbrains.com 1370854098 +0200 After: $ git cat-file -p 3adb7b2a5c87320a5a028b6a59a7132c75a6e91c tree 7a412e49ecdbd966d7efe5fe746ff3ea3b6067d1 parent 5efcdb551789b0d0bb541de9325f09521c5fbcb6 author peter pe...@jetbrains.com 1370854045 +0200 committer peter pe...@jetbrains.com 1370854045 +0200 - time fixed The relevant code is in: https://github.com/rtyley/bfg-repo-cleaner/compare/set-committer-to-author -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filter-branch performance
On 10 December 2014 at 14:37, Jeff King p...@peff.net wrote: On Wed, Dec 10, 2014 at 02:18:24PM +, Roberto Tyley wrote: object SetCommitterToAuthor extends CommitNodeCleaner { override def fixer(kit: CommitNodeCleaner.Kit) = c = c.copy(committer = c.author) // PersonIdent class holds name, email time } Thanks. I _almost_ mentioned BFG in the original email, but I didn't think it could do arbitrary fixes like this. Can you monkey-patch in arbitrary code, or do you have to rebuild all of BFG to include the snippet above? Well, I publish a bfg-library jar to Maven Central, so you don't need to rebuild that: http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22bfg-library_2.11%22 ...in principle you can write a Java/Groovy/whatever project that calls that jar (your entry point would be com.madgag.git.bfg.cleaner.RepoRewriter) - tho' to be honest, I can't swear to how /friendly/ the API would be to call from non-Scala-land though, as I haven't tried it. Incidentally, if people want to try compiling this monkey-patched BFG at home, this is how you'd do it: * Install SBT - http://www.scala-sbt.org/download.html (or 'brew install sbt' for Mac OS X) * git clone https://github.com/rtyley/bfg-repo-cleaner.git --branch set-committer-to-author * cd bfg-repo-cleaner * sbt bfg/run --no-blob-protection There will be a lot of automated downloading of dependencies, and compilation will be slow the first time around, but at least there aren't that many steps. I do realise that being Scala/JVM based makes working on the BFG a bit of a specialist activity at the moment! A DSL for non-Scala people to define their own BFG scripts would be good, I must get on that some day. That would be cool. Even if the DSL was just Java, if you could do something like: vi fix.java javac fix.java bfg --filter=fix.class that would be very useful (and I am probably showing my lack of Java chops by getting the compilation command or filenames wrong :) ). Your syntax is right :) I'll give it some thought. I started running the same test some time ago using filter-branch, unfortunately that test has not completed yet - the BFG appears to be substantially faster. No fair if you didn't run filter-branch on a PC and BFG on a Raspberry Pi. You have to give us a fighting chance. :) I guess I made that rod for my own back :) http://youtu.be/Ir4IHzPhJuI for those who haven't seen it. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blobs not referenced by file (anymore) are not removed by GC
On 10 December 2014 at 16:07, Junio C Hamano gits...@pobox.com wrote: Jeff King p...@peff.net writes: git reflog expire --expire=now --all git gc --prune=now --aggressive Maybe: git gc --purge Yeah, that is common enough that it might be worthwhile (you probably want --expire-unreachable in the reflog invocation, though). Also you would not want an unconditional --aggressive. After a big rewrite deleting files the re-optimisation of --aggressive can make a big difference to packsize - for instance 1.2GB to 768MB in a test I just ran - but of course it is *much* slower, so I suspect you're right about not including it. I wasn't aware of the '--expire-unreachable=all' switch, though it seems like a 'milder' version of the '--expire=now' switch? - in that it would keep reflog entries if they haven't been changed, which is fair enough and compatible with the 'purge' goal. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filter-branch performance
On 10 December 2014 at 16:05, Junio C Hamano gits...@pobox.com wrote: Roberto Tyley roberto.ty...@gmail.com writes: The BFG is generally faster than filter-branch for 3 reasons: 1. No forking - everything stays in the JVM process 2. Embarrassingly parallel algorithm makes good use of multi-core machines 3. Memoization means no Git object (file or folder) is cleaned more than once In the case of your problem, only the first factor will be noticeably helpful. Unfortunately commits do need to be cleaned sequentially, as their hashes depend on the hashes of their parents, and filter-branch doesn't clean /commits/ more than once, the way it does with files or folders - so the last 2 reasons in the list won't be significant. Just this part. If your history is bushy, you should be able to rewrite histories of merged branches in parallel up to the point they are merged---rewriting of the merge commit of course has to wait until all the branches have been rewritten, though. That's true, and the bfg does take advantage of that parallelism, so as well as point 1, point 2 will provide some benefit if history is bushy enough :) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blobs not referenced by file (anymore) are not removed by GC
On 9 December 2014 at 14:14, Jeff King p...@peff.net wrote: On Mon, Dec 08, 2014 at 05:22:23PM +0100, Martin Scherer wrote: # invoke bfg --delete-folders something multiple times with different pattern. # try to cleanup git gc --aggressive --prune=now # big blobs still in history git fsck # no results git fsck --full --unreachable --dangling # no results Might you still have reflogs pointing to the objects? Try: git reflog expire --expire-unreachable=now --all Yeah, we figured that's what it was! https://github.com/rtyley/bfg-repo-cleaner/issues/62#issuecomment-66152559 I also don't know if BFG keeps backup refs around (filter-branch, for example, writes a copy of the original refs into refs/original; you would want to delete that if you're trying to slim down the repo). The BFG reports the ref changes to the command line (and outputs a full list of changed object-ids in repo-name.git.bfg-report/[datetime]/object-id-map.old-new.txt) but doesn't keep refs (like refs/original) around because that would get in the way of the BFG's explicit intended use-case of removing unwanted data. Thanks for the object-size checking scripts, very useful. Roberto -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blobs not referenced by file (anymore) are not removed by GC
On Tuesday, 9 December 2014, Jeff King p...@peff.net wrote: I actually think filter-branch's refs/original is a bit outdated at this point. The information is there in the reflogs already, and dealing with refs/original often causes confusion in my experience. It could probably use a git filter-branch --restore or something to switch each $ref to $ref@{1} (after making sure that the reflog entry was from filter-branch, of course). Yeah, I'd agree that refs/original can cause confusion. Not that I expect you to want to work on filter-branch. :) But maybe food for thought for a BFG feature. I haven't heard much demand for a recover/restore feature on the BFG (I think by the time people get to the BFG, they're pretty sure they want to go ahead with the procedure!) but I'll bear it in mind. Mind you, to make the post-rewrite clean-up easier, I'd be happy to contribute a patch that gives 'gc' a flag to do the equivalent of: git reflog expire --expire=now --all git gc --prune=now --aggressive Maybe: git gc --purge ?? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blobs not referenced by file (anymore) are not removed by GC
Hi Martin, I'm the developer of the BFG - I'd guess that there probably isn't a bug for Git developers here, so you might want to open one or more issues at https://github.com/rtyley/bfg-repo-cleaner/issues, where I'd be happy to take a look. best regards, Roberto On 8 Dec 2014 16:35, Martin Scherer m.sche...@fu-berlin.de wrote: Hi, after using BFG on a repo given certain directory globs, all of those files(names) are gone from history, but can not be collected by garbage collection anymore. So the blobs of the underlying files are not deleted and only the file names are not associated with the blob anymore. I wonder, if I discovered a bug (at least in bfg). But I expect git to discover that this blobs are not used in any way (so they have to associated to something right?) # invoke bfg --delete-folders something multiple times with different pattern. # try to cleanup git gc --aggressive --prune=now # big blobs still in history git fsck # no results git fsck --full --unreachable --dangling # no results to verify if the blobs are still there, see the output of git gc git verify-pack -v .git/objects/pack/pack-*.idx | egrep ^\w+ blob\W+[0-9]+ [0-9]+ [0-9]+$ | sort -k 3 -n -r bigobjects .txt head bigobjects.txt # outputs 9451427d7335395779b91864418630d2f0af780a blob 7895212 1869047 7657491 Also if bfg is being told to remove the biggest blob (bfg -B 1) with no-blob-protection, it does not succeed in removing it. --- output of bfg -B 1 Found 1 blob ids for large blobs - biggest=7895212 smallest=7895212 BFG aborting: No refs to update - no dirty commits found?? --- The repo can be found here. https://github.com/marscher/stallone_stale_objects I will restart all over to cleanup the history, but I guess this might be interesting for git developers. Best, Martin -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
old git documentation pages hosted at kernel.org
The Git documentation pages hosted at kernel.org are a bit over a year out of date (Last updated 2013-02-15 19:24:31 UTC) - so from around Git v1.8: https://www.kernel.org/pub/software/scm/git/docs/ Are they fiddly to update? Should they be updated in celebration of Git 2.0, or maybe instead redirect to http://git-scm.com/docs ? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix documentation AsciiDoc links for external urls
Turns out that putting 'link:' before the 'http' is actually superfluous in AsciiDoc, as there's already a predefined macro to handle it. http, https, [etc] URLs are rendered using predefined inline macros. http://www.methods.co.nz/asciidoc/userguide.html#_urls Hypertext links to files on the local file system are specified using the link inline macro. http://www.methods.co.nz/asciidoc/userguide.html#_linking_to_local_documents Despite being superfluous, the reference implementation of AsciiDoc tolerates the extra 'link:' and silently removes it, giving a functioning link in the generated HTML. However, AsciiDoctor (the Ruby implementation of AsciiDoc used to render the http://git-scm.com/ site) does /not/ have this behaviour, and so generates broken links, as can be seen here: http://git-scm.com/docs/git-cvsimport (links to cvs2git parsecvs) http://git-scm.com/docs/git-filter-branch (link to The BFG) It's worth noting that after this change, the html generated by 'make html' in the git project is identical, and all links still work. Signed-off-by: Roberto Tyley roberto.ty...@gmail.com --- Documentation/git-cvsimport.txt | 4 ++-- Documentation/git-filter-branch.txt | 4 ++-- Documentation/gitcore-tutorial.txt| 2 +- Documentation/gitcvs-migration.txt| 2 +- Documentation/gitweb.txt | 2 +- Documentation/technical/http-protocol.txt | 4 ++-- 6 files changed, 9 insertions(+), 9 deletions(-) diff --git a/Documentation/git-cvsimport.txt b/Documentation/git-cvsimport.txt index 2df9953..260f39f 100644 --- a/Documentation/git-cvsimport.txt +++ b/Documentation/git-cvsimport.txt @@ -21,8 +21,8 @@ DESCRIPTION *WARNING:* `git cvsimport` uses cvsps version 2, which is considered deprecated; it does not work with cvsps version 3 and later. If you are performing a one-shot import of a CVS repository consider using -link:http://cvs2svn.tigris.org/cvs2git.html[cvs2git] or -link:https://github.com/BartMassey/parsecvs[parsecvs]. +http://cvs2svn.tigris.org/cvs2git.html[cvs2git] or +https://github.com/BartMassey/parsecvs[parsecvs]. Imports a CVS repository into Git. It will either create a new repository, or incrementally import into an existing one. diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt index 2eba627..09535f2 100644 --- a/Documentation/git-filter-branch.txt +++ b/Documentation/git-filter-branch.txt @@ -436,7 +436,7 @@ git-filter-branch allows you to make complex shell-scripted rewrites of your Git history, but you probably don't need this flexibility if you're simply _removing unwanted data_ like large files or passwords. For those operations you may want to consider -link:http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner], +http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner], a JVM-based alternative to git-filter-branch, typically at least 10-50x faster for those use-cases, and with quite different characteristics: @@ -455,7 +455,7 @@ characteristics: _is_ possible to write filters that include their own parallellism, in the scripts executed against each commit. -* The link:http://rtyley.github.io/bfg-repo-cleaner/#examples[command options] +* The http://rtyley.github.io/bfg-repo-cleaner/#examples[command options] are much more restrictive than git-filter branch, and dedicated just to the tasks of removing unwanted data- e.g: `--strip-blobs-bigger-than 1M`. diff --git a/Documentation/gitcore-tutorial.txt b/Documentation/gitcore-tutorial.txt index 058a352..d2d7c21 100644 --- a/Documentation/gitcore-tutorial.txt +++ b/Documentation/gitcore-tutorial.txt @@ -1443,7 +1443,7 @@ Although Git is a truly distributed system, it is often convenient to organize your project with an informal hierarchy of developers. Linux kernel development is run this way. There is a nice illustration (page 17, Merges to Mainline) in -link:http://www.xenotime.net/linux/mentor/linux-mentoring-2006.pdf[Randy Dunlap's presentation]. +http://www.xenotime.net/linux/mentor/linux-mentoring-2006.pdf[Randy Dunlap's presentation]. It should be stressed that this hierarchy is purely *informal*. There is nothing fundamental in Git that enforces the chain of diff --git a/Documentation/gitcvs-migration.txt b/Documentation/gitcvs-migration.txt index 5ea94cb..5f4e890 100644 --- a/Documentation/gitcvs-migration.txt +++ b/Documentation/gitcvs-migration.txt @@ -117,7 +117,7 @@ Importing a CVS archive --- First, install version 2.1 or higher of cvsps from -link:http://www.cobite.com/cvsps/[http://www.cobite.com/cvsps/] and make +http://www.cobite.com/cvsps/[http://www.cobite.com/cvsps/] and make sure it is in your path. Then cd to a checked out CVS working directory of the project you are interested in and run linkgit:git-cvsimport[1]: diff --git a/Documentation/gitweb.txt b/Documentation/gitweb.txt index cca14b8..cd9c895 100644 --- a/Documentation/gitweb.txt +++ b
[PATCH] Fix documentation AsciiDoc links for external urls
Turns out that putting 'link:' before the 'http' is actually superfluous in AsciiDoc, as there's already a predefined macro to handle it. http, https, [etc] URLs are rendered using predefined inline macros. http://www.methods.co.nz/asciidoc/userguide.html#_urls Hypertext links to files on the local file system are specified using the link inline macro. http://www.methods.co.nz/asciidoc/userguide.html#_linking_to_local_documents Despite being superfluous, the reference implementation of AsciiDoc tolerates the extra 'link:' and silently removes it, giving a functioning link in the generated HTML. However, AsciiDoctor (the Ruby implementation of AsciiDoc used to render the http://git-scm.com/ site) does /not/ have this behaviour, and so generates broken links, as can be seen here: http://git-scm.com/docs/git-cvsimport (links to cvs2git parsecvs) http://git-scm.com/docs/git-filter-branch (link to The BFG) It's worth noting that after this change, the html generated by 'make html' in the git project is identical, and all links still work. --- Documentation/git-cvsimport.txt | 4 ++-- Documentation/git-filter-branch.txt | 4 ++-- Documentation/gitcore-tutorial.txt| 2 +- Documentation/gitcvs-migration.txt| 2 +- Documentation/gitweb.txt | 2 +- Documentation/technical/http-protocol.txt | 4 ++-- 6 files changed, 9 insertions(+), 9 deletions(-) diff --git a/Documentation/git-cvsimport.txt b/Documentation/git-cvsimport.txt index 2df9953..260f39f 100644 --- a/Documentation/git-cvsimport.txt +++ b/Documentation/git-cvsimport.txt @@ -21,8 +21,8 @@ DESCRIPTION *WARNING:* `git cvsimport` uses cvsps version 2, which is considered deprecated; it does not work with cvsps version 3 and later. If you are performing a one-shot import of a CVS repository consider using -link:http://cvs2svn.tigris.org/cvs2git.html[cvs2git] or -link:https://github.com/BartMassey/parsecvs[parsecvs]. +http://cvs2svn.tigris.org/cvs2git.html[cvs2git] or +https://github.com/BartMassey/parsecvs[parsecvs]. Imports a CVS repository into Git. It will either create a new repository, or incrementally import into an existing one. diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt index 2eba627..09535f2 100644 --- a/Documentation/git-filter-branch.txt +++ b/Documentation/git-filter-branch.txt @@ -436,7 +436,7 @@ git-filter-branch allows you to make complex shell-scripted rewrites of your Git history, but you probably don't need this flexibility if you're simply _removing unwanted data_ like large files or passwords. For those operations you may want to consider -link:http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner], +http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner], a JVM-based alternative to git-filter-branch, typically at least 10-50x faster for those use-cases, and with quite different characteristics: @@ -455,7 +455,7 @@ characteristics: _is_ possible to write filters that include their own parallellism, in the scripts executed against each commit. -* The link:http://rtyley.github.io/bfg-repo-cleaner/#examples[command options] +* The http://rtyley.github.io/bfg-repo-cleaner/#examples[command options] are much more restrictive than git-filter branch, and dedicated just to the tasks of removing unwanted data- e.g: `--strip-blobs-bigger-than 1M`. diff --git a/Documentation/gitcore-tutorial.txt b/Documentation/gitcore-tutorial.txt index 058a352..d2d7c21 100644 --- a/Documentation/gitcore-tutorial.txt +++ b/Documentation/gitcore-tutorial.txt @@ -1443,7 +1443,7 @@ Although Git is a truly distributed system, it is often convenient to organize your project with an informal hierarchy of developers. Linux kernel development is run this way. There is a nice illustration (page 17, Merges to Mainline) in -link:http://www.xenotime.net/linux/mentor/linux-mentoring-2006.pdf[Randy Dunlap's presentation]. +http://www.xenotime.net/linux/mentor/linux-mentoring-2006.pdf[Randy Dunlap's presentation]. It should be stressed that this hierarchy is purely *informal*. There is nothing fundamental in Git that enforces the chain of diff --git a/Documentation/gitcvs-migration.txt b/Documentation/gitcvs-migration.txt index 5ea94cb..5f4e890 100644 --- a/Documentation/gitcvs-migration.txt +++ b/Documentation/gitcvs-migration.txt @@ -117,7 +117,7 @@ Importing a CVS archive --- First, install version 2.1 or higher of cvsps from -link:http://www.cobite.com/cvsps/[http://www.cobite.com/cvsps/] and make +http://www.cobite.com/cvsps/[http://www.cobite.com/cvsps/] and make sure it is in your path. Then cd to a checked out CVS working directory of the project you are interested in and run linkgit:git-cvsimport[1]: diff --git a/Documentation/gitweb.txt b/Documentation/gitweb.txt index cca14b8..cd9c895 100644 --- a/Documentation/gitweb.txt +++ b/Documentation/gitweb.txt @@ -84,7 +84,7 @@ separator
Re: [BUG?] inconsistent `git reflog show` output, possibly `git fsck` output
On 21/09/2013 23:16, Keshav Kini wrote: [SNIP] This situation came about because the BFG Repo-Cleaner doesn't write new reflog entries after creating its new objects and moving refs around. True enough - I don't think the BFG does write new entires to the reflog when it does the final ref-update, and it would be nicer if it did. I'll get that fixed. thanks, Roberto -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: commit-message attack for extracting sensitive data from rewritten Git history
On 9 April 2013 18:01, Jeff King p...@peff.net wrote: On Tue, Apr 09, 2013 at 08:03:24AM +0200, Johannes Sixt wrote: If A mentions B (think of cherry-pick -x), then you must ensure that the branch containing B was traversed first. Yeah, you're right. Multiple passes are necessary to get it completely right. And because each pass may change more commit id's, you have to recurse to pick up those changes, and keep going until you have a pass with no changes. Just to give some context on how the BFG handles this (without doing multiple passes): The BFG makes a design choice (based on it's intended use-case of annihilating unwanted data) that a specific tree or blob will always be cleaned in exactly the same way - because when you're trying to get rid of large blobs or private data, you most likely /don't care/ where it is, what commit it belongs to, how old it is. The id for a cleaned tree or blob is always the same no matter where it came from, and so the BFG maintains a in-memory mapping of 'dirty' to 'clean' object ids while cleaning a repo - whenever an object (commit, tag, tree, blob) is cleaned, these values are stored in the map: dirty-id - clean-id clean-id - clean-id (in terms of memory overhead, this amounts to only ~ 128MB for even quite a large repo like the linux kernel, so I don't spend much time worrying about it) The map memoises the cleaning functions on all objects, so an object (particularly a tree) never gets cleaned more than once, which is one of the things that makes the BFG fast. Having these memoised functions makes cleaning commit messages fairly easy - the message is grepped for hex strings more than a few characters in length, and if a matched string resolves uniquely to an object id in the repo, the clean() method is called on it to get the cleaned id - which will either return immediately with a previously calculated result, or if the id came from a different branch, trigger a cascade of more cleaning, eventually returning the required cleaned id. In the case of git-filter-branch, the user has a lot more freedom to change the tree-structure of commits on a commit-by-commit basis, so memoising tree-cleaning is out of the question, but I guess it might be possible to do memoisation of just the commit ids to short-cut the multiple-pass problem. - Roberto Tyley -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
commit-message attack for extracting sensitive data from rewritten Git history
This is a demonstration of a mildly-interesting security concern relating to Git git-filter-branch - not a vulnerability in Git itself, just in the way it can be used. I thought it was interesting to demonstrate that there is sometimes an avenue of attack for recovering sensitive data that's been removed from Git history using git-filter-branch. I think it's a low-severity issue, you may wish to ignore this, and indeed I've been very politely told already that it's clearly nonsense :) Here's an unmodified repo, in which the user unwisely committed a database password: https://github.com/bfg-repo-cleaner-demos/gma-demo-repo-original/commit/8c9cfe3c The unwise commit is reverted with a second commit using 'git revert', which obviously leaves the password in Git history, and - some time later - it's decided to properly clean the repo history with git-filter-branch git gc, purging the password so the repo can be more widely shared (open-sourced, or just externally hosted). git-filter-branch works exactly as intended, purging the password, but the one thing it does not- typically - do is update the commit message. So in the cleaned repo, the commit message for the revert commit still looks like this: https://github.com/bfg-repo-cleaner-demos/gma-demo-repo-git-filter-branch-cleaned/commit/bf0637a5 It contains a commit id (8c9cfe3) which is no longer in the repo, but can very easily be associated with an existing commit simply by examining the subject line of the reverted commit (Carelessly checking password into source control). It's also obvious, from examining the repo, where the excised data was removed (ie at the db.password= line). At this point it's possible to do a brute-force attack where you generate possible passwords, insert them into the available commit's tree, and compare them against the leaked commit id. When the the commit id matches, the sensitive data has been recovered. A proof-of-concept implementation of this attack was indeed able to recover the purged password: -- $ java -jar gma-0.1.jar 8c9cfe3c attack-pinpoint gma-demo-repo-git-filter-branch-cleaned Brute-force search using these characters : 0123456789abcdefghijklmnopqrstuvwxyz Available commit, presumed cleaned : 8ebbf661 File path : src/main/resources/config.properties Template blob : dca1a2fb Exhausted strings of length 1 or less ... Exhausted strings of length 4 or less Match with '0g6rw' -- So all of this amounts to a fairly low severity issue - people should always change credentials when they mistakenly commit them to a repo - but I guess the point is that from a paranoia point of view, you want to remove all information - including old commit hashes buried in commit messages - that relate to sensitive data when you clean a repo for sharing. The git-filter-branch command has a --msg-filter option which could be used for this purpose, with the application of some judicious bash-scripting, grepsed-ing. However, I must confess that I believe users would be better advised to use The BFG: http://rtyley.github.io/bfg-repo-cleaner/ The BFG already addresses this issue by replacing all old Git object-ids found in commit/tag messages with the updated id. For instance, here's that exact same commit message when cleaned with the BFG: https://github.com/bfg-repo-cleaner-demos/gma-demo-repo-bfg-cleaned/commit/35840201 In the case that the users specifies a filtering operation is not removing 'private' data, the BFG replaces old ids with text of the form 'newid [formerly oldid], but if the operation is in fact to strip private data, the replacement value is simply the newid - and without the old commit id, the attack described above is not possible. I believe it's worth educating users to give them a more realistic understanding of their exposure, and would like to update the documentation of git-filter-branch to give them a better idea of their options for removing private data - that would include noting the BFG as alternative. - Roberto Tyley https://github.com/rtyley/bfg-repo-cleaner/blob/v1.2.0/src/main/scala/com/madgag/git/bfg/cleaner/ObjectIdSubstitutor.scala#L33-L60 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
A fast alternative to git-filter-branch - The BFG Repo-Cleaner
I recently released The BFG Repo-Cleaner, a new tool for cleansing bad data out of Git repository histories. The BFG is typically at least 10-50x faster than git-filter-branch at these tasks: * Removing Crazy Big Files from repo history * Removing Passwords, Credentials other Private data http://rtyley.github.com/bfg-repo-cleaner/ As an example, these are timings for deleting an arbitrary file from the large GCC repository (148495 commits): The BFG : 3m29s $ bfg -D README-fixinc git filter-branch : 472m31s $ git filter-branch --index-filter 'git rm --cached --ignore-unmatch gcc/README-fixinc' --prune-empty --tag-name-filter cat -- --all (roughly a 135x speed increase, reducing the task of processing a large codebase from an overnight job to the work of a few minutes all timings done in a 4GB tmpfs ramdisk) The BFG has some simple but very powerful command-line options, which perform at similar speed: remove all blobs bigger than 1 megabyte : $ bfg --strip-blobs-bigger-than 1M my-repo.git replace all passwords (listed in a file 'passwords.txt') with ***REMOVED*** : $ bfg --replace-banned-strings passwords.txt my-repo.git The main source of the BFG's performance advantage comes from preventing repeated examination of the same tree objects. The approach of git-filter-branch performs filtering for each commit, against the complete file-hierarchy of each commit, one after the other, even though commit trees are largely very similar. For the use-cases of The BFG that's unnecessary- we don't care where, and in which commit, a 'bad' file exists - we just want it dealt with. Consequently the BFG processes the Git object db on a memoised tree-by-tree basis, processing each and every file folder exactly once - the final processing of the commit hierarchy is very quick. This _does_ mean that it's not possible to delete files based on their absolute path within the repo, but they can deleted based on their filename, blob-id, or contents. This, and multi-core processing by default, gives the dramatic speed-up while still providing the same results. There's more performance data here: https://docs.google.com/spreadsheet/ccc?key=0AsR1d5Zpes8HdER3VGU1a3dOcmVHMmtzT2dsS2xNenc I'd welcome feedback, and if anyone has cause to filter a repository's history in future, I'd appreciate you giving the BFG a try and letting me know how you found it. thanks, Roberto Tyley software dev @ The Guardian http://rtyley.github.com/bfg-repo-cleaner/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html