Re: filter-branch IO optimization
Hi, The usual advice is use an index-filter instead. It's *much* faster than a tree filter. However: I've tried the last example from git-filter-branch manpage, but failed. Seems like the GIT_INDEX_FILE env variable doesnt get honoured by git-update-index, no index.new file created, and so mv call fails. My second try (as index-filter command) was: git ls-files -s ../_INDEX_TMP cat ../_INDEX_TMP | sed s-\t\*-addons/- | git update-index --index-info rm -f ../_INDEX_TMP It works fine in the worktree (i see files renamed in the index), but no success when running it as --index-filter. Seems the index file isn't used at all (or some completely different one). By the way, inside the index filter, GIT_INDEX_FILTER here is /home/devel/vnc/openerp/workspace/pkg/openerp-extra-bundle.git/.git-rewrite/t/../index Obviously a different (temporary) index file, while many examples on the web, suggesting to use commands like 'git add --cached' or 'git rm --cached' _without_ passing GIT_INDEX_FILTER variable. Could there be some bug that this variable isn't honored properly everywhere ? -- Mit freundlichen Grüßen / Kind regards Enrico Weigelt VNC - Virtual Network Consult GmbH Head Of Development Pariser Platz 4a, D-10117 Berlin Tel.: +49 (30) 3464615-20 Fax: +49 (30) 3464615-59 enrico.weig...@vnc.biz; www.vnc.de -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filter-branch IO optimization
snip Did some more experiments, and it seems that missing index file isn't automatically created. When I instead copy the original index file to the temporary location, it runs well. But I still have to wait for the final result to check whether it really overwrites the whole index or just adds new files. cu -- Mit freundlichen Grüßen / Kind regards Enrico Weigelt VNC - Virtual Network Consult GmbH Head Of Development Pariser Platz 4a, D-10117 Berlin Tel.: +49 (30) 3464615-20 Fax: +49 (30) 3464615-59 enrico.weig...@vnc.biz; www.vnc.de -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filter-branch IO optimization
Hi folks, now finally managed the index-filter part. The main problem, IIRC, was that git-update-index didn't automatically create an empty index, so I needed to explicitly copy in (manually created it with an empty repo). My current filter code is: if [ ! $GIT_AUTHOR_EMAIL ] [ ! $GIT_COMMITTER_EMAIL ]; then export GIT_AUTHOR_EMAIL=nob...@none.org export GIT_COMMITTER_NAME=nob...@none.org elif [ ! $GIT_AUTHOR_EMAIL ]; then export GIT_AUTHOR_EMAIL=$GIT_COMMITTER_EMAIL elif [ ! $GIT_COMITTER_EMAIL ]; then export GIT_COMMITTER_EMAIL=$GIT_AUTHOR_NAME fi if [ ! $GIT_AUTHOR_NAME ] [ ! $GIT_COMMITTER_NAME ]; then export GIT_AUTHOR_NAME=nob...@none.org export GIT_COMMITTER_NAME=nob...@none.org elif [ ! $GIT_AUTHOR_NAME ]; then export GIT_AUTHOR_NAME=$GIT_COMMITTER_NAME elif [ ! $GIT_COMITTER_NAME ]; then export GIT_COMMITTER_NAME=$GIT_AUTHOR_NAME fi cp ../../../../scripts/index.empty $GIT_INDEX_FILE.new git ls-files -s | sed s-\t\*-addons/- | grep -e \t*addons/$module | ( export GIT_INDEX_FILE=$GIT_INDEX_FILE.new ; git update-index --index-info ) mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE Now another problem: this leaves behind thousands of now empty merge nodes (--prune-empty doesnt seem to catch them all), so I loop through additional `git filter-branch --prune-empty` runs, until the ref remains unchanged. This process is even more time-consuming, as it takes really many passes (havent counted them yet). Does anyone have an idea, why a single run doesnt catch that all? cu -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filter-branch IO optimization
On Fri, Oct 12, 2012 at 04:49:54PM +0200, Enrico Weigelt wrote: The usual advice is use an index-filter instead. It's *much* faster than a tree filter. However: I've tried the last example from git-filter-branch manpage, but failed. Seems like the GIT_INDEX_FILE env variable doesnt get honoured by git-update-index, no index.new file created, and so mv call fails. My second try (as index-filter command) was: git ls-files -s ../_INDEX_TMP cat ../_INDEX_TMP | sed s-\t\*-addons/- | git update-index --index-info rm -f ../_INDEX_TMP I didn't look closely at your individual problem, but that example has proven flaky before. There were some simpler formulations given in this thread: http://thread.gmane.org/gmane.comp.version-control.git/195492 In particular, Junio suggested: git filter-branch --index-filter ' rm -f $GIT_INDEX_FILE git read-tree --prefix=newsubdir/ $GIT_COMMIT ' HEAD -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filter-branch IO optimization
Am 11.10.2012 17:39, schrieb Enrico Weigelt: The main goal of this filtering is splitting out many modules from a large upstream repo into their own downstream repos. ... The next step I have in mind is using --subdirectory-filter, but open questsions are: * does it suffer from the same problems w/ empty username/email like --tree-filter ? I think so. ** if yes: what can I do about it (have an additional pass for fixing that before running the --tree-filter ? Use --env-filter. * can I somehow teach the --subdirectory filter to place the result under some somedir instead of directly to root ? No, but see the last example in the man page. * can I use --tree-filter in combination with --subdireectory-filter ? which one is executed first ? Yes. --subdirectory-filter applies first. -- Hannes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filter-branch IO optimization
Enrico Weigelt enrico.weig...@vnc.biz writes: for certain projects, I need to regularily run filter-branch on quite large repos (10k commits), and that needs to be run multiple times, which takes several hours, so I'm looking for optimizations. [...] #2: run a tree-filter which: * removes all files not belonging to the wanted module * move the module directory under another subdir (./addons/) * fix author/comitter name/email if empty (because otherwise fails) The usual advice is use an index-filter instead. It's *much* faster than a tree filter. However: * fix charater sets and indentions of source files That last step is rather crazy. At the very least you will want to only operate on files that were changed since the parent commit, so as to avoid scanning the whole tree. If you do this right, it should also fit into an index-filter. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html