Re: filter-branch IO optimization

2012-10-12 Thread Enrico Weigelt
Hi,

 The usual advice is use an index-filter instead.  It's *much*
 faster
 than a tree filter.  However:

I've tried the last example from git-filter-branch manpage, but failed.
Seems like the GIT_INDEX_FILE env variable doesnt get honoured by
git-update-index, no index.new file created, and so mv call fails.

My second try (as index-filter command) was:

git ls-files -s  ../_INDEX_TMP
cat ../_INDEX_TMP |
sed s-\t\*-addons/- |
git update-index --index-info
rm -f ../_INDEX_TMP

It works fine in the worktree (i see files renamed in the index),
but no success when running it as --index-filter. Seems the index
file isn't used at all (or some completely different one).

By the way, inside the index filter, GIT_INDEX_FILTER here is

/home/devel/vnc/openerp/workspace/pkg/openerp-extra-bundle.git/.git-rewrite/t/../index

Obviously a different (temporary) index file, while many examples
on the web, suggesting to use commands like 'git add --cached' or
'git rm --cached' _without_ passing GIT_INDEX_FILTER variable.

Could there be some bug that this variable isn't honored properly
everywhere ?

--
Mit freundlichen Grüßen / Kind regards

Enrico Weigelt
VNC - Virtual Network Consult GmbH
Head Of Development

Pariser Platz 4a, D-10117 Berlin
Tel.: +49 (30) 3464615-20
Fax: +49 (30) 3464615-59

enrico.weig...@vnc.biz; www.vnc.de
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: filter-branch IO optimization

2012-10-12 Thread Enrico Weigelt
snip

Did some more experiments, and it seems that missing index file
isn't automatically created.

When I instead copy the original index file to the temporary
location, it runs well. But I still have to wait for the final
result to check whether it really overwrites the whole index
or just adds new files.


cu
-- 
Mit freundlichen Grüßen / Kind regards 

Enrico Weigelt 
VNC - Virtual Network Consult GmbH 
Head Of Development 

Pariser Platz 4a, D-10117 Berlin
Tel.: +49 (30) 3464615-20
Fax: +49 (30) 3464615-59

enrico.weig...@vnc.biz; www.vnc.de 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: filter-branch IO optimization

2012-10-12 Thread Enrico Weigelt
Hi folks,

now finally managed the index-filter part.
The main problem, IIRC, was that git-update-index didn't
automatically create an empty index, so I needed to explicitly
copy in (manually created it with an empty repo).

My current filter code is:

if [ ! $GIT_AUTHOR_EMAIL ]  [ ! $GIT_COMMITTER_EMAIL ]; then
export GIT_AUTHOR_EMAIL=nob...@none.org
export GIT_COMMITTER_NAME=nob...@none.org
elif [ ! $GIT_AUTHOR_EMAIL ]; then
export GIT_AUTHOR_EMAIL=$GIT_COMMITTER_EMAIL
elif [ ! $GIT_COMITTER_EMAIL ]; then
export GIT_COMMITTER_EMAIL=$GIT_AUTHOR_NAME
fi

if [ ! $GIT_AUTHOR_NAME ]  [ ! $GIT_COMMITTER_NAME ]; then
export GIT_AUTHOR_NAME=nob...@none.org
export GIT_COMMITTER_NAME=nob...@none.org
elif [ ! $GIT_AUTHOR_NAME ]; then
export GIT_AUTHOR_NAME=$GIT_COMMITTER_NAME
elif [ ! $GIT_COMITTER_NAME ]; then
export GIT_COMMITTER_NAME=$GIT_AUTHOR_NAME
fi

cp ../../../../scripts/index.empty $GIT_INDEX_FILE.new

git ls-files -s |
sed s-\t\*-addons/- |
grep -e \t*addons/$module |
( export GIT_INDEX_FILE=$GIT_INDEX_FILE.new ; git update-index --index-info 
)

mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE


Now another problem: this leaves behind thousands of now empty
merge nodes (--prune-empty doesnt seem to catch them all),
so I loop through additional `git filter-branch --prune-empty`
runs, until the ref remains unchanged.

This process is even more time-consuming, as it takes really many
passes (havent counted them yet).

Does anyone have an idea, why a single run doesnt catch that all?


cu
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: filter-branch IO optimization

2012-10-12 Thread Jeff King
On Fri, Oct 12, 2012 at 04:49:54PM +0200, Enrico Weigelt wrote:

  The usual advice is use an index-filter instead.  It's *much*
  faster
  than a tree filter.  However:
 
 I've tried the last example from git-filter-branch manpage, but failed.
 Seems like the GIT_INDEX_FILE env variable doesnt get honoured by
 git-update-index, no index.new file created, and so mv call fails.
 
 My second try (as index-filter command) was:
 
 git ls-files -s  ../_INDEX_TMP
 cat ../_INDEX_TMP |
 sed s-\t\*-addons/- |
 git update-index --index-info
 rm -f ../_INDEX_TMP

I didn't look closely at your individual problem, but that example has
proven flaky before.  There were some simpler formulations given in this
thread:

  http://thread.gmane.org/gmane.comp.version-control.git/195492

In particular, Junio suggested:

  git filter-branch --index-filter '
rm -f $GIT_INDEX_FILE
git read-tree --prefix=newsubdir/ $GIT_COMMIT
  ' HEAD

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: filter-branch IO optimization

2012-10-11 Thread Johannes Sixt
Am 11.10.2012 17:39, schrieb Enrico Weigelt:
 The main goal of this filtering is splitting out many modules from a
 large upstream repo into their own downstream repos.
...
 The next step I have in mind is using --subdirectory-filter, but open
 questsions are:
 
 * does it suffer from the same problems w/ empty username/email like 
 --tree-filter ?

I think so.

 ** if yes: what can I do about it (have an additional pass for fixing that 
 before
running the --tree-filter ?

Use --env-filter.

 * can I somehow teach the --subdirectory filter to place the result under some
   somedir instead of directly to root ?

No, but see the last example in the man page.

 * can I use --tree-filter in combination with --subdireectory-filter ? 
   which one is executed first ?

Yes. --subdirectory-filter applies first.

-- Hannes

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: filter-branch IO optimization

2012-10-11 Thread Thomas Rast
Enrico Weigelt enrico.weig...@vnc.biz writes:

 for certain projects, I need to regularily run filter-branch on quite
 large repos (10k commits), and that needs to be run multiple times,
 which takes several hours, so I'm looking for optimizations.
[...]
 #2: run a tree-filter which:
 * removes all files not belonging to the wanted module
 * move the module directory under another subdir (./addons/)
 * fix author/comitter name/email if empty (because otherwise fails)

The usual advice is use an index-filter instead.  It's *much* faster
than a tree filter.  However:

 * fix charater sets and indentions of source files

That last step is rather crazy.  At the very least you will want to only
operate on files that were changed since the parent commit, so as to
avoid scanning the whole tree.  If you do this right, it should also fit
into an index-filter.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html