Re: [git-users] git rev-list --objects doesn't show moves

2014-10-02 Thread Dale R. Worley
> From: Roman Neuhauser 

> the inefficiency i'd like to avoid is in the diff-tree initialization.
> while the strace output is nice and short, most of it is loading shared
> libraries and reading the various .git* files; i hoped there would be
> a way to spend that energy once per N commits described.

That might be less important than it seems:

I once wrote a revision of "tar" that could write compressed tar files
to a tape drive.  (The tape drive did not have built-in compression.)
The standard "tar" can do that, but it compresses the entire tar file
as a unit, so if any part of the file is corrupted, you can't read the
remainder of the file.  This is not good for a backup tape.

So I modified tar to compress each file individually before putting it
into the tar file.  As a first implementation, I had tar simply create
a subprocess which ran gzip -- once for each file that was written.
So gzip was run tens to hundreds of thousands of times when writing a
backup tape.

The effect on performance was zero.  (And this was on a processor that
ran at tens of MHz.)  The reason seemed to be that once all the files
that are needed to get a gzip process running are in the disk cache,
the process starts *very* quickly.  Yes, it takes a zillion CPU
cycles, but that's not the slow part of the computer.

So don't fret about the efficiency of part of your system before you
know that it is actually performance-limiting.

Dale

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] git rev-list --objects doesn't show moves

2014-10-01 Thread Roman Neuhauser
# wor...@alum.mit.edu / 2014-10-01 11:37:38 -0400:
> > From: Roman Neuhauser 
> 
> > yup, i'd like a plumbing equivalent of `git log --raw ...`.  AFAICT
> > the closest to that is git-diff-tree, except that implies N invocations
> > instead of one, a sad loss of efficiency i'd love to avoid.

> [...] there is no stored summary of "what is changed by this commit",
> the only way to determine that information is to compare each file
> reference of each commit with the cognate file reference in its
> predecessor commit.

> There's no way to do that whose run time is not proportional to both
> the number of commits and the number of files.

the inefficiency i'd like to avoid is in the diff-tree initialization.
while the strace output is nice and short, most of it is loading shared
libraries and reading the various .git* files; i hoped there would be
a way to spend that energy once per N commits described.

it's a "storm in a tea cup" in this particular use case, but i've been
thinking about the distinction between plumbing and porcelain, and how
well the plumbing fulfills its promise of enabling other porcelains.
it looks like any theoretical git-log competition written on top of the
plumbing is quite badly undercut by git-diff-tree being a single-pair
operation.

-- 
roman

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] git rev-list --objects doesn't show moves

2014-10-01 Thread Dale R. Worley
> From: Roman Neuhauser 

> yup, i'd like a plumbing equivalent of `git log --raw ...`.  AFAICT
> the closest to that is git-diff-tree, except that implies N invocations
> instead of one, a sad loss of efficiency i'd love to avoid.

You may be beyond my knowledge here, but if you want to list the
changes that were made to the file-tree by each of a series of
commits, you will pretty much have to do one invocation of diff-tree
for each commit, or something else that is functionally equivalent.
The reason is that there is no stored summary of "what is changed by
this commit", the only way to determine that information is to compare
each file reference of each commit with the cognate file reference in
its predecessor commit.  There's no way to do that whose run time is
not proportional to both the number of commits and the number of
files.

Dale

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] git rev-list --objects doesn't show moves

2014-10-01 Thread Roman Neuhauser
# wor...@alum.mit.edu / 2014-09-29 11:45:00 -0400:
> > From: Roman Neuhauser 
> > 
> > i'm writing an alternative to git-requet-pull.  its output includes
> > a log of the commit range, eg:
> > 
> >   1/3 76a23b86 043603cc README fancier
> >   162441d0 README
> >   2/3 87990615 ab984c9b ignore vim swapfiles
> >   32682119 .gitignore
> >   3/3 2c842d2d 2ab371a4 README is now README.txt
> > 
> > each commit is represented by a line giving its position in the range,
> > the treeid, the commitid and the subject line, followed by a series of
> > lines identifying affected files, each line with the objectid and path.
> > 
> > i'm gathering the data with `git-rev-list --objects`, but it doesn't
> > mention objects that were moved (git mv) in a given commit; this is
> > visible in the last (3/3) commit in the example above: that commit was
> > just `git mv README README.txt`.
> > 
> > i want the output to identify moves and copies.  what are my options?
> > am i missing an option in git-rev-list(1)?  should i use a different
> > piece of plumbing?
> 
> The fundamental problem is that Git's data structures don't list moves
> and copies.  For that matter, they don't list adds and deletes,
> either.  As stored, each commit just tells the contents of the
> directory tree.  What you appear to want is something that compares
> one or more commits and tells what the differences between them are.

yup, i'd like a plumbing equivalent of `git log --raw ...`.  AFAICT
the closest to that is git-diff-tree, except that implies N invocations
instead of one, a sad loss of efficiency i'd love to avoid.
 
> OTOH, is that what you *really* want?  You say that you're "writing an
> alternative to git-request-pull".  What is the definition of this
> output?  What purposes do you expect the output to be put to?
> 
> For instance, when you're pulling commit 3/3 from the remote, you
> don't *need* to download the blob that is the current contents of
> README.txt (and the former contents of README) because you already
> have it in your repository.  So "git-rev-list --objects" doesn't list
> it.

this is for human consumption in an email-based code review process.
think git-request-pull for the overall picture plus git-format-patch
for individual commits.  from the readme:

  Pull requests are often sent repeatedly: Alice clones Bob's
  repository, commits some changes and sends him a pull request.
  Bob reviews the proposed changes and requests a few modifications.

  Alice tweaks her branch as requested and sends another pull request.
  Bob is a busy person and wants a very quick overview of the
  differences between the old and new pull request. Alice would do well
  to tell Bob which parts of the patch series changed in the second
  iteration.

-- 
roman

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] git rev-list --objects doesn't show moves

2014-09-29 Thread Dale R. Worley
> From: Roman Neuhauser 
> 
> i'm writing an alternative to git-requet-pull.  its output includes
> a log of the commit range, eg:
> 
>   1/3 76a23b86 043603cc README fancier
>   162441d0 README
>   2/3 87990615 ab984c9b ignore vim swapfiles
>   32682119 .gitignore
>   3/3 2c842d2d 2ab371a4 README is now README.txt
> 
> each commit is represented by a line giving its position in the range,
> the treeid, the commitid and the subject line, followed by a series of
> lines identifying affected files, each line with the objectid and path.
> 
> i'm gathering the data with `git-rev-list --objects`, but it doesn't
> mention objects that were moved (git mv) in a given commit; this is
> visible in the last (3/3) commit in the example above: that commit was
> just `git mv README README.txt`.
> 
> i want the output to identify moves and copies.  what are my options?
> am i missing an option in git-rev-list(1)?  should i use a different
> piece of plumbing?

The fundamental problem is that Git's data structures don't list moves
and copies.  For that matter, they don't list adds and deletes,
either.  As stored, each commit just tells the contents of the
directory tree.  What you appear to want is something that compares
one or more commits and tells what the differences between them are.

OTOH, is that what you *really* want?  You say that you're "writing an
alternative to git-request-pull".  What is the definition of this
output?  What purposes do you expect the output to be put to?

For instance, when you're pulling commit 3/3 from the remote, you
don't *need* to download the blob that is the current contents of
README.txt (and the former contents of README) because you already
have it in your repository.  So "git-rev-list --objects" doesn't list
it.

Dale

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.