Re: Get all tips quickly

2014-04-14 Thread Kirill Likhodedov

Hi Michael, Ævar,

Thank you very much for your answers.

Each of 'git show-ref’ and ‘git for-each-ref’ is 2 times faster than ‘git log 
--branches --tags --remotes’ on “warmed up FS caches, but take the same time 
on “cold” FS.

It seems that all these approaches internally walk down from all references. 

So, given that there is no way to get just tips (I was actually hoping that Git 
might be storing them somewhere else), we will stick to ‘git show ref -d’.

Thanks a lot,
Kirill.


On 13 Apr 2014, at 22:29 , Ævar Arnfjörð Bjarmason ava...@gmail.com wrote:

 Tried git for-each-ref and the various options it has?
 
 Doing this for 35k tags is still going to be non-trivial.

On 13 Apr 2014, at 23:20 , Michael Haggerty mhag...@alum.mit.edu wrote:

 On 04/13/2014 04:19 PM, Kirill Likhodedov wrote:
 What is fastest possible way to get all “tips” (leafs of the Git log
 graph) in a Git repository with hashes of commits they point to?
 
 We at JetBrains are tuning performance of Git log integration in our
 IntelliJ IDEA and want to get all tips as fast as possible. Currently
 we use 'git log —branches --tags --remotes --no-walk’, but the
 problem is that it is slow if there are a lot of references. In our
 repository we have about 35K tags, and therefore the tags is the main
 slowdown. On the other hand, we have just a couple of dozens of tips
 as well as branch references, and `git log --branches --remotes` is
 very fast.
 
 So we are searching a way to get tags pointing to the graph leafs
 faster.
 
 The fastest ways to get all references plus the commits that are pointed
 at by annotated references would probably be `git show-ref -d`.  The
 funny-looking entries like refs/tags/v1.7.0^{} are the annotated tags
 peeled to the object that they ultimately refer.  But this command
 doesn't tell the types of the objects, and there can be trees and blobs
 mixed in.
 
 If your question is also to figure out the minimum set of references
 that are needed to include all tips (i.e., commits with no descendants),
 then the answer is trickier.  There is a command that should do what you
 say:
 
git merge-base --independent commit...
 
 but (1) with a lot of references, your arguments wouldn't all fit on the
 command line (recursive use of xargs might be needed), (2) I don't know
 if merge-base --independent is programmed to work efficiently on so
 many inputs, and (3) I don't know of a cheap way of getting a list of
 all commits referred to by references (i.e., dereferencing annotated
 tags but ignoring references/annotated tags that refer to trees or blobs).
 
 
 Another approach is to start by finding the leaf commits by SHA-1.  You
 can do this by listing all commits, and listing all commits' parents,
 and then finding the objects that appear in the first list but not the
 second.  This could look like
 
comm -23 \
(git log --all --pretty=format:'%H' | sort -u) \
(git log --all --pretty=format:'%P' | tr ' ' '\n' | sort -u)
 
 If you want reference names corresponding to these SHA-1s, you could use
 name-rev to convert the SHA-1s into refnames:
 
git rev-parse --symbolic-full-name $(
comm -23 \
(git log --all --pretty=format:'%H' | sort -u) \
(git log --all --pretty=format:'%P' | tr ' ' '\n' | sort -u) |
git name-rev --stdin --name-only
)
 
 The rev-parse --symbolic-full-name is needed because name-ref seems
 only able to emit abbreviated reference names.
 
 
 In practice, you might want to cache some of the results to avoid having
 to do a full tree traversal every time.
 
 We also tried to read tags by manually parsing .git files (it is
 faster than invoking git log), but unfortunately annotated tags in
 .git/refs/tags/ are written without the hashes they point to (unlike
 .git/packed-refs).
 
 I strongly recommend against parsing these files yourselves.  Your
 software would not be robust against any future changes to the file
 formats etc.
 
 Michael
 
 -- 
 Michael Haggerty
 mhag...@alum.mit.edu
 http://softwareswirl.blogspot.com/

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Get all tips quickly

2014-04-13 Thread Ævar Arnfjörð Bjarmason
On Sun, Apr 13, 2014 at 4:19 PM, Kirill Likhodedov
kirill.likhode...@jetbrains.com wrote:
 Hi,

 What is fastest possible way to get all “tips” (leafs of the Git log graph) 
 in a Git repository with hashes of commits they point to?

Tried git for-each-ref and the various options it has?

Doing this for 35k tags is still going to be non-trivial.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Get all tips quickly

2014-04-13 Thread Michael Haggerty
On 04/13/2014 04:19 PM, Kirill Likhodedov wrote:
 What is fastest possible way to get all “tips” (leafs of the Git log
 graph) in a Git repository with hashes of commits they point to?
 
 We at JetBrains are tuning performance of Git log integration in our
 IntelliJ IDEA and want to get all tips as fast as possible. Currently
 we use 'git log —branches --tags --remotes --no-walk’, but the
 problem is that it is slow if there are a lot of references. In our
 repository we have about 35K tags, and therefore the tags is the main
 slowdown. On the other hand, we have just a couple of dozens of tips
 as well as branch references, and `git log --branches --remotes` is
 very fast.
 
 So we are searching a way to get tags pointing to the graph leafs
 faster.

The fastest ways to get all references plus the commits that are pointed
at by annotated references would probably be `git show-ref -d`.  The
funny-looking entries like refs/tags/v1.7.0^{} are the annotated tags
peeled to the object that they ultimately refer.  But this command
doesn't tell the types of the objects, and there can be trees and blobs
mixed in.

If your question is also to figure out the minimum set of references
that are needed to include all tips (i.e., commits with no descendants),
then the answer is trickier.  There is a command that should do what you
say:

git merge-base --independent commit...

but (1) with a lot of references, your arguments wouldn't all fit on the
command line (recursive use of xargs might be needed), (2) I don't know
if merge-base --independent is programmed to work efficiently on so
many inputs, and (3) I don't know of a cheap way of getting a list of
all commits referred to by references (i.e., dereferencing annotated
tags but ignoring references/annotated tags that refer to trees or blobs).


Another approach is to start by finding the leaf commits by SHA-1.  You
can do this by listing all commits, and listing all commits' parents,
and then finding the objects that appear in the first list but not the
second.  This could look like

comm -23 \
(git log --all --pretty=format:'%H' | sort -u) \
(git log --all --pretty=format:'%P' | tr ' ' '\n' | sort -u)

If you want reference names corresponding to these SHA-1s, you could use
name-rev to convert the SHA-1s into refnames:

git rev-parse --symbolic-full-name $(
comm -23 \
(git log --all --pretty=format:'%H' | sort -u) \
(git log --all --pretty=format:'%P' | tr ' ' '\n' | sort -u) |
git name-rev --stdin --name-only
)

The rev-parse --symbolic-full-name is needed because name-ref seems
only able to emit abbreviated reference names.


In practice, you might want to cache some of the results to avoid having
to do a full tree traversal every time.

 We also tried to read tags by manually parsing .git files (it is
 faster than invoking git log), but unfortunately annotated tags in
 .git/refs/tags/ are written without the hashes they point to (unlike
 .git/packed-refs).

I strongly recommend against parsing these files yourselves.  Your
software would not be robust against any future changes to the file
formats etc.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html