Re: is there a fast web-interface to git for huge repos?

2013-06-14 Thread Holger Hellmuth (IKS)

Am 07.06.2013 22:21, schrieb Constantine A. Murenin:

I'm totally fine with daily updates; but I think there still has to be
some better way of doing this than wasting 0.5s of CPU time and 5s of
HDD time (if completely cold) for each blame / log, at the price of
more storage and some pre-caching, and (daily (in my use-case))
fine-grained incremental updates.


To get a feel for the numbers: I would guess 'git blame' is mostly run 
against the newest version and the release version of a file, right? I 
couldn't find the number of files in bsd, so lets take linux instead: 
That is 25k files for version 2.6.27. Lets say 35k files altogether for 
both release and newer versions of the files.


A typical page of git blame output on github seems to be in the vicinity 
of 500 kbytes, but that seems to include lots of overhead for comfort 
functions. At least that means it is a good upper bound value.


35k files times 500k gives 17.5 Gbytes, a trivial value for a static 
*disk* based cache. It is also a manageable value for affordable SSDs











--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: is there a fast web-interface to git for huge repos?

2013-06-07 Thread Fredrik Gustafsson
On Thu, Jun 06, 2013 at 06:35:43PM -0700, Constantine A. Murenin wrote:
 I'm interested in running a web interface to this and other similar
 git repositories (FreeBSD and NetBSD git repositories are even much,
 much bigger).
 
 Software-wise, is there no way to make cold access for git-log and
 git-blame to be orders of magnitude less than ~5s, and warm access
 less than ~0.5s?

The obvious way would be to cache the results. You can even put an
update cache hook the git repositories to make the cache always be up to
date.

There's some dynamic web frontends like cgit and gitweb out there but
there's also static ones like git-arr ( http://blitiri.com.ar/p/git-arr/
) that might be more of an option to you.

-- 
Med vänliga hälsningar
Fredrik Gustafsson

tel: 0733-608274
e-post: iv...@iveqy.com
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: is there a fast web-interface to git for huge repos?

2013-06-07 Thread Constantine A. Murenin
On 6 June 2013 23:33, Fredrik Gustafsson iv...@iveqy.com wrote:
 On Thu, Jun 06, 2013 at 06:35:43PM -0700, Constantine A. Murenin wrote:
 I'm interested in running a web interface to this and other similar
 git repositories (FreeBSD and NetBSD git repositories are even much,
 much bigger).

 Software-wise, is there no way to make cold access for git-log and
 git-blame to be orders of magnitude less than ~5s, and warm access
 less than ~0.5s?

 The obvious way would be to cache the results. You can even put an

That would do nothing to prevent slowness of the cold requests, which
already run for 5s when completely cold.

In fact, unless done right, it would actually slow things down, as
lines would not necessarily show up as they're ready.

 update cache hook the git repositories to make the cache always be up to
 date.

That's entirely inefficient.  It'll probably take hours or days to
pre-cache all the html pages with a naive wget and the list of all the
files.  Not a solution at all.

(0.5s x 35k files = 5 hours for log/blame, plus another 5h of cpu time
for blame/log)

 There's some dynamic web frontends like cgit and gitweb out there but
 there's also static ones like git-arr ( http://blitiri.com.ar/p/git-arr/
 ) that might be more of an option to you.

The concept for git-arr looks interesting, but it has neither blame
nor log, so, it's kinda pointless, because the whole thing that's slow
is exactly blame and log.

There has to be some way to improve these matters.  Noone wants to
wait 5 seconds until a page is generated, we're not running enterprise
software here, latency is important!

C.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: is there a fast web-interface to git for huge repos?

2013-06-07 Thread Fredrik Gustafsson
On Fri, Jun 07, 2013 at 10:05:37AM -0700, Constantine A. Murenin wrote:
 On 6 June 2013 23:33, Fredrik Gustafsson iv...@iveqy.com wrote:
  On Thu, Jun 06, 2013 at 06:35:43PM -0700, Constantine A. Murenin wrote:
  I'm interested in running a web interface to this and other similar
  git repositories (FreeBSD and NetBSD git repositories are even much,
  much bigger).
 
  Software-wise, is there no way to make cold access for git-log and
  git-blame to be orders of magnitude less than ~5s, and warm access
  less than ~0.5s?
 
  The obvious way would be to cache the results. You can even put an
 
 That would do nothing to prevent slowness of the cold requests, which
 already run for 5s when completely cold.
 
 In fact, unless done right, it would actually slow things down, as
 lines would not necessarily show up as they're ready.

You need to cache this _before_ the web-request. Don't let the
web-request trigger a cache-update but a git push to the repository.

 
  update cache hook the git repositories to make the cache always be up to
  date.
 
 That's entirely inefficient.  It'll probably take hours or days to
 pre-cache all the html pages with a naive wget and the list of all the
 files.  Not a solution at all.
 
 (0.5s x 35k files = 5 hours for log/blame, plus another 5h of cpu time
 for blame/log)

That's a one-time penalty. Why would that be a problem? And why is wget
even mentioned? Did we misunderstood eachother?

 
  There's some dynamic web frontends like cgit and gitweb out there but
  there's also static ones like git-arr ( http://blitiri.com.ar/p/git-arr/
  ) that might be more of an option to you.
 
 The concept for git-arr looks interesting, but it has neither blame
 nor log, so, it's kinda pointless, because the whole thing that's slow
 is exactly blame and log.
 
 There has to be some way to improve these matters.  Noone wants to
 wait 5 seconds until a page is generated, we're not running enterprise
 software here, latency is important!
 
 C.

Git's internal structures make just blame pretty expensive. There's
nothing you really can do for it algoritm wise (as far as I know, if
there was, people would already improved it).

The solution here is to have a hot repository to speed up things.

There's of course little things you can do. I imagine that using git
repack in a sane way probably could speed things up, as well as git gc.

-- 
Med vänliga hälsningar
Fredrik Gustafsson

tel: 0733-608274
e-post: iv...@iveqy.com
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: is there a fast web-interface to git for huge repos?

2013-06-07 Thread Constantine A. Murenin
On 7 June 2013 10:57, Fredrik Gustafsson iv...@iveqy.com wrote:
 On Fri, Jun 07, 2013 at 10:05:37AM -0700, Constantine A. Murenin wrote:
 On 6 June 2013 23:33, Fredrik Gustafsson iv...@iveqy.com wrote:
  On Thu, Jun 06, 2013 at 06:35:43PM -0700, Constantine A. Murenin wrote:
  I'm interested in running a web interface to this and other similar
  git repositories (FreeBSD and NetBSD git repositories are even much,
  much bigger).
 
  Software-wise, is there no way to make cold access for git-log and
  git-blame to be orders of magnitude less than ~5s, and warm access
  less than ~0.5s?
 
  The obvious way would be to cache the results. You can even put an

 That would do nothing to prevent slowness of the cold requests, which
 already run for 5s when completely cold.

 In fact, unless done right, it would actually slow things down, as
 lines would not necessarily show up as they're ready.

 You need to cache this _before_ the web-request. Don't let the
 web-request trigger a cache-update but a git push to the repository.


  update cache hook the git repositories to make the cache always be up to
  date.

 That's entirely inefficient.  It'll probably take hours or days to
 pre-cache all the html pages with a naive wget and the list of all the
 files.  Not a solution at all.

 (0.5s x 35k files = 5 hours for log/blame, plus another 5h of cpu time
 for blame/log)

 That's a one-time penalty. Why would that be a problem? And why is wget
 even mentioned? Did we misunderstood eachother?

`wget` or `curl --head` would be used to trigger the caching.

I don't understand how it's a one-time penalty.  Noone wants to look
at an old copy of the repository, so, pretty much, if, say, I want to
have a gitweb of all 4 BSDs, updated daily, then, pretty much, even
with lots of ram (e.g. to eliminate the cold-case 5s penalty, and
reduce each page to 0.5s), on a quad-core box, I'd be kinda be lucky
to complete a generation of all the pages within 12h or so, obviously
using the machine at, or above, 50% capacity just for the caching.  Or
several days or even a couple of weeks on an Intel Atom or VIA Nano
with 2GB of RAM or so.  Obviously not acceptable, there has to be a
better solution.

One could, I guess, only regenerate the pages which have changed, but
it still sounds like an ugly solution, where you'd have to be
generating a list of files that have changed between one gen and the
next, and you'd still have to have a very high cpu, cache and storage
requirements.

C.

  There's some dynamic web frontends like cgit and gitweb out there but
  there's also static ones like git-arr ( http://blitiri.com.ar/p/git-arr/
  ) that might be more of an option to you.

 The concept for git-arr looks interesting, but it has neither blame
 nor log, so, it's kinda pointless, because the whole thing that's slow
 is exactly blame and log.

 There has to be some way to improve these matters.  Noone wants to
 wait 5 seconds until a page is generated, we're not running enterprise
 software here, latency is important!

 C.

 Git's internal structures make just blame pretty expensive. There's
 nothing you really can do for it algoritm wise (as far as I know, if
 there was, people would already improved it).

 The solution here is to have a hot repository to speed up things.

 There's of course little things you can do. I imagine that using git
 repack in a sane way probably could speed things up, as well as git gc.

 --
 Med vänliga hälsningar
 Fredrik Gustafsson

 tel: 0733-608274
 e-post: iv...@iveqy.com
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


is there a fast web-interface to git for huge repos?

2013-06-06 Thread Constantine A. Murenin
Hi,

On a relatively-empty Intel Core i7 975 @ 3.33GHz (quad-core):

Cns# cd DragonFly/

Cns# time git log sys/sys/sockbuf.h /dev/null
0.540u 0.140s 0:04.30 15.8% 0+0k 2754+55io 6484pf+0w
Cns# time git log sys/sys/sockbuf.h  /dev/null
0.000u 0.030s 0:00.52 5.7%  0+0k 0+0io 0pf+0w
Cns# time git log sys/sys/sockbuf.h  /dev/null
0.180u 0.020s 0:00.52 38.4% 0+0k 0+2io 0pf+0w
Cns# time git log sys/sys/sockbuf.h  /dev/null
0.420u 0.020s 0:00.52 84.6% 0+0k 0+0io 0pf+0w

And, right away, a semi-cold git-blame:

Cns# time git blame sys/sys/sockbuf.h /dev/null
0.340u 0.040s 0:01.91 19.8% 0+0k 769+45io 2078pf+0w
Cns# time git blame sys/sys/sockbuf.h  /dev/null
0.340u 0.010s 0:00.36 97.2% 0+0k 0+2io 0pf+0w
Cns# time git blame sys/sys/sockbuf.h  /dev/null
0.310u 0.040s 0:00.36 97.2% 0+0k 0+0io 0pf+0w
Cns# time git blame sys/sys/sockbuf.h  /dev/null
0.310u 0.050s 0:00.36 100.0%0+0k 0+0io 0pf+0w


I'm interested in running a web interface to this and other similar
git repositories (FreeBSD and NetBSD git repositories are even much,
much bigger).

Software-wise, is there no way to make cold access for git-log and
git-blame to be orders of magnitude less than ~5s, and warm access
less than ~0.5s?

C.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html