Re: is there a fast web-interface to git for huge repos?
Am 07.06.2013 22:21, schrieb Constantine A. Murenin: I'm totally fine with daily updates; but I think there still has to be some better way of doing this than wasting 0.5s of CPU time and 5s of HDD time (if completely cold) for each blame / log, at the price of more storage and some pre-caching, and (daily (in my use-case)) fine-grained incremental updates. To get a feel for the numbers: I would guess 'git blame' is mostly run against the newest version and the release version of a file, right? I couldn't find the number of files in bsd, so lets take linux instead: That is 25k files for version 2.6.27. Lets say 35k files altogether for both release and newer versions of the files. A typical page of git blame output on github seems to be in the vicinity of 500 kbytes, but that seems to include lots of overhead for comfort functions. At least that means it is a good upper bound value. 35k files times 500k gives 17.5 Gbytes, a trivial value for a static *disk* based cache. It is also a manageable value for affordable SSDs -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: is there a fast web-interface to git for huge repos?
On Thu, Jun 06, 2013 at 06:35:43PM -0700, Constantine A. Murenin wrote: I'm interested in running a web interface to this and other similar git repositories (FreeBSD and NetBSD git repositories are even much, much bigger). Software-wise, is there no way to make cold access for git-log and git-blame to be orders of magnitude less than ~5s, and warm access less than ~0.5s? The obvious way would be to cache the results. You can even put an update cache hook the git repositories to make the cache always be up to date. There's some dynamic web frontends like cgit and gitweb out there but there's also static ones like git-arr ( http://blitiri.com.ar/p/git-arr/ ) that might be more of an option to you. -- Med vänliga hälsningar Fredrik Gustafsson tel: 0733-608274 e-post: iv...@iveqy.com -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: is there a fast web-interface to git for huge repos?
On 6 June 2013 23:33, Fredrik Gustafsson iv...@iveqy.com wrote: On Thu, Jun 06, 2013 at 06:35:43PM -0700, Constantine A. Murenin wrote: I'm interested in running a web interface to this and other similar git repositories (FreeBSD and NetBSD git repositories are even much, much bigger). Software-wise, is there no way to make cold access for git-log and git-blame to be orders of magnitude less than ~5s, and warm access less than ~0.5s? The obvious way would be to cache the results. You can even put an That would do nothing to prevent slowness of the cold requests, which already run for 5s when completely cold. In fact, unless done right, it would actually slow things down, as lines would not necessarily show up as they're ready. update cache hook the git repositories to make the cache always be up to date. That's entirely inefficient. It'll probably take hours or days to pre-cache all the html pages with a naive wget and the list of all the files. Not a solution at all. (0.5s x 35k files = 5 hours for log/blame, plus another 5h of cpu time for blame/log) There's some dynamic web frontends like cgit and gitweb out there but there's also static ones like git-arr ( http://blitiri.com.ar/p/git-arr/ ) that might be more of an option to you. The concept for git-arr looks interesting, but it has neither blame nor log, so, it's kinda pointless, because the whole thing that's slow is exactly blame and log. There has to be some way to improve these matters. Noone wants to wait 5 seconds until a page is generated, we're not running enterprise software here, latency is important! C. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: is there a fast web-interface to git for huge repos?
On Fri, Jun 07, 2013 at 10:05:37AM -0700, Constantine A. Murenin wrote: On 6 June 2013 23:33, Fredrik Gustafsson iv...@iveqy.com wrote: On Thu, Jun 06, 2013 at 06:35:43PM -0700, Constantine A. Murenin wrote: I'm interested in running a web interface to this and other similar git repositories (FreeBSD and NetBSD git repositories are even much, much bigger). Software-wise, is there no way to make cold access for git-log and git-blame to be orders of magnitude less than ~5s, and warm access less than ~0.5s? The obvious way would be to cache the results. You can even put an That would do nothing to prevent slowness of the cold requests, which already run for 5s when completely cold. In fact, unless done right, it would actually slow things down, as lines would not necessarily show up as they're ready. You need to cache this _before_ the web-request. Don't let the web-request trigger a cache-update but a git push to the repository. update cache hook the git repositories to make the cache always be up to date. That's entirely inefficient. It'll probably take hours or days to pre-cache all the html pages with a naive wget and the list of all the files. Not a solution at all. (0.5s x 35k files = 5 hours for log/blame, plus another 5h of cpu time for blame/log) That's a one-time penalty. Why would that be a problem? And why is wget even mentioned? Did we misunderstood eachother? There's some dynamic web frontends like cgit and gitweb out there but there's also static ones like git-arr ( http://blitiri.com.ar/p/git-arr/ ) that might be more of an option to you. The concept for git-arr looks interesting, but it has neither blame nor log, so, it's kinda pointless, because the whole thing that's slow is exactly blame and log. There has to be some way to improve these matters. Noone wants to wait 5 seconds until a page is generated, we're not running enterprise software here, latency is important! C. Git's internal structures make just blame pretty expensive. There's nothing you really can do for it algoritm wise (as far as I know, if there was, people would already improved it). The solution here is to have a hot repository to speed up things. There's of course little things you can do. I imagine that using git repack in a sane way probably could speed things up, as well as git gc. -- Med vänliga hälsningar Fredrik Gustafsson tel: 0733-608274 e-post: iv...@iveqy.com -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: is there a fast web-interface to git for huge repos?
On 7 June 2013 10:57, Fredrik Gustafsson iv...@iveqy.com wrote: On Fri, Jun 07, 2013 at 10:05:37AM -0700, Constantine A. Murenin wrote: On 6 June 2013 23:33, Fredrik Gustafsson iv...@iveqy.com wrote: On Thu, Jun 06, 2013 at 06:35:43PM -0700, Constantine A. Murenin wrote: I'm interested in running a web interface to this and other similar git repositories (FreeBSD and NetBSD git repositories are even much, much bigger). Software-wise, is there no way to make cold access for git-log and git-blame to be orders of magnitude less than ~5s, and warm access less than ~0.5s? The obvious way would be to cache the results. You can even put an That would do nothing to prevent slowness of the cold requests, which already run for 5s when completely cold. In fact, unless done right, it would actually slow things down, as lines would not necessarily show up as they're ready. You need to cache this _before_ the web-request. Don't let the web-request trigger a cache-update but a git push to the repository. update cache hook the git repositories to make the cache always be up to date. That's entirely inefficient. It'll probably take hours or days to pre-cache all the html pages with a naive wget and the list of all the files. Not a solution at all. (0.5s x 35k files = 5 hours for log/blame, plus another 5h of cpu time for blame/log) That's a one-time penalty. Why would that be a problem? And why is wget even mentioned? Did we misunderstood eachother? `wget` or `curl --head` would be used to trigger the caching. I don't understand how it's a one-time penalty. Noone wants to look at an old copy of the repository, so, pretty much, if, say, I want to have a gitweb of all 4 BSDs, updated daily, then, pretty much, even with lots of ram (e.g. to eliminate the cold-case 5s penalty, and reduce each page to 0.5s), on a quad-core box, I'd be kinda be lucky to complete a generation of all the pages within 12h or so, obviously using the machine at, or above, 50% capacity just for the caching. Or several days or even a couple of weeks on an Intel Atom or VIA Nano with 2GB of RAM or so. Obviously not acceptable, there has to be a better solution. One could, I guess, only regenerate the pages which have changed, but it still sounds like an ugly solution, where you'd have to be generating a list of files that have changed between one gen and the next, and you'd still have to have a very high cpu, cache and storage requirements. C. There's some dynamic web frontends like cgit and gitweb out there but there's also static ones like git-arr ( http://blitiri.com.ar/p/git-arr/ ) that might be more of an option to you. The concept for git-arr looks interesting, but it has neither blame nor log, so, it's kinda pointless, because the whole thing that's slow is exactly blame and log. There has to be some way to improve these matters. Noone wants to wait 5 seconds until a page is generated, we're not running enterprise software here, latency is important! C. Git's internal structures make just blame pretty expensive. There's nothing you really can do for it algoritm wise (as far as I know, if there was, people would already improved it). The solution here is to have a hot repository to speed up things. There's of course little things you can do. I imagine that using git repack in a sane way probably could speed things up, as well as git gc. -- Med vänliga hälsningar Fredrik Gustafsson tel: 0733-608274 e-post: iv...@iveqy.com -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
is there a fast web-interface to git for huge repos?
Hi, On a relatively-empty Intel Core i7 975 @ 3.33GHz (quad-core): Cns# cd DragonFly/ Cns# time git log sys/sys/sockbuf.h /dev/null 0.540u 0.140s 0:04.30 15.8% 0+0k 2754+55io 6484pf+0w Cns# time git log sys/sys/sockbuf.h /dev/null 0.000u 0.030s 0:00.52 5.7% 0+0k 0+0io 0pf+0w Cns# time git log sys/sys/sockbuf.h /dev/null 0.180u 0.020s 0:00.52 38.4% 0+0k 0+2io 0pf+0w Cns# time git log sys/sys/sockbuf.h /dev/null 0.420u 0.020s 0:00.52 84.6% 0+0k 0+0io 0pf+0w And, right away, a semi-cold git-blame: Cns# time git blame sys/sys/sockbuf.h /dev/null 0.340u 0.040s 0:01.91 19.8% 0+0k 769+45io 2078pf+0w Cns# time git blame sys/sys/sockbuf.h /dev/null 0.340u 0.010s 0:00.36 97.2% 0+0k 0+2io 0pf+0w Cns# time git blame sys/sys/sockbuf.h /dev/null 0.310u 0.040s 0:00.36 97.2% 0+0k 0+0io 0pf+0w Cns# time git blame sys/sys/sockbuf.h /dev/null 0.310u 0.050s 0:00.36 100.0%0+0k 0+0io 0pf+0w I'm interested in running a web interface to this and other similar git repositories (FreeBSD and NetBSD git repositories are even much, much bigger). Software-wise, is there no way to make cold access for git-log and git-blame to be orders of magnitude less than ~5s, and warm access less than ~0.5s? C. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html