Re: [git-users] What is the maximum number of repositories a git server can have

2016-04-05 Thread display_name_taken
Thanks for the suggestions.
Definitely will evaluate them as well.
Cheers

On Tuesday, April 5, 2016 at 8:21:08 AM UTC-4, Konstantin Khomoutov wrote:
>
> On Mon, 4 Apr 2016 13:50:26 -0700 (PDT) 
> display_name_taken  wrote: 
>
> > Thanks for the post Konstantin. 
> > There won't be many concurrent users (say 10) hence not many cloning 
> > at the same time. 
> > My main requirement was using git as some type of storage space with 
> > versioning capability. 
>
> Well, Git is optimized for handling small-to-middle-sized files and 
> assumes they don't change a lot between commits (this would be 
> understandable once you consider the user case it was created -- 
> and tailored -- for: working on the Linux kernel source code). 
>
> The Git's capability of "diffing" (comparing in a human-sensible way) 
> of the contents of files recorded in different commits also relies 
> on these files being textual, where "textual" means some 8-bit encoding 
> (typically UTF-8 but various ISO-* and Windows-* encodings would work 
> as well). 
>
> So, while Git is able to work with large files, and it's able to work 
> with binary files (including files containing text in weird encodings 
> such as UTF-16/UCS-2 etc) these are not the things it's optimized for 
> and you might find yourself dancing around Git trying to do it things 
> with your data which you intended to get "for free". 
>
> What I'm actually leading you to, is that it might turn out you might 
> not really need a full-blown version-control system because inherently 
> those are typically tailored for working on source code and other "plain 
> text" stuff.  Hence you might consider light-weight solutions intended 
> for versioned backups.  I suggest looking at rdiff-backup, attic and 
> obnam -- to name just a few.  All these systems allow you to 
> periodically "push" a new state of a filesystem hierarchy rooted in a 
> directory to "a server" (which might reside on a local machine), which 
> would effectively store this new snapshot using various 
> deduplication/delta compression techniques, allow to inspect the list 
> of "revisions" and extract files from any selected revision. 
> A "winning" feature compared to Git is that they in most cases allow 
> to prune past revisions -- which might or might not be useful for your 
> use case. 
>
> All-in-all, if all you're concerned with is disk storage then Git is 
> relatively OK with it -- provided your data does not change too much 
> between adjacent commits (obviously, if you store 100MB worth of data 
> in a commit and then store 100MB of completely different data in the 
> next commit, that second commit won't be well-compressible compared to 
> the previous one and you'll end up with some 200MB of data in the 
> repository). 
>

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] What is the maximum number of repositories a git server can have

2016-04-05 Thread Konstantin Khomoutov
On Mon, 4 Apr 2016 13:50:26 -0700 (PDT)
display_name_taken  wrote:

> Thanks for the post Konstantin.
> There won't be many concurrent users (say 10) hence not many cloning
> at the same time.
> My main requirement was using git as some type of storage space with 
> versioning capability.

Well, Git is optimized for handling small-to-middle-sized files and
assumes they don't change a lot between commits (this would be
understandable once you consider the user case it was created --
and tailored -- for: working on the Linux kernel source code).

The Git's capability of "diffing" (comparing in a human-sensible way)
of the contents of files recorded in different commits also relies
on these files being textual, where "textual" means some 8-bit encoding
(typically UTF-8 but various ISO-* and Windows-* encodings would work
as well).

So, while Git is able to work with large files, and it's able to work
with binary files (including files containing text in weird encodings
such as UTF-16/UCS-2 etc) these are not the things it's optimized for
and you might find yourself dancing around Git trying to do it things
with your data which you intended to get "for free".

What I'm actually leading you to, is that it might turn out you might
not really need a full-blown version-control system because inherently
those are typically tailored for working on source code and other "plain
text" stuff.  Hence you might consider light-weight solutions intended
for versioned backups.  I suggest looking at rdiff-backup, attic and
obnam -- to name just a few.  All these systems allow you to
periodically "push" a new state of a filesystem hierarchy rooted in a
directory to "a server" (which might reside on a local machine), which
would effectively store this new snapshot using various
deduplication/delta compression techniques, allow to inspect the list
of "revisions" and extract files from any selected revision.
A "winning" feature compared to Git is that they in most cases allow
to prune past revisions -- which might or might not be useful for your
use case.

All-in-all, if all you're concerned with is disk storage then Git is
relatively OK with it -- provided your data does not change too much
between adjacent commits (obviously, if you store 100MB worth of data
in a commit and then store 100MB of completely different data in the
next commit, that second commit won't be well-compressible compared to
the previous one and you'll end up with some 200MB of data in the
repository).

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [git-users] What is the maximum number of repositories a git server can have

2016-04-04 Thread Konstantin Khomoutov
On Mon, 4 Apr 2016 12:59:38 -0700 (PDT)
display_name_taken  wrote:

> I'm trying to find out what is the maximum number of repositories a
> given git server can handle or is there such a limit?
> For example can git server works without any issues if there are 1000 
> repositories? 
> Each repository will have only few files (10 or less) and the size of
> each repository will be few megabytes.

You maintain a wrong mental model about Git.

When you access a Git repository hosted on a server you either talk to
an SSH server or an HTTP server or the so-called Git daemon directly
(when a repository is being accessed using the URL beginning with
git://), but in either case, to access that repository a separate OS
process is started -- running one of the special low-level Git tools.

Hence *if* you're in the end using Git to serve your repositories, the
server's capaticy is more about handling the desired number of
simultaneously run processes each potentially using a hefty amount of
memory (big fetches for instance).  Disk throughput might also
potentially become a bottleneck.

On the other hand, here we're talking about concurrent access.
I'm pretty sure that to really have 1000 clients concurrently cloning
big repositories, you'd have to have some x100 as much repositories
because IMO having such "thundering herd" spikes, when all the clients
potential clients all of a sudden start cloning their repositories is
pretty much unlikely -- unless you're being DOSed.

In either case, what's the point in such "guesseneering"?
Have a test server and simulate the desired workload.

If you're about hosting that much repositories you should not be
pulling the sample data out of anectotes heard on some public forums.

Oh, and while we're at it: the model I outlined -- one specialized Git
process serving a single client accessing a single repository -- is,
again, when you're using plain vanilla Git for serving (plus some
front-end may be, such as SSH or HTTP server).  If you're using a
solution which does not use plain Git (such as Gitblit), the situation
may be very different.

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[git-users] What is the maximum number of repositories a git server can have

2016-04-04 Thread display_name_taken
I'm trying to find out what is the maximum number of repositories a given 
git server can handle or is there such a limit?
For example can git server works without any issues if there are 1000 
repositories? 
Each repository will have only few files (10 or less) and the size of each 
repository will be few megabytes.

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.