Re: How much is too much data in an svn repository?

2022-09-26 Thread Doug Robinson
Sean:

On Thu, Sep 22, 2022 at 3:59 PM Sean McBride 
wrote:

> Our svn repo is about 110 GB for a full checkout. Larger on the server of
> course, with all history, weighting about 142 GB.
>
> There haven't been any performance issues, it's working great.
>
> But now some users are interested in committing an additional 200 GB of
> mostly large binary files.
>
> I worry about it becoming "too big".  At what point does that happen?
> Terabytes?  Petabytes?  100s of GB?
>

WANdisco supports customers with Subversion repositories in the TiB with
millions of revisions.  As others have mentioned, the repository size
matters only when it is time to back it up.  Large backups can be managed
with different techniques with different costs (only some of which have
been mentioned so far).

What tends to be more important on a day to day basis is the size of the
checkout: TCP is bandwidth limited by latency so the larger the working
copy at any given latency the longer it takes to check out.  And the larger
the latency, well...  The number of files/directories in a revision can be
an issue with certain operations as can the amount of history of changes
for a single file (e.g. "svn blame" can get slow...).  And the chatty
nature of WEBDAV means that latency compounds the time required.  Using
"svnserve" only helps so in some circumstances since it is difficult to
have it cache as much as Apache (and not at all for multi-user support) so
it scales differently for different operations.

I've read some excellent suggestions about using artifact management
systems for build artifacts - definitely.

All that said, I think it wise to keep repository size bounded to what you
(your company?) can reasonably support.

Cheers.

Doug
-- 
*DOUGLAS B ROBINSON* SENIOR PRODUCT MANAGER

T +1 925 396 1125
*E* doug.robin...@wandisco.com

-- 


 


THIS MESSAGE AND ANY ATTACHMENTS ARE 
CONFIDENTIAL, PROPRIETARY AND MAY BE PRIVILEGED


If this message was 
misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not 
waive any confidentiality or privilege. If you are not the intended 
recipient, please notify us immediately and destroy the message without 
disclosing its contents to anyone. Any distribution, use or copying of this 
email or the information it contains by other than an intended recipient is 
unauthorized. The views and opinions expressed in this email message are 
the author's own and may not reflect the views and opinions of WANdisco, 
unless the author is authorized by WANdisco to express such views or 
opinions on its behalf. All email sent to or from this address is subject 
to electronic storage and review by WANdisco. Although WANdisco operates 
anti-virus programs, it does not accept responsibility for any damage 
whatsoever caused by viruses being passed.


Re: How much is too much data in an svn repository?

2022-09-23 Thread Jeffrey Walton
On Thu, Sep 22, 2022 at 3:59 PM Sean McBride  wrote:
>
> Our svn repo is about 110 GB for a full checkout. Larger on the server of 
> course, with all history, weighting about 142 GB.
>
> There haven't been any performance issues, it's working great.
>
> But now some users are interested in committing an additional 200 GB of 
> mostly large binary files.
>
> I worry about it becoming "too big".  At what point does that happen?  
> Terabytes?  Petabytes?  100s of GB?

I've never encountered a problem with "too big," but I have
encountered problems with binary file types causing an SVN client or
server to hang. I experienced it back in 2012 or 2013 on a very large
collection of repos. I tried to check out/clone and the operation
would hang about 6 or 8 hours into the operation.

Through trial and error we discovered a developer had checked-in
object files from an XCode build, and the SVN client or server would
hang on the object files. I don't recall if it was all object files,
or just a particular one. As an added twist, I think we were using
TortoiseSVN on Windows. So it may have been a bad interaction with
TortoiseSVN on Windows. Once we manually deleted object files the
check-out/clone proceeded.

I don't know if that would happen nowadays.

Jeff


Re: How much is too much data in an svn repository?

2022-09-23 Thread Nico Kadel-Garcia
On Fri, Sep 23, 2022 at 7:43 AM Mark Phippard  wrote:
>
> On Thu, Sep 22, 2022 at 3:59 PM Sean McBride  wrote:
> >
> > Hi all,
> >
> > Our svn repo is about 110 GB for a full checkout. Larger on the server of 
> > course, with all history, weighting about 142 GB.
> >
> > There haven't been any performance issues, it's working great.
> >
> > But now some users are interested in committing an additional 200 GB of 
> > mostly large binary files.
> >
> > I worry about it becoming "too big".  At what point does that happen?  
> > Terabytes?  Petabytes?  100s of GB?
>
> Assuming you have the disk space then there is no real upper limit.

There are practical limits. The number of file descriptors for years
or decades of irrelevant history accumulate. Bulky accidental commits,
such as large binary objects, accumulate and create burdens for backup
or high availability. And keeping around old tags that haven't been
used in years encourages re-introducing obsolete API's or errors, or
re-introduce security flaws.

> That said ... do not discount the administrative burden. Are you
> backing up your repository? Whether using dump/load, svnsync or
> hotcopy the bigger the repository the more of a burden it will be on
> these tools.
>
> If this is just about storing binary files why not consider solutions
> that were meant for that like an object storage platform like S3 or
> minio or a package manager like Maven, Nuget etc.
>
> A big negative of Subversion repositories is you cannot ever delete
> anything. Do you really need to keep all these binaries forever?
>
> Mark


Re: How much is too much data in an svn repository?

2022-09-23 Thread Nathan Hartman
On Thu, Sep 22, 2022 at 3:59 PM Sean McBride  wrote:
>
> Hi all,
>
> Our svn repo is about 110 GB for a full checkout. Larger on the server of 
> course, with all history, weighting about 142 GB.
>
> There haven't been any performance issues, it's working great.
>
> But now some users are interested in committing an additional 200 GB of 
> mostly large binary files.
>
> I worry about it becoming "too big".  At what point does that happen?  
> Terabytes?  Petabytes?  100s of GB?
>
> Thanks,
>
> Sean

It occurs to me that we don't have a FAQ or other easy-to-find
documentation on maximums, such as the maximum file size, etc.

The largest publicly-accessible SVN repository of which I am aware is
the Apache.org one in which Subversion's own sources (as well as those
of numerous other projects) are housed. This repository contains
approximately 1.9 million revisions. According to [1] the dump of this
repository expands to over 65 gigabytes.

But that seems to be a drop in the ocean when Aleksa writes:

On Fri, Sep 23, 2022 at 3:45 AM Aleksa Todorović  wrote:
> I can confirm that Subversion can handle repositories with 100,000+ 
> revisions, size of committed files ranging from few bytes to several GBs, and 
> total repo size of up to 20TB.

It is possible that others here are aware of even larger repositories.

My biggest concern mirrors what Mark said about administrative burden:
the size of backups and the time it takes to make them. Mark addressed
that point quite well. Whatever you do, you must have good backups!
(My $dayjob does backups 3 different ways: the filesystem on which the
repository is stored is backed up regularly. In addition we take
periodic 'hotcopy' backups, and periodic full 'dump' backups.
Obviously as a repository grows, this takes longer and requires more
storage.

[1] http://svn-dump.apache.org

Cheers,
Nathan


Re: How much is too much data in an svn repository?

2022-09-23 Thread Daniel Sahlberg
Hi,

In addition to all other responses, I'd like to advertise the "pristines on
demand" feature that got some traction in the spring.

Subversion is normally storing all files twice on the client side (in the
"working copy": once for the actual file and once as a "pristine", ie as
the file was when checking out, in the .svn folder). The idea with
"prisines on demand" is to store the file only once, on the expense of some
operations requiring more bandwidth. I'm not sure about the status, but it
is not part of any current release yet. Karl Fogel and Julian Foad was
involved in this, more details can be found in the list archives on the
d...@subversion.apache.org list.

Kind regards,
Daniel



Den tors 22 sep. 2022 kl 21:59 skrev Sean McBride :

> Hi all,
>
> Our svn repo is about 110 GB for a full checkout. Larger on the server of
> course, with all history, weighting about 142 GB.
>
> There haven't been any performance issues, it's working great.
>
> But now some users are interested in committing an additional 200 GB of
> mostly large binary files.
>
> I worry about it becoming "too big".  At what point does that happen?
> Terabytes?  Petabytes?  100s of GB?
>
> Thanks,
>
> Sean
>


Re: How much is too much data in an svn repository?

2022-09-23 Thread Graham Leggett via users
On 23 Sep 2022, at 13:42, Mark Phippard  wrote:

> A big negative of Subversion repositories is you cannot ever delete
> anything. Do you really need to keep all these binaries forever?

In our regulated world that is an important feature.

Once the repos get too big we start new ones. In the meantime, there is no such 
thing as “we did fraud^H^H^H^H^H a delete to make space”.

Regards,
Graham
—



Re: How much is too much data in an svn repository?

2022-09-23 Thread Mark Phippard
On Thu, Sep 22, 2022 at 3:59 PM Sean McBride  wrote:
>
> Hi all,
>
> Our svn repo is about 110 GB for a full checkout. Larger on the server of 
> course, with all history, weighting about 142 GB.
>
> There haven't been any performance issues, it's working great.
>
> But now some users are interested in committing an additional 200 GB of 
> mostly large binary files.
>
> I worry about it becoming "too big".  At what point does that happen?  
> Terabytes?  Petabytes?  100s of GB?

Assuming you have the disk space then there is no real upper limit.

That said ... do not discount the administrative burden. Are you
backing up your repository? Whether using dump/load, svnsync or
hotcopy the bigger the repository the more of a burden it will be on
these tools.

If this is just about storing binary files why not consider solutions
that were meant for that like an object storage platform like S3 or
minio or a package manager like Maven, Nuget etc.

A big negative of Subversion repositories is you cannot ever delete
anything. Do you really need to keep all these binaries forever?

Mark


Re: How much is too much data in an svn repository?

2022-09-23 Thread Graham Leggett via users
On 22 Sep 2022, at 21:59, Sean McBride  wrote:

> Our svn repo is about 110 GB for a full checkout. Larger on the server of 
> course, with all history, weighting about 142 GB.
> 
> There haven't been any performance issues, it's working great.
> 
> But now some users are interested in committing an additional 200 GB of 
> mostly large binary files.
> 
> I worry about it becoming "too big".  At what point does that happen?  
> Terabytes?  Petabytes?  100s of GB?

From experience it becomes too big when the underlying disk gets full. As long 
as your underlying disks can handle it, it works fine.

I use SVN for versioned incremental backups of files in the 0.5GB range. I’ve 
seen reports of others checking in multi GB files as backups with no trouble.

Best thing to do is to physically try it. Make a copy of your repo, then try 
check things into it, and see where your issues are.

Regards,
Graham
—



Re: How much is too much data in an svn repository?

2022-09-23 Thread Aleksa Todorović
Hi all,

I can confirm that Subversion can handle repositories with 100,000+
revisions, size of committed files ranging from few bytes to several GBs,
and total repo size of up to 20TB. Speed issues that I'm seeing are mostly
related to hard drive operations, but do not prevent efficient work. The
only very noticeable speed issues are on commits with thousands (those
happen from time to time) on client side, where it takes lot of time to
commit (all those files need to be compared by content), but also update
(there is always a copy of each file in .svn directory). Outside of that,
Subversion performs really well.

Hope this helps.

Regards,
Aleksa


On Fri, Sep 23, 2022 at 9:33 AM Justin MASSIOT | Zentek <
justin.mass...@zentek.fr> wrote:

> Hello Sean,
>
> I have not enough experience to answer your question, but I'm very
> concerned about large binary files. Whereas I have a more "splitted"
> structure of repositories.
> I'm following this discussion ;-) Can anyone bring some inputs on this
> topic?
>
> Justin MASSIOT  |  Zentek
>
>
> On Thu, 22 Sept 2022 at 21:59, Sean McBride 
> wrote:
>
>> Hi all,
>>
>> Our svn repo is about 110 GB for a full checkout. Larger on the server of
>> course, with all history, weighting about 142 GB.
>>
>> There haven't been any performance issues, it's working great.
>>
>> But now some users are interested in committing an additional 200 GB of
>> mostly large binary files.
>>
>> I worry about it becoming "too big".  At what point does that happen?
>> Terabytes?  Petabytes?  100s of GB?
>>
>> Thanks,
>>
>> Sean
>>
>


Re: How much is too much data in an svn repository?

2022-09-23 Thread Justin MASSIOT | Zentek
Hello Sean,

I have not enough experience to answer your question, but I'm very
concerned about large binary files. Whereas I have a more "splitted"
structure of repositories.
I'm following this discussion ;-) Can anyone bring some inputs on this
topic?

Justin MASSIOT  |  Zentek


On Thu, 22 Sept 2022 at 21:59, Sean McBride  wrote:

> Hi all,
>
> Our svn repo is about 110 GB for a full checkout. Larger on the server of
> course, with all history, weighting about 142 GB.
>
> There haven't been any performance issues, it's working great.
>
> But now some users are interested in committing an additional 200 GB of
> mostly large binary files.
>
> I worry about it becoming "too big".  At what point does that happen?
> Terabytes?  Petabytes?  100s of GB?
>
> Thanks,
>
> Sean
>