Re: How much is too much data in an svn repository?

2022-09-23 Thread Jeffrey Walton
On Thu, Sep 22, 2022 at 3:59 PM Sean McBride  wrote:
>
> Our svn repo is about 110 GB for a full checkout. Larger on the server of 
> course, with all history, weighting about 142 GB.
>
> There haven't been any performance issues, it's working great.
>
> But now some users are interested in committing an additional 200 GB of 
> mostly large binary files.
>
> I worry about it becoming "too big".  At what point does that happen?  
> Terabytes?  Petabytes?  100s of GB?

I've never encountered a problem with "too big," but I have
encountered problems with binary file types causing an SVN client or
server to hang. I experienced it back in 2012 or 2013 on a very large
collection of repos. I tried to check out/clone and the operation
would hang about 6 or 8 hours into the operation.

Through trial and error we discovered a developer had checked-in
object files from an XCode build, and the SVN client or server would
hang on the object files. I don't recall if it was all object files,
or just a particular one. As an added twist, I think we were using
TortoiseSVN on Windows. So it may have been a bad interaction with
TortoiseSVN on Windows. Once we manually deleted object files the
check-out/clone proceeded.

I don't know if that would happen nowadays.

Jeff


Re: How much is too much data in an svn repository?

2022-09-23 Thread Nico Kadel-Garcia
On Fri, Sep 23, 2022 at 7:43 AM Mark Phippard  wrote:
>
> On Thu, Sep 22, 2022 at 3:59 PM Sean McBride  wrote:
> >
> > Hi all,
> >
> > Our svn repo is about 110 GB for a full checkout. Larger on the server of 
> > course, with all history, weighting about 142 GB.
> >
> > There haven't been any performance issues, it's working great.
> >
> > But now some users are interested in committing an additional 200 GB of 
> > mostly large binary files.
> >
> > I worry about it becoming "too big".  At what point does that happen?  
> > Terabytes?  Petabytes?  100s of GB?
>
> Assuming you have the disk space then there is no real upper limit.

There are practical limits. The number of file descriptors for years
or decades of irrelevant history accumulate. Bulky accidental commits,
such as large binary objects, accumulate and create burdens for backup
or high availability. And keeping around old tags that haven't been
used in years encourages re-introducing obsolete API's or errors, or
re-introduce security flaws.

> That said ... do not discount the administrative burden. Are you
> backing up your repository? Whether using dump/load, svnsync or
> hotcopy the bigger the repository the more of a burden it will be on
> these tools.
>
> If this is just about storing binary files why not consider solutions
> that were meant for that like an object storage platform like S3 or
> minio or a package manager like Maven, Nuget etc.
>
> A big negative of Subversion repositories is you cannot ever delete
> anything. Do you really need to keep all these binaries forever?
>
> Mark


Re: How much is too much data in an svn repository?

2022-09-23 Thread Nathan Hartman
On Thu, Sep 22, 2022 at 3:59 PM Sean McBride  wrote:
>
> Hi all,
>
> Our svn repo is about 110 GB for a full checkout. Larger on the server of 
> course, with all history, weighting about 142 GB.
>
> There haven't been any performance issues, it's working great.
>
> But now some users are interested in committing an additional 200 GB of 
> mostly large binary files.
>
> I worry about it becoming "too big".  At what point does that happen?  
> Terabytes?  Petabytes?  100s of GB?
>
> Thanks,
>
> Sean

It occurs to me that we don't have a FAQ or other easy-to-find
documentation on maximums, such as the maximum file size, etc.

The largest publicly-accessible SVN repository of which I am aware is
the Apache.org one in which Subversion's own sources (as well as those
of numerous other projects) are housed. This repository contains
approximately 1.9 million revisions. According to [1] the dump of this
repository expands to over 65 gigabytes.

But that seems to be a drop in the ocean when Aleksa writes:

On Fri, Sep 23, 2022 at 3:45 AM Aleksa Todorović  wrote:
> I can confirm that Subversion can handle repositories with 100,000+ 
> revisions, size of committed files ranging from few bytes to several GBs, and 
> total repo size of up to 20TB.

It is possible that others here are aware of even larger repositories.

My biggest concern mirrors what Mark said about administrative burden:
the size of backups and the time it takes to make them. Mark addressed
that point quite well. Whatever you do, you must have good backups!
(My $dayjob does backups 3 different ways: the filesystem on which the
repository is stored is backed up regularly. In addition we take
periodic 'hotcopy' backups, and periodic full 'dump' backups.
Obviously as a repository grows, this takes longer and requires more
storage.

[1] http://svn-dump.apache.org

Cheers,
Nathan


Re: How much is too much data in an svn repository?

2022-09-23 Thread Daniel Sahlberg
Hi,

In addition to all other responses, I'd like to advertise the "pristines on
demand" feature that got some traction in the spring.

Subversion is normally storing all files twice on the client side (in the
"working copy": once for the actual file and once as a "pristine", ie as
the file was when checking out, in the .svn folder). The idea with
"prisines on demand" is to store the file only once, on the expense of some
operations requiring more bandwidth. I'm not sure about the status, but it
is not part of any current release yet. Karl Fogel and Julian Foad was
involved in this, more details can be found in the list archives on the
d...@subversion.apache.org list.

Kind regards,
Daniel



Den tors 22 sep. 2022 kl 21:59 skrev Sean McBride :

> Hi all,
>
> Our svn repo is about 110 GB for a full checkout. Larger on the server of
> course, with all history, weighting about 142 GB.
>
> There haven't been any performance issues, it's working great.
>
> But now some users are interested in committing an additional 200 GB of
> mostly large binary files.
>
> I worry about it becoming "too big".  At what point does that happen?
> Terabytes?  Petabytes?  100s of GB?
>
> Thanks,
>
> Sean
>


Re: How much is too much data in an svn repository?

2022-09-23 Thread Graham Leggett via users
On 23 Sep 2022, at 13:42, Mark Phippard  wrote:

> A big negative of Subversion repositories is you cannot ever delete
> anything. Do you really need to keep all these binaries forever?

In our regulated world that is an important feature.

Once the repos get too big we start new ones. In the meantime, there is no such 
thing as “we did fraud^H^H^H^H^H a delete to make space”.

Regards,
Graham
—



Re: How much is too much data in an svn repository?

2022-09-23 Thread Mark Phippard
On Thu, Sep 22, 2022 at 3:59 PM Sean McBride  wrote:
>
> Hi all,
>
> Our svn repo is about 110 GB for a full checkout. Larger on the server of 
> course, with all history, weighting about 142 GB.
>
> There haven't been any performance issues, it's working great.
>
> But now some users are interested in committing an additional 200 GB of 
> mostly large binary files.
>
> I worry about it becoming "too big".  At what point does that happen?  
> Terabytes?  Petabytes?  100s of GB?

Assuming you have the disk space then there is no real upper limit.

That said ... do not discount the administrative burden. Are you
backing up your repository? Whether using dump/load, svnsync or
hotcopy the bigger the repository the more of a burden it will be on
these tools.

If this is just about storing binary files why not consider solutions
that were meant for that like an object storage platform like S3 or
minio or a package manager like Maven, Nuget etc.

A big negative of Subversion repositories is you cannot ever delete
anything. Do you really need to keep all these binaries forever?

Mark


Re: How much is too much data in an svn repository?

2022-09-23 Thread Graham Leggett via users
On 22 Sep 2022, at 21:59, Sean McBride  wrote:

> Our svn repo is about 110 GB for a full checkout. Larger on the server of 
> course, with all history, weighting about 142 GB.
> 
> There haven't been any performance issues, it's working great.
> 
> But now some users are interested in committing an additional 200 GB of 
> mostly large binary files.
> 
> I worry about it becoming "too big".  At what point does that happen?  
> Terabytes?  Petabytes?  100s of GB?

From experience it becomes too big when the underlying disk gets full. As long 
as your underlying disks can handle it, it works fine.

I use SVN for versioned incremental backups of files in the 0.5GB range. I’ve 
seen reports of others checking in multi GB files as backups with no trouble.

Best thing to do is to physically try it. Make a copy of your repo, then try 
check things into it, and see where your issues are.

Regards,
Graham
—



Re: How much is too much data in an svn repository?

2022-09-23 Thread Aleksa Todorović
Hi all,

I can confirm that Subversion can handle repositories with 100,000+
revisions, size of committed files ranging from few bytes to several GBs,
and total repo size of up to 20TB. Speed issues that I'm seeing are mostly
related to hard drive operations, but do not prevent efficient work. The
only very noticeable speed issues are on commits with thousands (those
happen from time to time) on client side, where it takes lot of time to
commit (all those files need to be compared by content), but also update
(there is always a copy of each file in .svn directory). Outside of that,
Subversion performs really well.

Hope this helps.

Regards,
Aleksa


On Fri, Sep 23, 2022 at 9:33 AM Justin MASSIOT | Zentek <
justin.mass...@zentek.fr> wrote:

> Hello Sean,
>
> I have not enough experience to answer your question, but I'm very
> concerned about large binary files. Whereas I have a more "splitted"
> structure of repositories.
> I'm following this discussion ;-) Can anyone bring some inputs on this
> topic?
>
> Justin MASSIOT  |  Zentek
>
>
> On Thu, 22 Sept 2022 at 21:59, Sean McBride 
> wrote:
>
>> Hi all,
>>
>> Our svn repo is about 110 GB for a full checkout. Larger on the server of
>> course, with all history, weighting about 142 GB.
>>
>> There haven't been any performance issues, it's working great.
>>
>> But now some users are interested in committing an additional 200 GB of
>> mostly large binary files.
>>
>> I worry about it becoming "too big".  At what point does that happen?
>> Terabytes?  Petabytes?  100s of GB?
>>
>> Thanks,
>>
>> Sean
>>
>


Re: How much is too much data in an svn repository?

2022-09-23 Thread Justin MASSIOT | Zentek
Hello Sean,

I have not enough experience to answer your question, but I'm very
concerned about large binary files. Whereas I have a more "splitted"
structure of repositories.
I'm following this discussion ;-) Can anyone bring some inputs on this
topic?

Justin MASSIOT  |  Zentek


On Thu, 22 Sept 2022 at 21:59, Sean McBride  wrote:

> Hi all,
>
> Our svn repo is about 110 GB for a full checkout. Larger on the server of
> course, with all history, weighting about 142 GB.
>
> There haven't been any performance issues, it's working great.
>
> But now some users are interested in committing an additional 200 GB of
> mostly large binary files.
>
> I worry about it becoming "too big".  At what point does that happen?
> Terabytes?  Petabytes?  100s of GB?
>
> Thanks,
>
> Sean
>


Re: Using IIS as a reverse proxy in front of Apache/SVN

2022-09-23 Thread Daniel Sahlberg
I'm following up on an old e-mail [1] on how to use IIS as a reverse proxy
in front of Apache Subversion.

Previously I found found a way to use the URL Rewrite module to forward
requests to mod_dav_svn. This was working fine until I tried to access a
file with a "+" encoded in the filename.

[[[
$ svn log "https://svn.example.com/svn/repo/file with + in filename.txt"

svn: E170013: Unable to connect to a repository at URL '
https://svn.example.com/svn/repo/file%20with%20+%20in%20filename.txt'

svn: E160013: '/svn/repo/file%20with%20+%20in%20filename.txt' path not found

$
]]]

It turns out that IIS will not accept requests with "+" and will reply with
http error 404.11 instead of rewriting the request and forwarding to
mod_dav_svn.

A similar problem is described here [2] and IIS can be configured to
"allowDoubleEscaping" [3]. With this configuration changed everything
worked as expected.

The solution was found by supp...@visualsvn.com (we use their software
stack on the server).

The relevant parts of web.config can be found below.

Kind regards,
Daniel Sahlberg


[1] https://lists.apache.org/thread/bcx15smyth43v1t1vvhqnc8bhxt5b5kd
[2] https://github.com/go-gitea/gitea/issues/10236
[3]
https://learn.microsoft.com/en-us/iis/configuration/system.webserver/security/requestfiltering/#attributes

[[[









https://{HTTP_HOST}{REQUEST_URI};
/>







http://{C:1}; />

http://127.0.0.1:81/{R:0};
logRewrittenUrl="true" />




http://127.0.0.1:81/{R:0};
logRewrittenUrl="true" />







]]]