Re: upgrading OpenStack (was: salsa.debian.org partially down)

2019-08-19 Thread Thomas Goirand
On 8/18/19 8:37 AM, Bastian Blank wrote:
> OpenStack is nice if it is run by people with knowledge and enough
> resources. It is not longer nice if you need to upgrade and general
> maintain it.

As I just wrote on the wiki [1], upgrade through the lifecycle of Debian
stable is generally not a problem. What's painful is from one Debian
release to another (ie: upgrading a rabbitmq-server cluster for example,
is noticeably painful). Though you don't really *have* to upgrade, you
could migrate your workload to a new Debian release deployment.

Of course, you'd need more servers to do that, but at the same time,
after 3 years of production, you probably aren't against a refresh of
your infrastructure.

Cheers,

Thomas Goirand (zigo)

[1] https://wiki.debian.org/OpenStack#Upgrading_OpenStack_in_Debian



Re: salsa.debian.org partially down

2019-08-18 Thread Bastian Blank
On Fri, Aug 16, 2019 at 01:35:58PM +0800, Aron Xu wrote:
>   even we can have its own Openstack
> installation if the team really like it.

You really want to maintain an OpenStack installation?  Have you done
so already?  Have you done upgrades of this installation for at least
two years (one Debian cycle)?

OpenStack is nice if it is run by people with knowledge and enough
resources.  It is not longer nice if you need to upgrade and general
maintain it.

Bastian

-- 
Many Myths are based on truth
-- Spock, "The Way to Eden",  stardate 5832.3



Re: salsa.debian.org partially down

2019-08-17 Thread Hector Oron
Hello,

Missatge de Thomas Goirand  del dia dg., 18 d’ag.
2019 a les 0:15:

> I'm all for it. The DSA team can use my installer, who's in Buster [1].

Great! DSA will evaluate such option now that it is in Buster.

> But last time I checked, the DSA team refused both my help and the idea
> to run OpenStack.

Last time was 4 or 5 years ago, DSA never refused your help but
objected to give away admin rights to run services per DSA policy.

Regards,
-- 
 Héctor Orón  -.. . -... .. .- -.   -.. . ...- . .-.. --- .--. . .-.



Re: Gitlab support in Zuul (was: salsa.debian.org partially down)

2019-08-17 Thread Thomas Goirand
Hi Jeremy,

-Off-list-

thanks for all the valuable info.

On 8/16/19 1:23 PM, Jeremy Stanley wrote:
> I know the Zuul community would welcome a
> driver for Gitlab, but that's unlikely to materialize unless people
> who want to use Zuul and Gitlab together write and contribute it.

I did read about at least some intention to add Zull support for Gitlab
from a few month ago, but also didn't see anything coming. It'd be
awesome if it one day materializes.

> I'm simply glad to see
> increasing uptake of automated testing in Debian relying on
> free/libre open source software, but have no interest in viewing
> choice between these solutions as a competition.
+1

What drives me here is adding more freeness in our Gitlab CI stuff,
which is using the non-free GCE. The only solution that I know of that
would be completely free would be Zuul and the way the OpenStack CI runs.

I very much love social aspect were the OpenStack CI runs on donated
compute power. I'd love to see the same model in Debian.

I also completely hate the fact Salsa is becoming more and more coupled
with Google. First with built artifact storage, and now with the CI.

Cheers,

Thomas Goirand (zigo)



Re: salsa.debian.org partially down

2019-08-17 Thread Thomas Goirand
On 8/16/19 7:35 AM, Aron Xu wrote:
> On Thu, Aug 15, 2019 at 6:23 PM Thomas Goirand  wrote:
>> [...]
>> 1/ CI Jobs going faster
>> 2/ Have a way more workers
>> 3/ Don't have all our eggs on the same Google basket
>> 4/ Use free software platforms instead of GCE
>> 5/ Stop being bound to a single VM provider [1]
>>
> 
> Would it be more attractive to use dedicated hardware for such a long
> running, resource hungry service? We already have quite some servers
> running at our hosting partners' places, adding a few more won't be a
> hugh problem. Doing so will make the service a lot more reliable and
> flexible in the long run, even we can have its own Openstack
> installation if the team really like it.
> 
> Regards,
> Aron

I'm all for it. The DSA team can use my installer, who's in Buster [1].
But last time I checked, the DSA team refused both my help and the idea
to run OpenStack.

Thomas Goirand (zigo)

[1] https://packages.debian.org/openstack-cluster-installer



Re: salsa.debian.org partially down

2019-08-17 Thread Ian Jackson
gregor herrmann writes ("Re: salsa.debian.org partially down"):
> Ack. And that's all I wanted to say: that I found Ian's term "foolish
> user action" inappropriate in this case of an accidental DOS.

Err, yes.  I retract that comment.  I had misunderstood what had
occurred.

Sorry,
Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Re: salsa.debian.org partially down

2019-08-16 Thread gregor herrmann
On Fri, 16 Aug 2019 10:30:07 +0200, Alexander Wirt wrote:

> > > On Fri, 16 Aug 2019, gregor herrmann wrote:
> > > > I don't see any reason for calling this action itself foolish.

> And as I said, you will always find ways to ddos a service (that does not
> mean the user was foolish, intended to do it or is even to blame for that). 

Ack. And that's all I wanted to say: that I found Ian's term "foolish
user action" inappropriate in this case of an accidental DOS.


Cheers,
gregor

-- 
 .''`.  https://info.comodo.priv.at -- Debian Developer https://www.debian.org
 : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D  85FA BB3A 6801 8649 AA06
 `. `'  Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe
   `-   NP: U2: Everlasting Love


signature.asc
Description: Digital Signature


Gitlab support in Zuul (was: salsa.debian.org partially down)

2019-08-16 Thread Jeremy Stanley
On 2019-08-15 12:22:58 +0200 (+0200), Thomas Goirand wrote:
[...]
> I've read that this year, there was some efforts to add Gitlab
> support to Zuul. I don't know what the status is though, but I
> know that it's possible to add Gerrit in front, and then have
> Gerrit plugged to Zuul.
[...]

While several Gitlab users have expressed an interest in writing a
connection driver to be able to actively integrate Zuul, I'm aware
of no actual work started to contribute this yet. For the moment
Zuul only integrates with Gerrit, Pagure and Github (in addition to
its generic Git driver), with support for Bitbucket now very close
to being finished as well. I know the Zuul community would welcome a
driver for Gitlab, but that's unlikely to materialize unless people
who want to use Zuul and Gitlab together write and contribute it.

The Debian community went all-in on Gitlab, and that has included
increasing reliance on Gitlab's built-in CI solution. Even though
I'm an upstream Zuul maintainer, I'm not going to advocate for it
here, or indeed for any particular solution, as I'm a firm believer
in projects using what works best for them. I'm simply glad to see
increasing uptake of automated testing in Debian relying on
free/libre open source software, but have no interest in viewing
choice between these solutions as a competition. When any one free
software solution wins, we all win.
-- 
Jeremy Stanley


signature.asc
Description: PGP signature


Re: salsa.debian.org partially down

2019-08-16 Thread Raphael Hertzog
On Fri, 16 Aug 2019, Alexander Wirt wrote:
> I am a bit surprised, from the first day on we said that there are limited
> ressources for ci and that you should be nice to the service. Thats even
> documented: 
> 
> "We mean that. Really. Be nice to the server. At some point in the future we 
> hope to add some dedicated Runners servers - Sponsors welcome! ;)"

And in the same message, you mention that you are looking for sponsors to
get more resources... so you clearly did not shut the door to grow the
service and I thank you for that.

I understand it's not always as easy as throwing additional resources on
it... otherwise it would not be a problem. You would not have too much
trouble to get more resources.

I saw an offer in this thread
(CAMr=8w4ArFb_rdAqr60U5zYuqwEFLbvGgj=scxflug9yfh+...@mail.gmail.com from
Aron Xu) and I also saw the discussion on IRC
suggesting that we should spend the money we got donated to actually buy
hardware we need. And I'm sure that a public call for sponsors would
get some significant results as well.

> And we mean it that way, so don't be surprised if we tell you that you
> overload things.

I'm not surprised, I learnt to expect it from you ;-)

> We are always improving things, but anyhow, there are
> limits - as it is for every other service within debian. 

Indeed.

Cheers,
-- 
Raphaël Hertzog ◈ Debian Developer

Support Debian LTS: https://www.freexian.com/services/debian-lts.html
Learn to master Debian: https://debian-handbook.info/get/


signature.asc
Description: PGP signature


Re: salsa.debian.org partially down

2019-08-16 Thread Alexander Wirt
On Fri, 16 Aug 2019, Raphael Hertzog wrote:

> Hi,
> 
> On Fri, 16 Aug 2019, gregor herrmann wrote:
> > From what I know, this what not a "foolish user action" but an action
> > by a dedicated maintainer who enabled salsa-ci for all packages
> > ("projects") of a specific team; so they used a service advertised by
> > the salsa and salsa-ci teams. That this service doesn't work as
> > advertised or at least doesn't work for the amount of packages a
> > medium-sized team might have is deplorable and needs some action but
> > I don't see any reason for calling this action itself foolish.
> 
> +1
> 
> FWIW, I did the same for Kali linux on 500 packages which are
> hosted on https://gitlab.com/kalilinux/packages/ and it generated that many
> pipelines as well without any significant issue.
> 
> Obviously, gitlab.com has certainly much more resources behind it
> than salsa but I believe that we should be able to just do that without
> bringing salsa to its knees. It's quite common to do mass-update to many
> repositories with "mr" and that would generate just as many pipelines
> too.
> 
> I understand that the Salsa admins will have to find ways to grow the
> available resources and so far they did a very good job on this level,
> (except the part where they always express their grumpyness in a way that
> is hostile to many users) so I'm confident that they will find solutions.
> 
> They already moved most of the work to external Google VM to make the service
> scale (at the start it was running entirely on the few dedicated
> runners). Same for storage of many artifacts/log files.
> 
> When we looked into replacing FusionForge, GitLab was not necessarily
> their preference (at least for formorer IIRC) but they listened to the
> feedback from DD on this level and it's pretty clear (at least to me)
> that the GitLab CI features are the reason why many DD voted for GitLab.
> 
> So, indeed, we should not blame users because they enable CI, we selected
> GitLab because of those features.
> 
> In summary: thank you Salsa admins and keep up the good work! (And try to be
> less grumpy)
I am a bit surprised, from the first day on we said that there are limited
ressources for ci and that you should be nice to the service. Thats even
documented: 

"We mean that. Really. Be nice to the server. At some point in the future we 
hope to add some dedicated Runners servers - Sponsors welcome! ;)"

And we mean it that way, so don't be surprised if we tell you that you
overload things. We are always improving things, but anyhow, there are
limits - as it is for every other service within debian. 

So please, please don't tell me what you expect and so on. Just be happy that
it works so well.

Alex



signature.asc
Description: PGP signature


Re: salsa.debian.org partially down

2019-08-16 Thread Raphael Hertzog
Hi,

On Fri, 16 Aug 2019, gregor herrmann wrote:
> From what I know, this what not a "foolish user action" but an action
> by a dedicated maintainer who enabled salsa-ci for all packages
> ("projects") of a specific team; so they used a service advertised by
> the salsa and salsa-ci teams. That this service doesn't work as
> advertised or at least doesn't work for the amount of packages a
> medium-sized team might have is deplorable and needs some action but
> I don't see any reason for calling this action itself foolish.

+1

FWIW, I did the same for Kali linux on 500 packages which are
hosted on https://gitlab.com/kalilinux/packages/ and it generated that many
pipelines as well without any significant issue.

Obviously, gitlab.com has certainly much more resources behind it
than salsa but I believe that we should be able to just do that without
bringing salsa to its knees. It's quite common to do mass-update to many
repositories with "mr" and that would generate just as many pipelines
too.

I understand that the Salsa admins will have to find ways to grow the
available resources and so far they did a very good job on this level,
(except the part where they always express their grumpyness in a way that
is hostile to many users) so I'm confident that they will find solutions.

They already moved most of the work to external Google VM to make the service
scale (at the start it was running entirely on the few dedicated
runners). Same for storage of many artifacts/log files.

When we looked into replacing FusionForge, GitLab was not necessarily
their preference (at least for formorer IIRC) but they listened to the
feedback from DD on this level and it's pretty clear (at least to me)
that the GitLab CI features are the reason why many DD voted for GitLab.

So, indeed, we should not blame users because they enable CI, we selected
GitLab because of those features.

In summary: thank you Salsa admins and keep up the good work! (And try to be
less grumpy)

Cheers,
-- 
Raphaël Hertzog ◈ Debian Developer

Support Debian LTS: https://www.freexian.com/services/debian-lts.html
Learn to master Debian: https://debian-handbook.info/get/


signature.asc
Description: PGP signature


Re: salsa.debian.org partially down

2019-08-16 Thread Alexander Wirt
On Fri, 16 Aug 2019, Daniel Leidert wrote:

> Am Freitag, den 16.08.2019, 08:58 +0200 schrieb Alexander Wirt:
> > On Fri, 16 Aug 2019, gregor herrmann wrote:
> 
> [..]
> > > From what I know, this what not a "foolish user action" but an action
> > > by a dedicated maintainer who enabled salsa-ci for all packages
> > > ("projects") of a specific team; so they used a service advertised by
> > > the salsa and salsa-ci teams. That this service doesn't work as
> > > advertised or at least doesn't work for the amount of packages a
> > > medium-sized team might have is deplorable and needs some action but
> > > I don't see any reason for calling this action itself foolish.
> > 
> > All our services have somewhat limited ressources and need to work for all
> > developers. To be honest: I do expect from everyone, using any of our
> > (debian) services, to think before doing things like that. Be it sending a
> > few thousand mails to our mailing lists, creating a few thousand jobs on
> > salsa, uploading a few thousand packages, creating a few thousand bugs. 
> 
> I'm with Gregor on this one. I can expect our services not to be forced into
> their knees, just be increasing the workload (hint: queue).
> 
> Gitlab has a configuration to limit how many jobs can be run concurrently. The
> user's action might not have been optimal, but it has shown, that the current
> setting is not optimal either. So I think it's best, not to blame the user, 
> but
> adjust our Gitlab configuration, announce the changes, point out how to skip
> pipelines [1] and move on.
And as I said, you will always find ways to ddos a service (that does not
mean the user was foolish, intended to do it or is even to blame for that). 

But yeah go ahead: write documentation. 

Alex


signature.asc
Description: PGP signature


Re: salsa.debian.org partially down

2019-08-16 Thread Daniel Leidert
Am Freitag, den 16.08.2019, 08:58 +0200 schrieb Alexander Wirt:
> On Fri, 16 Aug 2019, gregor herrmann wrote:

[..]
> > From what I know, this what not a "foolish user action" but an action
> > by a dedicated maintainer who enabled salsa-ci for all packages
> > ("projects") of a specific team; so they used a service advertised by
> > the salsa and salsa-ci teams. That this service doesn't work as
> > advertised or at least doesn't work for the amount of packages a
> > medium-sized team might have is deplorable and needs some action but
> > I don't see any reason for calling this action itself foolish.
> 
> All our services have somewhat limited ressources and need to work for all
> developers. To be honest: I do expect from everyone, using any of our
> (debian) services, to think before doing things like that. Be it sending a
> few thousand mails to our mailing lists, creating a few thousand jobs on
> salsa, uploading a few thousand packages, creating a few thousand bugs. 

I'm with Gregor on this one. I can expect our services not to be forced into
their knees, just be increasing the workload (hint: queue).

Gitlab has a configuration to limit how many jobs can be run concurrently. The
user's action might not have been optimal, but it has shown, that the current
setting is not optimal either. So I think it's best, not to blame the user, but
adjust our Gitlab configuration, announce the changes, point out how to skip
pipelines [1] and move on.

[1] https://docs.gitlab.com/ee/ci/yaml/#skipping-jobs

Regards, Daniel


signature.asc
Description: This is a digitally signed message part


Re: salsa.debian.org partially down

2019-08-16 Thread Alexander Wirt
On Fri, 16 Aug 2019, gregor herrmann wrote:

> On Wed, 14 Aug 2019 08:39:22 +0100, Ian Jackson wrote:
> 
> > Alexander Wirt writes ("Re: salsa.debian.org partially down"):
> > > It is already recovered. We will investigate where we can extend the
> > > ressources. But some misusages (like requesting >1300 merge requests via 
> > > API
> > > on a big project, that in consequence run >1300 ci jobs, that...) can't be
> > > solved regardless on how many resources we add. 
> > Thanks for the reports from you and Bastian.  Thanks also for having
> > the energy and effort to deal with this kind of thing.  It's annoying
> > when a thing you're responsible for breaks because of foolish user
> > action, and then you have to scramble to fix it.
> 
> From what I know, this what not a "foolish user action" but an action
> by a dedicated maintainer who enabled salsa-ci for all packages
> ("projects") of a specific team; so they used a service advertised by
> the salsa and salsa-ci teams. That this service doesn't work as
> advertised or at least doesn't work for the amount of packages a
> medium-sized team might have is deplorable and needs some action but
> I don't see any reason for calling this action itself foolish.

All our services have somewhat limited ressources and need to work for all
developers. To be honest: I do expect from everyone, using any of our
(debian) services, to think before doing things like that. Be it sending a
few thousand mails to our mailing lists, creating a few thousand jobs on
salsa, uploading a few thousand packages, creating a few thousand bugs. 

Alex



signature.asc
Description: PGP signature


Re: salsa.debian.org partially down

2019-08-15 Thread Aron Xu
On Thu, Aug 15, 2019 at 6:23 PM Thomas Goirand  wrote:
> [...]
> 1/ CI Jobs going faster
> 2/ Have a way more workers
> 3/ Don't have all our eggs on the same Google basket
> 4/ Use free software platforms instead of GCE
> 5/ Stop being bound to a single VM provider [1]
>

Would it be more attractive to use dedicated hardware for such a long
running, resource hungry service? We already have quite some servers
running at our hosting partners' places, adding a few more won't be a
hugh problem. Doing so will make the service a lot more reliable and
flexible in the long run, even we can have its own Openstack
installation if the team really like it.

Regards,
Aron



Re: salsa.debian.org partially down

2019-08-15 Thread gregor herrmann
On Wed, 14 Aug 2019 08:39:22 +0100, Ian Jackson wrote:

> Alexander Wirt writes ("Re: salsa.debian.org partially down"):
> > It is already recovered. We will investigate where we can extend the
> > ressources. But some misusages (like requesting >1300 merge requests via API
> > on a big project, that in consequence run >1300 ci jobs, that...) can't be
> > solved regardless on how many resources we add. 
> Thanks for the reports from you and Bastian.  Thanks also for having
> the energy and effort to deal with this kind of thing.  It's annoying
> when a thing you're responsible for breaks because of foolish user
> action, and then you have to scramble to fix it.

From what I know, this what not a "foolish user action" but an action
by a dedicated maintainer who enabled salsa-ci for all packages
("projects") of a specific team; so they used a service advertised by
the salsa and salsa-ci teams. That this service doesn't work as
advertised or at least doesn't work for the amount of packages a
medium-sized team might have is deplorable and needs some action but
I don't see any reason for calling this action itself foolish.
 

Cheers,
gregor

-- 
 .''`.  https://info.comodo.priv.at -- Debian Developer https://www.debian.org
 : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D  85FA BB3A 6801 8649 AA06
 `. `'  Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe
   `-   NP: Janis Joplin: Cry Baby (live)


signature.asc
Description: Digital Signature


How to use subjects (was: Re: salsa.debian.org partially down)

2019-08-15 Thread Bastian Blank
Hi Thomas

On Thu, Aug 15, 2019 at 12:22:58PM +0200, Thomas Goirand wrote:
> I probably should have mentioned that my remark was not so much related
> to the crash, but more to what I experienced using Salsa's CI.

If your remarks don't relate to the subject, please start a new thread,
or at least properly change the subject.

Regards,
Bastian

-- 
Dismissed.  That's a Star Fleet expression for, "Get out."
-- Capt. Kathryn Janeway, Star Trek: Voyager, "The Cloud"



Re: salsa.debian.org partially down

2019-08-15 Thread Thomas Goirand
On 8/15/19 11:03 AM, Ian Jackson wrote:
> (off-list)
> 
> Ian.

You replied off-list, so I wont quote you, though I would like my answer
to be public.

I probably should have mentioned that my remark was not so much related
to the crash, but more to what I experienced using Salsa's CI. It's
currently super slow, as apparently, it's using GCE using the smallest
instance possible (maybe because we have a limited credit?). As a
result, adding the salsa CI team's script just fails in some packages
like for example OpenVSwitch, because of the slowness. In the extreme
case of OpenVSwitch, during build, mostly all tests are timing-out. I
found the overall experience frustrating, and the Salsa CI not very
helpful because too slow.

So, instead of what I experienced, I thought using Zuul + Nodepool like
in the OpenStack CI would probably solve the slowness and resource
starvation issues. What's done there, is that OpenStack based cloud
providers are sponsoring a pool of VMs (. They are maintained using
nodepool, that "pre-provision" the VMs. That's a bit like with apache
prefork, where processes are spawned in advance to be ready for the next
connections, except here, we're talking about VMs ready to accept CI
jobs. Then Zuul does the job queue and scheduling. Overall, this is a
very efficient CI, which is able to spawn thousands of jobs, on many
different cloud providers.

I'm not sure exactly what the Salsa issue was. Whatever it was, I
strongly believe that Zuul + Nodepool would help:

1/ CI Jobs going faster
2/ Have a way more workers
3/ Don't have all our eggs on the same Google basket
4/ Use free software platforms instead of GCE
5/ Stop being bound to a single VM provider [1]

Not only OpenStack uses Zuul, big companies like Netflix and MediaWiki
foundations too.

I've read that this year, there was some efforts to add Gitlab support
to Zuul. I don't know what the status is though, but I know that it's
possible to add Gerrit in front, and then have Gerrit plugged to Zuul.
I've used Gerrit previously to do OpenStack packaging (the version of
OpenStack in Stretch was done this way), and I was very happy of it. I'd
love if I could have the opportunity to do this again, but with Salsa
and the Debian infra this time.

Cheers,

Thomas Goirand (zigo)

P.S: If you want an idea of what it is like with nodepool in action, go
here: http://grafana.openstack.org then click on "Home" and select
nodepool. There you'll see graphs of spawned instances.

[1] The OpenStack CI is currently configured to use VMs from:
Fortnebula, OVH (2 regions), Rackspace (2 regions), and Vexxhost, though
it used to have a large amount of sponsors.



Re: salsa.debian.org partially down

2019-08-14 Thread Bastian Blank
On Wed, Aug 14, 2019 at 10:53:56AM +0200, Thomas Goirand wrote:
> On 8/13/19 1:59 PM, Alexander Wirt wrote:
> > It is already recovered. We will investigate where we can extend the
> > ressources. But some misusages (like requesting >1300 merge requests via API
> > on a big project, that in consequence run >1300 ci jobs, that...) can't be
> > solved regardless on how many resources we add. 
> It'd be nice if Gitlab could be plugged into something like Zuul [1],
> and then we'd use nodepool, which would handle this type of load.

Please describe the problem to solve first.  Because formorer only
talked about what triggered the problem, not what part of the whole
system actually broke.

(Hint: it's not the number of jobs, or at least not directly.  And even
a different CI platform could not fix an overload that a different
system brought to it's knees.)

Regards,

-- 
Conquest is easy. Control is not.
-- Kirk, "Mirror, Mirror", stardate unknown



Re: salsa.debian.org partially down

2019-08-14 Thread Thomas Goirand
On 8/13/19 1:59 PM, Alexander Wirt wrote:
> It is already recovered. We will investigate where we can extend the
> ressources. But some misusages (like requesting >1300 merge requests via API
> on a big project, that in consequence run >1300 ci jobs, that...) can't be
> solved regardless on how many resources we add. 
> 
> Alex

It'd be nice if Gitlab could be plugged into something like Zuul [1],
and then we'd use nodepool, which would handle this type of load.

Cheers,

Thomas Goirand (zigo)

[1] https://zuul-ci.org/docs/zuul/



Re: salsa.debian.org partially down

2019-08-14 Thread Ian Jackson
Alexander Wirt writes ("Re: salsa.debian.org partially down"):
> It is already recovered. We will investigate where we can extend the
> ressources. But some misusages (like requesting >1300 merge requests via API
> on a big project, that in consequence run >1300 ci jobs, that...) can't be
> solved regardless on how many resources we add. 

Thanks for the reports from you and Bastian.  Thanks also for having
the energy and effort to deal with this kind of thing.  It's annoying
when a thing you're responsible for breaks because of foolish user
action, and then you have to scramble to fix it.

Maybe I'm teaching my grandmother to such eggs, but your message made
me want to suggest possible solutions/mitigations for the problem you
mention above.  Please feel free to disregard what follows.


I think the problem can be summarised/generalised as "someone makes
more requests to salsa than it has capacity to fulfil".

Traditional approaches to this include (mentioning all that I can
think of, even inappropriate or already-done ones; and, not knowing
what features gitlab has for this):

 * Per-user quotas.  (The kind of user who submits 1300 MRs might well
   react to a limit by creating more guest accounts...)

 * Per-project quotas.  (This avoids the above problem.  It
   ring-fences problems with poor contributor behaviour to the
   projects whose contributors are behaving poorly.)

 * Queuing jobs, so that the effect is contained (eg to the CI
   subsystem) until an administrator can cancel some jobs.  I think
   maybe earlier when Bastian wrote "It turns out that the configured
   amount of concurrency in CI builds can't be handled by the current
   available system resources" he was referring to a tuneable which
   would have the effect of queueing things, next time.  I guess
   you've adjusted this already.

 * Restricting resource-intensive actions to certain users.
   In our context this would seem to involve asking project
   maintainers to manually trigger CI on MRs.  That seems like it
   would be annoying and best avoided if we can.

 * Balkanising the system into multiple instances (perhaps with
   different configurations) so that each instance is exposed to a
   much smaller userbase.  I doubt we have the effort for this even if
   we could come up with a sensible division, and liked the idea.
   (One way to test the waters in this direction would be for someone
   to set up a competitor to salsa based on an entirely different
   management stack.)

 * Documentation, deterrence and punishment.  I mention this for
   completeness; given that we have so many users, and also offer
   guest accounts, this is not an appropriate strategy for salsa.

I hope that you find this message useful, rather than just a statement
of things which are mostly obvious and/or irrelevant.

Regards,
Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Re: salsa.debian.org partially down

2019-08-13 Thread Alexander Wirt
On Tue, 13 Aug 2019, Hector Oron wrote:

> Hello,
> 
> Missatge de Bastian Blank  del dia dt., 13 d’ag.
> 2019 a les 11:51:
> >
> > Hi folks
> >
> > salsa.debian.org is partially down for now.  Especially everything that
> > concerns Salsa CI.
> >
> > Someone decided to inject a few thousand CI jobs using Salsa CI
> > yesterday evening.  Since then the system is not longer at 50% load, but
> > at a 100%.  It turns out that the configured amount of concurrency in CI
> > builds can't be handled by the current available system resources.
> >
> > This now affects all user access to salsa.debian.org.  So we disabled
> > access to all the Salsa CI stuff for now, which will make everything
> > using it fail.
> >
> > We have to discuss first how we can go forward.
> 
> Those are very bad news. I hope that can be recovered soon. If you
> need more resources to be able to handle more jobs or longer timeouts,
> please let DPL know so more resources can be added.
It is already recovered. We will investigate where we can extend the
ressources. But some misusages (like requesting >1300 merge requests via API
on a big project, that in consequence run >1300 ci jobs, that...) can't be
solved regardless on how many resources we add. 

Alex
 



Re: salsa.debian.org partially down

2019-08-13 Thread Hector Oron
Hello,

Missatge de Bastian Blank  del dia dt., 13 d’ag.
2019 a les 11:51:
>
> Hi folks
>
> salsa.debian.org is partially down for now.  Especially everything that
> concerns Salsa CI.
>
> Someone decided to inject a few thousand CI jobs using Salsa CI
> yesterday evening.  Since then the system is not longer at 50% load, but
> at a 100%.  It turns out that the configured amount of concurrency in CI
> builds can't be handled by the current available system resources.
>
> This now affects all user access to salsa.debian.org.  So we disabled
> access to all the Salsa CI stuff for now, which will make everything
> using it fail.
>
> We have to discuss first how we can go forward.

Those are very bad news. I hope that can be recovered soon. If you
need more resources to be able to handle more jobs or longer timeouts,
please let DPL know so more resources can be added.

Regards,
-- 
 Héctor Orón  -.. . -... .. .- -.   -.. . ...- . .-.. --- .--. . .-.