Re: [Gluster-infra] Download.gluster.org 27 April 2016 postmortem

2016-04-27 Thread Amye Scavarda
On Wed, Apr 27, 2016 at 10:24 AM, Mike Hulsman  wrote:

>
> Quoting Kaushal M :
>
> On Wed, Apr 27, 2016 at 5:21 PM, Michael Scherer 
>> wrote:
>>
>>> Le mercredi 27 avril 2016 à 14:39 +0300, Eyal Edri a écrit :
>>>
 Excellent post-mortem!

 Do you think its worth adding mirrors to gluster repos like oVirt is
 doing?
 [1]

 [1]
 http://ovirt-infra-docs.readthedocs.org/en/latest/General/Mirror.html

>>>
>>> That could be a solution.
>>>
>>> But we have the ressources to host a mirror ourself in the DC, it just
>>> need a ip address, and a migration of servers (which is taking a awful
>>> lot of time to happen :/ ).
>>>
>>> One issue we would have with a mirror is on the download stats.
>>>
>>> This and the need to have a mirrorlist, not sure how that's done on
>>> dnf/yum side theses days.
>>>
>>>
>> Someone recently offered to mirror download.gluster.org (I need to dig
>> archives to find out who exactly). Didn't we take up their offer?
>>
> I offered to mirror gluster to ftp.nluug.nl
> We already mirror Ovirt for a while, and are happy to setup a mirror for
> gluster.
> Our bandwidth is 10Gb, and we are located in Amsterdam, the netherlands.
> I am happy to setup a mirror.
>
> Mike Hulsman
>
>
>>
>>> --
>>> Michael Scherer
>>> Sysadmin, Community Infrastructure and Platform, OSAS
>>>
>>>
I've reached out to our metrics team to see what happens to our download
metrics if we have a mirror, as being able to have accurate project metrics
is pretty important.

I'll let you know what solution they come up with and we'll move forward
from there.

- amye

-- 
Amye Scavarda | a...@redhat.com | Gluster Community Lead
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] [ovirt-users] [Attention needed] GlusterFS repository down - affects CI / Installations

2016-04-27 Thread Sandro Bonazzola
On Wed, Apr 27, 2016 at 11:09 AM, Niels de Vos  wrote:

> On Wed, Apr 27, 2016 at 02:30:57PM +0530, Ravishankar N wrote:
> > @gluster infra  - FYI.
> >
> > On 04/27/2016 02:20 PM, Nadav Goldin wrote:
> > >Hi,
> > >The GlusterFS repository became unavailable this morning, as a result
> all
> > >Jenkins jobs that use the repository will fail, the common error would
> be:
> > >
> > >
> http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/epel-7/noarch/repodata/repomd.xml
> :
> > >[Errno 14] HTTP Error 403 - Forbidden
> > >
> > >
> > >Also, installations of oVirt will fail.
>
> I thought oVirt moved to using the packages from the CentOS Storage SIG?
>

We did that for CentOS Virt SIG builds.
On oVirt upstream we're still on Gluster upstream.
We'll move to Storage SIG there as well.



> In any case, automated tests should probably use those instead of the
> packages on download.gluster.org. We're trying to minimize the work
> packagers need to do, and get the glusterfs and other components in the
> repositories that are provided by different distributions.
>
> For more details, see the quickstart for the Storage SIG here:
>   https://wiki.centos.org/SpecialInterestGroup/Storage/gluster-Quickstart
>
> HTH,
> Niels
>
> ___
> Users mailing list
> us...@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>


-- 
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] Download.gluster.org 27 April 2016 postmortem

2016-04-27 Thread Michael Scherer
Le mercredi 27 avril 2016 à 12:56 +0200, Michael Scherer a écrit :

> Potential improvement to make:
> - add monitoring on gluster side
> - use the centos sig repo on ovirt side
> - add more sysadmin for gluster
> - add a redundant service for that
>   - a 2nd download server with a shared gluster backend
> - migrate the storage to a proper setup with 1 single block device,
> rather than 2.

so I did the last item (did I said that lvm and pvmove kick ass ?)

I am looking on options for adding nagios since that do not need a
public ip so I could make it run on formicary.gluster.org until we can
do a proper setup (so no web interface, but just notification would be a
start)

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] Download.gluster.org 27 April 2016 postmortem

2016-04-27 Thread Michael Scherer
Le mercredi 27 avril 2016 à 17:35 +0530, Kaushal M a écrit :
> On Wed, Apr 27, 2016 at 5:26 PM, Kaushal M  wrote:
> > On Wed, Apr 27, 2016 at 5:21 PM, Michael Scherer  
> > wrote:
> >> Le mercredi 27 avril 2016 à 14:39 +0300, Eyal Edri a écrit :
> >>> Excellent post-mortem!
> >>>
> >>> Do you think its worth adding mirrors to gluster repos like oVirt is 
> >>> doing?
> >>> [1]
> >>>
> >>> [1] http://ovirt-infra-docs.readthedocs.org/en/latest/General/Mirror.html
> >>
> >> That could be a solution.
> >>
> >> But we have the ressources to host a mirror ourself in the DC, it just
> >> need a ip address, and a migration of servers (which is taking a awful
> >> lot of time to happen :/ ).
> >>
> >> One issue we would have with a mirror is on the download stats.
> >>
> >> This and the need to have a mirrorlist, not sure how that's done on
> >> dnf/yum side theses days.
> >>
> >
> > Someone recently offered to mirror download.gluster.org (I need to dig
> > archives to find out who exactly). Didn't we take up their offer?
> 
> The offer was made from nluug.nl [1]. The last mail in the thread on
> Mar 10[2], said the offer was still open, and we just needed to setup
> the sync.
> 
> Michael, did you just lose track of this with your long PTOs?

that's likely :)

(this and other fire that were more urgent to deal with)

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] Download.gluster.org 27 April 2016 postmortem

2016-04-27 Thread Michael Scherer
Le mercredi 27 avril 2016 à 17:26 +0530, Kaushal M a écrit :
> On Wed, Apr 27, 2016 at 5:21 PM, Michael Scherer  wrote:
> > Le mercredi 27 avril 2016 à 14:39 +0300, Eyal Edri a écrit :
> >> Excellent post-mortem!
> >>
> >> Do you think its worth adding mirrors to gluster repos like oVirt is doing?
> >> [1]
> >>
> >> [1] http://ovirt-infra-docs.readthedocs.org/en/latest/General/Mirror.html
> >
> > That could be a solution.
> >
> > But we have the ressources to host a mirror ourself in the DC, it just
> > need a ip address, and a migration of servers (which is taking a awful
> > lot of time to happen :/ ).
> >
> > One issue we would have with a mirror is on the download stats.
> >
> > This and the need to have a mirrorlist, not sure how that's done on
> > dnf/yum side theses days.
> >
> 
> Someone recently offered to mirror download.gluster.org (I need to dig
> archives to find out who exactly). Didn't we take up their offer?

Do not seems so.

But my question regarding the download stats is still unaswered. Given
how critical they seems to be, I would like to be sure we have a plan.

A mirror is the first step, but then:
- how do we signal to people to use the mirror and the main server
- are we ok with the decrease of download by a unspecified amount of
data on the download stats side
- who will edit the website for the change

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] Download.gluster.org 27 April 2016 postmortem

2016-04-27 Thread Eyal Edri
Excellent post-mortem!

Do you think its worth adding mirrors to gluster repos like oVirt is doing?
[1]

[1] http://ovirt-infra-docs.readthedocs.org/en/latest/General/Mirror.html

On Wed, Apr 27, 2016 at 1:56 PM, Michael Scherer 
wrote:

> Hi,
>
> as promised, here is the post-mortem of the incident, if you would like
> to see more information, or any remarks, please do not hesitate, since
> that's the first attempt at it we do.
>
> I modelled it based on the example of
> http://shop.oreilly.com/product/0636920041528.do, as that the book I am
> reading at the moment (Appendix D). We will formalize that later.
>
>
>
> Download.gluster.org was not serving file
> Date: 2016-04-27
> Participating people:
>  - misc
>
> Summary:
>
> Download.gluster.org http server was showing error 403 for all url,
> which did impact ovirt jenkins jobs, and users using the repository,
> among others. The server is used to distribute gluster rpms.
>
> Impact:
> - ovirt CI jobs got blocked
> - user couldn't install gluster
>
> Root cause:
> the underlying block device on rackspace was down for a undiagnosed
> reason, triggering xfs error on the server and thus 403 on the http
> level.
>
> the root cause of the block device error is for still unknown, no error
> have been seen on the rackspace status page for this DC. A ticket was
> opened with rackspace to see what was going on (160427-iad-814), a
> follow up of this post-mortem will be done if the ticket say something
> more than "shit happens".
>
> Resolution:
>
> The whole server was rebooted, and upon reboot, the block device came
> back.
>
> Lessons learned:
> - what went well:
>   - people notified the admin quickly on irc and on gluster-infra
>
> - when we were lucky
>   - the server and block device came back immediately
>   - it failed during business hours of EMEA with misc being on irc (just
> arrived at the office)
>
>
> - what went bad
>   - we do not have proper HA for the service
>   - we do not have automated monitoring for it
>   - the setup is using 2 blocks device of 120G in lvm, thus making it
> twice as risky to fail
>
> Timeline (in UTC)
> - 05:39 first error message in the log about XFS error
> - 08:41 misc is pinged on irc
> - 08:56 misc ack and diagnose the issue
> - 09:00 the server and service is back to normal
> - 09:00 first mail about the problem hit gluster-infra
>
>
> Potential improvement to make:
> - add monitoring on gluster side
> - use the centos sig repo on ovirt side
> - add more sysadmin for gluster
> - add a redundant service for that
>   - a 2nd download server with a shared gluster backend
> - migrate the storage to a proper setup with 1 single block device,
> rather than 2.
>
>
> --
> Michael Scherer
> Sysadmin, Community Infrastructure and Platform, OSAS
>
>
>
> ___
> Infra mailing list
> in...@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra
>
>


-- 
Eyal Edri
Associate Manager
RHEV DevOps
EMEA ENG Virtualization R
Red Hat Israel

phone: +972-9-7692018
irc: eedri (on #tlv #rhev-dev #rhev-integ)
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] Download.gluster.org 27 April 2016 postmortem

2016-04-27 Thread Kaushal M
On Wed, Apr 27, 2016 at 5:21 PM, Michael Scherer  wrote:
> Le mercredi 27 avril 2016 à 14:39 +0300, Eyal Edri a écrit :
>> Excellent post-mortem!
>>
>> Do you think its worth adding mirrors to gluster repos like oVirt is doing?
>> [1]
>>
>> [1] http://ovirt-infra-docs.readthedocs.org/en/latest/General/Mirror.html
>
> That could be a solution.
>
> But we have the ressources to host a mirror ourself in the DC, it just
> need a ip address, and a migration of servers (which is taking a awful
> lot of time to happen :/ ).
>
> One issue we would have with a mirror is on the download stats.
>
> This and the need to have a mirrorlist, not sure how that's done on
> dnf/yum side theses days.
>

Someone recently offered to mirror download.gluster.org (I need to dig
archives to find out who exactly). Didn't we take up their offer?

>
> --
> Michael Scherer
> Sysadmin, Community Infrastructure and Platform, OSAS
>
>
>
> ___
> Gluster-infra mailing list
> Gluster-infra@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-infra
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] Download.gluster.org 27 April 2016 postmortem

2016-04-27 Thread Michael Scherer
Le mercredi 27 avril 2016 à 14:39 +0300, Eyal Edri a écrit :
> Excellent post-mortem!
> 
> Do you think its worth adding mirrors to gluster repos like oVirt is doing?
> [1]
> 
> [1] http://ovirt-infra-docs.readthedocs.org/en/latest/General/Mirror.html

That could be a solution. 

But we have the ressources to host a mirror ourself in the DC, it just
need a ip address, and a migration of servers (which is taking a awful
lot of time to happen :/ ).

One issue we would have with a mirror is on the download stats. 

This and the need to have a mirrorlist, not sure how that's done on
dnf/yum side theses days.


-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra

[Gluster-infra] Download.gluster.org 27 April 2016 postmortem

2016-04-27 Thread Michael Scherer
Hi,

as promised, here is the post-mortem of the incident, if you would like
to see more information, or any remarks, please do not hesitate, since
that's the first attempt at it we do.

I modelled it based on the example of
http://shop.oreilly.com/product/0636920041528.do, as that the book I am
reading at the moment (Appendix D). We will formalize that later.



Download.gluster.org was not serving file
Date: 2016-04-27
Participating people:
 - misc

Summary:

Download.gluster.org http server was showing error 403 for all url,
which did impact ovirt jenkins jobs, and users using the repository,
among others. The server is used to distribute gluster rpms.

Impact:
- ovirt CI jobs got blocked
- user couldn't install gluster

Root cause:
the underlying block device on rackspace was down for a undiagnosed
reason, triggering xfs error on the server and thus 403 on the http
level.

the root cause of the block device error is for still unknown, no error
have been seen on the rackspace status page for this DC. A ticket was
opened with rackspace to see what was going on (160427-iad-814), a
follow up of this post-mortem will be done if the ticket say something
more than "shit happens".

Resolution:

The whole server was rebooted, and upon reboot, the block device came
back.

Lessons learned:
- what went well:
  - people notified the admin quickly on irc and on gluster-infra

- when we were lucky
  - the server and block device came back immediately
  - it failed during business hours of EMEA with misc being on irc (just
arrived at the office)


- what went bad
  - we do not have proper HA for the service
  - we do not have automated monitoring for it
  - the setup is using 2 blocks device of 120G in lvm, thus making it
twice as risky to fail

Timeline (in UTC)
- 05:39 first error message in the log about XFS error
- 08:41 misc is pinged on irc
- 08:56 misc ack and diagnose the issue
- 09:00 the server and service is back to normal
- 09:00 first mail about the problem hit gluster-infra
 

Potential improvement to make:
- add monitoring on gluster side
- use the centos sig repo on ovirt side
- add more sysadmin for gluster
- add a redundant service for that
  - a 2nd download server with a shared gluster backend
- migrate the storage to a proper setup with 1 single block device,
rather than 2.


-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] Please install netstat on the Jenkins slaves for regression testing

2016-04-27 Thread Niels de Vos
On Wed, Apr 27, 2016 at 11:29:53AM +0200, Michael Scherer wrote:
> Le mercredi 27 avril 2016 à 11:04 +0200, Niels de Vos a écrit :
> > We have one test-case that uses netstat (tests/bugs/fuse/bug-924726.t).
> > When netstat is not installed, this testcase will not be run correctly.
> > 
> > Please merge/squash this change and apply it to the slaves:
> >   https://github.com/gluster/gluster.org_ansible_configuration/pull/1
> 
> So I did merged, then I figured that I should have tested and/or not
> pushed to the repo first.
> 
> But there is no netstat package and netstat is already in net-tools.

Hmm, well, I assumed the playbook was run on slave27... The patch that
explicitly tests for netstat failed here:

  https://build.gluster.org/job/rackspace-regression-2GB-triggered/20020/console

> So I need to investigate a bit more the problem.

Maybe is it a failure in how the existance of netstat is tested? I
remember something about NetBSD not supporting the --version option:

  http://review.gluster.org/#/c/13547/3/run-tests.sh

But, it seems that regression succeeded for NetBSD, so maybe I'm
remembering things incorrectly. I'll try to check it out an other time
too, it is not urgent.

Thanks,
Niels


signature.asc
Description: PGP signature
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] Please install netstat on the Jenkins slaves for regression testing

2016-04-27 Thread Michael Scherer
Le mercredi 27 avril 2016 à 11:04 +0200, Niels de Vos a écrit :
> We have one test-case that uses netstat (tests/bugs/fuse/bug-924726.t).
> When netstat is not installed, this testcase will not be run correctly.
> 
> Please merge/squash this change and apply it to the slaves:
>   https://github.com/gluster/gluster.org_ansible_configuration/pull/1

So I did merged, then I figured that I should have tested and/or not
pushed to the repo first.

But there is no netstat package and netstat is already in net-tools.

So I need to investigate a bit more the problem.

But thanks for the first PR :)

> Once done, we can re-run the regression tests for
> http://review.gluster.org/13547 .
> 
> Thanks,
> Niels
> ___
> Gluster-infra mailing list
> Gluster-infra@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-infra

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] [ovirt-users] [Attention needed] GlusterFS repository down - affects CI / Installations

2016-04-27 Thread Niels de Vos
On Wed, Apr 27, 2016 at 02:30:57PM +0530, Ravishankar N wrote:
> @gluster infra  - FYI.
> 
> On 04/27/2016 02:20 PM, Nadav Goldin wrote:
> >Hi,
> >The GlusterFS repository became unavailable this morning, as a result all
> >Jenkins jobs that use the repository will fail, the common error would be:
> >
> >
> > http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/epel-7/noarch/repodata/repomd.xml:
> >[Errno 14] HTTP Error 403 - Forbidden
> >
> >
> >Also, installations of oVirt will fail.

I thought oVirt moved to using the packages from the CentOS Storage SIG?
In any case, automated tests should probably use those instead of the
packages on download.gluster.org. We're trying to minimize the work
packagers need to do, and get the glusterfs and other components in the
repositories that are provided by different distributions.

For more details, see the quickstart for the Storage SIG here:
  https://wiki.centos.org/SpecialInterestGroup/Storage/gluster-Quickstart

HTH,
Niels


signature.asc
Description: PGP signature
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra

[Gluster-infra] Please install netstat on the Jenkins slaves for regression testing

2016-04-27 Thread Niels de Vos
We have one test-case that uses netstat (tests/bugs/fuse/bug-924726.t).
When netstat is not installed, this testcase will not be run correctly.

Please merge/squash this change and apply it to the slaves:
  https://github.com/gluster/gluster.org_ansible_configuration/pull/1

Once done, we can re-run the regression tests for
http://review.gluster.org/13547 .

Thanks,
Niels


signature.asc
Description: PGP signature
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra

Re: [Gluster-infra] [ovirt-users] [Attention needed] GlusterFS repository down - affects CI / Installations

2016-04-27 Thread Ravishankar N

@gluster infra  - FYI.

On 04/27/2016 02:20 PM, Nadav Goldin wrote:

Hi,
The GlusterFS repository became unavailable this morning, as a result 
all Jenkins jobs that use the repository will fail, the common error 
would be:



http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/epel-7/noarch/repodata/repomd.xml:
[Errno 14] HTTP Error 403 - Forbidden


Also, installations of oVirt will fail.

We are working on a solution and will update asap.

Nadav.



___
Users mailing list
us...@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users



___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra