Re: [OpenStack-Infra] Public numbers about the scale of the infrastructure/CI ?

2018-03-29 Thread David Moreau Simard
The talk was this week and it's up on YouTube [1].

During the talk which was basically a long live demo, we...

- Sent a patch to fix a typo in the talk [2]
- Fixed a Zuul job through speculative testing [3]
- Updated the openstack-infra IRC meeting chair [4].

Oh, and we also added an item on the next meeting to talk about this talk [5].

It was fun.

[1]: https://youtu.be/6gTsL7E7U7Q
[2]: https://review.openstack.org/#/c/556738/
[3]: https://review.openstack.org/#/c/556615/
[4]: https://review.openstack.org/#/c/557095/
[5]: https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting


David Moreau Simard
Senior Software Engineer | OpenStack RDO

dmsimard = [irc, github, twitter]


On Sat, Mar 24, 2018 at 9:28 PM, David Moreau Simard
 wrote:
> Hi -infra,
>
> I'll be presenting a talk at a local OpenStack meetup next week [1]
> that will highlight some examples about how people can help and
> contribute to the infrastructure project.
> The talk will be recorded and should hopefully serve as a form of
> informal documentation.
>
> I'd like to disclose some semi-official numbers (as I'd personally
> pull them up) to let people have an idea of the scale our contributors
> are maintaining.
> I suppose this data is already somewhat public if you know where to
> look but I don't think it's been written down in a digestable format
> in recent history.
>
> Unless there's any objection, I'd have a slide with up to date numbers such 
> as:
> - # of projects hosted (as per git.openstack.org)
> - # of servers (in aggregate of all our regions)
> -- (Maybe some big highlights like the size of logstash, logs.o.o, Zuul)
> - Nodepool capacity (number of clouds, aggregate capacity)
> - # of jobs and Ansible playbooks per month ran by Zuul
> - Approximate number of maintained and hosted services (irc,
> gerritbot, meetbot, gerrit, git, mailing lists, wiki, ask.openstack,
> storyboard, codesearch, etc.)
> - Probably some high level numbers from Stackalytics
> - Maybe something else I haven't thought about
>
> The idea of the talk is not to brag about all the stuff we're doing
> but rather, "hey, you don't need to be a pro in OpenStack to
> contribute, we got all these different things you can help with".
>
> I realize it's a bit last minute but please let me know if you see
> anything wrong with this !
>
> [1]: https://www.meetup.com/Montreal-OpenStack/events/248344351/
>
> David Moreau Simard
> Senior Software Engineer | OpenStack RDO
>
> dmsimard = [irc, github, twitter]

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Public numbers about the scale of the infrastructure/CI ?

2018-03-26 Thread David Moreau Simard
Good point.

I'll work with that instead.

David Moreau Simard
Senior Software Engineer | OpenStack RDO

dmsimard = [irc, github, twitter]


On Mon, Mar 26, 2018 at 4:30 PM, James E. Blair  wrote:
> David Moreau Simard  writes:
>
>> On Mon, Mar 26, 2018 at 10:20 AM, James E. Blair  wrote:
 - # of jobs and Ansible playbooks per month ran by Zuul
>>>
>>> I'm curious about this one -- how were you planning on defining these
>>> values and obtaining them?
>>>
>>
>> I've needed to pull statistics out of Zuul in the past for RDO (i.e,
>> justifying budget for CI resources)
>> and I use the sql reporter data to do it.
>> It looks like this:
>>
>> $range = "'2018-02-01 00:00:00' AND '2018-02-28 23:59:59'"
>> SELECT job_name,
>>result,
>>start_time,
>>end_time,
>>TIMEDIFF(end_time, start_time) as duration
>> FROM zuul_build
>> WHERE
>> start_time BETWEEN $range
>>
>> This gets me the amount of monthly *jobs* and I can extrapolate (over
>> N playbooks..)
>> by estimating a number knowing that:
>> - base and post playbooks are fairly consistently X playbooks
>> - there is at least one "run" playbook
>>
>> So pretending that 1000 jobs ran, I can say something like:
>> 1000 jobs and over [1000 * (X+1)] playbooks
>>
>> It's not a perfect number but we know we run more playbooks than that.
>>
>> What I have also been thinking about is, if I want to get a more
>> accurate number, I could do a sum of all the executor playbook results
>> (which are in graphite) but the history for those don't go too far
>> back.
>> Ex: stats.zuul.executor.ze*_openstack_org.phase.*.*
>
> The SQL query gets the number of completed jobs which are *reported*.
> It doesn't get you two other numbers, which are the jobs *launched*
> (many of which may have been aborted before completion), or the jobs
> *completed* (the results of many of which may have been discarded due to
> changes in the environment).  In reality, the system is likely to be
> significantly busier than the number of jobs reported will indicate.
>
> Both of the other values can be obtained from graphite or by parsing
> logs.  I think for this purpose, graphite might be sufficient.  (The
> only time I'd recommend going to logs is when we need to find
> project-specific resource usage information.)
>
> stats_counts.zuul.executor.*.builds should be all jobs launched.
> stats_counts.zuul.tenant.*.pipeline.*.all_jobs should be all jobs completed.
>
> -Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Public numbers about the scale of the infrastructure/CI ?

2018-03-26 Thread James E. Blair
David Moreau Simard  writes:

> On Mon, Mar 26, 2018 at 10:20 AM, James E. Blair  wrote:
>>> - # of jobs and Ansible playbooks per month ran by Zuul
>>
>> I'm curious about this one -- how were you planning on defining these
>> values and obtaining them?
>>
>
> I've needed to pull statistics out of Zuul in the past for RDO (i.e,
> justifying budget for CI resources)
> and I use the sql reporter data to do it.
> It looks like this:
>
> $range = "'2018-02-01 00:00:00' AND '2018-02-28 23:59:59'"
> SELECT job_name,
>result,
>start_time,
>end_time,
>TIMEDIFF(end_time, start_time) as duration
> FROM zuul_build
> WHERE
> start_time BETWEEN $range
>
> This gets me the amount of monthly *jobs* and I can extrapolate (over
> N playbooks..)
> by estimating a number knowing that:
> - base and post playbooks are fairly consistently X playbooks
> - there is at least one "run" playbook
>
> So pretending that 1000 jobs ran, I can say something like:
> 1000 jobs and over [1000 * (X+1)] playbooks
>
> It's not a perfect number but we know we run more playbooks than that.
>
> What I have also been thinking about is, if I want to get a more
> accurate number, I could do a sum of all the executor playbook results
> (which are in graphite) but the history for those don't go too far
> back.
> Ex: stats.zuul.executor.ze*_openstack_org.phase.*.*

The SQL query gets the number of completed jobs which are *reported*.
It doesn't get you two other numbers, which are the jobs *launched*
(many of which may have been aborted before completion), or the jobs
*completed* (the results of many of which may have been discarded due to
changes in the environment).  In reality, the system is likely to be
significantly busier than the number of jobs reported will indicate.

Both of the other values can be obtained from graphite or by parsing
logs.  I think for this purpose, graphite might be sufficient.  (The
only time I'd recommend going to logs is when we need to find
project-specific resource usage information.)

stats_counts.zuul.executor.*.builds should be all jobs launched.
stats_counts.zuul.tenant.*.pipeline.*.all_jobs should be all jobs completed.

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Public numbers about the scale of the infrastructure/CI ?

2018-03-26 Thread David Moreau Simard
On Mon, Mar 26, 2018 at 10:20 AM, James E. Blair  wrote:
>> - # of jobs and Ansible playbooks per month ran by Zuul
>
> I'm curious about this one -- how were you planning on defining these
> values and obtaining them?
>

I've needed to pull statistics out of Zuul in the past for RDO (i.e,
justifying budget for CI resources)
and I use the sql reporter data to do it.
It looks like this:

$range = "'2018-02-01 00:00:00' AND '2018-02-28 23:59:59'"
SELECT job_name,
   result,
   start_time,
   end_time,
   TIMEDIFF(end_time, start_time) as duration
FROM zuul_build
WHERE
start_time BETWEEN $range

This gets me the amount of monthly *jobs* and I can extrapolate (over
N playbooks..)
by estimating a number knowing that:
- base and post playbooks are fairly consistently X playbooks
- there is at least one "run" playbook

So pretending that 1000 jobs ran, I can say something like:
1000 jobs and over [1000 * (X+1)] playbooks

It's not a perfect number but we know we run more playbooks than that.

What I have also been thinking about is, if I want to get a more
accurate number, I could do a sum of all the executor playbook results
(which are in graphite) but the history for those don't go too far
back.
Ex: stats.zuul.executor.ze*_openstack_org.phase.*.*

David Moreau Simard
Senior Software Engineer | OpenStack RDO

dmsimard = [irc, github, twitter]

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Public numbers about the scale of the infrastructure/CI ?

2018-03-26 Thread James E. Blair
David Moreau Simard  writes:

> Unless there's any objection, I'd have a slide with up to date numbers such 
> as:

I don't have any objection to making them public (I believe nearly all,
if not all, of these are public already).  But I would like them to be
as accurate as possible :).

> - # of projects hosted (as per git.openstack.org)
> - # of servers (in aggregate of all our regions)
> -- (Maybe some big highlights like the size of logstash, logs.o.o, Zuul)
> - Nodepool capacity (number of clouds, aggregate capacity)
> - # of jobs and Ansible playbooks per month ran by Zuul

I'm curious about this one -- how were you planning on defining these
values and obtaining them?

> - Approximate number of maintained and hosted services (irc,
> gerritbot, meetbot, gerrit, git, mailing lists, wiki, ask.openstack,
> storyboard, codesearch, etc.)
> - Probably some high level numbers from Stackalytics
> - Maybe something else I haven't thought about

-Jim

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Public numbers about the scale of the infrastructure/CI ?

2018-03-24 Thread David Moreau Simard
Hi -infra,

I'll be presenting a talk at a local OpenStack meetup next week [1]
that will highlight some examples about how people can help and
contribute to the infrastructure project.
The talk will be recorded and should hopefully serve as a form of
informal documentation.

I'd like to disclose some semi-official numbers (as I'd personally
pull them up) to let people have an idea of the scale our contributors
are maintaining.
I suppose this data is already somewhat public if you know where to
look but I don't think it's been written down in a digestable format
in recent history.

Unless there's any objection, I'd have a slide with up to date numbers such as:
- # of projects hosted (as per git.openstack.org)
- # of servers (in aggregate of all our regions)
-- (Maybe some big highlights like the size of logstash, logs.o.o, Zuul)
- Nodepool capacity (number of clouds, aggregate capacity)
- # of jobs and Ansible playbooks per month ran by Zuul
- Approximate number of maintained and hosted services (irc,
gerritbot, meetbot, gerrit, git, mailing lists, wiki, ask.openstack,
storyboard, codesearch, etc.)
- Probably some high level numbers from Stackalytics
- Maybe something else I haven't thought about

The idea of the talk is not to brag about all the stuff we're doing
but rather, "hey, you don't need to be a pro in OpenStack to
contribute, we got all these different things you can help with".

I realize it's a bit last minute but please let me know if you see
anything wrong with this !

[1]: https://www.meetup.com/Montreal-OpenStack/events/248344351/

David Moreau Simard
Senior Software Engineer | OpenStack RDO

dmsimard = [irc, github, twitter]

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra