On Wed, Aug 13, 2014 at 2:48 PM, Duncan Thomas <duncan.tho...@gmail.com> wrote:
> On 13 August 2014 13:57, Matthew Treinish <mtrein...@kortar.org> wrote: > > On Tue, Aug 12, 2014 at 01:45:17AM +0400, Boris Pavlovic wrote: > >> Keystone, Glance, Cinder, Neutron and Heat are running rally performance > >> jobs, that can be used for performance testing, benchmarking, regression > >> testing (already now). These jobs supports in-tree plugins for all > >> components (scenarios, load generators, benchmark context) and they can > use > >> Rally fully without interaction with Rally team at all. More about these > >> jobs: > >> > https://docs.google.com/a/mirantis.com/document/d/1s93IBuyx24dM3SmPcboBp7N47RQedT8u4AJPgOHp9-A/ > >> So I really don't see anything like this in tempest (even in observed > >> future) > > > So this is actually the communication problem I mentioned before. > Singling out > > individual projects and getting them to add a rally job is not "cross > project" > > communication. (this is part of what I meant by "push using Rally") > There was no > > larger discussion on the ML or a topic in the project meeting about > adding these > > jobs. There was no discussion about the value vs risk of adding new jobs > to the > > gate. Also, this is why less than half of the integrated projects have > these > > jobs. Having asymmetry like this between gating workloads on projects > helps no > > one. > > So the advantage of the approach, rather than having a massive > cross-product discussion, is that interested projects (I've been very > interested for a cinder core PoV) act as a test bed for other > projects. 'Cross project' discussions rather come to other teams, they > rely on people to find them, where as Boris came to us, said I've got > this thing you might like, try it out, tell me what you want. He took > feedback, iterated fast and investigated bugs. It has been a genuine > pleasure to work with him, and I feel we made progress faster than we > would have done if it was trying to please everybody. > > > That being said the reason I think osprofiler has been more accepted and > it's > > adoption into oslo is not nearly as contentious is because it's an > independent > > library that has value outside of itself. You don't need to pull in a > monolithic > > stack to use it. Which is a design point more conducive with the rest of > > OpenStack. > > Sorry, are you suggesting tempest isn't a giant monolithic thing? > Because I was able to comprehend the rally code very quickly, that > isn't even slightly true of tempest. Having one simple tool that does > one thing well is exactly what rally has tried to do - tempest seems > to want to be five different things at once (CI, instalation tests, > trademark, preformance, stress testing, ...) > > >> Matt, Sean - seriously community is about convincing people, not about > >> forcing people to do something against their wiliness. You are making > huge > >> architectural decisions without deep knowledge about what is Rally, what > >> are use cases, road map, goals and auditory. > >> > >> IMHO community in my opinion is thing about convincing people. So QA > >> program should convince Rally team (at least me) to do such changes. Key > >> secret to convince me, is to say how this will help OpenStack to perform > >> better. > > > > If community, per your definition, is about convincing people then there > needs > > to be a 2-way discussion. This is an especially key point considering the > > feedback on this thread is basically the same feedback you've been > getting since > > you first announced Rally on the ML. [1] (and from even before that I > think, but > > it's hard to remember all the details from that far back) I'm afraid > that > > without a shared willingness to explore what we're suggesting because of > > preconceived notions then I fail to see the point in moving forward. The > fact > > that this feedback has been ignored is why this discussion has come up > at all. > > > >> > >> Currently Rally team see a lot of issues related to this decision: > >> > >> 1) It breaks already existing performance jobs (Heat, Glance, Cinder, > >> Neutron, Keystone) > > > > So firstly, I want to say I find these jobs troubling. Not just from the > fact > > that because of the nature of the gate (2nd level virt on public clouds) > the > > variability between jobs can be staggering. I can't imagine what value > there is > > in running synthetic benchmarks in this environment. It would only > reliably > > catch the most egregious of regressions. Also from what I can tell none > of these > > jobs actually compare the timing data to the previous results, it just > generates > > the data and makes a pretty graph. The burden appears to be on the user > to > > figure out what it means, which really isn't that useful. How have these > jobs > > actually helped? IMO the real value in performance testing in the gate > is to > > capture the longer term trends in the data. Which is something these > jobs are > > not doing. > > So I put in a change to dump out the raw data from each run into a > zipped json file so that I can start looking at the value of > collecting this data.... As an experiment I think it is very worth > while. The gate job is none voting, and apparently, at least on the > cinder front, highly reliable. The job runs fast enough it isn't > slowing the gate down - we aren't running out of nodes on the gate as > far as I can tell, so I don't understand the hostility towards it. > We'll run it for a bit, see if it proves useful, if it doesn't then we > can turn it off and try something else. > We actually run out of nodes almost every day now (except weekends), we have about 800 nodes, and hit that quota most days [0][1]. While the output of the rally job [2] is very impressive, with our constrained number of nodes, I am still struggling to grok the value of running this job on every patch. [0] http://graphite.openstack.org/render/?from=-24hours&height=180&until=now&width=334&bgcolor=ffffff&fgcolor=000000&areaMode=stacked&target=color(alias(sumSeries(stats.gauges.nodepool.target.building),%20%27Building%27),%20%27ffbf52%27)&target=color(alias(sumSeries(stats.gauges.nodepool.target.ready),%20%27Available%27),%20%2700c868%27)&target=color(alias(sumSeries(stats.gauges.nodepool.target.used),%20%27In%20Use%27),%20%276464ff%27)&target=color(alias(sumSeries(stats.gauges.nodepool.target.delete),%20%27Deleting%27),%20%27c864ff%27)&title=Test%20Nodes&_t=0.8509290898218751#1407982412165 [1] http://graphite.openstack.org/render/?from=-12days&height=180&until=now&width=334&bgcolor=ffffff&fgcolor=000000&areaMode=stacked&target=color(alias(sumSeries(stats.gauges.nodepool.target.building),%20%27Building%27),%20%27ffbf52%27)&target=color(alias(sumSeries(stats.gauges.nodepool.target.ready),%20%27Available%27),%20%2700c868%27)&target=color(alias(sumSeries(stats.gauges.nodepool.target.used),%20%27In%20Use%27),%20%276464ff%27)&target=color(alias(sumSeries(stats.gauges.nodepool.target.delete),%20%27Deleting%27),%20%27c864ff%27)&title=Test%20Nodes&_t=0.8509290898218751#1407982412165 [2] http://logs.openstack.org/02/109202/4/check/gate-rally-dsvm-cinder/bbc256b/rally-plot/results.html.gz > I'm confused by the hostility about this gate job - it is costing us > nothing, if it turns out to be a pain we'll turn it off. > > Rally as a general tool has enabled me do do things that I wouldn't > even consider trying with tempest. There shouldn't be a problem with a > small number of parallel efforts - that's a founding principle of > opensource in general. > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev