Re: [openstack-dev] [infra] placement job is busted in stable/newton (NO MORE HOSTS LEFT)

2017-01-12 Thread Jeremy Stanley
On 2017-01-11 20:40:26 -0600 (-0600), Matt Riedemann wrote:
[...]
> Well I guess it's less sinister than all that, it was just a matter of when
> the nova change landed, which was meant for newton but happened in Ocata:
[...]

Oh, good catch! I didn't even think to check whether that hook
existed in nova's stable branches, I was only looking at master
because I hadn't paid close enough attention to your subject line.
-- 
Jeremy Stanley

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [infra] placement job is busted in stable/newton (NO MORE HOSTS LEFT)

2017-01-11 Thread Matt Riedemann

On 1/11/2017 8:18 PM, Matt Riedemann wrote:

On 1/11/2017 9:19 AM, Jeremy Stanley wrote:


If you look in the _zuul_ansible/scripts directory you'll see that
shell script which exited nonzero is the one calling devstack-gate,
so we've got something broken near the end of the job as you
surmise. I think it might be the post_test_hook:

http://logs.openstack.org/57/416757/1/check/gate-tempest-dsvm-neutron-placement-full-ubuntu-xenial-nv/dfe0c38/logs/devstack-gate-post_test_hook.txt.gz


Looking in the nova repo, tools/hooks/post_test_hook.sh is a
relative symlink to gate/post_test_hook.sh but for some reason the
job doesn't seem to be following that. You might try recreating this
locally with the logs/reproduce.sh from that run and see if you get
the same behavior.



Hmm, I'm guessing this is somehow related to this:

https://review.openstack.org/#/c/378952/

But I'm not entirely sure how or why yet...I'll have to talk to Old Man
Dague in the morning.



Well I guess it's less sinister than all that, it was just a matter of 
when the nova change landed, which was meant for newton but happened in 
Ocata:


https://review.openstack.org/#/c/376567/

So the script isn't there in newton. I'll push a change to the job in 
project-config to make that only run the hook if it exists.


--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [infra] placement job is busted in stable/newton (NO MORE HOSTS LEFT)

2017-01-11 Thread Matt Riedemann

On 1/11/2017 9:19 AM, Jeremy Stanley wrote:


If you look in the _zuul_ansible/scripts directory you'll see that
shell script which exited nonzero is the one calling devstack-gate,
so we've got something broken near the end of the job as you
surmise. I think it might be the post_test_hook:

http://logs.openstack.org/57/416757/1/check/gate-tempest-dsvm-neutron-placement-full-ubuntu-xenial-nv/dfe0c38/logs/devstack-gate-post_test_hook.txt.gz

Looking in the nova repo, tools/hooks/post_test_hook.sh is a
relative symlink to gate/post_test_hook.sh but for some reason the
job doesn't seem to be following that. You might try recreating this
locally with the logs/reproduce.sh from that run and see if you get
the same behavior.



Hmm, I'm guessing this is somehow related to this:

https://review.openstack.org/#/c/378952/

But I'm not entirely sure how or why yet...I'll have to talk to Old Man 
Dague in the morning.


--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [infra] placement job is busted in stable/newton (NO MORE HOSTS LEFT)

2017-01-11 Thread Matt Riedemann

On 1/11/2017 7:17 AM, Sylvain Bauza wrote:


On a separate change, I also have the placement job being -1 because of
the ComputeFilter saying that the service is disabled because of
'connection of libvirt lost' :

http://logs.openstack.org/20/415520/5/check/gate-tempest-dsvm-neutron-placement-full-ubuntu-xenial-nv/19fcab4/logs/screen-n-sch.txt.gz#_2017-01-11_04_33_35_995




That's probably due to one of:

http://status.openstack.org//elastic-recheck/index.html#1646779
http://status.openstack.org//elastic-recheck/index.html#1643911
http://status.openstack.org//elastic-recheck/index.html#1638982

--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [infra] placement job is busted in stable/newton (NO MORE HOSTS LEFT)

2017-01-11 Thread Jeremy Stanley
On 2017-01-10 20:03:28 -0500 (-0500), Matt Riedemann wrote:
> I'm trying to sort out failures in the placement job in stable/newton job
> where the tests aren't failing but it's something in the host cleanup step
> that blows up.
> 
> Looking here I see this:
> 
> http://logs.openstack.org/57/416757/1/check/gate-tempest-dsvm-neutron-placement-full-ubuntu-xenial-nv/dfe0c38/_zuul_ansible/ansible_log.txt.gz
[...]
> 2017-01-04 23:44:42,880 p=10771 u=zuul |  fatal: [node]: FAILED! =>
> {"changed": true, "cmd": ["/tmp/05-cb20affd78a84851b47992ff129722af.sh"],
> "delta": "0:57:51.734808", "end": "2017-01-04 23:44:42.632473", "failed":
> true, "rc": 127, "start": "2017-01-04 22:46:50.897665", "stderr": "",
> "stdout": "", "stdout_lines": [], "warnings": []}
[...]

If you look in the _zuul_ansible/scripts directory you'll see that
shell script which exited nonzero is the one calling devstack-gate,
so we've got something broken near the end of the job as you
surmise. I think it might be the post_test_hook:

http://logs.openstack.org/57/416757/1/check/gate-tempest-dsvm-neutron-placement-full-ubuntu-xenial-nv/dfe0c38/logs/devstack-gate-post_test_hook.txt.gz

Looking in the nova repo, tools/hooks/post_test_hook.sh is a
relative symlink to gate/post_test_hook.sh but for some reason the
job doesn't seem to be following that. You might try recreating this
locally with the logs/reproduce.sh from that run and see if you get
the same behavior.
-- 
Jeremy Stanley

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [infra] placement job is busted in stable/newton (NO MORE HOSTS LEFT)

2017-01-11 Thread Sylvain Bauza


Le 11/01/2017 02:03, Matt Riedemann a écrit :
> I'm trying to sort out failures in the placement job in stable/newton
> job where the tests aren't failing but it's something in the host
> cleanup step that blows up.
> 
> Looking here I see this:
> 
> http://logs.openstack.org/57/416757/1/check/gate-tempest-dsvm-neutron-placement-full-ubuntu-xenial-nv/dfe0c38/_zuul_ansible/ansible_log.txt.gz
> 
> 
> 2017-01-04 22:46:50,761 p=10771 u=zuul |  changed: [node] => {"changed":
> true, "checksum": "7f4d51086f4bc4de5ae6d83c00b0e458b8606aa2", "dest":
> "/tmp/05-cb20affd78a84851b47992ff129722af.sh", "gid": 3001, "group":
> "jenkins", "md5sum": "2de9baa70e4d28bbcca550a17959beab", "mode": "0555",
> "owner": "jenkins", "size": 647, "src":
> "/tmp/tmpz_guiR/.ansible/remote_tmp/ansible-tmp-1483570010.54-207083993908564/source",
> "state": "file", "uid": 3000}
> 2017-01-04 22:46:50,775 p=10771 u=zuul |  TASK [command generated from
> JJB] **
> 2017-01-04 23:44:42,880 p=10771 u=zuul |  fatal: [node]: FAILED! =>
> {"changed": true, "cmd":
> ["/tmp/05-cb20affd78a84851b47992ff129722af.sh"], "delta":
> "0:57:51.734808", "end": "2017-01-04 23:44:42.632473", "failed": true,
> "rc": 127, "start": "2017-01-04 22:46:50.897665", "stderr": "",
> "stdout": "", "stdout_lines": [], "warnings": []}
> 2017-01-04 23:44:42,887 p=10771 u=zuul |  NO MORE HOSTS LEFT
> *
> 2017-01-04 23:44:42,888 p=10771 u=zuul |  PLAY RECAP
> *
> 2017-01-04 23:44:42,888 p=10771 u=zuul |  node   :
> ok=13   changed=13   unreachable=0failed=1
> 
> I'm not sure what the 'NO MORE HOSTS LEFT' error means. Is there
> something wrong with the post/cleanup step for this job in newton? It's
> non-voting but we're backporting bug fixes for this code since it needs
> to work to upgrade to ocata.
> 


Is there a follow-up on the above problem ?

On a separate change, I also have the placement job being -1 because of
the ComputeFilter saying that the service is disabled because of
'connection of libvirt lost' :

http://logs.openstack.org/20/415520/5/check/gate-tempest-dsvm-neutron-placement-full-ubuntu-xenial-nv/19fcab4/logs/screen-n-sch.txt.gz#_2017-01-11_04_33_35_995


-Sylvain

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [infra] placement job is busted in stable/newton (NO MORE HOSTS LEFT)

2017-01-10 Thread Matt Riedemann
I'm trying to sort out failures in the placement job in stable/newton 
job where the tests aren't failing but it's something in the host 
cleanup step that blows up.


Looking here I see this:

http://logs.openstack.org/57/416757/1/check/gate-tempest-dsvm-neutron-placement-full-ubuntu-xenial-nv/dfe0c38/_zuul_ansible/ansible_log.txt.gz

2017-01-04 22:46:50,761 p=10771 u=zuul |  changed: [node] => {"changed": 
true, "checksum": "7f4d51086f4bc4de5ae6d83c00b0e458b8606aa2", "dest": 
"/tmp/05-cb20affd78a84851b47992ff129722af.sh", "gid": 3001, "group": 
"jenkins", "md5sum": "2de9baa70e4d28bbcca550a17959beab", "mode": "0555", 
"owner": "jenkins", "size": 647, "src": 
"/tmp/tmpz_guiR/.ansible/remote_tmp/ansible-tmp-1483570010.54-207083993908564/source", 
"state": "file", "uid": 3000}
2017-01-04 22:46:50,775 p=10771 u=zuul |  TASK [command generated from 
JJB] **
2017-01-04 23:44:42,880 p=10771 u=zuul |  fatal: [node]: FAILED! => 
{"changed": true, "cmd": 
["/tmp/05-cb20affd78a84851b47992ff129722af.sh"], "delta": 
"0:57:51.734808", "end": "2017-01-04 23:44:42.632473", "failed": true, 
"rc": 127, "start": "2017-01-04 22:46:50.897665", "stderr": "", 
"stdout": "", "stdout_lines": [], "warnings": []}
2017-01-04 23:44:42,887 p=10771 u=zuul |  NO MORE HOSTS LEFT 
*
2017-01-04 23:44:42,888 p=10771 u=zuul |  PLAY RECAP 
*
2017-01-04 23:44:42,888 p=10771 u=zuul |  node   : 
ok=13   changed=13   unreachable=0failed=1


I'm not sure what the 'NO MORE HOSTS LEFT' error means. Is there 
something wrong with the post/cleanup step for this job in newton? It's 
non-voting but we're backporting bug fixes for this code since it needs 
to work to upgrade to ocata.


--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev