Indeed that is the problem. I am sshing as root The only possible explanations I have for this working in the ovirt autmated testing setup are:
- Either the ovirt automated tests must be using a host inventory file with become: true? - The inventory file ssh credentials happen to be to vdsm user? (And become is just not doing anything) - Some other janky thing that allows the access It also happens for the immediately following task `Initialize metadata volume` and many more following that task. I will correct and then open a pull request to fix with your input that you believe it's a problem. ~Kyle On Tue, Apr 3, 2018 at 3:08 PM, RabidCicada <[email protected]> wrote: > Alright. By differential comparing to successfull postgres-command > sudo-becomes. It looks like the salient difference is that `Copy > configuration archive to storage` is missing become:true. Testing now. > > The similar postgres commands have become:true while this one does not. > > ~Kyle > > On Tue, Apr 3, 2018 at 1:19 PM, Simone Tiraboschi <[email protected]> > wrote: > >> >> >> On Tue, Apr 3, 2018 at 5:21 PM, RabidCicada <[email protected]> >> wrote: >> >>> I've attached a full debug packet below. I include the log file from >>> ovirt-hosted-engine-setup. I include relevant cmd line info. I include >>> info from the command line where epdb has a breakpoint in playbook.py from >>> ansible itself. I also include info from commands I ran after it failed. >>> I also include attached the ferried over script in /root/.ansible/tmp that >>> is run. >>> >>> >>> *Output on cmd line:* >>> [ INFO ] TASK [Copy configuration archive to storage] >>> [ ERROR ] fatal: [localhost]: FAILED! => { >>> "changed": true, >>> "cmd": [ >>> "dd", >>> "bs=20480", >>> "count=1", >>> "oflag=direct", >>> "if=/var/tmp/localvmbCDQIR/5e >>> f881f5-c992-48d2-b969-a0b6156bdf7c", >>> "of=/rhev/data-center/mnt/nod >>> e.local:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/image >>> s/c5510e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-48d2-b9 >>> 69-a0b6156bdf7c" >>> ], >>> "delta": "0:00:00.004336", >>> "end": "2018-04-03 15:01:55.581823", >>> "invocation": { >>> "module_args": { >>> "_raw_params": "dd bs=20480 count=1 oflag=direct >>> if=\"/var/tmp/localvmbCDQIR/5ef881f5-c992-48d2-b969-a0b6156bdf7c\" >>> of=\"/rhev/data-center/mnt/node.local:_srv_data/81292f3f-11d >>> 3-4e38-9afa-62e133aa8017/images/c5510e77-1ee0-479c-b6cf-24c1 >>> 79313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c\"", >>> "_uses_shell": false, >>> "chdir": null, >>> "creates": null, >>> "executable": null, >>> "removes": null, >>> "stdin": null, >>> "warn": true >>> } >>> }, >>> "msg": "non-zero return code", >>> "rc": 1, >>> "start": "2018-04-03 15:01:55.577487", >>> "stderr": "dd: failed to open ā/rhev/data-center/mnt/node.lo >>> cal:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/images/c5 >>> 510e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7cā: >>> Permission denied", >>> "stderr_lines": [ >>> "dd: failed to open ā/rhev/data-center/mnt/node.lo >>> cal:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/images/c5 >>> 510e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7cā: >>> Permission denied" >>> ], >>> "stdout": "", >>> "stdout_lines": [] >>> } >>> [ ERROR ] Failed to execute stage 'Closing up': Failed executing >>> ansible-playbook >>> >> >> In the playbook we have on that task: >> become_user: vdsm >> become_method: sudo >> >> but I fear it got somehow ignored. >> I'll investigate it. >> >> >>> >>> *Output from ansible epdb tracepoint:* >>> >>> Using module file /usr/lib/python2.7/site-packag >>> es/ansible/modules/commands/command.py >>> <localhost> ESTABLISH LOCAL CONNECTION FOR USER: root >>> <localhost> EXEC /bin/sh -c 'echo ~ && sleep 0' >>> <localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo >>> /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055 `" && echo >>> ansible-tmp-1522767715.36-81496549401055="` echo >>> /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055 `" ) && >>> sleep 0' >>> <localhost> PUT /tmp/tmpGMGdjh TO /root/.ansible/tmp/ansible-tmp >>> -1522767715.36-81496549401055/command.py >>> <localhost> EXEC /bin/sh -c 'chmod u+x >>> /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055/ >>> /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055/command.py >>> && sleep 0' >>> <localhost> EXEC /bin/sh -c '/usr/bin/python >>> /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055/command.py >>> && sleep 0' >>> to retry, use: --limit @/usr/share/ovirt-hosted-engin >>> e-setup/ansible/create_target_vm.retry >>> >>> The above command.py is the on I have attached as problematic_command.py >>> >>> *Investigation After Failure:* >>> [root@node ~]# ls -al '/rhev/data-center/mnt/node.lo >>> cal:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/images/c5 >>> 510e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c' >>> -rw-rw----. 1 vdsm kvm 20480 Apr 3 15:01 /rhev/data-center/mnt/node.loc >>> al:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/images/c55 >>> 10e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c >>> >>> sudo -u vdsm dd bs=20480 count=1 oflag=direct >>> if="/var/tmp/localvmbCDQIR/5ef881f5-c992-48d2-b969-a0b6156bdf7c" >>> of="/rhev/data-center/mnt/node.local:_srv_data/81292f3f-11d3 >>> -4e38-9afa-62e133aa8017/images/c5510e77-1ee0-479c-b6cf-24c17 >>> 9313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c >>> 1+0 records in >>> 1+0 records out >>> >>> >>> It seems to me that somehow it is not getting the right permissions even >>> though the playbook has: >>> - name: Copy configuration archive to storage >>> command: dd bs=20480 count=1 oflag=direct if="{{ LOCAL_VM_DIR }}/{{ >>> he_conf_disk_details.disk.image_id }}" of="{{ he_conf_disk_path }}" >>> become_user: vdsm >>> become_method: sudo >>> changed_when: True >>> tags: [ 'skip_ansible_lint' ] >>> >>> >>> On Tue, Apr 3, 2018 at 8:51 AM, RabidCicada <[email protected]> >>> wrote: >>> >>>> I am now also running with: >>>> >>>> export ANSIBLE_VERBOSITY=5 >>>> >>>> export ANSIBLE_FORKS=1 >>>> >>>> export ANSIBLE_KEEP_REMOTE_FILES=1 >>>> >>>> >>>> On Tue, Apr 3, 2018 at 8:49 AM, RabidCicada <[email protected]> >>>> wrote: >>>> >>>>> Here's the log. >>>>> >>>>> So the command that it says it ran is: >>>>> dd bs=20480 count=1 oflag=direct if=/var/tmp/localvmHaWb6G/1cce >>>>> 8df2-1810-4063-b4e2-e19a2c5b1909 of=/rhev/data-center/mnt/node. >>>>> local:_srv_data/3c7485ea-14e3-40c1-b627-f89a819ed1d6/images/ >>>>> 2c1f7c2f-b8f7-46d4-ac66-8ff1e9649e29/1cce8df2-1810-4063-b4e2 >>>>> -e19a2c5b1909 >>>>> >>>>> But we all know that was done with: >>>>> >>>>> - name: Copy configuration archive to storage >>>>> command: dd bs=20480 count=1 oflag=direct if="{{ LOCAL_VM_DIR >>>>> }}/{{ he_conf_disk_details.disk.image_id }}" of="{{ he_conf_disk_path >>>>> }}" >>>>> become_user: vdsm >>>>> become_method: sudo >>>>> changed_when: True >>>>> tags: [ 'skip_ansible_lint' ] >>>>> >>>>> So I manually replicated with `sudo vdsm dd bs=20480 count=1 >>>>> if=/var/tmp/localvmHaWb6G/1cce8df2-1810-4063-b4e2-e19a2c5b1909 >>>>> of=/rhev/data-center/mnt/node.local:_srv_data/3c7485ea-14e3- >>>>> 40c1-b627-f89a819ed1d6/images/2c1f7c2f-b8f7-46d4-ac66-8ff1e9 >>>>> 649e29/1cce8df2-1810-4063-b4e2-e19a2c5b1909` >>>>> >>>>> And it works when I manually do it. Though I think I didn't use the >>>>> oflag=direct (Just realised this) >>>>> >>>>> I eventually put a pause task directly preceeding it with debug output >>>>> that showed the file paths. I manually ran the command and it worked. >>>>> Then let it do it....failed. I checked all the permissions e.g. >>>>> vdsm:kvm. All looks good from a filesystem point of view. I'm beginning >>>>> (naively) to suspect a race condition for the file access problem...but >>>>> have come nowhere close to solving it. >>>>> >>>>> *Can you suggest a good way to continue executing the install process >>>>> from the create_target_vm.yml playbook (with proper variables and context >>>>> from otopi etc)?* I currently have to restart the entire process >>>>> over again and wait quite a while for it to circle back around. >>>>> >>>>> I have since discovered epdb and I've set breakpoints directly in >>>>> playbook.py of ansible just to see better log output. I insert epdb.serve >>>>> and use netcat to connect since epdb on python 2.7.5 and up seems to have >>>>> problems using epdb.connect() itself. >>>>> >>>>> ~Kyle >>>>> >>>>> On Tue, Apr 3, 2018 at 4:06 AM, Simone Tiraboschi <[email protected] >>>>> > wrote: >>>>> >>>>>> >>>>>> >>>>>> On Mon, Apr 2, 2018 at 4:52 PM, RabidCicada <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Heyo everyone. I'm trying to debug hosted-engine --deploy. It is >>>>>>> failing in `Copy configuration archive to storage` in >>>>>>> `create_target_vm.yml` from `hosted-engine --deploy`. My general >>>>>>> and most important query here is how to get good debug output from >>>>>>> ansible >>>>>>> through hosted-engine. I'm running hosted-engine through an ssh >>>>>>> session. >>>>>>> >>>>>>> I can't figure out how to get good debug output from ansible within >>>>>>> that workflow. I see it's running through otopi, I tried setting >>>>>>> typical `debugger: on_failed` hooks etc and tried many incantations on >>>>>>> the >>>>>>> command line and config files to get ansible to help me out. The >>>>>>> debugger: directive and other debugger related ansible config file stuff >>>>>>> wouldn't result in any debugger popping up. I also can't seem to pass >>>>>>> normal -vvvv flags to hosted-engine either and get it to ansible. >>>>>>> Ultimately I tried to use a `pause` directive and it complained that it >>>>>>> was >>>>>>> in a non-interactive shell. I figured it might be the result of my ssh >>>>>>> session so I enabled tty allocation with -t -t. It did not resolve the >>>>>>> issue. >>>>>>> >>>>>>> I eventually wrote-my-own/stole a callback_plugin that checks an >>>>>>> environmental variable and enables `display.verbosity = int(v)` since I >>>>>>> can't seem to pass typical -vvvv stuff to ansible through `hosted-engine >>>>>>> --deploy`. It give me the best info that I have so far. But it wont >>>>>>> give >>>>>>> me enough to debug issues around Gathering Facts or what looks like a >>>>>>> sudo/permission problem in `Copy configuration archive to storage` >>>>>>> in `create_target_vm.yml`. I took and used the exact command that they >>>>>>> use >>>>>>> manually and it works when I run it manually (But I can't get debug >>>>>>> output >>>>>>> to show me the exact sudo command being executed), hence my interest in >>>>>>> passing -vvvv or equivalent to ansible through `hosted-engine`. I >>>>>>> intentionally disabled the VM_directory cleanup so that I could execute >>>>>>> the >>>>>>> same stuff. >>>>>>> >>>>>>> So....after all that...what is a good way to get deep debug info >>>>>>> from hosted-engine ansible stuff? >>>>>>> >>>>>> >>>>>> You should already find all the relevant log entries in a file >>>>>> called /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engin >>>>>> e-setup-ansible-create_target_vm-{timestamp}-{hash}.log >>>>>> >>>>>> Can you please share it? >>>>>> >>>>>> >>>>>>> >>>>>>> Or does anyone have intuition for the possible sudo problem? >>>>>>> ~Kyle >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing list >>>>>>> [email protected] >>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

