On Tue, Apr 3, 2018 at 9:46 PM, RabidCicada <[email protected]> wrote:
> Indeed that is the problem. I am sshing as root > > The only possible explanations I have for this working in the ovirt > autmated testing setup are: > > - Either the ovirt automated tests must be using a host inventory file > with become: true? > - The inventory file ssh credentials happen to be to vdsm user? (And > become is just not doing anything) > - Some other janky thing that allows the access > > Yes, you are absolutely right: we are running the CI suite with no_root_squash https://github.com/oVirt/ovirt-system-tests/blob/master/common/deploy-scripts/setup_storage_nfs_el7.sh#L17 while we recommend anonuid=36,anongid=36,all_squash https://www.ovirt.org/documentation/how-to/troubleshooting/troubleshooting-nfs-storage-issues/ and this explains why it can write as root in the CI env. Thanks for the report! > > It also happens for the immediately following task `Initialize metadata > volume` and many more following that task. > > I will correct and then open a pull request to fix with your input that > you believe it's a problem. > > ~Kyle > > > On Tue, Apr 3, 2018 at 3:08 PM, RabidCicada <[email protected]> wrote: > >> Alright. By differential comparing to successfull postgres-command >> sudo-becomes. It looks like the salient difference is that `Copy >> configuration archive to storage` is missing become:true. Testing now. >> >> The similar postgres commands have become:true while this one does not. >> >> ~Kyle >> >> On Tue, Apr 3, 2018 at 1:19 PM, Simone Tiraboschi <[email protected]> >> wrote: >> >>> >>> >>> On Tue, Apr 3, 2018 at 5:21 PM, RabidCicada <[email protected]> >>> wrote: >>> >>>> I've attached a full debug packet below. I include the log file from >>>> ovirt-hosted-engine-setup. I include relevant cmd line info. I include >>>> info from the command line where epdb has a breakpoint in playbook.py from >>>> ansible itself. I also include info from commands I ran after it failed. >>>> I also include attached the ferried over script in /root/.ansible/tmp that >>>> is run. >>>> >>>> >>>> *Output on cmd line:* >>>> [ INFO ] TASK [Copy configuration archive to storage] >>>> [ ERROR ] fatal: [localhost]: FAILED! => { >>>> "changed": true, >>>> "cmd": [ >>>> "dd", >>>> "bs=20480", >>>> "count=1", >>>> "oflag=direct", >>>> "if=/var/tmp/localvmbCDQIR/5e >>>> f881f5-c992-48d2-b969-a0b6156bdf7c", >>>> "of=/rhev/data-center/mnt/nod >>>> e.local:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/image >>>> s/c5510e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-48d2-b9 >>>> 69-a0b6156bdf7c" >>>> ], >>>> "delta": "0:00:00.004336", >>>> "end": "2018-04-03 15:01:55.581823", >>>> "invocation": { >>>> "module_args": { >>>> "_raw_params": "dd bs=20480 count=1 oflag=direct >>>> if=\"/var/tmp/localvmbCDQIR/5ef881f5-c992-48d2-b969-a0b6156bdf7c\" >>>> of=\"/rhev/data-center/mnt/node.local:_srv_data/81292f3f-11d >>>> 3-4e38-9afa-62e133aa8017/images/c5510e77-1ee0-479c-b6cf-24c1 >>>> 79313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c\"", >>>> "_uses_shell": false, >>>> "chdir": null, >>>> "creates": null, >>>> "executable": null, >>>> "removes": null, >>>> "stdin": null, >>>> "warn": true >>>> } >>>> }, >>>> "msg": "non-zero return code", >>>> "rc": 1, >>>> "start": "2018-04-03 15:01:55.577487", >>>> "stderr": "dd: failed to open >>>> ā/rhev/data-center/mnt/node.local:_srv_data/81292f3f-11d3-4e >>>> 38-9afa-62e133aa8017/images/c5510e77-1ee0-479c-b6cf-24c17931 >>>> 3a45/5ef881f5-c992-48d2-b969-a0b6156bdf7cā: Permission denied", >>>> "stderr_lines": [ >>>> "dd: failed to open ā/rhev/data-center/mnt/node.lo >>>> cal:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/images/c5 >>>> 510e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7cā: >>>> Permission denied" >>>> ], >>>> "stdout": "", >>>> "stdout_lines": [] >>>> } >>>> [ ERROR ] Failed to execute stage 'Closing up': Failed executing >>>> ansible-playbook >>>> >>> >>> In the playbook we have on that task: >>> become_user: vdsm >>> become_method: sudo >>> >>> but I fear it got somehow ignored. >>> I'll investigate it. >>> >>> >>>> >>>> *Output from ansible epdb tracepoint:* >>>> >>>> Using module file /usr/lib/python2.7/site-packag >>>> es/ansible/modules/commands/command.py >>>> <localhost> ESTABLISH LOCAL CONNECTION FOR USER: root >>>> <localhost> EXEC /bin/sh -c 'echo ~ && sleep 0' >>>> <localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo >>>> /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055 `" && echo >>>> ansible-tmp-1522767715.36-81496549401055="` echo >>>> /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055 `" ) && >>>> sleep 0' >>>> <localhost> PUT /tmp/tmpGMGdjh TO /root/.ansible/tmp/ansible-tmp >>>> -1522767715.36-81496549401055/command.py >>>> <localhost> EXEC /bin/sh -c 'chmod u+x >>>> /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055/ >>>> /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055/command.py >>>> && sleep 0' >>>> <localhost> EXEC /bin/sh -c '/usr/bin/python >>>> /root/.ansible/tmp/ansible-tmp-1522767715.36-81496549401055/command.py >>>> && sleep 0' >>>> to retry, use: --limit @/usr/share/ovirt-hosted-engin >>>> e-setup/ansible/create_target_vm.retry >>>> >>>> The above command.py is the on I have attached as problematic_command.py >>>> >>>> *Investigation After Failure:* >>>> [root@node ~]# ls -al '/rhev/data-center/mnt/node.lo >>>> cal:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/images/c5 >>>> 510e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-48d2-b969-a >>>> 0b6156bdf7c' >>>> -rw-rw----. 1 vdsm kvm 20480 Apr 3 15:01 /rhev/data-center/mnt/node.loc >>>> al:_srv_data/81292f3f-11d3-4e38-9afa-62e133aa8017/images/c55 >>>> 10e77-1ee0-479c-b6cf-24c179313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c >>>> >>>> sudo -u vdsm dd bs=20480 count=1 oflag=direct >>>> if="/var/tmp/localvmbCDQIR/5ef881f5-c992-48d2-b969-a0b6156bdf7c" >>>> of="/rhev/data-center/mnt/node.local:_srv_data/81292f3f-11d3 >>>> -4e38-9afa-62e133aa8017/images/c5510e77-1ee0-479c-b6cf-24c17 >>>> 9313a45/5ef881f5-c992-48d2-b969-a0b6156bdf7c >>>> 1+0 records in >>>> 1+0 records out >>>> >>>> >>>> It seems to me that somehow it is not getting the right permissions >>>> even though the playbook has: >>>> - name: Copy configuration archive to storage >>>> command: dd bs=20480 count=1 oflag=direct if="{{ LOCAL_VM_DIR }}/{{ >>>> he_conf_disk_details.disk.image_id }}" of="{{ he_conf_disk_path }}" >>>> become_user: vdsm >>>> become_method: sudo >>>> changed_when: True >>>> tags: [ 'skip_ansible_lint' ] >>>> >>>> >>>> On Tue, Apr 3, 2018 at 8:51 AM, RabidCicada <[email protected]> >>>> wrote: >>>> >>>>> I am now also running with: >>>>> >>>>> export ANSIBLE_VERBOSITY=5 >>>>> >>>>> export ANSIBLE_FORKS=1 >>>>> >>>>> export ANSIBLE_KEEP_REMOTE_FILES=1 >>>>> >>>>> >>>>> On Tue, Apr 3, 2018 at 8:49 AM, RabidCicada <[email protected]> >>>>> wrote: >>>>> >>>>>> Here's the log. >>>>>> >>>>>> So the command that it says it ran is: >>>>>> dd bs=20480 count=1 oflag=direct if=/var/tmp/localvmHaWb6G/1cce >>>>>> 8df2-1810-4063-b4e2-e19a2c5b1909 of=/rhev/data-center/mnt/node. >>>>>> local:_srv_data/3c7485ea-14e3-40c1-b627-f89a819ed1d6/images/ >>>>>> 2c1f7c2f-b8f7-46d4-ac66-8ff1e9649e29/1cce8df2-1810-4063-b4e2 >>>>>> -e19a2c5b1909 >>>>>> >>>>>> But we all know that was done with: >>>>>> >>>>>> - name: Copy configuration archive to storage >>>>>> command: dd bs=20480 count=1 oflag=direct if="{{ LOCAL_VM_DIR >>>>>> }}/{{ he_conf_disk_details.disk.image_id }}" of="{{ >>>>>> he_conf_disk_path }}" >>>>>> become_user: vdsm >>>>>> become_method: sudo >>>>>> changed_when: True >>>>>> tags: [ 'skip_ansible_lint' ] >>>>>> >>>>>> So I manually replicated with `sudo vdsm dd bs=20480 count=1 >>>>>> if=/var/tmp/localvmHaWb6G/1cce8df2-1810-4063-b4e2-e19a2c5b1909 >>>>>> of=/rhev/data-center/mnt/node.local:_srv_data/3c7485ea-14e3- >>>>>> 40c1-b627-f89a819ed1d6/images/2c1f7c2f-b8f7-46d4-ac66-8ff1e9 >>>>>> 649e29/1cce8df2-1810-4063-b4e2-e19a2c5b1909` >>>>>> >>>>>> And it works when I manually do it. Though I think I didn't use the >>>>>> oflag=direct (Just realised this) >>>>>> >>>>>> I eventually put a pause task directly preceeding it with debug >>>>>> output that showed the file paths. I manually ran the command and it >>>>>> worked. Then let it do it....failed. I checked all the permissions e.g. >>>>>> vdsm:kvm. All looks good from a filesystem point of view. I'm beginning >>>>>> (naively) to suspect a race condition for the file access problem...but >>>>>> have come nowhere close to solving it. >>>>>> >>>>>> *Can you suggest a good way to continue executing the install process >>>>>> from the create_target_vm.yml playbook (with proper variables and context >>>>>> from otopi etc)?* I currently have to restart the entire process >>>>>> over again and wait quite a while for it to circle back around. >>>>>> >>>>>> I have since discovered epdb and I've set breakpoints directly in >>>>>> playbook.py of ansible just to see better log output. I insert >>>>>> epdb.serve >>>>>> and use netcat to connect since epdb on python 2.7.5 and up seems to have >>>>>> problems using epdb.connect() itself. >>>>>> >>>>>> ~Kyle >>>>>> >>>>>> On Tue, Apr 3, 2018 at 4:06 AM, Simone Tiraboschi < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Apr 2, 2018 at 4:52 PM, RabidCicada <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Heyo everyone. I'm trying to debug hosted-engine --deploy. It is >>>>>>>> failing in `Copy configuration archive to storage` in >>>>>>>> `create_target_vm.yml` from `hosted-engine --deploy`. My general >>>>>>>> and most important query here is how to get good debug output from >>>>>>>> ansible >>>>>>>> through hosted-engine. I'm running hosted-engine through an ssh >>>>>>>> session. >>>>>>>> >>>>>>>> I can't figure out how to get good debug output from ansible within >>>>>>>> that workflow. I see it's running through otopi, I tried setting >>>>>>>> typical `debugger: on_failed` hooks etc and tried many incantations on >>>>>>>> the >>>>>>>> command line and config files to get ansible to help me out. The >>>>>>>> debugger: directive and other debugger related ansible config file >>>>>>>> stuff >>>>>>>> wouldn't result in any debugger popping up. I also can't seem to pass >>>>>>>> normal -vvvv flags to hosted-engine either and get it to ansible. >>>>>>>> Ultimately I tried to use a `pause` directive and it complained that >>>>>>>> it was >>>>>>>> in a non-interactive shell. I figured it might be the result of my ssh >>>>>>>> session so I enabled tty allocation with -t -t. It did not resolve the >>>>>>>> issue. >>>>>>>> >>>>>>>> I eventually wrote-my-own/stole a callback_plugin that checks an >>>>>>>> environmental variable and enables `display.verbosity = int(v)` since I >>>>>>>> can't seem to pass typical -vvvv stuff to ansible through >>>>>>>> `hosted-engine >>>>>>>> --deploy`. It give me the best info that I have so far. But it wont >>>>>>>> give >>>>>>>> me enough to debug issues around Gathering Facts or what looks like a >>>>>>>> sudo/permission problem in `Copy configuration archive to storage` >>>>>>>> in `create_target_vm.yml`. I took and used the exact command that >>>>>>>> they use >>>>>>>> manually and it works when I run it manually (But I can't get debug >>>>>>>> output >>>>>>>> to show me the exact sudo command being executed), hence my interest in >>>>>>>> passing -vvvv or equivalent to ansible through `hosted-engine`. I >>>>>>>> intentionally disabled the VM_directory cleanup so that I could >>>>>>>> execute the >>>>>>>> same stuff. >>>>>>>> >>>>>>>> So....after all that...what is a good way to get deep debug info >>>>>>>> from hosted-engine ansible stuff? >>>>>>>> >>>>>>> >>>>>>> You should already find all the relevant log entries in a file >>>>>>> called /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engin >>>>>>> e-setup-ansible-create_target_vm-{timestamp}-{hash}.log >>>>>>> >>>>>>> Can you please share it? >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Or does anyone have intuition for the possible sudo problem? >>>>>>>> ~Kyle >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list >>>>>>>> [email protected] >>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

