As to why root, ultimately 'nodeapply' has to ssh in to the nodes in a way that 
currently allows arbitrary commands as root, so it's not going through the 
confluent api and so you have to be 'really root' to do nodeshell and 
nodeapply, as that is the only user, by default, allowed to ssh into nodes to 
do such a thing.

If you want a common user to be able to do that, then you can create:

/var/lib/confluent/public/site/ssh/*.rootpubkey

Files including a public key your user has access too.  Also you can use 
ssh-agent and ssh-add if you want to add /root/.ssh/id_ed25519 for 
non-interactive use in a sesion.


In terms of why that fails, there are some things I would check:
/var/log/confluent/stdout
/var/log/confluent/stderr
/var/log/confluent/trace

And the output of:

confluent_selfcheck -an node01
________________________________
From: Brian Joiner <martinitime1...@gmail.com>
Sent: Wednesday, October 2, 2024 10:00 PM
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Subject: [External] [xcat-user] Confluent nodeapply -F fail

I'm trying to setup syncfiles to transfer some configs for munge and slurm, and 
no matter what I do 'nodeapply -F <nodename>' fails if run after the node has 
booted:

[brian@confluent01 confluent]$ sudo -i nodeapply -F node01
Enter passphrase for key '/root/.ssh/id_ed25519':
node01:
node01: 
---------------------------------------------------------------------------
node01: Running python script 'syncfileclient' from 
https://10.13.13.5/confluent-public/os/rocky-9.2-x86_64-default/scripts/
node01: Executing in /tmp/confluentscripts.Nm4k5fCWl
node01: Traceback (most recent call last):
node01:   File "/tmp/confluentscripts.Nm4k5fCWl/syncfileclient", line 286, in 
<module>
node01:     synchronize()
node01:   File "/tmp/confluentscripts.Nm4k5fCWl/syncfileclient", line 233, in 
synchronize
node01:     status, rsp = 
ac.grab_url_with_status('/confluent-api/self/remotesyncfiles')
node01:   File "/opt/confluent/bin/apiclient", line 413, in grab_url_with_status
node01:     raise Exception(rsp.read())
node01: 'syncfileclient' exited with code 1
node01: Exception: b"500 - Command '['rsync', '-rvLD', 
'/tmp/tmp9wxzgajv.synctonode01/', 'root@[10.13.13.11]:/']' returned non-zero 
exit status
255."


I have tried using various section headers like APPENDONCE and REPLACE  in 
/var/lib/confluent/public/os/rocky-9.2-x86_64-default/syncfiles
 or just no header and it fails every time. I have no issue running other post 
scripts from various scripts subdirs.  The files I want to transfer are in 
/var/lib/confluent/syncfiles, but I even tried them in 
/var/lib/confluent/public/syncfiles and no difference.  Also, my user 'brian' 
is supposed to be an admin but I have to use 'sudo -i' to run anything.
Syncffiles on xCAT never gave me any issues and I'm using the same syntax in 
this syncfiles file for Confluent:  /source/file -> /destination/file




Brian Joiner
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to