There are two problems on the table currently:
Problem 1) The linux portions of the utils.pm::check_connection() function currently use two checks, one 'netstat' command check and one 'who' command check. With IPv6 hosts, the netstat returned lines don't look the same, therefore the regex on the netstat return doesn't stand a chance of working. I'm concerned that this type of regex is a losing battle. What do you all think? The who seems to work more reliable, could we just take netstat out?
An aside question .... Why do we require VCL to be aware of when the users connect in the first place? What if I want to reserve a server instance, but never want to SSH into it. Currently, VCL would not allow that use case because of these additional checks.
Problem 2) When a reservation is made, VCL gives us a username, password, and an IP to log in with. We logon to the web portal with the 'admin' username, but when VCL goes to determine if the user has logged in yet, it checks for the user "administrator". Why are they different? How do we get it to use the same username that was given to the user?
Thanks! Brian Brian Bouterse Secure Open Systems Initiative 919.698.8796 On Feb 10, 2009, at 1:24 PM, Brian Bouterse wrote:
When running the netstat -an command on our reserved machine we get the following line:tcp 0 0 ::ffff:22.214.171.124:22 ::ffff:152.14.243:50257 ESTABLISHEDThe ::ffff: explains why the regex below doesn't match. Any comments on what to do about this? Change the regex?/tcp\s+([0-9]*)\s+([0-9]*)\s($ipaddress:22)\s+([.0-9]*):([0-9]*)(.*) (ESTABLISHED)/Brian Brian Bouterse Secure Open Systems Initiative 919.698.8796 On Feb 7, 2009, at 12:15 PM, Andy Kurth wrote:The reserved process is not detecting that you have logged in to the node. The problem is occurring in utils.pm::check_connection(). For Linux images, the subroutine attempts to determine if the user is logged in via "netstat -an" and "who".The "who" command is failing to determine that you logged in because the login name you're using is "admin" instead of "administrator". I think you'll be able to get past this problem if you change your reservation login name (user.unityid) to "administrator".(Note: There's a bug in reserved.pm which is causing this. It is passing "Administrator" instead of the correct user name to check_connection() if forimaging=1, OS=Linux, and userstandalone=1.)The "netstat -an" command is failing because the regex in utils.pm::check_connection() isn't finding a match. Here's the regex and what the "netstat -an" output line looks like when I connect to a Linux image via SSH:/tcp\s+([0-9]*)\s+([0-9]*)\s($ipaddress:22)\s+([.0-9]*):([0-9]*)(.*) (ESTABLISHED)/tcp 0 0 126.96.36.199:22 188.8.131.52:51803 ESTABLISHEDI'm not sure exactly why it isn't finding a match because the netstat output isn't displayed in the log file. To troubleshoot this, first remove the last "1" argument from this command near line 1911 in utils.pm: run_ssh_command($shortnodename, $identity, "netstat -an", "root", 22, 1);The last argument instructs run_ssh_command to not display the command output. This was added because netstat returns a lot of output and is called many times when a request is in the reserved state. The output was causing the log file to grow quickly.Once you see the netstat output in the log file you should be able to tell why the regex isn't matching. The $ipaddress variable is the IP of the node. In your log file, it is 184.108.40.206. The 4th grouping returned by the regex should match your remote IP, which was 220.127.116.11 in your log file. You should expect to see a netstat line like this: tcp 0 0 18.104.22.168:22 22.214.171.124:51803 ESTABLISHEDThe "connect to host test_vm port 22: No route to host" failures at the end of the log file begin to occur after you manually delete the computerloadlog 'begin' entry, the image.pm process is then able to start up, then this is called:2009-02-06 14:56:56|31543|93:93|image|esx.pm:capture(524)|Power off command: /usr/lib/vmware-viperl/apps/vm/vmcontrol.pl --server 'blade2-7. cnl.ncsu.edu' --vmname test_vm --operation poweroff --username root --password 'xxx'2009-02-06 14:57:03|31543|93:93|image|esx.pm:capture(527)|Powered off: |31543|93:93|image| virtual machine 'test_vm' under host blade2-7.cnl.ncsu.edu powered offHope this helps, Andy Brian Bouterse wrote:Our testing steps are these: - make the reservation/request - wait for connect button - click connect button - connect to the machine - touch a file for uniqueness purposes - then you can hit the create image button- indicate we want to update the image (not create a new image from this one). Still looking into why the child create to handle the original imaging reservation/request continues to try to ssh continuously instead of marking the imaging event as in use like it should. That log is attached; it's towards the end, and really shows itself with repeated messages like:|31467|93:93|reserved| ---- WARNING ----|31467|93:93|reserved| 2009-02-06 14:58:21|31467|93:93|reserved| utils.pm:run_ssh_command(6807)|failed to run SSH command after 3 attempts, command: /usr/bin/ssh -i /etc/vcl/bladelinuxkey_id_rsa - l root -p 22 -x test_vm 'who' 2>&1, exit status: 255, output: |31467|93:93|reserved| ssh output (who): ssh: connect to host test_vm port 22: No route to host|31467|93:93|reserved| ( 0) utils.pm, notify (line: 683) |31467|93:93|reserved| (-1) utils.pm, run_ssh_command (line: 6807) |31467|93:93|reserved| (-2) utils.pm, check_connection (line: 1931) |31467|93:93|reserved| (-3) reserved.pm, process (line: 410) |31467|93:93|reserved| (-4) vcld, make_new_child (line: 599) |31467|93:93|reserved| (-5) vcld, main (line: 347) Brian Brian Bouterse Secure Open Systems Initiative 919.698.8796 On Feb 6, 2009, at 3:41 PM, Aaron Peeler wrote:What are your steps when you start the imaging process? You have to do the following: - make the reservation/request - wait for connect button - click connect button - connect to the machine- then you can hit the create image button to start the imaging process.If your in the reserved state as reported in the log file below - the process is polling the node looking for the user to connect. Once connected then it proceeds to the inuse state.This is actually a user interface bug. We're assuming users will actually login to the remote node and install something - but in your case if your just testing I can see going straight to the image creation mode. Created JIRA issue VCL-84.Under normal operation you don't need to clean on the computerloadlog. Since your developing and likely having to break things, it's going to behave a little bit different.Aaron--On February 6, 2009 3:12:16 PM -0500 Andrew Brown <brow...@gmail.com > wrote:So we have things mostly working except for one detail: When we check out a computer for capture through "update image", all that goes fine. But when we go and hit "create" to actually capture the image, vcld hangswith the log messages that Brian posted, repeating.If we go in and remove the database entry in the computerloadlog table, the capture goes through; however, there seems to be a process that lingers and keeps trying to ssh to the machine even when our capture isproceeding. It gives us log entries like this: 2009-02-0614:56:57|31467|93:93|reserved|utils.pm:check_connection(1828)| checkingfor connection by administrator on test_vm, attempt 4 2009-02-0614:56:57|31467|93:93|reserved|utils.pm:run_ssh_command(6729)| executingSSH command on test_vm: netstat -an 2009-02-0614:56:58|31467|93:93|reserved|utils.pm:run_ssh_command(6748)|$?: 0,signal: 0, core dump: 0, exit status: 0 2009-02-0614:56:58|31467|93:93|reserved|utils.pm:run_ssh_command(6799)|SSH commandexecuted on test_vm: netstat -an, returning (0, output) 2009-02-0614:56:58|31467|93:93|reserved|utils.pm:run_ssh_command(6729)| executingSSH command on test_vm: who 2009-02-0614:56:58|31467|93:93|reserved|utils.pm:run_ssh_command(6748)|$?: 0,signal: 0, core dump: 0, exit status: 0 2009-02-0614:56:58|31467|93:93|reserved|utils.pm:run_ssh_command(6796)| run_ssh_command output: admin pts/0 Feb 6 15:59 (126.96.36.199) 2009-02-0614:56:58|31467|93:93|reserved|utils.pm:run_ssh_command(6799)|SSH commandexecuted on test_vm: who, returning (0, output)(the pid being different from the one that is going into our captureroutine)This may be unrelated, I don't know. But we need to fix it so we don't need to manually delete the computerloadlog entry each time, as well asfix it so this second process... goes away or something. -AndrewOn Fri, Feb 6, 2009 at 1:23 PM, Aaron Peeler <aaron_pee...@ncsu.edu >wrote:Actually it's computerloadlog The computerloadlog table has three purposes.It also has two dependent tables computerloadflow and computerloadstate1) Provides information to the user about what state user request is in, where it's at in loading, performing post install tasks, adding user acct, etc. This information is seen on the current reservations page by the user clicking on the pending link. This link opens a ajax based window and will provided the state the load is in. At some point duringyour load routine - you'll want to add these type of calls in the appropriate places. insertloadlog($reservation_id, $computer_id, "startload", "$computer_node_name $image_name");But this isn't required to get a provision module functional. It's a nice feature, so the user can be aware of what's going on. We'll put togetherinformation on how to best use the insertloadlog.2) If a new|reload request is started and the computer state is reload or reloading. The forked request process will use that table to determine what to do. Either take over, fail, or let the current owning processcomplete.3) It is also used in vcld to prevent duplicate or competing processes.It gets cleaned out on a per-reservation basis. When a reservations goes into the inuse state and when a reservation is removed from the db.In your case, since your just testing your module, just manually emptythe computerloadlog table as needed. Aaron --On February 6, 2009 11:20:47 AM -0500 Brian Bouterse <bmbou...@ncsu.edu> wrote: What is the purpose of the loadlog table? Currently the ESX provisioningmodule doesn't update any information in the loadlog table, and the provisioning works fine. We're working on the capture() portions now, and when we go through the web interface to have it "capture" ourreservation, vcld warns saying: | 25052|72:72|image| ---- WARNING ---- | 25052|72:72|image| 2009-02-06| 11:19:56|25052|72:72|image|vcld:main(272)|reservation 72 is already | being processed 25052|72:72|image| ( 0) utils.pm, notify (line: 683)| 25052|72:72|image| (-1) vcld, main (line: 272) Is this normal for the "image update" process? Best, Brian Brian Bouterse Secure Open Systems Initiative 919.698.8796 \