Re: role of loadlog table - computerloadlog

Brian Bouterse Tue, 10 Feb 2009 10:25:27 -0800

When running the netstat -an command on our reserved machine we getthe following line:

tcp 0 0 ::ffff:152.14.17.82:22 ::ffff:152.14.243:50257ESTABLISHED

The ::ffff: explains why the regex below doesn't match. Any commentson what to do about this? Change the regex?

/tcp\s+([0-9]*)\s+([0-9]*)\s($ipaddress:22)\s+([.0-9]*):([0-9]*)(.*)(ESTABLISHED)/


Brian


Brian Bouterse
Secure Open Systems Initiative
919.698.8796




On Feb 7, 2009, at 12:15 PM, Andy Kurth wrote:

The reserved process is not detecting that you have logged in to thenode. The problem is occurring in utils.pm::check_connection().For Linux images, the subroutine attempts to determine if the useris logged in via "netstat -an" and "who".
The "who" command is failing to determine that you logged in becausethe login name you're using is "admin" instead of "administrator".I think you'll be able to get past this problem if you change yourreservation login name (user.unityid) to "administrator".
(Note: There's a bug in reserved.pm which is causing this. It ispassing "Administrator" instead of the correct user name tocheck_connection() if forimaging=1, OS=Linux, and userstandalone=1.)
The "netstat -an" command is failing because the regex inutils.pm::check_connection() isn't finding a match. Here's theregex and what the "netstat -an" output line looks like when Iconnect to a Linux image via SSH:
/tcp\s+([0-9]*)\s+([0-9]*)\s($ipaddress:22)\s+([.0-9]*):([0-9]*)(.*)(ESTABLISHED)/
tcp 0 0 152.46.18.198:2269.111.94.75:51803 ESTABLISHED
I'm not sure exactly why it isn't finding a match because thenetstat output isn't displayed in the log file. To troubleshootthis, first remove the last "1" argument from this command near line1911 in utils.pm:run_ssh_command($shortnodename, $identity, "netstat -an", "root",22, 1);
The last argument instructs run_ssh_command to not display thecommand output. This was added because netstat returns a lot ofoutput and is called many times when a request is in the reservedstate. The output was causing the log file to grow quickly.
Once you see the netstat output in the log file you should be ableto tell why the regex isn't matching. The $ipaddress variable isthe IP of the node. In your log file, it is 152.14.17.82. The 4thgrouping returned by the regex should match your remote IP, whichwas 152.14.244.35 in your log file. You should expect to see anetstat line like this:tcp 0 0 152.14.17.82:22152.14.244.35:51803 ESTABLISHED
The "connect to host test_vm port 22: No route to host" failures atthe end of the log file begin to occur after you manually delete thecomputerloadlog 'begin' entry, the image.pm process is then able tostart up, then this is called:
2009-02-06 14:56:56|31543|93:93|image|esx.pm:capture(524)|Power offcommand: /usr/lib/vmware-viperl/apps/vm/vmcontrol.pl --server'blade2-7.cnl.ncsu.edu' --vmname test_vm --operation poweroff --username root--password 'xxx'
2009-02-06 14:57:03|31543|93:93|image|esx.pm:capture(527)|Powered off:
|31543|93:93|image| virtual machine 'test_vm' under hostblade2-7.cnl.ncsu.edu powered off
Hope this helps,
Andy

Brian Bouterse wrote:
Our testing steps are these:
- make the reservation/request
- wait for connect button
- click connect button
- connect to the machine
- touch a file for uniqueness purposes
- then you can hit the create image button
- indicate we want to update the image (not create a new image fromthis one).Still looking into why the child create to handle the originalimaging reservation/request continues to try to ssh continuouslyinstead of marking the imaging event as in use like it should.That log is attached; it's towards the end, and really shows itselfwith repeated messages like:
|31467|93:93|reserved| ---- WARNING ----
|31467|93:93|reserved| 2009-02-06 14:58:21|31467|93:93|reserved|utils.pm:run_ssh_command(6807)|failed to run SSH command after 3attempts, command: /usr/bin/ssh -i /etc/vcl/bladelinuxkey_id_rsa -l root -p 22 -x test_vm 'who' 2>&1, exit status: 255, output:|31467|93:93|reserved| ssh output (who): ssh: connect to hosttest_vm port 22: No route to host
|31467|93:93|reserved| ( 0) utils.pm, notify (line: 683)
|31467|93:93|reserved| (-1) utils.pm, run_ssh_command (line: 6807)
|31467|93:93|reserved| (-2) utils.pm, check_connection (line: 1931)
|31467|93:93|reserved| (-3) reserved.pm, process (line: 410)
|31467|93:93|reserved| (-4) vcld, make_new_child (line: 599)
|31467|93:93|reserved| (-5) vcld, main (line: 347)
Brian
Brian Bouterse
Secure Open Systems Initiative
919.698.8796
On Feb 6, 2009, at 3:41 PM, Aaron Peeler wrote:
What are your steps when you start the imaging process?

You have to do the following:
- make the reservation/request
- wait for connect button
- click connect button
- connect to the machine
- then you can hit the create image button to start the imagingprocess.
If your in the reserved state as reported in the log file below -the process is polling the node looking for the user to connect.Once connected then it proceeds to the inuse state.
This is actually a user interface bug. We're assuming users willactually login to the remote node and install something - but inyour case if your just testing I can see going straight to theimage creation mode. Created JIRA issue VCL-84.
Under normal operation you don't need to clean on thecomputerloadlog. Since your developing and likely having to breakthings, it's going to behave a little bit different.
Aaron
--On February 6, 2009 3:12:16 PM -0500 Andrew Brown <brow...@gmail.com> wrote:
So we have things mostly working except for one detail: When wecheck outa computer for capture through "update image", all that goesfine. Butwhen we go and hit "create" to actually capture the image, vcldhangs
with the log messages that Brian posted, repeating.
If we go in and remove the database entry in the computerloadlogtable,
the capture goes through; however, there seems to be a process that
lingers and keeps trying to ssh to the machine even when ourcapture is
proceeding. It gives us log entries like this:
2009-02-06
14:56:57|31467|93:93|reserved|utils.pm:check_connection(1828)|checking
for connection by administrator on test_vm, attempt 4
2009-02-06
14:56:57|31467|93:93|reserved|utils.pm:run_ssh_command(6729)|executing
SSH command on test_vm: netstat -an
2009-02-06
14:56:58|31467|93:93|reserved|utils.pm:run_ssh_command(6748)|$?: 0,
signal: 0, core dump: 0, exit status: 0
2009-02-06
14:56:58|31467|93:93|reserved|utils.pm:run_ssh_command(6799)|SSHcommand
executed on test_vm: netstat -an, returning (0, output)
2009-02-06
14:56:58|31467|93:93|reserved|utils.pm:run_ssh_command(6729)|executing
SSH command on test_vm: who
2009-02-06
14:56:58|31467|93:93|reserved|utils.pm:run_ssh_command(6748)|$?: 0,
signal: 0, core dump: 0, exit status: 0
2009-02-06
14:56:58|31467|93:93|reserved|utils.pm:run_ssh_command(6796)|run_ssh_comm
and output: admin pts/0 Feb 6 15:59 (152.14.244.35)
2009-02-06
14:56:58|31467|93:93|reserved|utils.pm:run_ssh_command(6799)|SSHcommand
executed on test_vm: who, returning (0, output)
(the pid being different from the one that is going into ourcapture
routine)
This may be unrelated, I don't know. But we need to fix it so wedon'tneed to manually delete the computerloadlog entry each time, aswell as
fix it so this second process... goes away or something.

-Andrew
On Fri, Feb 6, 2009 at 1:23 PM, Aaron Peeler<aaron_pee...@ncsu.edu>
wrote:
Actually it's computerloadlog

The computerloadlog table has three purposes.
It also has two dependent tables computerloadflow andcomputerloadstate
1) Provides information to the user about what state userrequest is in,where it's at in loading, performing post install tasks, addinguseracct, etc. This information is seen on the current reservationspage bythe user clicking on the pending link. This link opens a ajaxbasedwindow and will provided the state the load is in. At some pointduring
your load routine - you'll want to add these type of calls in the
appropriate places.

insertloadlog($reservation_id, $computer_id, "startload",
"$computer_node_name $image_name");
But this isn't required to get a provision module functional.It's a nicefeature, so the user can be aware of what's going on. We'll puttogether
information on how to best use the insertloadlog.
2) If a new|reload request is started and the computer state isreload orreloading. The forked request process will use that table todeterminewhat to do. Either take over, fail, or let the current owningprocess
complete.
3) It is also used in vcld to prevent duplicate or competingprocesses.
It gets cleaned out on a per-reservation basis. When areservations goesinto the inuse state and when a reservation is removed from thedb.
In your case, since your just testing your module, just manuallyempty
the computerloadlog table as needed.

Aaron



--On February 6, 2009 11:20:47 AM -0500 Brian Bouterse
<bmbou...@ncsu.edu> wrote:

What is the purpose of the loadlog table?  Currently the ESX
provisioning
module doesn't update any information in the loadlog table, andtheprovisioning works fine. We're working on the capture()portions now,
and when we go through the web interface to have it "capture" our
reservation, vcld warns saying:

| 25052|72:72|image| ---- WARNING ----
| 25052|72:72|image| 2009-02-06
| 11:19:56|25052|72:72|image|vcld:main(272)|reservation 72 isalready| being processed 25052|72:72|image| ( 0) utils.pm, notify(line: 683)
| 25052|72:72|image| (-1) vcld, main (line: 272)

Is this normal for the "image update" process?

Best,
Brian

Brian Bouterse
Secure Open Systems Initiative
919.698.8796



\

Re: role of loadlog table - computerloadlog

Reply via email to