Dario & Brodsky,

Running into the same problem here  on a fresh install of CentOS5.6, with 
xCAT-2.6.8-snap201109140900 installed from the xcat-core and xcat-deps repos on 
sourceforge. It also seems that DNS is correct, I can resolve both the master 
and node host names from the node undergoing installation. I added some code to 
my pre.rh script to enable ssh during the install. After the installation has 
finished, the system just sits there and never reboots. 

I see messages like this in the logs on the master node.

Oct 17 15:39:39 itrc-adm2 xCAT: start up sshd 
Oct 17 15:39:39 itrc-adm2 xCAT: /xcatpost/syncfiles: there is no sync file 
template for the node 
Oct 17 15:39:39 itrc-adm2 xcat: addsiteyum: repos/centos5.7/x86_64 is not a 
directory 
Oct 17 15:39:39 itrc-adm2 xcat: Retrying flag update 
Oct 17 15:40:19 itrc-adm2 last message repeated 4 times 
Oct 17 15:41:29 itrc-adm2 last message repeated 7 times 
 

I logged into the node itrc-adm2 and ran ps to see what was running and I came 
across this:

3609 root       1004 S   /bin/awk -f /xcatpost/updateflag.awk 10.250.19.1 3002  
 -------------> the ip address shown is correct for the master node.


Which is interesting because the directory /xcatpost doesn't exist on  
itrc-adm2. 

I looked around and there are xcatpost files in /mnt/sysimage/xcatpost/.

If you need me to post anymore information please let me know.

Regards,

Jamie I. Fargen
Systems Administrator
Research Computing
University of South Florida
[email protected]
813-974-4108

________________________________________
From: [email protected] 
[[email protected]]
Sent: Monday, October 17, 2011 1:17 PM
To: [email protected]
Subject: xCAT-user Digest, Vol 26, Issue 10

Send xCAT-user mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.sourceforge.net/lists/listinfo/xcat-user
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of xCAT-user digest..."


Today's Topics:

   1. updateflag.awk hangs forever? (Dario Dorella)
   2. Re: updateflag.awk hangs forever? (Russell Jones)
   3. Re: updateflag.awk hangs forever? (Brodsky Denis-RM08520)
   4. Re: updateflag.awk hangs forever? (Lissa Valletta)
   5. updateflag.awk hangs forever? (Dario Dorella)
   6. updateflag.awk hangs forever? (Dario Dorella)
   7. updateflag.awk hangs forever? (Dario Dorella)
   8. Re: updateflag.awk hangs forever? (Lissa Valletta)


----------------------------------------------------------------------

Message: 1
Date: Sun, 16 Oct 2011 11:46:07 +0200
From: Dario Dorella <[email protected]>
Subject: [xcat-user] updateflag.awk hangs forever?
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hello list,

   I am trying to install a CentOS5 cluster using xCAT, and looking at
what happens during installation it seems that the updateflag.awk never
receives its "done" message from xcatd and keeps looping.

Has anybody an idea on what I might be doing wrong and on how to debug this?


Thx,
Dario



------------------------------

Message: 2
Date: Sun, 16 Oct 2011 20:11:33 -0500
From: Russell Jones <[email protected]>
Subject: Re: [xcat-user] updateflag.awk hangs forever?
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Sounds like perhaps a postscript may be hanging. Do you have any custom
postscripts? Any interesting messages in the other terminal windows
during install?

Are you truly doing diskfull node installation or are these diskless
nodes? There's a known issue with CentOS 5.5 and the xcatdsklspost
script that will cause a stateless node to hang during boot.



On 10/16/2011 4:46 AM, Dario Dorella wrote:
> Hello list,
>
>     I am trying to install a CentOS5 cluster using xCAT, and looking at
> what happens during installation it seems that the updateflag.awk never
> receives its "done" message from xcatd and keeps looping.
>
> Has anybody an idea on what I might be doing wrong and on how to debug this?
>
>
> Thx,
> Dario
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2d-oct
> _______________________________________________
> xCAT-user mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>



------------------------------

Message: 3
Date: Mon, 17 Oct 2011 06:35:21 +0000
From: Brodsky Denis-RM08520 <[email protected]>
Subject: Re: [xcat-user] updateflag.awk hangs forever?
To: xCAT Users Mailing list <[email protected]>
Message-ID:
        
<cecdca94ddf5884a84ea374a38c714a301042...@039-sn1mpn1-003.039d.mgd.msft.net>

Content-Type: text/plain; charset="us-ascii"

Hello,

I have same problem, still no fix


-----Original Message-----
From: Dario Dorella [mailto:[email protected]]
Sent: Sunday, October 16, 2011 11:46
To: [email protected]
Subject: [xcat-user] updateflag.awk hangs forever?

Hello list,

   I am trying to install a CentOS5 cluster using xCAT, and looking at what 
happens during installation it seems that the updateflag.awk never receives its 
"done" message from xcatd and keeps looping.

Has anybody an idea on what I might be doing wrong and on how to debug this?


Thx,
Dario

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a 
definitive record of customers, application performance, security threats, 
fraudulent activity and more. Splunk takes this data and makes sense of it. 
Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user





------------------------------

Message: 4
Date: Mon, 17 Oct 2011 07:29:58 -0400
From: Lissa Valletta <[email protected]>
Subject: Re: [xcat-user] updateflag.awk hangs forever?
To: xCAT Users Mailing list <[email protected]>
Cc: [email protected]
Message-ID:
        <of1822366a.668d3d32-on8525792c.003eff15-8525792c.003f2...@us.ibm.com>
Content-Type: text/plain; charset=US-ASCII

Many times it is because at that point during the install,  the node cannot
contact the Management Server by  the provided ip address.  Check   site
table master attribute  ( ip address as known by the node )
and /etc/resolv.conf on the node.

Lissa K. Valletta
2-3/T12
Poughkeepsie, NY 12601
(tie 293) 433-3102





From:   Russell Jones <[email protected]>
To:     [email protected]
Date:   10/16/2011 09:14 PM
Subject:        Re: [xcat-user] updateflag.awk hangs forever?



Sounds like perhaps a postscript may be hanging. Do you have any custom
postscripts? Any interesting messages in the other terminal windows
during install?

Are you truly doing diskfull node installation or are these diskless
nodes? There's a known issue with CentOS 5.5 and the xcatdsklspost
script that will cause a stateless node to hang during boot.



On 10/16/2011 4:46 AM, Dario Dorella wrote:
> Hello list,
>
>     I am trying to install a CentOS5 cluster using xCAT, and looking at
> what happens during installation it seems that the updateflag.awk never
> receives its "done" message from xcatd and keeps looping.
>
> Has anybody an idea on what I might be doing wrong and on how to debug
this?
>
>
> Thx,
> Dario
>
>
------------------------------------------------------------------------------

> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2d-oct
> _______________________________________________
> xCAT-user mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>

------------------------------------------------------------------------------

All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user





------------------------------

Message: 5
Date: Mon, 17 Oct 2011 15:56:55 +0200
From: Dario Dorella <[email protected]>
Subject: [xcat-user] updateflag.awk hangs forever?
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

I looked in the installing node, under /tmp/ and in ~root for log files
and messages. The problem seems to be that xCATd never answer with
"ready", all I can get is an empty response from $MASTER:3002.

I was able to replicate the problem on a working environment by screwing
name resolution, but seems that on the machines where this happens
uncalled for node resolution is fine.

Is there any way I can trace what's happening from the xCATd point of
view? I want to know why when it receives the call on 3002 it answers
with an empty string.


Thx,
Dario



------------------------------

Message: 6
Date: Mon, 17 Oct 2011 16:02:42 +0200
From: Dario Dorella <[email protected]>
Subject: [xcat-user] updateflag.awk hangs forever?
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Lissa,

   I don't think this is network related: "tcpdump" and "netstat" showed
the right stuff. Again, how can I get an insight on what xCATd is thinking?


Thx,
Dario



------------------------------

Message: 7
Date: Mon, 17 Oct 2011 15:47:54 +0200
From: Dario Dorella <[email protected]>
Subject: [xcat-user] updateflag.awk hangs forever?
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

I looked in the installing node, under /tmp/ and in ~root for log files
and messages. The problem seems to be that xCATd never answer with
"ready", all I can get is an empty response from $MASTER:3002.

I was able to replicate the problem on a working environment by screwing
name resolution, but seems that on the machines where this happens
uncalled for node resolution is fine.

Is there any way I can trace what's happening from the xCATd point of
view? I want to know why when it receives the call on 3002 it answers
with an empty string.


Thx,
Dario



------------------------------

Message: 8
Date: Mon, 17 Oct 2011 13:06:54 -0400
From: Lissa Valletta <[email protected]>
Subject: Re: [xcat-user] updateflag.awk hangs forever?
To: xCAT Users Mailing list <[email protected]>
Cc: [email protected]
Message-ID:
        <of698cbd29.781aaf5b-on8525792c.005d7216-8525792c.005e0...@us.ibm.com>
Content-Type: text/plain; charset=US-ASCII

If you look on the node,  you will see in /tmp/mypostscript ,  this is the
script that runs after install.  The last thing that is done is to
run
updateflag.awk $MASTER 3002 "installstatus booted"    which is going to
send the status over port 3002 to $MASTER.
$MASTER should be defined above in the script in an export, like below.
If for some reason we cannot contact the Management Node by the address
that is there, then the booted status never gets set.

For example:
/tmp/mypostscript:
.
.
MASTER=10.16.0.103
export MASTER
.
.
.
updateflag.awk $MASTER 3002 "installstatus booted"

You could check /var/log/xcat/xcat.log  on the node also.

Lissa K. Valletta
2-3/T12
Poughkeepsie, NY 12601
(tie 293) 433-3102





From:   Dario Dorella <[email protected]>
To:     [email protected]
Date:   10/17/2011 10:13 AM
Subject:        [xcat-user] updateflag.awk hangs forever?



Hi Lissa,

   I don't think this is network related: "tcpdump" and "netstat" showed
the right stuff. Again, how can I get an insight on what xCATd is thinking?


Thx,
Dario

------------------------------------------------------------------------------

All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user






------------------------------

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct

------------------------------

_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user


End of xCAT-user Digest, Vol 26, Issue 10
*****************************************

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to