Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7

2015-11-24 Thread Josh Nielsen
Also there are two logs that you can utilize for investigating
postscripts more. As long as you can get onto the deployed OS (and the
deployment is not a total failure) when postscripts do not complete
you can look in that node's local /var/log/xcat/xcat.log. An optional
log, which you will have to set for yourself in the kickstart (for
testing you can just edit the /install/autoinst/[nodename] kickstart
file directly), is to specify a log file for your %post section like
this: %post --log=/root/ks-post.log. If all else fails you can add
your own debugging messages to xcatdsklspost to track how far you are
getting.

I hope that helps in some way.

Regards,
Josh

On Tue, Nov 24, 2015 at 10:18 AM, Josh Nielsen  wrote:
> Hi Nathan,
>
> Well, I may not be of any help at all (and I'm not familiar with Centos 7
> deployment) but since you mentioned DNS it reminded me that once I did a
> makedns -e [external_server] for some Centos 6.5 nodes and for some reason
> only the forward lookup entries were added to the DNS server but not the
> reverse entries. My postscripts were failing each time for an unknown reason
> when I attempted to deploy the nodes, and it turns out that it couldn't
> complete without the reverse lookup because some of the xCAT code in
> xcatdsklspost queries it.
>
> After running the command manually I saw this error:
>
>> /opt/xcat/xcatdsklspost 6
>
> awk: //xcatpost/updateflag.awk:22: fatal: remote host and port information
> (3002, installstatus booted) invalid
>
>
> When I searched for a solution to this error I found this which pointed me
> to a reverse lookup error:
> http://sourceforge.net/p/xcat/mailman/message/27872412/
>
> So maybe that's just one more thing to strike off your DNS and postscript
> checklist, if you haven't already.
>
> Regards,
> Josh
>
> On Tue, Nov 24, 2015 at 9:16 AM, Heald, Nathan T.  wrote:
>>
>> No, this is still in the same state.
>>
>> As best as I can tell DNS resolution is working, but it’s possible I’m
>> mistaken. No post scripts run so not much should be different once it boots
>> a second time (and I run a "nodeset boot” before it starts a 2nd install).
>> It can ping the xcat server’s hostname fine once it comes back up and I can
>> examine the node. I haven’t had much more time to work on this, but I do
>> plan to keep chipping away at it as time allows.
>>
>> This is the xcat error I see for reference:
>> Oct 21 16:35:15 oss01 systemd: Starting LSB: xCATpost...
>> Oct 21 16:35:15 oss01 xcatpostinit1: /opt/xcat/xcatinstallpost: line 9:
>> /xcatpost/xcatlib.sh: No such file or directory
>> Oct 21 16:35:15 oss01 systemd: xcatpostinit1.service: control process
>> exited, code=exited status=1
>> Oct 21 16:35:15 oss01 systemd: Failed to start LSB: xCATpost.
>> Oct 21 16:35:15 oss01 systemd: Unit xcatpostinit1.service entered failed
>> state.
>>
>> Thanks,
>> -Nathan
>>
>>
>> From: Josh Nielsen 
>> Date: Monday, November 23, 2015 at 3:43 PM
>> To: xCAT Users Mailing list , Nathan
>> Heald 
>> Cc: "russa...@comcast.net" 
>> Subject: Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7
>>
>> And did you ever figure out your problem Nathan?
>>
>> -Josh
>>
>> On Mon, Nov 23, 2015 at 2:34 PM, Josh Nielsen 
>> wrote:
>>>
>>> I was going to post a new thread about Centos 7 but thought I might piggy
>>> back on this one since it is a similar topic. I have xCAT 2.8.3 currently
>>> and it sounds like from the thread here that upgrading to 2.10 is part of
>>> the solution for deploying Centos 7.1. Currently the issue I'm having is
>>> that during a PXE boot it fetches the correct centos 7.1 image and begins
>>> trying to deploy but after it shows "Mounted Configuration File System" and
>>> "Started Show Plymouth Boot Screen" I get a message like this:
>>> "dracut-initqueue: Warning: Could not boot" and hangs there.
>>>
>>> This isn't a very explicit error, as to what caused it, and I didn't
>>> learn much from removing "quiet" from the PXE kernel boot parameters. I also
>>> can't get the ctrl+alt+F keys to work, at least in a VM, for showing
>>> different terminals like you could in Centos 6 (which was very helpful for
>>> debugging).
>>>
>>> Is this most likely because the kickstart file itself is not formatted
>>> correctly (I'm reusing my Centos 6.4 kickstart - until I figure out how
>>> Centos 7.1 differs), or perhaps that the kickstart is not being properly
>>> fetched from the gpxe configuration in /tftpboot/xcat/xnba/nodes? Currently
>>> the gpxe boot configuration for the node I'm trying to deploy Centos7.1 to
>>> looks like this:
>>>
>>> #!gpxe
>>> #install centos7.1-x86_64-compute
>>> imgfetch -n kernel
>>> http://${next-server}/tftpboot/xcat/osimage/centos7.1-x86_64-install-compute/vmlinuz
>>> imgload kernel
>>> imgargs kernel repo=http://10.20.0.101:80/install/centos7.1/x86_64
>>> 

Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7

2015-11-24 Thread Josh Nielsen
Hi Nathan,

Well, I may not be of any help at all (and I'm not familiar with Centos 7
deployment) but since you mentioned DNS it reminded me that once I did a
makedns -e [external_server] for some Centos 6.5 nodes and for some reason
only the forward lookup entries were added to the DNS server but not the
reverse entries. My postscripts were failing each time for an unknown
reason when I attempted to deploy the nodes, and it turns out that it
couldn't complete without the reverse lookup because some of the xCAT code
in xcatdsklspost queries it.

After running the command manually I saw this error:

> */opt/xcat/xcatdsklspost 6*

*awk: //xcatpost/updateflag.awk:22: fatal: remote host and port information
(3002, installstatus booted) invalid*

When I searched for a solution to this error I found this which pointed me
to a reverse lookup error:
http://sourceforge.net/p/xcat/mailman/message/27872412/

So maybe that's just one more thing to strike off your DNS and postscript
checklist, if you haven't already.

Regards,
Josh

On Tue, Nov 24, 2015 at 9:16 AM, Heald, Nathan T.  wrote:

> No, this is still in the same state.
>
> As best as I can tell DNS resolution is working, but it’s possible I’m
> mistaken. No post scripts run so not much should be different once it boots
> a second time (and I run a "nodeset boot” before it starts a 2nd install).
> It can ping the xcat server’s hostname fine once it comes back up and I can
> examine the node. I haven’t had much more time to work on this, but I do
> plan to keep chipping away at it as time allows.
>
> This is the xcat error I see for reference:
> Oct 21 16:35:15 oss01 systemd: Starting LSB: xCATpost...
> Oct 21 16:35:15 oss01 xcatpostinit1: /opt/xcat/xcatinstallpost: line 9:
> /xcatpost/xcatlib.sh: No such file or directory
> Oct 21 16:35:15 oss01 systemd: xcatpostinit1.service: control process
> exited, code=exited status=1
> Oct 21 16:35:15 oss01 systemd: Failed to start LSB: xCATpost.
> Oct 21 16:35:15 oss01 systemd: Unit xcatpostinit1.service entered failed
> state.
>
> Thanks,
> -Nathan
>
>
> From: Josh Nielsen 
> Date: Monday, November 23, 2015 at 3:43 PM
> To: xCAT Users Mailing list , Nathan
> Heald 
> Cc: "russa...@comcast.net" 
> Subject: Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7
>
> And did you ever figure out your problem Nathan?
>
> -Josh
>
> On Mon, Nov 23, 2015 at 2:34 PM, Josh Nielsen 
> wrote:
>
>> I was going to post a new thread about Centos 7 but thought I might piggy
>> back on this one since it is a similar topic. I have xCAT 2.8.3 currently
>> and it sounds like from the thread here that upgrading to 2.10 is part of
>> the solution for deploying Centos 7.1. Currently the issue I'm having is
>> that during a PXE boot it fetches the correct centos 7.1 image and begins
>> trying to deploy but after it shows "Mounted Configuration File System" and
>> "Started Show Plymouth Boot Screen" I get a message like this:
>> "dracut-initqueue: Warning: Could not boot" and hangs there.
>>
>> This isn't a very explicit error, as to what caused it, and I didn't
>> learn much from removing "quiet" from the PXE kernel boot parameters. I
>> also can't get the ctrl+alt+F keys to work, at least in a VM, for showing
>> different terminals like you could in Centos 6 (which was very helpful for
>> debugging).
>>
>> Is this most likely because the kickstart file itself is not formatted
>> correctly (I'm reusing my Centos 6.4 kickstart - until I figure out how
>> Centos 7.1 differs), or perhaps that the kickstart is not being properly
>> fetched from the gpxe configuration in /tftpboot/xcat/xnba/nodes? Currently
>> the gpxe boot configuration for the node I'm trying to deploy Centos7.1 to
>> looks like this:
>>
>> #!gpxe
>> #install centos7.1-x86_64-compute
>> imgfetch -n kernel
>> http://${next-server}/tftpboot/xcat/osimage/centos7.1-x86_64-install-compute/vmlinuz
>> imgload kernel
>> imgargs kernel repo=http://10.20.0.101:80/install/centos7.1/x86_64 ks=
>> http://10.20.0.101:80/install/autoinst/node0067c ksdevice=bootif cmdline
>> console=tty0 console=ttyS0,115200n8r BOOTIF=01-${netX/machyp}
>> imgfetch
>> http://${next-server}/tftpboot/xcat/osimage/centos7.1-x86_64-install-compute/initrd.img
>> imgexec kernel
>>
>> I manually changed ksdevice to "bootif" just to make sure it was using
>> the right interface (it was set to eth0 but Centos 7.1 got rid of "eth" for
>> "em" didn't it?). Nonetheless, I think my kickstart is being sucessfully
>> fetched, because I changed the disk formatting commands in the kickstart
>> and when I removed "quiet" from the boot I saw errors related to disk
>> formatting/partitioning. So I'm partial to thinking that the old Centos 6.4
>> kickstart configuration is not 100% compatible with Centos 7.1, but I want
>> to double check to see what xCAT 2.10 bring to the table that might be
>> 

Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7

2015-11-24 Thread Heald, Nathan T.
No, this is still in the same state.

As best as I can tell DNS resolution is working, but it’s possible I’m 
mistaken. No post scripts run so not much should be different once it boots a 
second time (and I run a "nodeset boot” before it starts a 2nd install). It can 
ping the xcat server’s hostname fine once it comes back up and I can examine 
the node. I haven’t had much more time to work on this, but I do plan to keep 
chipping away at it as time allows.

This is the xcat error I see for reference:
Oct 21 16:35:15 oss01 systemd: Starting LSB: xCATpost...
Oct 21 16:35:15 oss01 xcatpostinit1: /opt/xcat/xcatinstallpost: line 9: 
/xcatpost/xcatlib.sh: No such file or directory
Oct 21 16:35:15 oss01 systemd: xcatpostinit1.service: control process exited, 
code=exited status=1
Oct 21 16:35:15 oss01 systemd: Failed to start LSB: xCATpost.
Oct 21 16:35:15 oss01 systemd: Unit xcatpostinit1.service entered failed state.

Thanks,
-Nathan


From: Josh Nielsen >
Date: Monday, November 23, 2015 at 3:43 PM
To: xCAT Users Mailing list 
>, 
Nathan Heald >
Cc: "russa...@comcast.net" 
>
Subject: Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7

And did you ever figure out your problem Nathan?

-Josh

On Mon, Nov 23, 2015 at 2:34 PM, Josh Nielsen 
> wrote:
I was going to post a new thread about Centos 7 but thought I might piggy back 
on this one since it is a similar topic. I have xCAT 2.8.3 currently and it 
sounds like from the thread here that upgrading to 2.10 is part of the solution 
for deploying Centos 7.1. Currently the issue I'm having is that during a PXE 
boot it fetches the correct centos 7.1 image and begins trying to deploy but 
after it shows "Mounted Configuration File System" and "Started Show Plymouth 
Boot Screen" I get a message like this: "dracut-initqueue: Warning: Could not 
boot" and hangs there.

This isn't a very explicit error, as to what caused it, and I didn't learn much 
from removing "quiet" from the PXE kernel boot parameters. I also can't get the 
ctrl+alt+F keys to work, at least in a VM, for showing different terminals like 
you could in Centos 6 (which was very helpful for debugging).

Is this most likely because the kickstart file itself is not formatted 
correctly (I'm reusing my Centos 6.4 kickstart - until I figure out how Centos 
7.1 differs), or perhaps that the kickstart is not being properly fetched from 
the gpxe configuration in /tftpboot/xcat/xnba/nodes? Currently the gpxe boot 
configuration for the node I'm trying to deploy Centos7.1 to looks like this:

#!gpxe
#install centos7.1-x86_64-compute
imgfetch -n kernel 
http://${next-server}/tftpboot/xcat/osimage/centos7.1-x86_64-install-compute/vmlinuz
imgload kernel
imgargs kernel repo=http://10.20.0.101:80/install/centos7.1/x86_64 
ks=http://10.20.0.101:80/install/autoinst/node0067c ksdevice=bootif cmdline 
console=tty0 console=ttyS0,115200n8r BOOTIF=01-${netX/machyp}
imgfetch 
http://${next-server}/tftpboot/xcat/osimage/centos7.1-x86_64-install-compute/initrd.img
imgexec kernel

I manually changed ksdevice to "bootif" just to make sure it was using the 
right interface (it was set to eth0 but Centos 7.1 got rid of "eth" for "em" 
didn't it?). Nonetheless, I think my kickstart is being sucessfully fetched, 
because I changed the disk formatting commands in the kickstart and when I 
removed "quiet" from the boot I saw errors related to disk 
formatting/partitioning. So I'm partial to thinking that the old Centos 6.4 
kickstart configuration is not 100% compatible with Centos 7.1, but I want to 
double check to see what xCAT 2.10 bring to the table that might be necessary 
for Centos 7.1 deployment. Comments or thoughts?

Regards,
Josh

On Fri, Oct 23, 2015 at 1:44 PM, Russell Auld 
> wrote:
Looks like xcatdsklspost does get called even for stateful installs. Look at 
the script header.
Usually in cases like this, the issue is that the node being imaged can't 
resolve the name of the master node. Make sure your dns is working properly.

On Oct 23, 2015 12:52 PM, "Heald, Nathan T." 
> wrote:
>
> To follow up:
>
> I have resolved the pxe problem by upgrading further to xcat 2.10. Now it 
> sets kickstart parameters that centos7 responds to.
>
> However I have a new problem, the rinstall is now looping. I’ve gotten as far 
> as seeing that "/xcatpost” is never created on my stateful install. The xcat 
> debugging page suggests networking problems as the first thing to check. So 
> far I’ve not found anything on that front. I can’t find what specifically 
> creates /xcatpost during the install. I see that /opt/xcat was created which 
>