Hi
I'd suggest a reinstall and if this occurs again I will investigate further.
Not a great solution I grant, but I suspect that one or more patches did
not install/backoff properly.
Enda
Brian Kolaci wrote:
Thats what I was thinking - do a re-install. I ran broken for a month.
I was at my limit and gave the directory moves a shot.
Prior to that I did updates, and even upgrades (it originally happened
on a U1 install), but continued to fail after the U2 and even U3b3
upgrades.
I did the grep and only the packages SUNWrcmdr and SUNWtnetr have
postinstalls that reference pam.conf.
But like I said, ever since I did the "move the directory entry of SUNWcsr
before the others", the system has been working again. But I believe I
can still recreate this on my other BE.
Let me know if you want me to investigate further with the other BE
and I will, otherwise I'll just forget about it and hope it doesn't
happen again...
Enda O'Connor ( Sun Micro Systems Ireland) wrote:
Hi
I strongly suspect that somehow updatemanager has created some
inconsistency.
If the old BE list pam.conf as type e pamconf in
/var/sadm/install/contents
Then as long as the pspool/pkgmap is also e pamconf this should just
install.
Perhaps grep for pam.conf in
/var/sadm/pkg/*/save/pspoo/*/install/*
and see if somehow something else is modifying this file outside of
those packages that depend on SUNWcsr.
I'm running out of ideas short of a reinstall.
Enda
Brian Kolaci wrote:
Yep, its there. And the dependency is listed there.
The save/pspool... pkgmap entry is:
1 e pamconf etc/pam.conf 0644 root sys 3224 28137 1106348054
I'm just stunned/amazed that the "directory swapping" actually worked.
But I know that was the only change that fixed the issue. I had created
a zone right before it, did the directory entry swapping and created
another
zone right after it, and the second zone was OK.
Also before the directory swapping I used DTrace to confirm that
something else was manipulating /etc/pam.conf in the zone prior
to the pamconf action script.
I still have an old BE that had the issue (with solaris 10 U3 beta 3).
I can boot into that to diagnose this further (just go to single user,
mount it and luactivate it, right?) Let me know what I should do/try
to find out what is being installed in the zone prior to SUNWcsr.
Enda O'Connor ( Sun Micro Systems Ireland) wrote:
Hi
There should be a file
/var/sadm/pkg/SUNWkrbu/save/pspool/SUNWkrbu/install/depend
which should explicitly call out SUNWcsr.
Coudl you check the /var/sadm/pkg/SUNWcsr/save/pspool/SUNWcsr/pkgmap
and tell me what the entry is for pam.conf?
I suspect that this entry is corrupt.
Very much suspect that update manager has caused the system to
become out of sync.
Basically in my experience updatemanager needs a lot of space in
/var to work. It downloads and uncompresses the patches, which if
installing a lot of patches can be space consuming, given that the
avg patch is 6m zipped to start with.
Enda
Brian Kolaci wrote:
I believe that SUNWkrbu is creating the file /etc/pam.conf.
How are the dependencies expressed? What tells (or should tell)
the system that SUNWcsr must be installed before SUNWkrbu (or
any other package)?
I don't have the logs from the failed update, but this all started
when "updatemanager" failed to update patches on the system. The
reason for updatemanager failing is clear though (zonepath for the
local zones was on the root filesystem and it filled to 100% and
couldn't boot the zones since the /zones filesystem wasn't mounted).
Enda O'Connor ( Sun Micro Systems Ireland) wrote:
Hi Brian
Not clear to me why this is happening, my system appears to have
correct dependencies on SUNWcsr, the only way that I know that
this could happen is if etc/pam.conf got converted to type "f" by
mistake, which does not appear to be the case in your failed machine.
or
A package creates pam.conf without a dependency on SUNWcsr.
oe the pspool/SUNWcsr/pkgmap entry for pam.conf is damaged.
what is the pam.conf entry like in
/var/sadm/pkg./SUNWcsr/save/pspool/SUNWcsr/pkgmap ( is it type e
and pamconf CAS ? )
Almost looks like the system is somehow corrupted by a patchrm
failure.
But that is just a guess without any concrete evidence.
Did you remove any patches ( if so which ones )
Do you have an install_log from a failed zone install, ( in
/var/sadm/system/logs in the non global zone )
What release and what patches have you applied to the failed system
Brian Kolaci wrote:
Hi Enda,
the grep returns:
/etc/pam.conf e pamconf 0644 root sys 3224 28137 1106348054 SUNWcsr
Yes, there was one of the other packages. I looked and it appears
it was SUNWkrbu.
Strange but when I mv'd the directories around (at least the way
I did),
the order they came back from ls, enough to make it start working
again.
My rationale was to look at the directories there, then clear the
directory
entry toward the top of the list by moving the directory out of
the way,
then moving SUNWcsr back into the package directory (again
assuming that
it will be assigned to the first free slot), then move the other
directory
back in. Like I said, it did work, but having that work didn't
give a
warm & fuzzy feeling...
Enda O'Connor ( Sun Micro Systems Ireland) wrote:
Hi Brian
packages are installed in dependency order, ie SUNWcsr always
installs before any other package that requires it, no matter
the ordering of them in /var/sadm/pkg. Basically any patch can
change the order of the packages in /var/sadm/pkg in relation to
time stamps etc, so we do need to do this in dependency order.
Basically SUNWtnetr has /var/sadm/pkg/SUNWtnetr/install/depend
which calls out SUNWcsr.
I mv'ed packages in /var/sadm/pkg around and it had no effect on
the ordering, see <zone-path>/root/var/sadm/system/logs/install_log
it should never change with respect to SUNWcsr really, basically
SUNWcsr will always install before any other package that calls
out a dependency installs.
Could I see a log of a failed install, seems some package is
installing pam.conf without a dependency on SUNWcsr?
But unless the system is corrupted the packages you mention:
SUNWsshdr SUNWtnetr SUNWrcmdr SUNWwebminu
all install after SUNWcsr.
what does grep etc/pam.conf /var/sadm/install/contents say?
Enda
Brian Kolaci wrote:
Well, I finally solved this obscure case. I think this is a silly
way to determine package ordering and dependencies which can
cripple
an installation. I believe a bug should be filed, but I'm not
sure
what to file it against.
Due to a failed patch update, something happened with the on-disk
directory ordering in /var/sadm/pkg. I found that /etc/pam.conf
is referenced in the packages SUNWcsr SUNWsshdr SUNWtnetr
SUNWrcmdr
SUNWwebminu SUNWman. Apparently their installation order is based
on the order they're returned by opendir/readdir. There should
be a
dependency on SUNWcsr by all packages that reference /etc/pam.conf
in any class action scripts, since SUNWcsr needs to install it
before any other package can modify it.
So the logic to determine package installation order needs to
be updated to include the above dependency. What dependency
checks
are used to calculate the order? Is there any "official" order
the
packages should be installed in (so that I may rebuild the
/var/sadm/pkg
directory to be in the proper order)?
Should you run into this problem, I'll quickly post how I fixed
this.
# cd /var/sadm/pkg
# ls -ltd SUNWcsr SUNWsshdr SUNWtnetr SUNWrcmdr SUNWwebminu
if SUNWcsr isn't at the top of the list, you're going to have
problems.
I fixed it by creating a directory and moving the one at the
top of the list
into the tmp folder, then mv SUNWcsr to the tmp folder, then mv
SUNWcsr back
followed by moving the other directory back.
# mkdir tmp
# mv SUNWtnetr tmp
# mv SUNWcsr tmp
# ls -ltd SUNWcsr SUNWsshdr SUNWtnetr SUNWrcmdr SUNWwebminu
SUNWcsr: No such file or directory
SUNWtnetr: No such file or directory
drwxr-xr-x 4 root root 512 Jul 18 15:22 SUNWsshdr
drwxr-xr-x 4 root root 512 Jul 18 15:21 SUNWrcmdr
drwxr-xr-x 4 root root 512 May 3 2005 SUNWwebminu
# mv tmp/SUNWcsr .
# mv tmp/SUNWtnetr .
# ls -ltd SUNWcsr SUNWsshdr SUNWtnetr SUNWrcmdr SUNWwebminu
and maybe I just got lucky, but SUNWcsr was now at the top of
the list.
All zone creations now work properly and /etc/pam.conf matches
the global zone.
Brian Kolaci wrote:
Thanks for the reply.
I'm digging through i.pamconf to find out why its not copying
the file.
This seems to be the problem. Its doing the editing, but not the
initial copy of the file. I checked the CLEANUP_FILE and found
that
it had logged messages "default entries updated", which means
it is
not copying the file which means it already exists.
Perhaps there's some kind of package installation ordering issue.
The i.pamconf script checks for the existence of /etc/pam.conf
and
only copies it if it doesn't exist. If another package gets
installed
before SUNWcsr that tries to manipulate /etc/pam.conf and
actually
creates it, then the copy will never be done. This looks like
what is
happening.
What order are the packages installed in? Is there a way to
adjust
that order to assure that SUNWcsr comes before the other
one(s) that
are manipulating the file? What is the correct order for
package installation?
Renaud Manus wrote:
SUNWcsr pkgmap defines /etc/pam.conf as a 'e' (editable) type
file with a class
action script 'pamconf'. In this situation, when you install
a new zone, when
it comes to install the SUNWcsr package, the class action
script will just copy
the file from /var/sadm/pkg/SUNWcsr/save/... to
[ZONEROOTPATH]/etc.
After that, it's possible that some packages need to modify
the pam.conf
(eg. SUNWtnetr), to add new entries for example, then they do
so in their
postinstall script.
You could find all the files on both systems that manipulate
pam.conf and compare them.
eg.
# find /var/sadm/pkg/SUNWcsr -type f -exec /usr/xpg4/bin/grep
-q pam.conf {} \; -print
-- Renaud
Brian Kolaci wrote:
Hi,
I'm still having zone creation issues where my /etc/pam.conf
is corrupt.
I have 2 machines, one works fine, the other always creates the
zone with a bad /etc/pam.conf.
I used the Dtrace toolkit "opensnoop" program to watch on
both machines.
I see on the "good" machine, where it creates the
/etc/pam.conf correctly
that a process properly copies the file from the pspool
directory:
0 29509 cp 4
/var/sadm/pkg/SUNWcsr/save/pspool/SUNWcsr/reloc/etc/pam.conf
This happens during the "Initializing package <x> of <y>:
percent complete: ??%" phase.
I never see this on the machine having issues. In fact what
I do see is:
0 16561 cat -1
/pool1/zones/bktest2/root/etc/pam.conf 0 16564
grep 7 /pool1/zones/bktest2/root/etc/pam.conf
0 16565 sh 7
/pool1/zones/bktest2/root/etc/pam.conf 0 17485
cat 6 /pool1/zones/bktest2/root/etc/pam.conf
0 17487 cat 6 /tmp/pam.conf.17484 0 17489
grep 6 /pool1/zones/bktest2/root/etc/pam.conf
0 17490 sh 6
/pool1/zones/bktest2/root/etc/pam.conf 0 17491
grep 6 /pool1/zones/bktest2/root/etc/pam.conf
0 17492 sh 6
/pool1/zones/bktest2/root/etc/pam.conf
so it appears to be trying to manipulate the file rather
than just copy it.
What determines whether a file is copied from the
save/pspool/... directory
rather than just a postinstall script trying to manipulate it?
I've even upgraded the system to the latest U3 beta and the
problem persists.
Is the process flow for creating zones documented somewhere?
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org